More Organs → More Human

Stupid things I've figured out so that you don't have to.


Site Feed

Friday, April 28, 2006

Full of goo... mission goo...

Normally, I try to keep this blog relatively free of the random, stream-of-conciousness, unloading-of-random-personal-feelings family of writing. At the moment, though, I can't resist. It's a gorgeous Friday morning, it's already almost 70 degrees outside, not a cloud in the sky, and it's early enough in the day that, if I left now, I could go spend the day hiking in the Gorge and still make it back with plenty of time for dinner this evening. Instead, I'm managing— through sheer force of will— to stay in my office, at my computer, doing homework. In an hour, I'll actually go to class. I might even pay attention. Then I'll read the article for today's seminar (or at least skim the abstract while I eat lunch), go to the seminar, return to my office, and continue to work. From my window, I'll have an excellent view of the sun making its way west. Just as it's heading below the hills, I'll get on the bus and head home.


I am in full academic mission mode. I will not be deterred. I am stronger than the weather. I can stay focused. Just a few more weeks until summer vacation, and then I'll have all the time in the world to go hiking. Back to proofreading.


Something about this sort of weather, though, makes me really, really, really, really, really not want to stay inside. Yesterday, I spent the better part of the afternoon pretending to pay attention to a recorded lecture that my professor left for us to watch while simultaneously playing the "airfare game"— you know, the one where you bring up Travelocity or Expedia, enter in some far-off destination, and try to find the best fare: "Hmm... if I fly through London, stay over a night in Daka, and change planes in Bangkok, I can make it to Jakarta for only $1,350!" Or, "Hey, if I bump my return flight back a week, I can get to Tel Aviv and back for under $1,500!" It is for this reason that credit cards are dangerous things. I see some ridiculously huge fare on the computer screen, and I know that it would be a trivial physical motion for me to pull my credit card out of my wallet and enter the numbers into the computer. I know that, if I did that, I could then go home to get my camera, drive to the airport, get on an airplane, and in a few short hours (ok, actually, quite a few really, really, really loooong hours) find myself somewhere far away having an adventure.


For now, the simple thought of having to return to the Sisyphusian treadmill of credit card payments is enough to deter me... but I know that, one of these days, I'm just going to actually do it. I'll type in the numbers, get on a plane, and will find myself in a souk in Amman, or on a beach in Goa, when the thought of consequences finally crosses my mind. I'll slap my forehead, yell a Simpsonian "'Doh!", and... well, I'm not actually sure what I'll do then. I'm sure I'll figure something out... in my mental screenplay of this event, Imagination-Steve engages in some risky-but-profitable, almost-legal money-making scheme with some friendly but somewhat disreputable locals, or maybe with the unsmiling Mosad agent he met at a bar. Possibly something "import/export"-related, or perhaps involving setting up a secure communications channel with a local underground political group, or giving bioinformatics tutorials to some people who, his friends assure him, are faculty members of the local university. Heavily-armed faculty members. Some of whom are middle-aged chain-smoking guys who speak with strong Russian accents, and whose knowledge of molecular biology seems a bit more "applied" than most American biologists', and whose internet connections come over aging Chinese satphone equipment rather than over the city's telephone network. But the money is good— they offered him his choice of currencies— and it makes a great story that he can pitch to the editor he sat next to on the airplane from Lisbon.


Of course, that's just what Imagination-Screenplay-Steve would do. Real-Steve would probably finish his trip, have a great time, come back to the US, and then do a string of annoying and hassle-ridden consulting jobs until his debt to the credit card deities was paid back. That's because Real-Steve is far more sensible than Imagination-Steve... but he has a lot less fun. Of course, he's less likely to end up stabbed and left for dead on the side of the road in Kabul, or arrested and rotting in a cell in Bogota... but he's also a lot less likely to ever get a story written up in the next edition of DP.


Something about days like this, though— warm, sunny, large stacks of unpleasant homework that desperately needs doing— makes Imagination-Steve want to come out to play. Real-Steve keep telling me that the last time I let Imagination-Steve come out to play at all, even in a limited-trial-run sort of way, I ended up puking my guts out on the side of the road in Calcutta, but for some reason even that particular episode doesn't seem so bad in retrospect. It's really just Imagination-Steve rattling his chains and telling me "C'mon, you know you sort of liked it, it wasn't that bad, it makes a great story, you got some great pictures of the whole thing, go on, drink the tap-water, what's the worst that'll happen? They can cure dysentery these days, you know that, right?"


I know, enough already. Back to proofreading my paper. We'll be back to our regularly scheduled programming of occasional random geekery soon, I promise. In fact, right after I proofread this paper, I'm planning to write up the new R trick I learned last week (definitely falls into the category of "Stupid Stuff I Figured Out So You Don't Have To", and also the "Stuff I Will Almost Certainly Forget How To Do And Spend Hours Googling For The Next Time I Need It Unless I Write It Down Somewhere" category). Maybe if I really hurry (and stop procrastinating by writing this post), I can get enough done to at least do some of my reading outside this afternoon. The ILL-demigods finally brought my books on current statistical theory of molecular recombination. Big excitement in Real-Steve-land, eh? Meanwhile, Imagination-Steve is intrepidly hiking through the mountains of outside Kabul with his camera and satellite phone, getting ready to call in a story about the latest developments in poppy cultivation or some such thing. Or maybe he's met up with the airline pilots that he met in Calcutta last summer, and is on his way to Cape Verde right now. Or perhaps he's trying to convince his kidnappers that he's Canadian, not American, or that he only looks Jewish, and that he's certainly never been to Israel. Real-Steve, on the other hand, is about to pack up his laptop and head off to biophysics class. Today's topic: fluorescence, up close and personal.

Tuesday, April 11, 2006

For the record...

I would like it noted that I just wrote the following sentence for my statistics homework:

For simplicity's sake, I used R to calculate...


I could be wrong here, but I think this may be the first time in recorded history that R has ever turned out to be the simplest way to do something. Usually, when people talk about R, they use phrases like "not as bad as I'd heard it would be", or "after a few hours of screwing around with R, I finally...", or "I gave up on R because...".

For the uninitiated, R is an open-source math program, in the same general family of application as Matlab. It's based on some pretty old numerical computing code, so its syntax is kind of... odd. Once you figure it out, it's not so bad, getting to that point can take an awfully long time, even for programmers or people who are familiar with programs like Matlab. The upshot is that it can basically do any kind of math you'll ever need, is free, and is easy-ish to integrate with other programs. Until today, I'd never found anything that I needed that it could do faster or easier than SPSS. It turns out, though, that getting SPSS to deal with, say, a 2x2 table when you've already got the data in aggregate form is, while entirely possible, pretty dang unintuitive. So unintuitive, in fact, that it was actually easier to get R to do what I needed than it was to figure out (from the SPSS documentation) how to get SPSS to do it. In this case, I needed to do what may well be the simplest statistical task involving a 2x2 table: a normal test of the equality of two proportions. In R, it was about two lines of input. In SPSS, it's a pretty convoluted process involving dummy variables, case weighting, etc.

Part of this has to do with the way that the two programs accept data input. R assumes that you'll be dumping data directly in from some sort of input file (or pipe, or whatever). As a result, its manual interface for entering data is ABSOLUTELY HORRID, and this is what usually trips new users up. The "tutorials" that come with R spend a lot of time talking about some pretty abstract aspects of its data model, and when they finally get around to showing you how to type your data in, it looks way more complex and tedious than it would be in practice. 2x2 tables, however, are one of the few cases where the usually annoying manual input process is exactly what is needed. All of R's proportion-comparison functions want their input in the form of two vectors: one for the "successes", and one for the total "tries" (recall that the statistical basis for proportion comparison is usually derived from the binomial distribution). This means that, if you already have that data, you can just type the darn numbers more or less straight into R, and it will politely give you your results.

SPSS, on the other hand, assumes that the user will be manually entering their data in an unaggregated way. It basically pretends to be a giant spreadsheet, and does all of the data aggregation and calculation needed for proportion comparisons "behind the scenes". Unfortunately, when your data is already aggregated, there's really no immediately obvious way to get it to do anything useful. The solution actually makes a little bit of sense, but is pretty non-obvious. For the interested reader, here's how to do it:

  1. Open a new data editor.

  2. Make three numeric integer variables. The first one will be a dummy variable for your risk factor, the second will be a dummy variable for your disease, and the third will be a count.

  3. For each square in your 2x2 table, enter a new case. For the square containing the count of subjects with the risk factor and the disease, enter "1" for the risk factor dummy variable, "1" for the disease variable, and the value of that square in your table. For the square containing the count of subjects with the risk factor and without the disease, enter "1","2", and the contents of the square. Basically, the first dummy variable is for the row number, and the second is for the column number. Do this for each cell in your 2x2 table.

  4. Once you've entered your data, go to the "Data" menu and select "Weight Cases". Tell it to weight cases by your "count" variable.

  5. Now, you can go to the "Analyze->Descriptive->Crosstabs" and proceed as you would if your data were entered normally.



This method should work for arbitrarily sized tables, but I know that many of the analyses that SPSS performs will only really work with a 2x2 table.

You can clearly see that, in this particular case, R was totally the easier way to go. I don't want to go spreading false R-hope around, though— for most people, most of the time, if you've got access to SPSS or SAS and know how to use them, R is probably not a good first choice for your statistical computing. If, however, you find yourself needing something a bit burlier than Excel, and don't have anything else lying around, it's definitely worth a shot.