More Organs → More Human

Stupid things I've figured out so that you don't have to.


Site Feed

Thursday, August 12, 2010

The move...

So, after more than a year of non-bloggery, I've decided to change things up a little bit and try out a different blogging platform. Going forward, More Organs -> More Human will be located at http://blog.bedrick.org, and will hopefully be updated slightly more often. Thanks, Blogger- it's been great!

Monday, January 28, 2008

Notes on the Cuban Internet

I was poking around O'Reilly's Radar blog, and found the following post: O'Reilly Books and the Cuban Internet. It's about the role that O'Reilly books had in the early development of Cuban Internets. Naturally, it mentions the current state of the Internet in Cuba (short version: heavily censored). I decided to chime in with what was supposed to be a short reply, but turned into a slightly longer blurb than I'd intended. It seemed like the sort of thing that people who know me might be interested in, so I thought I'd re-post it here. Enjoy!




I was actually in Cuba a year ago for the medical track of an IT conference, and while there I tried to learn as much as I could about the state of the Cuban Internet experience.


First, a caveat: I had the opportunity to meet many tech-savvy Cubans from many different sectors of Cuban society, all of whom were friendly and knowledgeable, and none of whom *seemed* to be pulling any wool over my eyes.... however, in Cuba, as an outsider, it is very difficult to know what's true and what isn't. Conversations between locals and foreigners are generally assumed to be under surveillance, and the locals know it and tend to choose their words carefully. There wasn't any reason for anybody to be BSing me- it's not like we were directly talking politics, or anything like that- and people were generally quite open about certain negative aspects to Internet use in Cuba... but I certainly take what I learned with a very large grain of salt, and suggest that you do the same.


From what I was able to find out, one's ability to use the Internet depends heavily on one's position in Cuban society. The nicer tourist hotels in Havana have (very, very expensive) Internet facilities for their guests to use, but I never saw any actual Cubans using them- not sure if that's because of the price, or because most Cubans aren't really allowed in the tourist hotels in downtown Havana.


Everybody I talked to was quite open about the impossibility of average Cubans having computers and Internet access in their homes-- I don't know one way or the other if that impossibility is "de jure" or "de facto", but either way, most people have no legitimate way to get online and this was common knowledge among Cuban digerati. Nearly everybody attributed this impossibility to "El Bloquero" (the Embargo). This is not surprising: in Cuba, *everything* is blamed on the embargo... and, to be completely fair, the embargo does indeed affect Cuban society in many ways (some obvious, some not). Without getting into what is an enormously complex subject, I'll just say that it seems plausible to me that some the limits on Cuban Internet access might indeed be caused by the embargo; however, that simply can't be the whole story. It suits two goals of the regime to limit the Internet's penetration in Cuban society: first it allows them to control the potentially disruptive effects of the Internet; second, it gives the government one more thing to blame on the embargo.


However, many of the doctors and medical researchers I talked to reported having at least some dial-up Internet access in their homes and offices, theoretically for professional use- Cuba has done some really impressive things in terms of using computer networks for medical communication and training, and the dedication on the part of the (ludicrously underpaid and undersupplied) doctors towards their patients was impressive. Being a doctor in Cuba really isn't a 9-to-5 at all, and having access to the national medical intranet (called Infomed) from home is crucial. People being people, however, it is almost a guaranteed thing that many Cubans with legitimate Internet access in their homes are finding ways to share that access with their neighbors in some fashion.


Another way that the Cubans I met were using the Internet was via their schools. Medical students have Internet access through their universities, but only (from what I was told) via public computer labs. I was able to meet on several occasions with students from the local university's IT/CS departments. They all reported having unfettered internet access, though they said it was slow and flaky at times. They all had gmail accounts, and I saw them using various instant messaging clients just as obsessively as college students here in the US do- I don't think they were "Potemkin Email Accounts". Most of them used Linux in some form or another, and reported being able to access message boards without issue. I asked specifically about news sources such as the BBC-- from the conference center and from various hotels, I had been able to access it without issue. The students I talked to reported being able to access it without restriction from their school labs, but who knows if that was really the case or not- furthermore, even if it was the case, who knows what sort of logging or monitoring was being carried out by their schools!


Incidentally, I got the distinct impression that Cuban Internet users I talked to were well aware of the fact that their online activities were almost certainly being monitored and logged, at least to some extent. Nobody came out and said it, but after a week or two in Cuba I began to be able to pick up on certain hints and cues- the Cuban people are far from stupid, and know perfectly well when they're being watched in real life... it's hard to explain, but they seemed to know the score regarding their online activities. Alternatively, I could have imagined all of it- after a little while, I began to absorb some of the the paranoia that comes naturally from being in what is fundamentally a police state. A very warm and welcoming police state, with many wonderful people and many fine attributes- but a police state nonetheless. One of my traveling companions grew up in an Eastern Bloc country in the 1940s and 1950s, and said that in many ways being in Cuba felt like "being back home," so to speak, and that, in some ways, the Cubans we met had many of the same mannerisms that he remembered from his youth- always looking over their shoulder, being aware of who was around them while they were talking, etc.


I talked to several people involved in the design and operation of Infomed, and according to what they told me, there is plenty of network bandwidth *within* Cuba. There is, however, a severe bottleneck on traffic *leaving* Cuba- partially due to the sketchy nature of their connection (a couple of fiber lines to Venezuela, according to one engineer I talked to) and, presumably, partially due to whatever traffic monitoring system the Cuban government has put in place. It would not surprise me in the least to learn that the Cuban telecom authorities were using some sort of traffic shaping to prioritize tourism-related network traffic (e.g., from hotels and resorts) over Cuban traffic.


Interestingly, several engineers told me that one problem facing Cuban Internet access was that the American government was actively trying to interfere with their traffic, and that every couple of months the telecom engineers who maintained the external connection had to make some sort of routing change to get around whatever blocks the Americans had put in place. I have *absolutely* no way of knowing if this is true or not- I wouldn't be surprised either way. On the one hand, it sounds pretty far-fetched, and as near as I can tell Cuba's national pastime is to find a way to blame everything, up to and including the weather, on the embargo. On the other hand, that's *exactly* the sort of crazy and time-wasting stunt that our government would try and pull. The history of US-Cuban relations is littered with dozens of crazier and further-fetched attempts by both governments to get on each other's nerves, so who knows?


In spite of the embargo, I saw a fair amount of American tech being used- Cisco routers, HP servers, a couple of Epson scanners, and so on. According to one person I talked to, there are various Latin American resellers who sell Cuba American electronics at a significant markup. I also saw several Chinese telecom companies exhibiting their wares- apparently, much of the Cuban telecom infrastructure is built on Chinese equipment. Interestingly, I saw a surprising number of Apple machines in use, mostly by the team developing Cuba's homebrew radiology imaging system. I hadn't been expecting to see *any* Macs besides my own, so seeing five or six set up demoing a PACS application was a pleasant surprise.


Wow, this really ended up being a lot longer than I'd intended it to be. There's all kinds of stuff that I'm probably leaving out, so shoot me an email if you want to hear more. Basically, the bottom line is that Internet use in Cuba is growing, but in a very controlled and directed fashion. At some point, the floodgates will *have* to open, just as they have everywhere else in the world, and I think the government knows it... but they're delaying for as long as possible. Their reasons presumably have less to do with keeping Cubans from getting information from the outside world- there is plenty of foreign media available in Cuba, especially to people who are connected in some way to the tourism industry (which is an awful lot of people these days...). I suspect that the government's reasons for not wanting the general population to have Internet access have more to do with restricting the sorts of communications Cubans can have with one another- I don't imagine that they'd want Cubans to be able to set up Google Groups to complain about the government, or for underground groups to be able to communicate securely. To my mind, this is probably why they have restricted Internet access to individuals with at least some stake in the status quo.


However, I think the history of the Internet has taught us that betting against the free flow of information is generally a losing bet. People have a way of getting around whatever barriers are put in their way when it comes to accessing and using the Internet, and the sooner governments everywhere learn this and adapt accordingly, the happier we'll all be.

Wednesday, January 16, 2008

Notes from Macworld 2008

Live, from the Microsoft "Blogger's Lounge" at this year's Macworld Expo, some disjointed notes and observations:



  • I had absolutely no idea how many different companies were attempting to make a business out of carrying, storing, protecting, decorating, or otherwise interacting with your iPod. Seriously, it's insane. Ever third or fourth booth is occupied by someobdy with a new and improved carrying case.


  • Office 2008 is certainly shiny, and seems to have lots of good features. However, it's introducing a UI innovation not unlike its corresponding Windows version's "Ribbon", and the implementation is flawed. It has lots of flashy animations- when you change to a different "tab" of controls, the new controls sort of swoop in from the left. This is cute the first time, and then gets annoying. Fast. The spokespeople all seemed to agree that there is no way to disable the animations. Lame.


  • Office 2008 also introduces some basic bibliographic management features. Unfortunately, they only support three or four reference styles: Chicago, APA, and one or two others that I've forgotten about. There's no way to import or add new styles. Double lame. Apparently, you can edit existing ones, but then what are you supposed to do if you need Chicago style for something? I don't think EndNote has anything to worry about.


  • The people at OmniGroup's booth were friendly. Big congrats for winning the "Best of Show" award with OmniFocus!


  • The Nikon D3 is a seriously sexy camera. I took a picture at ISO 3200 that had about the same amount of noise as my D70 has at ISO 400. You can practically shoot in the dark with that thing.


  • The new Macbook Air is even shinier in person than in pictures. That doesn't change its constrained feature set or exorbitant prices, but it's still impressive. Also, I'm guessing that the target customer for that thing isn't worried about either of those things...




More notes as they come...

Wednesday, October 24, 2007

Word of the day: Rhombencephalitis

While reading about the ongoing listeriosis outbreak in Norway, I came across an interesting new word: Rhombencephalitis. Apparently, this is an inflammation of the rhombencephalon (or hindbrain)— the part of the brain that manages our heart rate, breathing, and so on. Needless to say, this is a very, very, very bad place to have an infection. For some informative reading, check out:

Popescu, G A, Saquepee, M, Poisson, D, Prazuck, T. Treatment difficulties of a listerial rhombencephalitis in an adult patient allergic to penicillins. J Clin Pathol 2004 57: 665-666 [journal][pubmed]

Nichter, C. A., Pavlakis, S. G., Shaikh, U., Cherian, K. A., Dobrosyzcki, J., Porricolo, M. E., Chatturvedi, I. Rhombencephalitis caused by West Nile fever virus. Neurology 2000 55: 153 [journal] [pubmed]

Friday, October 19, 2007

PostgreSQL tip: Make sure your indexed columns' data types match!

Hello again, kiddies. Today's post describes a particular behavior of PostgreSQL that has bitten me on the bum more than once. Do you have any queries that should be using an index, but aren't? Read on for a possible explanation.

I've been working with the UMLS a lot lately. For those who are unfamiliar with this acronym, it stands for the "Unified Medical Language System" and is a monstrously large database containing millions of medical terms and concepts in dozens of languages. While it can be a wonderful resource, the adjectives I'd use to describe it do not include "easy to use." One of its nifty features is that, in addition to simply storing medical terms, it stores a little bit of semantic information about the terms. In other words, if I know the UMLS code number for "Weil's Disease", I can (in theory) find out that it is a type of "Leptospirosis", and that it is therefore a "Zoonotic bacterial infection" and a "Spirochetal infection".

As you might imagine, this involves writing an SQL query that JOINs data from several tables on a particular set of specified values. Now, given the size of the UMLS, some of these tables have millions of rows. As anybody out there who's done anything with SQL will tell you, you really want the database engine to do that join using an index scan (as opposed to a simple sequential scan, which would have to visit each and every one of those millions of rows). Luckily, in PostgreSQL, if you are trying to join two tables, and the column you're joining on is indexed in both of those tables, the system is usually smart enough to do an index scan, and life is good.

Note the use of that all-important word, "usually". There are certain circumstances in which, in spite of both columns being indexed, the database will insist on doing a sequential scan on one or both of the tables. You'll know when this happens because a query that should take under one second will instead take ninety (depending, of course, on how large your tables are). The first step for diagnosing a slow query should always be to use the "explain" command. When you do explain your mysteriously slow query, the odds are good that you'll see the dreaded "Seq Scan on..." listed as one of the steps instead of "Index Scan using...".

There are all sorts of reasons why Postgres could be choosing not to use an index, but the reason that has gotten me most often over the years is that the columns involved in comparisons (i.e., the columns that are being specified as the join conditions) are not using the same datatype. The first time I ran into this problem, I was using two tables that were involved in storing customer shopping cart data. For some reason that I've long since forgotten, we'd used bigint (i.e., int8) as the datatype for a column in one table, and only used int (int4) for the corresponding column in the other table. Even though each table had a column index for the relevant column, Postgres was doing a sequential scan, which began to absolutely murder our query's performance once one of the tables got to having more than 100,000 rows or so.

The solution was refreshingly easy: in the query, simply cast one of the columns to the appropriate datatype. In this case, we casted the int4 column to int8, and *boom*, the query ran thousands of times faster.

Today, in the UMLS, I found myself running into the same problem trying to do a join between mrhier (~8,000,000 rows) and mrconso (~ 4,000,000 rows): for some reason, the mrhier table's paui column is set to varchar(10), whereas every other column storing an AUI is varchar(9). I forced my query to start casting to varchar(9), and things suddenly started using their appropriate indices. I don't know if the difference in data types is a bug in the UMLS SQL load scripts, or what— MRCOLS.RRF tells me that the PAUI column should have a maximum of 9 characters, so that spare character is almost certainly anomalous.

For example, consider the following query:

select str from mrconso where aui in (select distinct mh.paui from mrhier mh where mh.aui = 'A2883423');


Note that I'm using a subquery mostly out of convenience. explain analyze gives the following output: (Blogger's screwed up the formatting, you might need to copy and paste into Text Edit or something)



umls=# explain analyze select str from mrconso where aui in (select distinct mh.paui from mrhier mh where mh.aui = 'A2883423');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash IN Join (cost=164014.37..352398.59 rows=43224 width=32) (actual time=2008.163..50013.062 rows=2 loops=1)
Hash Cond: (("outer".aui)::bpchar = "inner".paui)
-> Seq Scan on mrconso (cost=0.00..166340.32 rows=4322332 width=63) (actual time=0.100..43613.254 rows=4322332 loops=1)
-> Hash (cost=164014.36..164014.36 rows=2 width=34) (actual time=0.205..0.205 rows=0 loops=1)
-> Subquery Scan "IN_subquery" (cost=163814.12..164014.36 rows=2 width=34) (actual time=0.189..0.197 rows=2 loops=1)
-> Unique (cost=163814.12..164014.34 rows=2 width=34) (actual time=0.185..0.192 rows=2 loops=1)
-> Sort (cost=163814.12..163914.23 rows=40045 width=34) (actual time=0.184..0.185 rows=3 loops=1)
Sort Key: paui
-> Index Scan using x_mrhier_aui on mrhier mh (cost=0.00..160080.82 rows=40045 width=34) (actual time=0.075..0.096 rows=3 loops=1)
Index Cond: ((aui)::text = 'A2883423'::text)
Total runtime: 50049.380 ms



Crikey! This thing took almost a minute to run! I've highlighted the culprit in red: that dang sequential scan. Let's change the query a little bit, and use the casting trick, like so:

select str from mrconso where aui in (select distinct mh.paui::varchar(9) from mrhier mh where mh.aui = 'A2883423');


I've highlighted the cast. Let's run explain analyze again:



QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=164014.34..337010.71 rows=43224 width=32) (actual time=40.762..87.066 rows=2 loops=1)
-> Subquery Scan "IN_subquery" (cost=164014.34..164214.59 rows=2 width=31) (actual time=0.232..0.261 rows=2 loops=1)
-> Unique (cost=164014.34..164214.57 rows=2 width=34) (actual time=0.229..0.249 rows=2 loops=1)
-> Sort (cost=164014.34..164114.46 rows=40045 width=34) (actual time=0.227..0.232 rows=3 loops=1)
Sort Key: (paui)::character varying(9)
-> Index Scan using x_mrhier_aui on mrhier mh (cost=0.00..160281.04 rows=40045 width=34) (actual time=0.091..0.111 rows=3 loops=1)
Index Cond: ((aui)::text = 'A2883423'::text)
-> Index Scan using x_mrconso_aui on mrconso (cost=0.00..86127.91 rows=21612 width=63) (actual time=43.368..43.382 rows=1 loops=2)
Index Cond: ((mrconso.aui)::text = ("outer".paui)::text)
Total runtime: 87.286 ms



Look, ma, no sequential scans! Pretty remarkable performance improvement, eh? Sub-100-ms vs 50,000 ms! Amazing what a good index will do on sufficiently large tables.

So, long story short, Postgres' query planner is very picky about column data types, even down to how many characters a varchar is given. If a query is misbehaving, try experimenting with some casting and see what you get.

Update: For the sake of completeness, I should mention that the PAUI column is not supposed to be 10 characters long- it's supposed to be 9 characters long, but there's a bug in the sql load file that the UMLS gives out wherein the column is set to 8 characters by mistake. To "fix" this bug, I set it to 10, thereby introducing this other bug. This all happened before I had found the master data dictionary for the UMLS; if I were to run into this problem today, I would have been able to discover that the correct column width for PAUI is 9 characters and this never would have happened. However, the larger point of the article--- that it's important to JOIN on similarly-typed columns--- is still valid.

Monday, September 24, 2007

Bubonic Plague and Mongolian Cooking

There exists an excellent and widely-read infectious disease notification listserv, published by the International Society for Infectious Diseases. It's named ProMED, and it covers pretty much any interesting disease outbreak going on in the world. The best part is actually the commentary--- once a report makes it onto ProMED, other readers are able to contribute their opinions and findings back to the group. The listserv's editors are all experienced infectious disease specialists, and they often offer useful commentary as well.

This afternoon's ProMED contained a fun little report of bubonic plague in Mongolia--- apparently, a 16-year-old boy was skinning a marmot his father had caught and cut his finger in the process. Marmots apparently serve as the main plague reservoir in Mongolia, and our unfortunate Mongolian teenager soon found himself infected with glandular bubonic plague. Luckily, he lived in an area with plenty of doctors and medicines, and recovered nicely.

So far, this is an interesting but not particularly funny disease report. Immediately following the initial report, however, were the ProMED editors' comments:



[Plague is endemic in Mongolian marmots and their fleas, but as roast
marmot ("boodog" in Mongolian) is a popular dish there, some unlucky
hunters catch it every year. See marmot photo at
<http://www.bobak.ru/pics/view/Marmota%20sibirica.jpg>;
and recipes at
<http://www.e-mongol.com/mongolia_culture_cooking-recipes.htm>;
- - Mod.JW]



Yup, you read that correctly: a link to actual roast marmot recipes, following a link about a case of one of the world's most feared pathogens! The recipes themselves sound pretty good--- apparently, standard marmot-roasting practice involves packing the body cavity of a cleaned and de-boned marmot with extremely hot rocks, sealing it up again, and letting it cook from within, making for very tender and tasty meat. Best of all, according to the recipe, this dish (called boodog) is a risky one to prepare: there is a chance that an improperly-packed marmot could explode during cooking due, one imagines, to gas buildup.

Three cheers for infectious disease epidemiology! One learns new things every day.

Friday, June 22, 2007

Word of the Day: Lithopedion

OK, boys and girls, it's time to learn a new word: Lithopedion. Roughly translated from the Greek, lithopedion essentially means "stone child". Apparently, some ectopic or abdominal pregnancies result in fetuses that are too large for the mother's body to reabsorb. Very, very, very rarely, in this cases, the mother's immune system walls off the necrotic fetal tissue by calcification, after which the lithopedion can remain present and undetected for decades. While hacking away at our medical image retrieval program this afternoon, we ran across this radiograph in our collection. Pretty wild, eh? Here's some further reading:

Lithopedion: laparoscopic diagnosis and removal.
Fertil Steril. 2007 May;87(5):1208-9. Epub 2007 Feb 6.
PMID: 17289039

Lithopedion presenting as intra-abdominal abscess and fecal fistula: report of a case and review of the literature.
Am Surg. 2006 Jan;72(1):77-8.
PMID: 16494190

Old abdominal pregnancy presenting as an ovarian neoplasm.
J Korean Med Sci. 2002 Apr;17(2):274-5.
PMID: 11961318

Lithopedion: a case report.
Clin Anat. 2001;14(1):52-4.
PMID: 11135399

For those of you who don't have access to medical libraries, I can pull any of these articles for you if you're interested.