More Organs → More Human

Stupid things I've figured out so that you don't have to.


Site Feed

Sunday, June 26, 2005

Cisco VOIP JTAPI magic

So, for a variety of reasons, work decided about two years ago to install a Cisco VOIP telephone system in our office. One of the major selling points of Cisco's system is that it allows custom Java applications to interface with it using the JTAPI (Java Telephony API). Now, anybody who has worked with this API can tell you that it is neither simple to use nor well-documented. Cisco's implementation of the API is far from complete, and their documentation is even worse than Sun's. The problem is compounded by the fact that Cisco made all sorts of interesting custom extensions to the API, but failed to provide any useful documentation or example programs. In other words, this thing's a bitch to program for. After all sorts of drama, however, we were able to figure out what we needed to know and finish our project. In the interests of saving anybody who might be doing Cisco JTAPI development some serious pain, I'd like to put up a brief description of how to do something so basic, so simple, so crucial, and so seemingly obvious that the uninitiated might think that I'm lying when I say that Cisco gives almost no clues how to do it: stream audio from your application to a phone. (Quick disclaimer: I haven't done any hacking on Cisco's phone system in about a year, so it's quite possible that the situation has improved since I used it last.)

What's that you say? Surely, this would be one of the most obvious features that you'd want in an application that interfaces with a phone system, right? Think of all the phone-based applications you interact with on a daily basis— voicemail, the bank's auto-teller, and so on— that rely on a computer program playing audio to your phone. Pretty much any application that has to interact with a user in some way over the phone relies on media streaming. Luckily, the JTAPI contains a whole package of classes that provide these functions, so clearly the designers of the API knew that it was something people would want to do.

Cisco, however, did not see fit to implement those handy functions (the one thing that Cisco actually does document well is what is and is not implemented). Its documentation is, in fact, very sparse on the subject of just how somebody would go about streaming media over its phone system using Java. There are a few tantalizingly-named classes in the "com.cisco" section of their JTAPI implementation's Javadocs (MediaTerminal, etc), but there is basically no documentation on how to go about using them. No sample code, nothing. I know, I know— I'm crazy, to think that they'd provide sample code demonstrating how to do one of the (presumably) most common things you'd want to do with their phone system. :-)

After banging our heads against the wall for a little while, we noticed a small paragraph buried deep within the Cisco JTAPI developer's guide explaining why. See, it turns out that Cisco thinks that, contrary to the API designer's point of view, you're not actually supposed to use the JTAPI to handle the audio transport— for that, you have to use something else. Cisco doesn't tell you what else to use, however. Here's where things get really fun.

See, Cisco VOIP systems use the RTP (Real-Time Protocol) to handle audio transport. RTP is a UDP-based network protocol that's designed to handle all sorts of media streaming. It's a huge, gnarly, complex mess, but works very well. It turns out that if you want to send audio across your Cisco phone system, your code has to handle all of the transport. The good news is that doing this in Java is not that big of a deal, theoretically speaking, since Sun provides a massive library of classes called the Java Media Framework (JMF). The JMF is designed to handle pretty much anything you'd ever care to do with any sort of time-based media. Want to write a shoutcast-style server in Java? The JMF can do that. Want to write video-conferencing software? JMF's got you covered.

The bad news is that, like all powerful libraries that manage extremely difficult and complex tasks, the JMF is a little bit tricky to use. OK, it's worse than that. It's really tricky to use. And the documentation's not great. OK, ok, you got me— the documentation totally sucks. Luckily, however, our problem— how to stream audio over RTP to a particular IP address— is just about the simplest thing that it is possible to do using the JMF. From the API's standpoint, what we're trying to do is a little bit like sandblasting a soup cracker. Once you've figured out how to work with the JMF, there is really only one tricky thing needed to get it to work with the Cisco VOIP system.

Before I tell you all about what that tricky step is, though, let's go over the general process for media streaming:


  1. Use JTAPI to somehow connect your application's code with a particular call.

  2. Determine the target endpoint's IP address and port number.

  3. Initiate an RTP session with that address/port

  4. Transmit your media.

  5. Do at least one of the following:

    1. Catch when your playback is complete, and take appropriate action

    2. Catch when the endpoint is no longer active (i.e., the user hangs up) and take appropriate action

    3. Catch user input events (i.e., DTMF) and take appropriate action

    4. Do whatever it is your application does





Steps 1, 2, and 5.2 and 5.3 are actually fairly easy to figure out, if you're willing to send a ton of time digging through Cisco's JTAPI documentation. It's a pain, and is way harder than it needs to be, and will require lots of experimentation and trial-and-error, but it is at least possible. Steps 3, 4, and 5.1 are doable if you follow a similar protocol with the JMF docs, but, as it turns out, only if you happen to stumble across some serious magic.

See, doing any kind of audio streaming involves selecting a codec. When you set up your Cisco VOIP system, the administrator made some codec-related decisions. Most likely, they decided on some variant of μLAW. This is important because your JMF code will need to match up exactly with what the phone is expecting. Otherwise, bits will arrive at the phone, but no sound will come out. That's what held us up for about two weeks- we'd gotten code to catch that a call was happening, figure out the endpoint address & port info, set up a connection, and stream audio. We knew the bits were hitting the phone— the Cisco 7940 handset has a network diagnostic mode where it can tell you how many packets have been sent or received by the phone, and the Rx number would increment as long as playback was taking place and stop as soon as the playback did.

I knew it had to be some sort of codec problem, but I couldn't imagine what it might be. I'd been careful to match my code's codec setup to what I knew the network was set to. Finally, I was able to get in touch with somebody who had already solved this problem. What he told me was that we needed to configure the packet size— how many milliseconds of audio would go in each packet over the network. Doing this meant delving a bit deeper into the JMF than we had been, and doing some pretty crazy custom codec configuration. The final solution involved a "magic number"... I still have no idea where this guy got it from, but it worked perfectly. I manually set my packets to 160ms in length, and suddenly my wav file was playing out of my handset. I swore at the time that I would get this information to somewhere publicly accessible on the internet so that nobody would ever have to spend two months of their lives banging away at such a stupid problem, so here it is. To give the code some context: in the JMF, the basic flow of audio playback is like this:

  1. Instantiate a Processor object

  2. Give it a source and a sink

  3. Configure it by invoking its configure() method

  4. Start playback



The following code snippet can be called in your event handler for the ConfigureUpdate event that will fire as part of the configuration step. It is mostly self contained, but there is a reference to an instance-scope Processor object called mProcessor. Copy-and-pasters, beware. :-)



00001: private boolean setTracksAndCodec() {
00002:
00003: // will only work for RTP
00004: ContentDescriptor content
00005: = new FileTypeDescriptor(FileTypeDescriptor.RAW_RTP);
00006: mProcessor.setContentDescriptor(content);
00007:
00008: TrackControl track[] = mProcessor.getTrackControls();
00009:
00010: boolean encodingOk = false;
00011:
00012: // Go through the tracks and try to program one of them to
00013: // output ulaw data.
00014: for (int i = 0; i < track.length; i++) {
00015:
00016: if (track[i].isEnabled()) {
00017:
00018: Codec[] ciscoCodecChain = new Codec[3];
00019:
00020: ciscoCodecChain[0] = new RCModule();
00021: ciscoCodecChain[1] = new JavaEncoder();
00022: ciscoCodecChain[2] = new Packetizer();
00023: ((Packetizer) ciscoCodecChain[2]).setPacketSize(160); // the magic happens here!!!
00024: try {
00025: track[i].setCodecChain(ciscoCodecChain);
00026: } catch (Exception ex) {
00027: System.out.println("Couldn't set codec chain: " + ex);
00028: System.exit(-1);
00029: }
00030:
00031: Format[] supportedFormats = track[i].getSupportedFormats();
00032:
00033: int formatToSet = -1;
00034:
00035: for (int j = 0; j < supportedFormats.length; j++) {
00036: if (supportedFormats[j].toString().indexOf("ULAW/rtp") >= 0) {
00037: formatToSet = j;
00038: }
00039: }
00040:
00041: if (formatToSet >= 0) {
00042: track[i].setFormat(supportedFormats[formatToSet]);
00043:
00044: encodingOk = true;
00045: } else {
00046: track[i].setEnabled(false);
00047: }
00048:
00049: }
00050: }
00051:
00052: if (encodingOk) {
00053: return true;
00054: } else {
00055: System.out.println("Couldn't program any tracks, quitting.");
00056: return false;
00057: }
00058: }



So, there you have it. I know this is kind of an odd thing to start off a blog with, but it seemed appropriate. It falls squarely under the banner of "Stupid things I figured out so that you don't have to". If just one person out there is spared the weeks of pain caused by this stupid problem, this post will have done its job. Enjoy!