Introducing PsiMedia
Voice (and video) chat is a feature we’ve wanted in Psi for a long time. However, implementing voice/video chat is not straightforward, and this is partly due to all of the new concepts that have to be introduced into the application in order to make it happen. Cameras, microphones, codecs, and RTP are all just very foreign to Psi. The code necessary to handle a multimedia “stack” could easily exceed the amount of code in our own IM stack! Fortunately, there are libraries out there to handle the task.
In 2004, we considered RealNetworks’ Helix framework. For receiving content, we found this framework to be quite mature. However, for transmitting content, it was clearly not designed for end-user desktop applications and was even GPL-incompatible in that scenario. Quite some work went into the Psi+Helix effort, but ultimately it was abandoned.
In 2005, we considered Google’s libjingle. We managed to get voice chat working with it, but the code never went beyond the experimental stage. This was due to the limited platform support at the time (Linux audio only at first, though Remko managed to add in Mac audio support) and libjingle’s lack of maintenance. Libjingle works as a black box, handling not only multimedia but also the Jingle protocol. Unfortunately, this meant that as the Jingle protocol changed, libjingle fell out of spec. We also felt it was a tad intrusive for libjingle to be handling XMPP stuff.
In 2006, we investigated GStreamer. This framework has proved to be the most interesting thus far, for a number of reasons. Unlike the limited libjingle black-box, GStreamer is a comprehensive and flexible multimedia framework, similar in nature to Helix. It goes further than Helix though, by offering a better API for transmitting, by being GPL-compatible throughout, and by being easier to extend. I feel confident we can accomplish everything we need with GStreamer.
Today there is Phonon, however it lacks input and transmission facilities at this time. We will keep an eye on it for the future. There is also Farsight, which integrates with GStreamer. We may make use of Farsight, depending on our needs.
In any case, I’ve started a new “wrapper” project called PsiMedia. The goal of PsiMedia is to offer an API designed for the purpose of adding voice and video chat to Psi or a Psi-like client. All of the details the client does not care about will be hidden behind PsiMedia. It solves only the multimedia aspects, and not Jingle/XMPP, as I consider these two problems to be orthogonal. Currently PsiMedia wraps GStreamer, but the requirements are abstract enough that the client should not care what is actually wrapped. PsiMedia can be considered the successor of the old “Media” module I started in 2004, to wrap Helix.
Below are the requirements of the system.
What PsiMedia does:
- Tell you what audio and video devices are available.
- Tell you what audio/video modes are possible (codecs, sample rates, video resolutions, etc).
- Allow you to specify your desired modes, and the modes of the remote party, to arrive at a list if common modes.
- Capture audio/video and encode as RTP into a series of QByteArrays.
- Accept QByteArrays containing RTP, and playback any audio/video contained within.
- Play back video in a QWidget.
- Allow displaying video currently being captured (preview of yourself).
- Volume controls.
- Ability to separate the backend into a plugin, so that no new compile-time dependencies are introduced to Psi.
(RTP, by the way, is a standard packet format for transporting multimedia data in real-time. It is used by SIP, Jingle, and, well, everybody.)
What PsiMedia does not do:
- Use the network.
- Implement Jingle or anything XMPP.
- Expose anything more than very basic multimedia details. There are no filters, no pipelines, etc.
In short, PsiMedia should make implementing voice/video chat in Psi straightforward.




Spike said,
July 12, 2008 @ 3:00 am
Good news!
By the way, how does it fit in Kopete Jingle GSoC 2008 project?
justin said,
July 12, 2008 @ 9:43 am
According to the Kopete-devel mailing list, the GSoC plan is to use Phonon, and to add any missing features to it as necessary.
However, Kopete is welcome to use PsiMedia if they choose to. It is designed to be reusable.
kael said,
November 13, 2008 @ 3:04 am
This looks great. I’m impatient to use Psi with Jingle media. Is there any planned release date for beta-testing with Psi ?
BTW, have you considered VCR-like media-control capabilities (e.g.: ‘play’,stop, rewind, etc.) ?
I’ve been thinking that it might be possible to emulate media-control with Jingle DTMF and a keypad (like Jabbin’s one). Although this part would probably be implemented in Psi, perhaps media-control capabilities would be part of PsiMedia.
justin said,
November 13, 2008 @ 9:31 am
kael,
PsiMedia is meant just for live voice/video chat. I’m not sure how media-controls (rewind? :)) fit into that.
I’d say it will be January before there is anything to test in Psi. The PsiMedia demo app can be tested earlier, of course. If you’re interested in following development, see the Delta mailing list.
kael said,
November 14, 2008 @ 12:28 am
Justin,
Actually, I’d wish to run a bot to broadcast media files with Jingle - similarly to the UCT IPtv Streaming Server (notice that it allows to launch a video by “calling” a SIP URI), and use Psi as a simple media player, similarly to the UCT IMS Client.
I’d also like to control the stream on server-side, and DTMF looks like a good solution for that - yes, with “rewind” also
. I’m thinking that there could be an XMPP extension to remote-control (future) Jingle media players, though.
But I realize that the media-control capabilities should be handled mainly by the bot and that PsiMedia would have few to do with them.
BTW, wondering, is/will it be possible to display a full screen ? Also, how is the quality of the video, can it be of very good quality ?
Anyway, I’m following the list and impatient to beta-test Psi implementation.
justin said,
November 14, 2008 @ 1:18 am
PsiMedia does allow using a file as an input source instead of hardware, but this feature is limited and meant just for testing. Maybe once PsiMedia is ready for use, you can consider expanding on it. I suggest proposing your ideas on the mailing list.
From an API perspective, full screen should be possible by simply setting the video QWidget to full screen. However, this may not perform very well at the current time.
As for “very good” quality, I’m not sure. Certainly the Theora codec can look beautiful in an ogg stream, but I don’t know what constraints RTP/UDP imposes on quality.