JEP-01xx: RTSP binding

Real time streaming between XMPP entities

WARNING: This Standards-Track JEP is Experimental. Publication as a Jabber Enhancement Proposal does not imply acceptance or approval of this proposal by the Jabber Software Foundation. Implementation of the protocol described herein is NOT RECOMMENDED except in an exploratory fashion (e.g., in a proof of concept). Production systems SHOULD NOT deploy implementations of this protocol until it advances to a status of Draft.

Author Information

Justin Karneges

Email: justin@affinix.com
JID: justin@andbit.net

JEP Information

Status: Experimental
Type: Standards Track
Number: 01xx
Version: 0.1
Last Updated: 2004-03-22
JIG: Standards JIG
Dependencies: JEP-0065, RFC 2326
Supersedes: None
Superseded By: None
Short Name: rtsp

Legal Notice

This Jabber Enhancement Proposal is copyright 1999 - 2004 by the Jabber Software Foundation (JSF) and is in full conformance with the JSF's Intellectual Property Rights Policy <http://jabber.org/jsf/ipr-policy.php>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at <http://www.opencontent.org/openpub/>).

Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core and XMPP IM specifications contributed by the Jabber Software Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocols defined herein have been developed outside the Internet Standards Process and are to be understood as extensions to XMPP rather than as an evolution, development, or modification of XMPP itself.

Discussion Venue

The preferred venue for discussion of this document is the Standards-JIG mailing list: <https://jabberstudio.org/mailman/listinfo/standards-jig>.


Table of Contents:
1. Introduction
2. Brief explanation of RTSP
3. Terminology
4. Protocol
5. Virtual UDP Ports
6. Resource Parameters
7. Implementation Notes
8. Security Considerations
9. IANA Considerations
10. Jabber Registrar Considerations
Notes
Revision History


1. Introduction

RFC 2326 [1] (Real Time Streaming Protocol) specifies a way of managing the transmission of real-time data. This is most useful for audio and video streaming, although RTSP can be used for any kind of content. Normally, RTSP takes place between endpoints addressed by IP. This JEP discusses a formal XMPP-RTSP binding, which acts as an abstraction so that RTSP can be performed between JIDs. It also provides a protocol for obtaining stream parameters over XMPP, something that normally would be performed using RFC 2327 [2] (SDP) over HTTP. We take this a step further by allowing the server to offer multiple stream parameter configurations that the client can choose from.

2. Brief explanation of RTSP

RTSP is a large RFC, and is just one of many RFCs that are needed to do anything useful with real-time streaming. However, implementors of this JEP should not be expected to have to know all the details of RTSP. Understanding the importance of RTSP is useful though. Briefly, here is how RTSP works:

First, it is an HTTP-like protocol that takes place over TCP. The client makes requests to the server over RTSP for certain resources. The following is a sample, lifted from the RFC. C=client and S=server.

    C->S: SETUP rtsp://example.com/foo/bar/baz.rm RTSP/1.0
          CSeq: 302
          Transport: RTP/AVP;unicast;client_port=4588-4589

    S->C: RTSP/1.0 200 OK
          CSeq: 302
          Date: 23 Jan 1997 15:35:06 GMT
          Session: 47112344
          Transport: RTP/AVP;unicast;
            client_port=4588-4589;server_port=6256-6257

This sets up the use of the provided data resource (it is an "rtsp://" URL). The client might follow up with a PLAY command afterward, to start the data streaming. Notice the "client_port" and "server_port" fields. This is preparation for UDP transmission using those ports. Generally, the RTSP channel is simply a control channel. The client uses it to start, pause, stop, etc, any resources that it wishes, but the actual streaming data is generally transmitted out of band. It is possible to transmit streaming data inband over the RTSP TCP channel, although note well that TCP is not suitable for transmitting data that requires real-time priority. For applications like voice chat, UDP is preferred.

3. Terminology

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3].

4. Protocol

RTSP normally takes place over TCP, and the actual streaming content over UDP. For our specification, instead of using raw TCP/UDP, we turn to SOCKS5 Bytestreams [4] to provide authenticated TCP connections and UDP sessions between JIDs. We then use these for RTSP and streaming data, respectively.

First, the Initiator (or 'serving entity') offers a stream to the Target (or 'client entity'):

Example 1. Initiating a video stream request

<iq type='set' id='rtsp_1' to='target@jabber.org/resource' from='initiator@jabber.org/resource'>
  <query xmlns='http://jabber.org/protocol/rtsp' sid='mySID'>
    <datagrams/>
    <resource type='video' name='rtsp://example.com/foo/video'>
      <params id='0'>
        <codec name='video/JPEG'/>
        <size width='320' height='240'/>
        <framerate>15</framerate>
      </params>
      <params id='1'>
        <codec name='video/JPEG'/>
        <size width='160' height='120'/>
        <framerate>10</framerate>
      </params>
    </resource>
    <resource type='audio' name='rtsp://example.com/foo/audio'>
      <params id='0'>
        <codec name='audio/x-speex'/>
        <sample rate='22050' size='16'/>
        <channels>1</channels>
      </params>
      <params id='1'>
        <codec name='audio/GSM'/>
        <sample rate='11025' size='8'/>
        <channels>1</channels>
      </params>
    </resource>
  </query>
</iq>

The 'sid' attribute acts as a unique id for this session. The <resource> element describes the various parameter configurations for a particular resource type. There can be multiple <resource> elements, but only one per type. In the example above, a stream is offered composed of both audio and video resources, each resource having multiple parameter configurations contained in <params> elements. There can be any number of such elements for a given resource type, and each one must have a unique id attribute within the context of this sid. The <datagrams> element indicates that the offering entity is willing to use UDP for stream transmission.

The Target replies and chooses suitable parameter configurations for each resource type. If the Initiator offered datagrams support and the Target wishes to take advantage of this, the <datagrams> MUST be present in the reply:

Example 2. Accepting a video stream request

<iq type='result' id='rtsp_1' to='initiator@jabber.org/resource' from='target@jabber.org/resource'>
  <query xmlns='http://jabber.org/protocol/rtsp' sid='mySID'>
    <datagrams/>
    <resource type='video' params='0'/>
    <resource type='audio' params='1'/>
  </query>
</iq>

If the Target is unwilling to accept the bytestream, it MUST return a "Not Acceptable" error to the Initiator:

Example 3. Target refuses video stream request

<iq type='error' id='rtsp_1' to='initiator@jabber.org/resource' from='target@jabber.org/resource'>
  <error code='406' type='auth'>
    <not-acceptable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

If the Target accepted the request, then the next step is for the Initiator to form the necessary SOCKS5 Bytestreams connections using the same 'sid' value. If datagrams support was negotiated, then the Initiator MUST first establish a UDP-based SOCKS5 Bytestreams connection with the Target. After this, or if datagrams support was not negotiated, the Initiator MUST form a TCP-based connection. Once the TCP-based connection is ready, the Target can now act as an RTSP client and the Initiator can act as an RTSP server using the established channel. The client MUST NOT issue RTSP commands to any resources other than the ones specified in the request (found in the 'name' attribute of the <resource> elements).

5. Virtual UDP Ports

If the datagrams mode has been negotiated, implementations MUST use "virtual" ports rather than real UDP ports, to multiplex many packets over one session. When the 'SETUP' command is used, client_port and server_port from both the request and response MUST contain virtual ports. The ports chosen MUST be in the range 1-65535, and care should be taken to not use the same virtual ports for different activities within the same RTSP session.

When a UDP packet is sent by either party, it MUST contain a 4-byte header (in addition to other possible headers, such as that of SOCKS5 Bytestreams), which consists of the source virtual port and then the destination virtual port of the packet, both 16-bit values in network byte order.

The programming interface for a Jabber-RTSP-aware UDP MUST report an available buffer space for UDP datagrams that is smaller than the actual space provided by the operating system and SOCKS5 Bytestreams layer if applicable. In other words, 4 more octets smaller.

6. Resource Parameters

The <params> element can have any number of configuration child elements. These elements are specific to the type of resource. All elements MUST be present unless indicated as Optional. Consult the table below:

Table 1:

Resource Type Element Description Example
video codec RFC 3555 [5] identifier of the video codec to use. <codec name='video/JPEG'/>
size The dimensions of the video frame. <size width='320' height='240'/>
framerate The video framerate in frames per second. <framerate>15</framerate>
bitrate The target bitrate of the data in bytes per second. (Optional) <bitrate>4096</bitrate>
colors Number of colors. (Optional) <colors>65536</colors>
audio codec RFC 3555 identifier of the audio codec to use. <codec name='audio/GSM'/>
sample Audio sample rate (samples per second) and sample size (in bits). <sample rate='22050' size='16'/>
channels Number of audio channels. 1=mono, 2=stereo, and so forth. <channels>1</channels>
bitrate The target bitrate of the data in bytes per second. (Optional) <bitrate>1024</bitrate>

7. Implementation Notes

8. Security Considerations

Section 12.39 of the RTSP RFC says that specifying a destination address other than the client's own could be used to cause denial-of-service attacks, and thus they should not be allowed unless the client is authenticated. SOCKS5 Bytestreams gives us authentication, therefore specifying alternate addresses MUST always be allowed.

9. IANA Considerations

No interaction with the Internet Assigned Numbers Authority (IANA) [6] is required as a result of this JEP.

10. Jabber Registrar Considerations

The Jabber Registrar [7] shall register the 'http://jabber.org/protocol/rtsp' namespace as a result of this JEP.


Notes

1. RFC 2326: Real Time Streaming Protocol (RTSP) <http://www.ietf.org/rfc/rfc2326.txt>.

2. RFC 2327: SDP: Session Description Protocol <http://www.ietf.org/rfc/rfc2327.txt>.

3. RFC 2119: Key words for use in RFCs to Indicate Requirement Levels <http://www.ietf.org/rfc/rfc2119.txt>.

4. JEP-0065: SOCKS5 Bytestreams <http://www.jabber.org/jeps/jep-0065.html>.

5. RFC 3555: MIME Type Registration of RTP Payload Formats <http://www.ietf.org/rfc/rfc3555.txt>.

6. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.

7. The Jabber Registrar maintains a list of reserved Jabber protocol namespaces as well as registries of parameters used in the context of protocols approved by the Jabber Software Foundation. For further information, see <http://www.jabber.org/registrar/>.


Revision History

Version 0.1 (2004-03-22)

Initial version. (jk)