9.4 – Protocols for Real-Time Conversational Applications
- Both RTP and SIP are standards for real-time conversation and are enjoying widespread implementation in industry products.
9.4.1 – RTP
- RTP defined in RFC 3550, is a standard that standardizes packet structure that includes fields for audio/video data, sequence number, and timestamps, as well as other potentially useful fields.
- RTP can be used for transporting common formats such as PCM, ACC, and MP3 for sound and MPEG and H.263 for video. But it can also be used for transporting proprietary sound and video formats.
- RTP is implemented in many products and prototypes, and is also a part of SIP.
- RTP typically runs on top of UDP.
- The sending side encapsulates a media chunk within an RTP packet, then encapsulates the packet in a UDP segment, and then hands the segment to IP.
- The receiving side extracts the RTP packet from the UDP segment, then extracts the media chunk from the RTP packet, and then passes the chunk to the media player for decoding and rendering.
- The RTP header is normally 12 bytes.
- The audio chunk along with the RTP header form the RTP packet.
- If an application incorporates RTP (instead of a proprietary scheme), then the application will more easily interoperate with other networked multimedia applications.
- F.ex. if 2 different VoIP applications incorporate RTP then there may be some hope that a user using one of the VoIP products will be able to communicate with the other VoIP product.
- RTP does not provide any mechanism to ensure timely delivery of data or provide other quality-of-service guarantees. RTP encapsulation is seen only at the end systems.
- RTP allows each source to be assigned its own independent RTP stream of packets.
- F.ex. for a video conference between two participants, four RTP streams could be opened (two for audio and two for video).
- However, many popular encoding techniques (f.ex. MPEG 1 and 2) bundles the audio and video into a single stream during the encoding process.
- RTP packets are not limited to unicast applications.
- RTP packets can be sent over one-to-many and many-to-many trees.
- For a many-to-many multicast session, all of the session’s senders and sources typically use the same multicast group for sending their RTP streams.
- RTP multicast streams belonging together, such as audio and video streams emanating from multiple senders in a video conference application, belong to an RTP session.
- RTP packet header fields
- Payload type:
- 7 bits long.
- For an audio stream, the payload type field is used to indicate the type of audio encoding that is being used. If a sender decided to change the encoding in the middle of the session, the sender can inform the receiver of the change through this field.
- Sequence number:
- 16 bit long.
- Increments by one for each packet sent.
- May be used to detect packet loss and to restore packet sequence.
- Timestamp field:
- 32 bit long.
- It reflects the sampling instant of the first byte in the RTP data packet.
- It can be used to remove jitter.
- The timestamp is derived from the sampling clock at the sender.
- Synchronization source identifier (SSRC):
- 32 bits long.
- It identifies the source of the RTP stream.
- Typically each stream in an RTP session has a distinct SSRC
- It is not the IP address of the sender, but a number that the source assigns randomly when the new stream is started.
- The probability that two streams get assigned the same SSRC is very small. Should it happened, then the two sources pick a new SSRC value.
- Audio payload types supported by RTP with their payload-type number:
- 0 - PCM
- 1 - 1016
- 3 - GSM
- 7 - LPC
- 9 - G.722
- 14 - MPEG audio
- 15 - G.728
- Some video payload types supported by RTP with their payload-type number:
- 26 - Motion JPEG
- 31 - H.261
- 31 - MPEG 1 video
- 33 - MPEG 2 video