When media is streamed in a SIP based voice over IP call, being audio, or video or both, one of the requirements is for the participants to know the media details, i.e. transport address, transport protocol, codec, ports and other session description metadata. SDP, also known as Session Description Protocol is the protocol used with SIP (session initiation protocol) to advertise such information.
SDP does not stream or provide the media itself, and it is not intended to support negotiation processes of streaming sessions or type of encodings used. It is intended to be of general purpose so it can be used in a range of network environments and different applications, and to use different transport protocols as appropriate, such as SIP and HTTP. This makes SDP forward compatible as it can support any upcoming media or traffic type.
The SDP Protocol explained
The SDP protocol can be split into three parts. The first part advertises the session details and is called “Session description”, the second part is called “Time description” which advertises timing details related to the session and the last part is called “Media description”, which advertises details about the media which will be streamed in an advertised session. Below is a list of the syntax used in the SDP protocol.
v= (protocol version) o= (owner/creator and session identification). s= (session name) i= (session information)* u= (URI of description)* e=(email address – contact detail)* p= (phone number – contact detail)* c= (connection information – not required if included in media description)* b= (session bandwidth information)* z= (time zone adjustments)* k= (encryption key)* a= (zero or more session attribute lines)*
t= (time the session is active) r= (repeat times)*
m= (media name/ transport address) i= (media title)* c= (connection information – not required if included in session description)* b= (bandwidth information)* k= (encryption key)* a= (zero or more media attribute lines)*
* Field is optional
Description of each field
- v=<version> – Specifies the version of Session Description Protocol. As specified in RFC 4566, up to now there is only one version, which is Version 0. No minor versions exist.
- o=<username><sess-id><sess-version><nettype><addrtype><unicast-address> Details about the originator and identification of the session.
- <username> – The user’s login. The MUST NOT contain spaces
- <sess-id> – A numeric string used as unique identifier for the session
- <sess-version> – A numeric string used as version number for this session description
- <nettype> – Text string, specifying the network type, e.g. IN for internet
- <addrtype> – Text string specifying the type of the address of originator E.g.IP4 or IP6
- <unicast-address> – The address of the machine from where the session is originating, which can be both FQDN or IP address.
- s=<session name> – Only one session name per session description can be specified. It must not be empty; therefore if no name is assigned to the session, a single empty space should be used as session name.
- i=<session description> – Only one session-level “i” field can be specified in the Session description. The “i” filed can be used in session or media description. It is primarily intended for labeling media streams when used in media description section. It can be a human readable description.
- u=<uri> – The URI (Uniform Resource Identifier) specified in the “u” filed, is a pointer to additional information about the session.
- e=<email address>
- p=<phone-number> – Specifies contact information for the person responsible for the conference.
- c=<nettype> <addrtype> <connection-address> – Connection information can be included in Session description or in media description. A session description MUST contain either at least one “c=” field in each media description or a single “c=” field at the session level
- <nettype> A text string describing the network type, e.g. IN for internet.
- <addrype> A text string describing the type of the address used in connection-address; E.g. IP4 or IP6.
- <connection-address> A Multicast IP address is specified including TTL, e.g. 126.96.36.199/127
- b=<bwtype>:<bandwidth> – Bandwidth field can be used both in the session description, specifying the total bandwidth of the whole session and can also be used in media description, per media session.
- <bwtype> Bandwidth type can be CT; conference total upper limit of bandwidth to be used, or AS; application specific, therefore it will be the application’s concept of maximum bandwidth.
- <bandwidth> is interpreted as kilobits per second by default.
- z=<adjustment time> <offset> <adjustment time> <offset> – To schedule a repeated session that specifies a change from daylight saving time to standard time or vice versa, it is necessary to specify difference from the originating time.
- k=<method>:<encryption key> – If channel is secure and trusted, SDP can be used to convey encryption keys. A key can be specified for the whole session or for each media description.
- <method> Indicates the mechanism which is used to obtain the encryption key from external sources or from encoding the given key. Several different methods exists, such as prompt and URI.
- <encryption key> The encryption key, or if URI is used as method, the URI from where the key can be retrieved.>
- a=<attribute>:<value> – Attributes may be defined at “session-level” or at “media-level” or both. Session level attributes are used to advertise additional information that applies to conference as a whole. Media level attributes are specific to the media, i.e. advertising information about the media stream.
- t=<start-time>:<value> – Specifies the start and stop times for a session. If a session is active at irregular intervals, multiple time entries can be used.
- r=<repeat interval> <active duration> <offsets from start-time> – If a session is to be repeated at fixed intervals, the “r” field is used. By default all values should be specified in seconds, but to make description more compact, time can also be given in different units, such as days, hours or minutes; e.g. r=6d 2h 14m.
- m=<media> <port>/<number of ports> <proto> <fmt> – This field is used in the media description section to advertise properties of the media stream, such as the port it will be using for transmitting, the protocol used for streaming and the format or codec.
- <media> Used to specify media type, generally this can be audio, video, text etc.
- <port> The port to which the media stream will be sent. Multiple ports can also be specified if more than 1 port is being used.
- <proto> The transport protocol used for streaming, e.g. RTP (real time protocol).
- <fmt> The format of the media being sent, e.g. in which codec is the media encoded; e.g. PCMU, GSM etc.
SDP and VoIP
In VoIP, the SDP protocol is used between two SIP entities to determine what codec to use between them to stream the voice or video call. It is also used to determine other properties related to the stream of media. In Part 2 of this article we will see examples and explanations on how SDP is used in a SIP based VoIP call. For further and more detailed information on the SDP protocol, please check RFC 4566.
Further information about SIP, SDP
- SIP Invite Header Fields Demystified
- SIP Messages (Request, Status) Explained
- Session Description Protocol – Part 1 – Birds Eye View
- Session Description Protocol – Part 2 – A deep dive
- DTMF, SIP, RFC 2833 – how they dance together