mirror of
https://gitlab.freedesktop.org/pulseaudio/pulseaudio.git
synced 2025-11-14 06:59:53 -05:00
git-svn-id: file:///home/lennart/svn/public/pulseaudio/trunk@712 fefdeb5f-60dc-0310-8127-8f9354f1896f
2467 lines
104 KiB
Text
2467 lines
104 KiB
Text
|
||
|
||
|
||
|
||
|
||
|
||
Network Working Group H. Schulzrinne
|
||
Request for Comments: 3551 Columbia University
|
||
Obsoletes: 1890 S. Casner
|
||
Category: Standards Track Packet Design
|
||
July 2003
|
||
|
||
|
||
RTP Profile for Audio and Video Conferences
|
||
with Minimal Control
|
||
|
||
Status of this Memo
|
||
|
||
This document specifies an Internet standards track protocol for the
|
||
Internet community, and requests discussion and suggestions for
|
||
improvements. Please refer to the current edition of the "Internet
|
||
Official Protocol Standards" (STD 1) for the standardization state
|
||
and status of this protocol. Distribution of this memo is unlimited.
|
||
|
||
Copyright Notice
|
||
|
||
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
||
|
||
Abstract
|
||
|
||
This document describes a profile called "RTP/AVP" for the use of the
|
||
real-time transport protocol (RTP), version 2, and the associated
|
||
control protocol, RTCP, within audio and video multiparticipant
|
||
conferences with minimal control. It provides interpretations of
|
||
generic fields within the RTP specification suitable for audio and
|
||
video conferences. In particular, this document defines a set of
|
||
default mappings from payload type numbers to encodings.
|
||
|
||
This document also describes how audio and video data may be carried
|
||
within RTP. It defines a set of standard encodings and their names
|
||
when used within RTP. The descriptions provide pointers to reference
|
||
implementations and the detailed standards. This document is meant
|
||
as an aid for implementors of audio, video and other real-time
|
||
multimedia applications.
|
||
|
||
This memorandum obsoletes RFC 1890. It is mostly backwards-
|
||
compatible except for functions removed because two interoperable
|
||
implementations were not found. The additions to RFC 1890 codify
|
||
existing practice in the use of payload formats under this profile
|
||
and include new payload formats defined since RFC 1890 was published.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 1]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
Table of Contents
|
||
|
||
1. Introduction ................................................. 3
|
||
1.1 Terminology ............................................. 3
|
||
2. RTP and RTCP Packet Forms and Protocol Behavior .............. 4
|
||
3. Registering Additional Encodings ............................. 6
|
||
4. Audio ........................................................ 8
|
||
4.1 Encoding-Independent Rules .............................. 8
|
||
4.2 Operating Recommendations ............................... 9
|
||
4.3 Guidelines for Sample-Based Audio Encodings ............. 10
|
||
4.4 Guidelines for Frame-Based Audio Encodings .............. 11
|
||
4.5 Audio Encodings ......................................... 12
|
||
4.5.1 DVI4 ............................................ 13
|
||
4.5.2 G722 ............................................ 14
|
||
4.5.3 G723 ............................................ 14
|
||
4.5.4 G726-40, G726-32, G726-24, and G726-16 .......... 18
|
||
4.5.5 G728 ............................................ 19
|
||
4.5.6 G729 ............................................ 20
|
||
4.5.7 G729D and G729E ................................. 22
|
||
4.5.8 GSM ............................................. 24
|
||
4.5.9 GSM-EFR ......................................... 27
|
||
4.5.10 L8 .............................................. 27
|
||
4.5.11 L16 ............................................. 27
|
||
4.5.12 LPC ............................................. 27
|
||
4.5.13 MPA ............................................. 28
|
||
4.5.14 PCMA and PCMU ................................... 28
|
||
4.5.15 QCELP ........................................... 28
|
||
4.5.16 RED ............................................. 29
|
||
4.5.17 VDVI ............................................ 29
|
||
5. Video ........................................................ 30
|
||
5.1 CelB .................................................... 30
|
||
5.2 JPEG .................................................... 30
|
||
5.3 H261 .................................................... 30
|
||
5.4 H263 .................................................... 31
|
||
5.5 H263-1998 ............................................... 31
|
||
5.6 MPV ..................................................... 31
|
||
5.7 MP2T .................................................... 31
|
||
5.8 nv ...................................................... 32
|
||
6. Payload Type Definitions ..................................... 32
|
||
7. RTP over TCP and Similar Byte Stream Protocols ............... 34
|
||
8. Port Assignment .............................................. 34
|
||
9. Changes from RFC 1890 ........................................ 35
|
||
10. Security Considerations ...................................... 38
|
||
11. IANA Considerations .......................................... 39
|
||
12. References ................................................... 39
|
||
12.1 Normative References .................................... 39
|
||
12.2 Informative References .................................. 39
|
||
13. Current Locations of Related Resources ....................... 41
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 2]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
14. Acknowledgments .............................................. 42
|
||
15. Intellectual Property Rights Statement ....................... 43
|
||
16. Authors' Addresses ........................................... 43
|
||
17. Full Copyright Statement ..................................... 44
|
||
|
||
1. Introduction
|
||
|
||
This profile defines aspects of RTP left unspecified in the RTP
|
||
Version 2 protocol definition (RFC 3550) [1]. This profile is
|
||
intended for the use within audio and video conferences with minimal
|
||
session control. In particular, no support for the negotiation of
|
||
parameters or membership control is provided. The profile is
|
||
expected to be useful in sessions where no negotiation or membership
|
||
control are used (e.g., using the static payload types and the
|
||
membership indications provided by RTCP), but this profile may also
|
||
be useful in conjunction with a higher-level control protocol.
|
||
|
||
Use of this profile may be implicit in the use of the appropriate
|
||
applications; there may be no explicit indication by port number,
|
||
protocol identifier or the like. Applications such as session
|
||
directories may use the name for this profile specified in Section
|
||
11.
|
||
|
||
Other profiles may make different choices for the items specified
|
||
here.
|
||
|
||
This document also defines a set of encodings and payload formats for
|
||
audio and video. These payload format descriptions are included here
|
||
only as a matter of convenience since they are too small to warrant
|
||
separate documents. Use of these payload formats is NOT REQUIRED to
|
||
use this profile. Only the binding of some of the payload formats to
|
||
static payload type numbers in Tables 4 and 5 is normative.
|
||
|
||
1.1 Terminology
|
||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
||
document are to be interpreted as described in RFC 2119 [2] and
|
||
indicate requirement levels for implementations compliant with this
|
||
RTP profile.
|
||
|
||
This document defines the term media type as dividing encodings of
|
||
audio and video content into three classes: audio, video and
|
||
audio/video (interleaved).
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 3]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
2. RTP and RTCP Packet Forms and Protocol Behavior
|
||
|
||
The section "RTP Profiles and Payload Format Specifications" of RFC
|
||
3550 enumerates a number of items that can be specified or modified
|
||
in a profile. This section addresses these items. Generally, this
|
||
profile follows the default and/or recommended aspects of the RTP
|
||
specification.
|
||
|
||
RTP data header: The standard format of the fixed RTP data
|
||
header is used (one marker bit).
|
||
|
||
Payload types: Static payload types are defined in Section 6.
|
||
|
||
RTP data header additions: No additional fixed fields are
|
||
appended to the RTP data header.
|
||
|
||
RTP data header extensions: No RTP header extensions are
|
||
defined, but applications operating under this profile MAY use
|
||
such extensions. Thus, applications SHOULD NOT assume that the
|
||
RTP header X bit is always zero and SHOULD be prepared to ignore
|
||
the header extension. If a header extension is defined in the
|
||
future, that definition MUST specify the contents of the first 16
|
||
bits in such a way that multiple different extensions can be
|
||
identified.
|
||
|
||
RTCP packet types: No additional RTCP packet types are defined
|
||
by this profile specification.
|
||
|
||
RTCP report interval: The suggested constants are to be used for
|
||
the RTCP report interval calculation. Sessions operating under
|
||
this profile MAY specify a separate parameter for the RTCP traffic
|
||
bandwidth rather than using the default fraction of the session
|
||
bandwidth. The RTCP traffic bandwidth MAY be divided into two
|
||
separate session parameters for those participants which are
|
||
active data senders and those which are not. Following the
|
||
recommendation in the RTP specification [1] that 1/4 of the RTCP
|
||
bandwidth be dedicated to data senders, the RECOMMENDED default
|
||
values for these two parameters would be 1.25% and 3.75%,
|
||
respectively. For a particular session, the RTCP bandwidth for
|
||
non-data-senders MAY be set to zero when operating on
|
||
unidirectional links or for sessions that don't require feedback
|
||
on the quality of reception. The RTCP bandwidth for data senders
|
||
SHOULD be kept non-zero so that sender reports can still be sent
|
||
for inter-media synchronization and to identify the source by
|
||
CNAME. The means by which the one or two session parameters for
|
||
RTCP bandwidth are specified is beyond the scope of this memo.
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 4]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
SR/RR extension: No extension section is defined for the RTCP SR
|
||
or RR packet.
|
||
|
||
SDES use: Applications MAY use any of the SDES items described
|
||
in the RTP specification. While CNAME information MUST be sent
|
||
every reporting interval, other items SHOULD only be sent every
|
||
third reporting interval, with NAME sent seven out of eight times
|
||
within that slot and the remaining SDES items cyclically taking up
|
||
the eighth slot, as defined in Section 6.2.2 of the RTP
|
||
specification. In other words, NAME is sent in RTCP packets 1, 4,
|
||
7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 22.
|
||
|
||
Security: The RTP default security services are also the default
|
||
under this profile.
|
||
|
||
String-to-key mapping: No mapping is specified by this profile.
|
||
|
||
Congestion: RTP and this profile may be used in the context of
|
||
enhanced network service, for example, through Integrated Services
|
||
(RFC 1633) [4] or Differentiated Services (RFC 2475) [5], or they
|
||
may be used with best effort service.
|
||
|
||
If enhanced service is being used, RTP receivers SHOULD monitor
|
||
packet loss to ensure that the service that was requested is
|
||
actually being delivered. If it is not, then they SHOULD assume
|
||
that they are receiving best-effort service and behave
|
||
accordingly.
|
||
|
||
If best-effort service is being used, RTP receivers SHOULD monitor
|
||
packet loss to ensure that the packet loss rate is within
|
||
acceptable parameters. Packet loss is considered acceptable if a
|
||
TCP flow across the same network path and experiencing the same
|
||
network conditions would achieve an average throughput, measured
|
||
on a reasonable timescale, that is not less than the RTP flow is
|
||
achieving. This condition can be satisfied by implementing
|
||
congestion control mechanisms to adapt the transmission rate (or
|
||
the number of layers subscribed for a layered multicast session),
|
||
or by arranging for a receiver to leave the session if the loss
|
||
rate is unacceptably high.
|
||
|
||
The comparison to TCP cannot be specified exactly, but is intended
|
||
as an "order-of-magnitude" comparison in timescale and throughput.
|
||
The timescale on which TCP throughput is measured is the round-
|
||
trip time of the connection. In essence, this requirement states
|
||
that it is not acceptable to deploy an application (using RTP or
|
||
any other transport protocol) on the best-effort Internet which
|
||
consumes bandwidth arbitrarily and does not compete fairly with
|
||
TCP within an order of magnitude.
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 5]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
Underlying protocol: The profile specifies the use of RTP over
|
||
unicast and multicast UDP as well as TCP. (This does not preclude
|
||
the use of these definitions when RTP is carried by other lower-
|
||
layer protocols.)
|
||
|
||
Transport mapping: The standard mapping of RTP and RTCP to
|
||
transport-level addresses is used.
|
||
|
||
Encapsulation: This profile leaves to applications the
|
||
specification of RTP encapsulation in protocols other than UDP.
|
||
|
||
3. Registering Additional Encodings
|
||
|
||
This profile lists a set of encodings, each of which is comprised of
|
||
a particular media data compression or representation plus a payload
|
||
format for encapsulation within RTP. Some of those payload formats
|
||
are specified here, while others are specified in separate RFCs. It
|
||
is expected that additional encodings beyond the set listed here will
|
||
be created in the future and specified in additional payload format
|
||
RFCs.
|
||
|
||
This profile also assigns to each encoding a short name which MAY be
|
||
used by higher-level control protocols, such as the Session
|
||
Description Protocol (SDP), RFC 2327 [6], to identify encodings
|
||
selected for a particular RTP session.
|
||
|
||
In some contexts it may be useful to refer to these encodings in the
|
||
form of a MIME content-type. To facilitate this, RFC 3555 [7]
|
||
provides registrations for all of the encodings names listed here as
|
||
MIME subtype names under the "audio" and "video" MIME types through
|
||
the MIME registration procedure as specified in RFC 2048 [8].
|
||
|
||
Any additional encodings specified for use under this profile (or
|
||
others) may also be assigned names registered as MIME subtypes with
|
||
the Internet Assigned Numbers Authority (IANA). This registry
|
||
provides a means to insure that the names assigned to the additional
|
||
encodings are kept unique. RFC 3555 specifies the information that
|
||
is required for the registration of RTP encodings.
|
||
|
||
In addition to assigning names to encodings, this profile also
|
||
assigns static RTP payload type numbers to some of them. However,
|
||
the payload type number space is relatively small and cannot
|
||
accommodate assignments for all existing and future encodings.
|
||
During the early stages of RTP development, it was necessary to use
|
||
statically assigned payload types because no other mechanism had been
|
||
specified to bind encodings to payload types. It was anticipated
|
||
that non-RTP means beyond the scope of this memo (such as directory
|
||
services or invitation protocols) would be specified to establish a
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 6]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
dynamic mapping between a payload type and an encoding. Now,
|
||
mechanisms for defining dynamic payload type bindings have been
|
||
specified in the Session Description Protocol (SDP) and in other
|
||
protocols such as ITU-T Recommendation H.323/H.245. These mechanisms
|
||
associate the registered name of the encoding/payload format, along
|
||
with any additional required parameters, such as the RTP timestamp
|
||
clock rate and number of channels, with a payload type number. This
|
||
association is effective only for the duration of the RTP session in
|
||
which the dynamic payload type binding is made. This association
|
||
applies only to the RTP session for which it is made, thus the
|
||
numbers can be re-used for different encodings in different sessions
|
||
so the number space limitation is avoided.
|
||
|
||
This profile reserves payload type numbers in the range 96-127
|
||
exclusively for dynamic assignment. Applications SHOULD first use
|
||
values in this range for dynamic payload types. Those applications
|
||
which need to define more than 32 dynamic payload types MAY bind
|
||
codes below 96, in which case it is RECOMMENDED that unassigned
|
||
payload type numbers be used first. However, the statically assigned
|
||
payload types are default bindings and MAY be dynamically bound to
|
||
new encodings if needed. Redefining payload types below 96 may cause
|
||
incorrect operation if an attempt is made to join a session without
|
||
obtaining session description information that defines the dynamic
|
||
payload types.
|
||
|
||
Dynamic payload types SHOULD NOT be used without a well-defined
|
||
mechanism to indicate the mapping. Systems that expect to
|
||
interoperate with others operating under this profile SHOULD NOT make
|
||
their own assignments of proprietary encodings to particular, fixed
|
||
payload types.
|
||
|
||
This specification establishes the policy that no additional static
|
||
payload types will be assigned beyond the ones defined in this
|
||
document. Establishing this policy avoids the problem of trying to
|
||
create a set of criteria for accepting static assignments and
|
||
encourages the implementation and deployment of the dynamic payload
|
||
type mechanisms.
|
||
|
||
The final set of static payload type assignments is provided in
|
||
Tables 4 and 5.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 7]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4. Audio
|
||
|
||
4.1 Encoding-Independent Rules
|
||
|
||
Since the ability to suppress silence is one of the primary
|
||
motivations for using packets to transmit voice, the RTP header
|
||
carries both a sequence number and a timestamp to allow a receiver to
|
||
distinguish between lost packets and periods of time when no data was
|
||
transmitted. Discontiguous transmission (silence suppression) MAY be
|
||
used with any audio payload format. Receivers MUST assume that
|
||
senders may suppress silence unless this is restricted by signaling
|
||
specified elsewhere. (Even if the transmitter does not suppress
|
||
silence, the receiver should be prepared to handle periods when no
|
||
data is present since packets may be lost.)
|
||
|
||
Some payload formats (see Sections 4.5.3 and 4.5.6) define a "silence
|
||
insertion descriptor" or "comfort noise" frame to specify parameters
|
||
for artificial noise that may be generated during a period of silence
|
||
to approximate the background noise at the source. For other payload
|
||
formats, a generic Comfort Noise (CN) payload format is specified in
|
||
RFC 3389 [9]. When the CN payload format is used with another
|
||
payload format, different values in the RTP payload type field
|
||
distinguish comfort-noise packets from those of the selected payload
|
||
format.
|
||
|
||
For applications which send either no packets or occasional comfort-
|
||
noise packets during silence, the first packet of a talkspurt, that
|
||
is, the first packet after a silence period during which packets have
|
||
not been transmitted contiguously, SHOULD be distinguished by setting
|
||
the marker bit in the RTP data header to one. The marker bit in all
|
||
other packets is zero. The beginning of a talkspurt MAY be used to
|
||
adjust the playout delay to reflect changing network delays.
|
||
Applications without silence suppression MUST set the marker bit to
|
||
zero.
|
||
|
||
The RTP clock rate used for generating the RTP timestamp is
|
||
independent of the number of channels and the encoding; it usually
|
||
equals the number of sampling periods per second. For N-channel
|
||
encodings, each sampling period (say, 1/8,000 of a second) generates
|
||
N samples. (This terminology is standard, but somewhat confusing, as
|
||
the total number of samples generated per second is then the sampling
|
||
rate times the channel count.)
|
||
|
||
If multiple audio channels are used, channels are numbered left-to-
|
||
right, starting at one. In RTP audio packets, information from
|
||
lower-numbered channels precedes that from higher-numbered channels.
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 8]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
For more than two channels, the convention followed by the AIFF-C
|
||
audio interchange format SHOULD be followed [3], using the following
|
||
notation, unless some other convention is specified for a particular
|
||
encoding or payload format:
|
||
|
||
l left
|
||
r right
|
||
c center
|
||
S surround
|
||
F front
|
||
R rear
|
||
|
||
channels description channel
|
||
1 2 3 4 5 6
|
||
_________________________________________________
|
||
2 stereo l r
|
||
3 l r c
|
||
4 l c r S
|
||
5 Fl Fr Fc Sl Sr
|
||
6 l lc c r rc S
|
||
|
||
Note: RFC 1890 defined two conventions for the ordering of four
|
||
audio channels. Since the ordering is indicated implicitly by
|
||
the number of channels, this was ambiguous. In this revision,
|
||
the order described as "quadrophonic" has been eliminated to
|
||
remove the ambiguity. This choice was based on the observation
|
||
that quadrophonic consumer audio format did not become popular
|
||
whereas surround-sound subsequently has.
|
||
|
||
Samples for all channels belonging to a single sampling instant MUST
|
||
be within the same packet. The interleaving of samples from
|
||
different channels depends on the encoding. General guidelines are
|
||
given in Section 4.3 and 4.4.
|
||
|
||
The sampling frequency SHOULD be drawn from the set: 8,000, 11,025,
|
||
16,000, 22,050, 24,000, 32,000, 44,100 and 48,000 Hz. (Older Apple
|
||
Macintosh computers had a native sample rate of 22,254.54 Hz, which
|
||
can be converted to 22,050 with acceptable quality by dropping 4
|
||
samples in a 20 ms frame.) However, most audio encodings are defined
|
||
for a more restricted set of sampling frequencies. Receivers SHOULD
|
||
be prepared to accept multi-channel audio, but MAY choose to only
|
||
play a single channel.
|
||
|
||
4.2 Operating Recommendations
|
||
|
||
The following recommendations are default operating parameters.
|
||
Applications SHOULD be prepared to handle other values. The ranges
|
||
given are meant to give guidance to application writers, allowing a
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 9]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
set of applications conforming to these guidelines to interoperate
|
||
without additional negotiation. These guidelines are not intended to
|
||
restrict operating parameters for applications that can negotiate a
|
||
set of interoperable parameters, e.g., through a conference control
|
||
protocol.
|
||
|
||
For packetized audio, the default packetization interval SHOULD have
|
||
a duration of 20 ms or one frame, whichever is longer, unless
|
||
otherwise noted in Table 1 (column "ms/packet"). The packetization
|
||
interval determines the minimum end-to-end delay; longer packets
|
||
introduce less header overhead but higher delay and make packet loss
|
||
more noticeable. For non-interactive applications such as lectures
|
||
or for links with severe bandwidth constraints, a higher
|
||
packetization delay MAY be used. A receiver SHOULD accept packets
|
||
representing between 0 and 200 ms of audio data. (For framed audio
|
||
encodings, a receiver SHOULD accept packets with a number of frames
|
||
equal to 200 ms divided by the frame duration, rounded up.) This
|
||
restriction allows reasonable buffer sizing for the receiver.
|
||
|
||
4.3 Guidelines for Sample-Based Audio Encodings
|
||
|
||
In sample-based encodings, each audio sample is represented by a
|
||
fixed number of bits. Within the compressed audio data, codes for
|
||
individual samples may span octet boundaries. An RTP audio packet
|
||
may contain any number of audio samples, subject to the constraint
|
||
that the number of bits per sample times the number of samples per
|
||
packet yields an integral octet count. Fractional encodings produce
|
||
less than one octet per sample.
|
||
|
||
The duration of an audio packet is determined by the number of
|
||
samples in the packet.
|
||
|
||
For sample-based encodings producing one or more octets per sample,
|
||
samples from different channels sampled at the same sampling instant
|
||
SHOULD be packed in consecutive octets. For example, for a two-
|
||
channel encoding, the octet sequence is (left channel, first sample),
|
||
(right channel, first sample), (left channel, second sample), (right
|
||
channel, second sample), .... For multi-octet encodings, octets
|
||
SHOULD be transmitted in network byte order (i.e., most significant
|
||
octet first).
|
||
|
||
The packing of sample-based encodings producing less than one octet
|
||
per sample is encoding-specific.
|
||
|
||
The RTP timestamp reflects the instant at which the first sample in
|
||
the packet was sampled, that is, the oldest information in the
|
||
packet.
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 10]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.4 Guidelines for Frame-Based Audio Encodings
|
||
|
||
Frame-based encodings encode a fixed-length block of audio into
|
||
another block of compressed data, typically also of fixed length.
|
||
For frame-based encodings, the sender MAY choose to combine several
|
||
such frames into a single RTP packet. The receiver can tell the
|
||
number of frames contained in an RTP packet, if all the frames have
|
||
the same length, by dividing the RTP payload length by the audio
|
||
frame size which is defined as part of the encoding. This does not
|
||
work when carrying frames of different sizes unless the frame sizes
|
||
are relatively prime. If not, the frames MUST indicate their size.
|
||
|
||
For frame-based codecs, the channel order is defined for the whole
|
||
block. That is, for two-channel audio, right and left samples SHOULD
|
||
be coded independently, with the encoded frame for the left channel
|
||
preceding that for the right channel.
|
||
|
||
All frame-oriented audio codecs SHOULD be able to encode and decode
|
||
several consecutive frames within a single packet. Since the frame
|
||
size for the frame-oriented codecs is given, there is no need to use
|
||
a separate designation for the same encoding, but with different
|
||
number of frames per packet.
|
||
|
||
RTP packets SHALL contain a whole number of frames, with frames
|
||
inserted according to age within a packet, so that the oldest frame
|
||
(to be played first) occurs immediately after the RTP packet header.
|
||
The RTP timestamp reflects the instant at which the first sample in
|
||
the first frame was sampled, that is, the oldest information in the
|
||
packet.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 11]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5 Audio Encodings
|
||
|
||
name of sampling default
|
||
encoding sample/frame bits/sample rate ms/frame ms/packet
|
||
__________________________________________________________________
|
||
DVI4 sample 4 var. 20
|
||
G722 sample 8 16,000 20
|
||
G723 frame N/A 8,000 30 30
|
||
G726-40 sample 5 8,000 20
|
||
G726-32 sample 4 8,000 20
|
||
G726-24 sample 3 8,000 20
|
||
G726-16 sample 2 8,000 20
|
||
G728 frame N/A 8,000 2.5 20
|
||
G729 frame N/A 8,000 10 20
|
||
G729D frame N/A 8,000 10 20
|
||
G729E frame N/A 8,000 10 20
|
||
GSM frame N/A 8,000 20 20
|
||
GSM-EFR frame N/A 8,000 20 20
|
||
L8 sample 8 var. 20
|
||
L16 sample 16 var. 20
|
||
LPC frame N/A 8,000 20 20
|
||
MPA frame N/A var. var.
|
||
PCMA sample 8 var. 20
|
||
PCMU sample 8 var. 20
|
||
QCELP frame N/A 8,000 20 20
|
||
VDVI sample var. var. 20
|
||
|
||
Table 1: Properties of Audio Encodings (N/A: not applicable; var.:
|
||
variable)
|
||
|
||
The characteristics of the audio encodings described in this document
|
||
are shown in Table 1; they are listed in order of their payload type
|
||
in Table 4. While most audio codecs are only specified for a fixed
|
||
sampling rate, some sample-based algorithms (indicated by an entry of
|
||
"var." in the sampling rate column of Table 1) may be used with
|
||
different sampling rates, resulting in different coded bit rates.
|
||
When used with a sampling rate other than that for which a static
|
||
payload type is defined, non-RTP means beyond the scope of this memo
|
||
MUST be used to define a dynamic payload type and MUST indicate the
|
||
selected RTP timestamp clock rate, which is usually the same as the
|
||
sampling rate for audio.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 12]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.1 DVI4
|
||
|
||
DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding
|
||
scheme that was specified by the Interactive Multimedia Association
|
||
(IMA) as the "IMA ADPCM wave type". However, the encoding defined
|
||
here as DVI4 differs in three respects from the IMA specification:
|
||
|
||
o The RTP DVI4 header contains the predicted value rather than the
|
||
first sample value contained the IMA ADPCM block header.
|
||
|
||
o IMA ADPCM blocks contain an odd number of samples, since the first
|
||
sample of a block is contained just in the header (uncompressed),
|
||
followed by an even number of compressed samples. DVI4 has an
|
||
even number of compressed samples only, using the `predict' word
|
||
from the header to decode the first sample.
|
||
|
||
o For DVI4, the 4-bit samples are packed with the first sample in
|
||
the four most significant bits and the second sample in the four
|
||
least significant bits. In the IMA ADPCM codec, the samples are
|
||
packed in the opposite order.
|
||
|
||
Each packet contains a single DVI block. This profile only defines
|
||
the 4-bit-per-sample version, while IMA also specified a 3-bit-per-
|
||
sample encoding.
|
||
|
||
The "header" word for each channel has the following structure:
|
||
|
||
int16 predict; /* predicted value of first sample
|
||
from the previous block (L16 format) */
|
||
u_int8 index; /* current index into stepsize table */
|
||
u_int8 reserved; /* set to zero by sender, ignored by receiver */
|
||
|
||
Each octet following the header contains two 4-bit samples, thus the
|
||
number of samples per packet MUST be even because there is no means
|
||
to indicate a partially filled last octet.
|
||
|
||
Packing of samples for multiple channels is for further study.
|
||
|
||
The IMA ADPCM algorithm was described in the document IMA Recommended
|
||
Practices for Enhancing Digital Audio Compatibility in Multimedia
|
||
Systems (version 3.0). However, the Interactive Multimedia
|
||
Association ceased operations in 1997. Resources for an archived
|
||
copy of that document and a software implementation of the RTP DVI4
|
||
encoding are listed in Section 13.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 13]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.2 G722
|
||
|
||
G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding
|
||
within 64 kbit/s". The G.722 encoder produces a stream of octets,
|
||
each of which SHALL be octet-aligned in an RTP packet. The first bit
|
||
transmitted in the G.722 octet, which is the most significant bit of
|
||
the higher sub-band sample, SHALL correspond to the most significant
|
||
bit of the octet in the RTP packet.
|
||
|
||
Even though the actual sampling rate for G.722 audio is 16,000 Hz,
|
||
the RTP clock rate for the G722 payload format is 8,000 Hz because
|
||
that value was erroneously assigned in RFC 1890 and must remain
|
||
unchanged for backward compatibility. The octet rate or sample-pair
|
||
rate is 8,000 Hz.
|
||
|
||
4.5.3 G723
|
||
|
||
G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech
|
||
coder for multimedia communications transmitting at 5.3 and 6.3
|
||
kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T
|
||
as a mandatory codec for ITU-T H.324 GSTN videophone terminal
|
||
applications. The algorithm has a floating point specification in
|
||
Annex B to G.723.1, a silence compression algorithm in Annex A to
|
||
G.723.1 and a scalable channel coding scheme for wireless
|
||
applications in G.723.1 Annex C.
|
||
|
||
This Recommendation specifies a coded representation that can be used
|
||
for compressing the speech signal component of multi-media services
|
||
at a very low bit rate. Audio is encoded in 30 ms frames, with an
|
||
additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be
|
||
one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s
|
||
frame), or 4 octets. These 4-octet frames are called SID frames
|
||
(Silence Insertion Descriptor) and are used to specify comfort noise
|
||
parameters. There is no restriction on how 4, 20, and 24 octet
|
||
frames are intermixed. The least significant two bits of the first
|
||
octet in the frame determine the frame size and codec type:
|
||
|
||
bits content octets/frame
|
||
00 high-rate speech (6.3 kb/s) 24
|
||
01 low-rate speech (5.3 kb/s) 20
|
||
10 SID frame 4
|
||
11 reserved
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 14]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
It is possible to switch between the two rates at any 30 ms frame
|
||
boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of
|
||
the encoder and decoder. Receivers MUST accept both data rates and
|
||
MUST accept SID frames unless restriction of these capabilities has
|
||
been signaled. The MIME registration for G723 in RFC 3555 [7]
|
||
specifies parameters that MAY be used with MIME or SDP to restrict to
|
||
a single data rate or to restrict the use of SID frames. This coder
|
||
was optimized to represent speech with near-toll quality at the above
|
||
rates using a limited amount of complexity.
|
||
|
||
The packing of the encoded bit stream into octets and the
|
||
transmission order of the octets is specified in Rec. G.723.1 and is
|
||
the same as that produced by the G.723 C code reference
|
||
implementation. For the 6.3 kb/s data rate, this packing is
|
||
illustrated as follows, where the header (HDR) bits are always "0 0"
|
||
as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit
|
||
is always set to zero. The diagrams show the bit packing in "network
|
||
byte order", also known as big-endian order. The bits of each 32-bit
|
||
word are numbered 0 to 31, with the most significant bit on the left
|
||
and numbered 0. The octets (bytes) of each word are transmitted most
|
||
significant octet first. The bits of each data field are numbered in
|
||
the order of the bit stream representation of the encoding (least
|
||
significant bit first). The vertical bars indicate the boundaries
|
||
between field fragments.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 15]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| LPC |HDR| LPC | LPC | ACL0 |LPC|
|
||
| | | | | | |
|
||
|0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
|
||
|5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 |
|
||
| | 1 |C| | 3 | 2 | | |
|
||
|0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
|
||
|4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 |
|
||
| | | | | | |
|
||
|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
|
||
|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 |
|
||
| | | 0 | | | 1 | |
|
||
|0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1|
|
||
|6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| POS1 | POS2 | POS1 | POS2 | POS3 | POS2 |
|
||
| | | | | | |
|
||
|0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1|
|
||
|9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2|
|
||
| | | 3 | | | | |
|
||
|1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0|
|
||
|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 1: G.723 (6.3 kb/s) bit packing
|
||
|
||
For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1",
|
||
as shown in Fig. 2, to indicate operation at 5.3 kb/s.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 16]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| LPC |HDR| LPC | LPC | ACL0 |LPC|
|
||
| | | | | | |
|
||
|0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
|
||
|5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 |
|
||
| | 1 |C| | 3 | 2 | | |
|
||
|0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
|
||
|4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 |
|
||
| | | | | | |
|
||
|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
|
||
|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| POS0 | POS1 | POS0 | POS1 | POS2 |
|
||
| | | | | |
|
||
|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
|
||
|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 |
|
||
| | | | | | | |
|
||
|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
|
||
|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 2: G.723 (5.3 kb/s) bit packing
|
||
|
||
The packing of G.723.1 SID (silence) frames, which are indicated by
|
||
the header (HDR) bits having the pattern "1 0", is depicted in Fig.
|
||
3.
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| LPC |HDR| LPC | LPC | GAIN |LPC|
|
||
| | | | | | |
|
||
|0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
|
||
|5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 3: G.723 SID mode bit packing
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 17]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.4 G726-40, G726-32, G726-24, and G726-16
|
||
|
||
ITU-T Recommendation G.726 describes, among others, the algorithm
|
||
recommended for conversion of a single 64 kbit/s A-law or mu-law PCM
|
||
channel encoded at 8,000 samples/sec to and from a 40, 32, 24, or 16
|
||
kbit/s channel. The conversion is applied to the PCM stream using an
|
||
Adaptive Differential Pulse Code Modulation (ADPCM) transcoding
|
||
technique. The ADPCM representation consists of a series of
|
||
codewords with a one-to-one correspondence to the samples in the PCM
|
||
stream. The G726 data rates of 40, 32, 24, and 16 kbit/s have
|
||
codewords of 5, 4, 3, and 2 bits, respectively.
|
||
|
||
The 16 and 24 kbit/s encodings do not provide toll quality speech.
|
||
They are designed for used in overloaded Digital Circuit
|
||
Multiplication Equipment (DCME). ITU-T G.726 recommends that the 16
|
||
and 24 kbit/s encodings should be alternated with higher data rate
|
||
encodings to provide an average sample size of between 3.5 and 3.7
|
||
bits per sample.
|
||
|
||
The encodings of G.726 are here denoted as G726-40, G726-32, G726-24,
|
||
and G726-16. Prior to 1990, G721 described the 32 kbit/s ADPCM
|
||
encoding, and G723 described the 40, 32, and 16 kbit/s encodings.
|
||
Thus, G726-32 designates the same algorithm as G721 in RFC 1890.
|
||
|
||
A stream of G726 codewords contains no information on the encoding
|
||
being used, therefore transitions between G726 encoding types are not
|
||
permitted within a sequence of packed codewords. Applications MUST
|
||
determine the encoding type of packed codewords from the RTP payload
|
||
identifier.
|
||
|
||
No payload-specific header information SHALL be included as part of
|
||
the audio data. A stream of G726 codewords MUST be packed into
|
||
octets as follows: the first codeword is placed into the first octet
|
||
such that the least significant bit of the codeword aligns with the
|
||
least significant bit in the octet, the second codeword is then
|
||
packed so that its least significant bit coincides with the least
|
||
significant unoccupied bit in the octet. When a complete codeword
|
||
cannot be placed into an octet, the bits overlapping the octet
|
||
boundary are placed into the least significant bits of the next
|
||
octet. Packing MUST end with a completely packed final octet. The
|
||
number of codewords packed will therefore be a multiple of 8, 2, 8,
|
||
and 4 for G726-40, G726-32, G726-24, and G726-16, respectively. An
|
||
example of the packing scheme for G726-32 codewords is as shown,
|
||
where bit 7 is the least significant bit of the first octet, and bit
|
||
A3 is the least significant bit of the first codeword:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 18]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
0 1
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
||
|B B B B|A A A A|D D D D|C C C C| ...
|
||
|0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
||
|
||
An example of the packing scheme for G726-24 codewords follows, where
|
||
again bit 7 is the least significant bit of the first octet, and bit
|
||
A2 is the least significant bit of the first codeword:
|
||
|
||
0 1 2
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
||
|C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ...
|
||
|1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
||
|
||
Note that the "little-endian" direction in which samples are packed
|
||
into octets in the G726-16, -24, -32 and -40 payload formats
|
||
specified here is consistent with ITU-T Recommendation X.420, but is
|
||
the opposite of what is specified in ITU-T Recommendation I.366.2
|
||
Annex E for ATM AAL2 transport. A second set of RTP payload formats
|
||
matching the packetization of I.366.2 Annex E and identified by MIME
|
||
subtypes AAL2-G726-16, -24, -32 and -40 will be specified in a
|
||
separate document.
|
||
|
||
4.5.5 G728
|
||
|
||
G728 is specified in ITU-T Recommendation G.728, "Coding of speech at
|
||
16 kbit/s using low-delay code excited linear prediction".
|
||
|
||
A G.278 encoder translates 5 consecutive audio samples into a 10-bit
|
||
codebook index, resulting in a bit rate of 16 kb/s for audio sampled
|
||
at 8,000 samples per second. The group of five consecutive samples
|
||
is called a vector. Four consecutive vectors, labeled V1 to V4
|
||
(where V1 is to be played first by the receiver), build one G.728
|
||
frame. The four vectors of 40 bits are packed into 5 octets, labeled
|
||
B1 through B5. B1 SHALL be placed first in the RTP packet.
|
||
|
||
Referring to the figure below, the principle for bit order is
|
||
"maintenance of bit significance". Bits from an older vector are
|
||
more significant than bits from newer vectors. The MSB of the frame
|
||
goes to the MSB of B1 and the LSB of the frame goes to LSB of B5.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 19]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
1 2 3 3
|
||
0 0 0 0 9
|
||
++++++++++++++++++++++++++++++++++++++++
|
||
<---V1---><---V2---><---V3---><---V4---> vectors
|
||
<--B1--><--B2--><--B3--><--B4--><--B5--> octets
|
||
<------------- frame 1 ---------------->
|
||
|
||
In particular, B1 contains the eight most significant bits of V1,
|
||
with the MSB of V1 being the MSB of B1. B2 contains the two least
|
||
significant bits of V1, the more significant of the two in its MSB,
|
||
and the six most significant bits of V2. B1 SHALL be placed first in
|
||
the RTP packet and B5 last.
|
||
|
||
4.5.6 G729
|
||
|
||
G729 is specified in ITU-T Recommendation G.729, "Coding of speech at
|
||
8 kbit/s using conjugate structure-algebraic code excited linear
|
||
prediction (CS-ACELP)". A reduced-complexity version of the G.729
|
||
algorithm is specified in Annex A to Rec. G.729. The speech coding
|
||
algorithms in the main body of G.729 and in G.729 Annex A are fully
|
||
interoperable with each other, so there is no need to further
|
||
distinguish between them. An implementation that signals or accepts
|
||
use of G729 payload format may implement either G.729 or G.729A
|
||
unless restricted by additional signaling specified elsewhere related
|
||
specifically to the encoding rather than the payload format. The
|
||
G.729 and G.729 Annex A codecs were optimized to represent speech
|
||
with high quality, where G.729 Annex A trades some speech quality for
|
||
an approximate 50% complexity reduction [10]. See the next Section
|
||
(4.5.7) for other data rates added in later G.729 Annexes. For all
|
||
data rates, the sampling frequency (and RTP timestamp clock rate) is
|
||
8,000 Hz.
|
||
|
||
A voice activity detector (VAD) and comfort noise generator (CNG)
|
||
algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous
|
||
voice and data applications and can be used in conjunction with G.729
|
||
or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets,
|
||
while the G.729 Annex B comfort noise frame occupies 2 octets.
|
||
Receivers MUST accept comfort noise frames if restriction of their
|
||
use has not been signaled. The MIME registration for G729 in RFC
|
||
3555 [7] specifies a parameter that MAY be used with MIME or SDP to
|
||
restrict the use of comfort noise frames.
|
||
|
||
A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A
|
||
frames, followed by zero or one G.729 Annex B frames. The presence
|
||
of a comfort noise frame can be deduced from the length of the RTP
|
||
payload. The default packetization interval is 20 ms (two frames),
|
||
but in some situations it may be desirable to send 10 ms packets. An
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 20]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
example would be a transition from speech to comfort noise in the
|
||
first 10 ms of the packet. For some applications, a longer
|
||
packetization interval may be required to reduce the packet rate.
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|L| L1 | L2 | L3 | P1 |P| C1 |
|
||
|0| | | | |0| |
|
||
| |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| C1 | S1 | GA1 | GB1 | P2 | C2 |
|
||
| 1 1 1| | | | | |
|
||
|5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| C2 | S2 | GA2 | GB2 |
|
||
| 1 1 1| | | |
|
||
|8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 4: G.729 and G.729A bit packing
|
||
|
||
The transmitted parameters of a G.729/G.729A 10-ms frame, consisting
|
||
of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The
|
||
mapping of the these parameters is given below in Fig. 4. The
|
||
diagrams show the bit packing in "network byte order", also known as
|
||
big-endian order. The bits of each 32-bit word are numbered 0 to 31,
|
||
with the most significant bit on the left and numbered 0. The octets
|
||
(bytes) of each word are transmitted most significant octet first.
|
||
The bits of each data field are numbered in the order as produced by
|
||
the G.729 C code reference implementation.
|
||
|
||
The packing of the G.729 Annex B comfort noise frame is shown in Fig.
|
||
5.
|
||
|
||
0 1
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|L| LSF1 | LSF2 | GAIN |R|
|
||
|S| | | |E|
|
||
|F| | | |S|
|
||
|0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero)
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 5: G.729 Annex B bit packing
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 21]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.7 G729D and G729E
|
||
|
||
Annexes D and E to ITU-T Recommendation G.729 provide additional data
|
||
rates. Because the data rate is not signaled in the bitstream, the
|
||
different data rates are given distinct RTP encoding names which are
|
||
mapped to distinct payload type numbers. G729D indicates a 6.4
|
||
kbit/s coding mode (G.729 Annex D, for momentary reduction in channel
|
||
capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E,
|
||
for improved performance with a wide range of narrow-band input
|
||
signals, e.g., music and background noise). Annex E has two
|
||
operating modes, backward adaptive and forward adaptive, which are
|
||
signaled by the first two bits in each frame (the most significant
|
||
two bits of the first octet).
|
||
|
||
The voice activity detector (VAD) and comfort noise generator (CNG)
|
||
algorithm specified in Annex B of G.729 may be used with Annex D and
|
||
Annex E frames in addition to G.729 and G.729 Annex A frames. The
|
||
algorithm details for the operation of Annexes D and E with the Annex
|
||
B CNG are specified in G.729 Annexes F and G. Note that Annexes F
|
||
and G do not introduce any new encodings. Receivers MUST accept
|
||
comfort noise frames if restriction of their use has not been
|
||
signaled. The MIME registrations for G729D and G729E in RFC 3555 [7]
|
||
specify a parameter that MAY be used with MIME or SDP to restrict the
|
||
use of comfort noise frames.
|
||
|
||
For G729D, an RTP packet may consist of zero or more G.729 Annex D
|
||
frames, followed by zero or one G.729 Annex B frame. Similarly, for
|
||
G729E, an RTP packet may consist of zero or more G.729 Annex E
|
||
frames, followed by zero or one G.729 Annex B frame. The presence of
|
||
a comfort noise frame can be deduced from the length of the RTP
|
||
payload.
|
||
|
||
A single RTP packet must contain frames of only one data rate,
|
||
optionally followed by one comfort noise frame. The data rate may be
|
||
changed from packet to packet by changing the payload type number.
|
||
G.729 Annexes D, E and H describe what the encoding and decoding
|
||
algorithms must do to accommodate a change in data rate.
|
||
|
||
For G729D, the bits of a G.729 Annex D frame are formatted as shown
|
||
below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 22]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|L| L1 | L2 | L3 | P1 | C1 |
|
||
|0| | | | | |
|
||
| |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 |
|
||
| | | | | | | | | |
|
||
|6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 6: G.729 Annex D bit packing
|
||
|
||
The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a
|
||
total of 118 bits are used. Two bits are appended as "don't care"
|
||
bits to complete an integer number of octets for the frame. For
|
||
G729E, the bits of a data frame are formatted as shown in the next
|
||
two diagrams (cf. Table E.1/G.729). The fields for the G729E forward
|
||
adaptive mode are packed as shown in Fig. 7.
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|0 0|L| L1 | L2 | L3 | P1 |P| C0_1|
|
||
| |0| | | | |0| |
|
||
| | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| | C1_1 | C2_1 | C3_1 | C4_1 |
|
||
| | | | | |
|
||
|3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 |
|
||
| | | | | | |
|
||
|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| | C3_2 | C4_2 | GA2 | GB2 |DC |
|
||
| | | | | | |
|
||
|6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 7: G.729 Annex E (forward adaptive mode) bit packing
|
||
|
||
The fields for the G729E backward adaptive mode are packed as shown
|
||
in Fig. 8.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 23]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|1 1| P1 |P| C0_1 | C1_1 |
|
||
| | |0| 1 1 1| |
|
||
| |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 |
|
||
| | | | | | | |
|
||
|8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| | C0_2 | C1_2 | C2_2 |
|
||
| | 1 1 1| | |
|
||
|2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| | C3_2 | C4_2 | GA2 | GB2 |DC |
|
||
| | | | | | |
|
||
|6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
Figure 8: G.729 Annex E (backward adaptive mode) bit packing
|
||
|
||
4.5.8 GSM
|
||
|
||
GSM (Group Speciale Mobile) denotes the European GSM 06.10 standard
|
||
for full-rate speech transcoding, ETS 300 961, which is based on
|
||
RPE/LTP (residual pulse excitation/long term prediction) coding at a
|
||
rate of 13 kb/s [11,12,13]. The text of the standard can be obtained
|
||
from:
|
||
|
||
ETSI (European Telecommunications Standards Institute)
|
||
ETSI Secretariat: B.P.152
|
||
F-06561 Valbonne Cedex
|
||
France
|
||
Phone: +33 92 94 42 00
|
||
Fax: +33 93 65 47 16
|
||
|
||
Blocks of 160 audio samples are compressed into 33 octets, for an
|
||
effective data rate of 13,200 b/s.
|
||
|
||
4.5.8.1 General Packaging Issues
|
||
|
||
The GSM standard (ETS 300 961) specifies the bit stream produced by
|
||
the codec, but does not specify how these bits should be packed for
|
||
transmission. The packetization specified here has subsequently been
|
||
adopted in ETSI Technical Specification TS 101 318. Some software
|
||
implementations of the GSM codec use a different packing than that
|
||
specified here.
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 24]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
field field name bits field field name bits
|
||
________________________________________________
|
||
1 LARc[0] 6 39 xmc[22] 3
|
||
2 LARc[1] 6 40 xmc[23] 3
|
||
3 LARc[2] 5 41 xmc[24] 3
|
||
4 LARc[3] 5 42 xmc[25] 3
|
||
5 LARc[4] 4 43 Nc[2] 7
|
||
6 LARc[5] 4 44 bc[2] 2
|
||
7 LARc[6] 3 45 Mc[2] 2
|
||
8 LARc[7] 3 46 xmaxc[2] 6
|
||
9 Nc[0] 7 47 xmc[26] 3
|
||
10 bc[0] 2 48 xmc[27] 3
|
||
11 Mc[0] 2 49 xmc[28] 3
|
||
12 xmaxc[0] 6 50 xmc[29] 3
|
||
13 xmc[0] 3 51 xmc[30] 3
|
||
14 xmc[1] 3 52 xmc[31] 3
|
||
15 xmc[2] 3 53 xmc[32] 3
|
||
16 xmc[3] 3 54 xmc[33] 3
|
||
17 xmc[4] 3 55 xmc[34] 3
|
||
18 xmc[5] 3 56 xmc[35] 3
|
||
19 xmc[6] 3 57 xmc[36] 3
|
||
20 xmc[7] 3 58 xmc[37] 3
|
||
21 xmc[8] 3 59 xmc[38] 3
|
||
22 xmc[9] 3 60 Nc[3] 7
|
||
23 xmc[10] 3 61 bc[3] 2
|
||
24 xmc[11] 3 62 Mc[3] 2
|
||
25 xmc[12] 3 63 xmaxc[3] 6
|
||
26 Nc[1] 7 64 xmc[39] 3
|
||
27 bc[1] 2 65 xmc[40] 3
|
||
28 Mc[1] 2 66 xmc[41] 3
|
||
29 xmaxc[1] 6 67 xmc[42] 3
|
||
30 xmc[13] 3 68 xmc[43] 3
|
||
31 xmc[14] 3 69 xmc[44] 3
|
||
32 xmc[15] 3 70 xmc[45] 3
|
||
33 xmc[16] 3 71 xmc[46] 3
|
||
34 xmc[17] 3 72 xmc[47] 3
|
||
35 xmc[18] 3 73 xmc[48] 3
|
||
36 xmc[19] 3 74 xmc[49] 3
|
||
37 xmc[20] 3 75 xmc[50] 3
|
||
38 xmc[21] 3 76 xmc[51] 3
|
||
|
||
Table 2: Ordering of GSM variables
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 25]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7
|
||
_____________________________________________________________________
|
||
0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3
|
||
1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5
|
||
2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2
|
||
3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1
|
||
4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2
|
||
5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0
|
||
6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04
|
||
7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0
|
||
8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2
|
||
9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1
|
||
10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0
|
||
11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2
|
||
12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0
|
||
13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14
|
||
14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0
|
||
15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2
|
||
16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1
|
||
17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0
|
||
18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2
|
||
19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0
|
||
20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24
|
||
21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0
|
||
22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2
|
||
23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1
|
||
24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0
|
||
25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2
|
||
26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0
|
||
27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34
|
||
28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0
|
||
29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2
|
||
30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1
|
||
31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0
|
||
32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2
|
||
|
||
Table 3: GSM payload format
|
||
|
||
In the GSM packing used by RTP, the bits SHALL be packed beginning
|
||
from the most significant bit. Every 160 sample GSM frame is coded
|
||
into one 33 octet (264 bit) buffer. Every such buffer begins with a
|
||
4 bit signature (0xD), followed by the MSB encoding of the fields of
|
||
the frame. The first octet thus contains 1101 in the 4 most
|
||
significant bits (0-3) and the 4 most significant bits of F1 (0-3) in
|
||
the 4 least significant bits (4-7). The second octet contains the 2
|
||
least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so
|
||
on. The order of the fields in the frame is described in Table 2.
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 26]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.8.2 GSM Variable Names and Numbers
|
||
|
||
In the RTP encoding we have the bit pattern described in Table 3,
|
||
where F.i signifies the ith bit of the field F, bit 0 is the most
|
||
significant bit, and the bits of every octet are numbered from 0 to 7
|
||
from most to least significant.
|
||
|
||
4.5.9 GSM-EFR
|
||
|
||
GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding,
|
||
specified in ETS 300 726 which is available from ETSI at the address
|
||
given in Section 4.5.8. This codec has a frame length of 244 bits.
|
||
For transmission in RTP, each codec frame is packed into a 31 octet
|
||
(248 bit) buffer beginning with a 4-bit signature 0xC in a manner
|
||
similar to that specified here for the original GSM 06.10 codec. The
|
||
packing is specified in ETSI Technical Specification TS 101 318.
|
||
|
||
4.5.10 L8
|
||
|
||
L8 denotes linear audio data samples, using 8-bits of precision with
|
||
an offset of 128, that is, the most negative signal is encoded as
|
||
zero.
|
||
|
||
4.5.11 L16
|
||
|
||
L16 denotes uncompressed audio data samples, using 16-bit signed
|
||
representation with 65,535 equally divided steps between minimum and
|
||
maximum signal level, ranging from -32,768 to 32,767. The value is
|
||
represented in two's complement notation and transmitted in network
|
||
byte order (most significant byte first).
|
||
|
||
The MIME registration for L16 in RFC 3555 [7] specifies parameters
|
||
that MAY be used with MIME or SDP to indicate that analog pre-
|
||
emphasis was applied to the signal before quantization or to indicate
|
||
that a multiple-channel audio stream follows a different channel
|
||
ordering convention than is specified in Section 4.1.
|
||
|
||
4.5.12 LPC
|
||
|
||
LPC designates an experimental linear predictive encoding contributed
|
||
by Ron Frederick, which is based on an implementation written by Ron
|
||
Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The
|
||
codec generates 14 octets for every frame. The framesize is set to
|
||
20 ms, resulting in a bit rate of 5,600 b/s.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 27]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.13 MPA
|
||
|
||
MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary
|
||
streams. The encoding is defined in ISO standards ISO/IEC 11172-3
|
||
and 13818-3. The encapsulation is specified in RFC 2250 [14].
|
||
|
||
The encoding may be at any of three levels of complexity, called
|
||
Layer I, II and III. The selected layer as well as the sampling rate
|
||
and channel count are indicated in the payload. The RTP timestamp
|
||
clock rate is always 90,000, independent of the sampling rate.
|
||
MPEG-1 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC
|
||
11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of
|
||
16, 22.05 and 24 kHz. The number of samples per frame is fixed, but
|
||
the frame size will vary with the sampling rate and bit rate.
|
||
|
||
The MIME registration for MPA in RFC 3555 [7] specifies parameters
|
||
that MAY be used with MIME or SDP to restrict the selection of layer,
|
||
channel count, sampling rate, and bit rate.
|
||
|
||
4.5.14 PCMA and PCMU
|
||
|
||
PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio
|
||
data is encoded as eight bits per sample, after logarithmic scaling.
|
||
PCMU denotes mu-law scaling, PCMA A-law scaling. A detailed
|
||
description is given by Jayant and Noll [15]. Each G.711 octet SHALL
|
||
be octet-aligned in an RTP packet. The sign bit of each G.711 octet
|
||
SHALL correspond to the most significant bit of the octet in the RTP
|
||
packet (i.e., assuming the G.711 samples are handled as octets on the
|
||
host machine, the sign bit SHALL be the most significant bit of the
|
||
octet as defined by the host machine format). The 56 kb/s and 48
|
||
kb/s modes of G.711 are not applicable to RTP, since PCMA and PCMU
|
||
MUST always be transmitted as 8-bit samples.
|
||
|
||
See Section 4.1 regarding silence suppression.
|
||
|
||
4.5.15 QCELP
|
||
|
||
The Electronic Industries Association (EIA) & Telecommunications
|
||
Industry Association (TIA) standard IS-733, "TR45: High Rate Speech
|
||
Service Option for Wideband Spread Spectrum Communications Systems",
|
||
defines the QCELP audio compression algorithm for use in wireless
|
||
CDMA applications. The QCELP CODEC compresses each 20 milliseconds
|
||
of 8,000 Hz, 16-bit sampled input speech into one of four different
|
||
size output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4
|
||
(54 bits) or Rate 1/8 (20 bits). For typical speech patterns, this
|
||
results in an average output of 6.8 kb/s for normal mode and 4.7 kb/s
|
||
for reduced rate mode. The packetization of the QCELP audio codec is
|
||
described in [16].
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 28]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
4.5.16 RED
|
||
|
||
The redundant audio payload format "RED" is specified by RFC 2198
|
||
[17]. It defines a means by which multiple redundant copies of an
|
||
audio packet may be transmitted in a single RTP stream. Each packet
|
||
in such a stream contains, in addition to the audio data for that
|
||
packetization interval, a (more heavily compressed) copy of the data
|
||
from a previous packetization interval. This allows an approximation
|
||
of the data from lost packets to be recovered upon decoding of a
|
||
subsequent packet, giving much improved sound quality when compared
|
||
with silence substitution for lost packets.
|
||
|
||
4.5.17 VDVI
|
||
|
||
VDVI is a variable-rate version of DVI4, yielding speech bit rates of
|
||
between 10 and 25 kb/s. It is specified for single-channel operation
|
||
only. Samples are packed into octets starting at the most-
|
||
significant bit. The last octet is padded with 1 bits if the last
|
||
sample does not fill the last octet. This padding is distinct from
|
||
the valid codewords. The receiver needs to detect the padding
|
||
because there is no explicit count of samples in the packet.
|
||
|
||
It uses the following encoding:
|
||
|
||
DVI4 codeword VDVI bit pattern
|
||
_______________________________
|
||
0 00
|
||
1 010
|
||
2 1100
|
||
3 11100
|
||
4 111100
|
||
5 1111100
|
||
6 11111100
|
||
7 11111110
|
||
8 10
|
||
9 011
|
||
10 1101
|
||
11 11101
|
||
12 111101
|
||
13 1111101
|
||
14 11111101
|
||
15 11111111
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 29]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
5. Video
|
||
|
||
The following sections describe the video encodings that are defined
|
||
in this memo and give their abbreviated names used for
|
||
identification. These video encodings and their payload types are
|
||
listed in Table 5.
|
||
|
||
All of these video encodings use an RTP timestamp frequency of 90,000
|
||
Hz, the same as the MPEG presentation time stamp frequency. This
|
||
frequency yields exact integer timestamp increments for the typical
|
||
24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates
|
||
and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED
|
||
rate for future video encodings used within this profile, other rates
|
||
MAY be used. However, it is not sufficient to use the video frame
|
||
rate (typically between 15 and 30 Hz) because that does not provide
|
||
adequate resolution for typical synchronization requirements when
|
||
calculating the RTP timestamp corresponding to the NTP timestamp in
|
||
an RTCP SR packet. The timestamp resolution MUST also be sufficient
|
||
for the jitter estimate contained in the receiver reports.
|
||
|
||
For most of these video encodings, the RTP timestamp encodes the
|
||
sampling instant of the video image contained in the RTP data packet.
|
||
If a video image occupies more than one packet, the timestamp is the
|
||
same on all of those packets. Packets from different video images
|
||
are distinguished by their different timestamps.
|
||
|
||
Most of these video encodings also specify that the marker bit of the
|
||
RTP header SHOULD be set to one in the last packet of a video frame
|
||
and otherwise set to zero. Thus, it is not necessary to wait for a
|
||
following packet with a different timestamp to detect that a new
|
||
frame should be displayed.
|
||
|
||
5.1 CelB
|
||
|
||
The CELL-B encoding is a proprietary encoding proposed by Sun
|
||
Microsystems. The byte stream format is described in RFC 2029 [18].
|
||
|
||
5.2 JPEG
|
||
|
||
The encoding is specified in ISO Standards 10918-1 and 10918-2. The
|
||
RTP payload format is as specified in RFC 2435 [19].
|
||
|
||
5.3 H261
|
||
|
||
The encoding is specified in ITU-T Recommendation H.261, "Video codec
|
||
for audiovisual services at p x 64 kbit/s". The packetization and
|
||
RTP-specific properties are described in RFC 2032 [20].
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 30]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
5.4 H263
|
||
|
||
The encoding is specified in the 1996 version of ITU-T Recommendation
|
||
H.263, "Video coding for low bit rate communication". The
|
||
packetization and RTP-specific properties are described in RFC 2190
|
||
[21]. The H263-1998 payload format is RECOMMENDED over this one for
|
||
use by new implementations.
|
||
|
||
5.5 H263-1998
|
||
|
||
The encoding is specified in the 1998 version of ITU-T Recommendation
|
||
H.263, "Video coding for low bit rate communication". The
|
||
packetization and RTP-specific properties are described in RFC 2429
|
||
[22]. Because the 1998 version of H.263 is a superset of the 1996
|
||
syntax, this payload format can also be used with the 1996 version of
|
||
H.263, and is RECOMMENDED for this use by new implementations. This
|
||
payload format does not replace RFC 2190, which continues to be used
|
||
by existing implementations, and may be required for backward
|
||
compatibility in new implementations. Implementations using the new
|
||
features of the 1998 version of H.263 MUST use the payload format
|
||
described in RFC 2429.
|
||
|
||
5.6 MPV
|
||
|
||
MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary
|
||
streams as specified in ISO Standards ISO/IEC 11172 and 13818-2,
|
||
respectively. The RTP payload format is as specified in RFC 2250
|
||
[14], Section 3.
|
||
|
||
The MIME registration for MPV in RFC 3555 [7] specifies a parameter
|
||
that MAY be used with MIME or SDP to restrict the selection of the
|
||
type of MPEG video.
|
||
|
||
5.7 MP2T
|
||
|
||
MP2T designates the use of MPEG-2 transport streams, for either audio
|
||
or video. The RTP payload format is described in RFC 2250 [14],
|
||
Section 2.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 31]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
5.8 nv
|
||
|
||
The encoding is implemented in the program `nv', version 4, developed
|
||
at Xerox PARC by Ron Frederick. Further information is available
|
||
from the author:
|
||
|
||
Ron Frederick
|
||
Blue Coat Systems Inc.
|
||
650 Almanor Avenue
|
||
Sunnyvale, CA 94085
|
||
United States
|
||
EMail: ronf@bluecoat.com
|
||
|
||
6. Payload Type Definitions
|
||
|
||
Tables 4 and 5 define this profile's static payload type values for
|
||
the PT field of the RTP data header. In addition, payload type
|
||
values in the range 96-127 MAY be defined dynamically through a
|
||
conference control protocol, which is beyond the scope of this
|
||
document. For example, a session directory could specify that for a
|
||
given session, payload type 96 indicates PCMU encoding, 8,000 Hz
|
||
sampling rate, 2 channels. Entries in Tables 4 and 5 with payload
|
||
type "dyn" have no static payload type assigned and are only used
|
||
with a dynamic payload type. Payload type 2 was assigned to G721 in
|
||
RFC 1890 and to its equivalent successor G726-32 in draft versions of
|
||
this specification, but its use is now deprecated and that static
|
||
payload type is marked reserved due to conflicting use for the
|
||
payload formats G726-32 and AAL2-G726-32 (see Section 4.5.4).
|
||
Payload type 13 indicates the Comfort Noise (CN) payload format
|
||
specified in RFC 3389 [9]. Payload type 19 is marked "reserved"
|
||
because some draft versions of this specification assigned that
|
||
number to an earlier version of the comfort noise payload format.
|
||
The payload type range 72-76 is marked "reserved" so that RTCP and
|
||
RTP packets can be reliably distinguished (see Section "Summary of
|
||
Protocol Constants" of the RTP protocol specification).
|
||
|
||
The payload types currently defined in this profile are assigned to
|
||
exactly one of three categories or media types: audio only, video
|
||
only and those combining audio and video. The media types are marked
|
||
in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types
|
||
of different media types SHALL NOT be interleaved or multiplexed
|
||
within a single RTP session, but multiple RTP sessions MAY be used in
|
||
parallel to send multiple media types. An RTP source MAY change
|
||
payload types within the same media type during a session. See the
|
||
section "Multiplexing RTP Sessions" of RFC 3550 for additional
|
||
explanation.
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 32]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
PT encoding media type clock rate channels
|
||
name (Hz)
|
||
___________________________________________________
|
||
0 PCMU A 8,000 1
|
||
1 reserved A
|
||
2 reserved A
|
||
3 GSM A 8,000 1
|
||
4 G723 A 8,000 1
|
||
5 DVI4 A 8,000 1
|
||
6 DVI4 A 16,000 1
|
||
7 LPC A 8,000 1
|
||
8 PCMA A 8,000 1
|
||
9 G722 A 8,000 1
|
||
10 L16 A 44,100 2
|
||
11 L16 A 44,100 1
|
||
12 QCELP A 8,000 1
|
||
13 CN A 8,000 1
|
||
14 MPA A 90,000 (see text)
|
||
15 G728 A 8,000 1
|
||
16 DVI4 A 11,025 1
|
||
17 DVI4 A 22,050 1
|
||
18 G729 A 8,000 1
|
||
19 reserved A
|
||
20 unassigned A
|
||
21 unassigned A
|
||
22 unassigned A
|
||
23 unassigned A
|
||
dyn G726-40 A 8,000 1
|
||
dyn G726-32 A 8,000 1
|
||
dyn G726-24 A 8,000 1
|
||
dyn G726-16 A 8,000 1
|
||
dyn G729D A 8,000 1
|
||
dyn G729E A 8,000 1
|
||
dyn GSM-EFR A 8,000 1
|
||
dyn L8 A var. var.
|
||
dyn RED A (see text)
|
||
dyn VDVI A var. 1
|
||
|
||
Table 4: Payload types (PT) for audio encodings
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 33]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
PT encoding media type clock rate
|
||
name (Hz)
|
||
_____________________________________________
|
||
24 unassigned V
|
||
25 CelB V 90,000
|
||
26 JPEG V 90,000
|
||
27 unassigned V
|
||
28 nv V 90,000
|
||
29 unassigned V
|
||
30 unassigned V
|
||
31 H261 V 90,000
|
||
32 MPV V 90,000
|
||
33 MP2T AV 90,000
|
||
34 H263 V 90,000
|
||
35-71 unassigned ?
|
||
72-76 reserved N/A N/A
|
||
77-95 unassigned ?
|
||
96-127 dynamic ?
|
||
dyn H263-1998 V 90,000
|
||
|
||
Table 5: Payload types (PT) for video and combined
|
||
encodings
|
||
|
||
Session participants agree through mechanisms beyond the scope of
|
||
this specification on the set of payload types allowed in a given
|
||
session. This set MAY, for example, be defined by the capabilities
|
||
of the applications used, negotiated by a conference control protocol
|
||
or established by agreement between the human participants.
|
||
|
||
Audio applications operating under this profile SHOULD, at a minimum,
|
||
be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4).
|
||
This allows interoperability without format negotiation and ensures
|
||
successful negotiation with a conference control protocol.
|
||
|
||
7. RTP over TCP and Similar Byte Stream Protocols
|
||
|
||
Under special circumstances, it may be necessary to carry RTP in
|
||
protocols offering a byte stream abstraction, such as TCP, possibly
|
||
multiplexed with other data. The application MUST define its own
|
||
method of delineating RTP and RTCP packets (RTSP [23] provides an
|
||
example of such an encapsulation specification).
|
||
|
||
8. Port Assignment
|
||
|
||
As specified in the RTP protocol definition, RTP data SHOULD be
|
||
carried on an even UDP port number and the corresponding RTCP packets
|
||
SHOULD be carried on the next higher (odd) port number.
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 34]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
Applications operating under this profile MAY use any such UDP port
|
||
pair. For example, the port pair MAY be allocated randomly by a
|
||
session management program. A single fixed port number pair cannot
|
||
be required because multiple applications using this profile are
|
||
likely to run on the same host, and there are some operating systems
|
||
that do not allow multiple processes to use the same UDP port with
|
||
different multicast addresses.
|
||
|
||
However, port numbers 5004 and 5005 have been registered for use with
|
||
this profile for those applications that choose to use them as the
|
||
default pair. Applications that operate under multiple profiles MAY
|
||
use this port pair as an indication to select this profile if they
|
||
are not subject to the constraint of the previous paragraph.
|
||
Applications need not have a default and MAY require that the port
|
||
pair be explicitly specified. The particular port numbers were
|
||
chosen to lie in the range above 5000 to accommodate port number
|
||
allocation practice within some versions of the Unix operating
|
||
system, where port numbers below 1024 can only be used by privileged
|
||
processes and port numbers between 1024 and 5000 are automatically
|
||
assigned by the operating system.
|
||
|
||
9. Changes from RFC 1890
|
||
|
||
This RFC revises RFC 1890. It is mostly backwards-compatible with
|
||
RFC 1890 except for functions removed because two interoperable
|
||
implementations were not found. The additions to RFC 1890 codify
|
||
existing practice in the use of payload formats under this profile.
|
||
Since this profile may be used without using any of the payload
|
||
formats listed here, the addition of new payload formats in this
|
||
revision does not affect backwards compatibility. The changes are
|
||
listed below, categorized into functional and non-functional changes.
|
||
|
||
Functional changes:
|
||
|
||
o Section 11, "IANA Considerations" was added to specify the
|
||
registration of the name for this profile. That appendix also
|
||
references a new Section 3 "Registering Additional Encodings"
|
||
which establishes a policy that no additional registration of
|
||
static payload types for this profile will be made beyond those
|
||
added in this revision and included in Tables 4 and 5. Instead,
|
||
additional encoding names may be registered as MIME subtypes for
|
||
binding to dynamic payload types. Non-normative references were
|
||
added to RFC 3555 [7] where MIME subtypes for all the listed
|
||
payload formats are registered, some with optional parameters for
|
||
use of the payload formats.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 35]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
o Static payload types 4, 16, 17 and 34 were added to incorporate
|
||
IANA registrations made since the publication of RFC 1890, along
|
||
with the corresponding payload format descriptions for G723 and
|
||
H263.
|
||
|
||
o Following working group discussion, static payload types 12 and 18
|
||
were added along with the corresponding payload format
|
||
descriptions for QCELP and G729. Static payload type 13 was
|
||
assigned to the Comfort Noise (CN) payload format defined in RFC
|
||
3389. Payload type 19 was marked reserved because it had been
|
||
temporarily allocated to an earlier version of Comfort Noise
|
||
present in some draft revisions of this document.
|
||
|
||
o The payload format for G721 was renamed to G726-32 following the
|
||
ITU-T renumbering, and the payload format description for G726 was
|
||
expanded to include the -16, -24 and -40 data rates. Because of
|
||
confusion regarding draft revisions of this document, some
|
||
implementations of these G726 payload formats packed samples into
|
||
octets starting with the most significant bit rather than the
|
||
least significant bit as specified here. To partially resolve
|
||
this incompatibility, new payload formats named AAL2-G726-16, -24,
|
||
-32 and -40 will be specified in a separate document (see note in
|
||
Section 4.5.4), and use of static payload type 2 is deprecated as
|
||
explained in Section 6.
|
||
|
||
o Payload formats G729D and G729E were added following the ITU-T
|
||
addition of Annexes D and E to Recommendation G.729. Listings
|
||
were added for payload formats GSM-EFR, RED, and H263-1998
|
||
published in other documents subsequent to RFC 1890. These
|
||
additional payload formats are referenced only by dynamic payload
|
||
type numbers.
|
||
|
||
o The descriptions of the payload formats for G722, G728, GSM, VDVI
|
||
were expanded.
|
||
|
||
o The payload format for 1016 audio was removed and its static
|
||
payload type assignment 1 was marked "reserved" because two
|
||
interoperable implementations were not found.
|
||
|
||
o Requirements for congestion control were added in Section 2.
|
||
|
||
o This profile follows the suggestion in the revised RTP spec that
|
||
RTCP bandwidth may be specified separately from the session
|
||
bandwidth and separately for active senders and passive receivers.
|
||
|
||
o The mapping of a user pass-phrase string into an encryption key
|
||
was deleted from Section 2 because two interoperable
|
||
implementations were not found.
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 36]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
o The "quadrophonic" sample ordering convention for four-channel
|
||
audio was removed to eliminate an ambiguity as noted in Section
|
||
4.1.
|
||
|
||
Non-functional changes:
|
||
|
||
o In Section 4.1, it is now explicitly stated that silence
|
||
suppression is allowed for all audio payload formats. (This has
|
||
always been the case and derives from a fundamental aspect of
|
||
RTP's design and the motivations for packet audio, but was not
|
||
explicit stated before.) The use of comfort noise is also
|
||
explained.
|
||
|
||
o In Section 4.1, the requirement level for setting of the marker
|
||
bit on the first packet after silence for audio was changed from
|
||
"is" to "SHOULD be", and clarified that the marker bit is set only
|
||
when packets are intentionally not sent.
|
||
|
||
o Similarly, text was added to specify that the marker bit SHOULD be
|
||
set to one on the last packet of a video frame, and that video
|
||
frames are distinguished by their timestamps.
|
||
|
||
o RFC references are added for payload formats published after RFC
|
||
1890.
|
||
|
||
o The security considerations and full copyright sections were
|
||
added.
|
||
|
||
o According to Peter Hoddie of Apple, only pre-1994 Macintosh used
|
||
the 22254.54 rate and none the 11127.27 rate, so the latter was
|
||
dropped from the discussion of suggested sampling frequencies.
|
||
|
||
o Table 1 was corrected to move some values from the "ms/packet"
|
||
column to the "default ms/packet" column where they belonged.
|
||
|
||
o Since the Interactive Multimedia Association ceased operations, an
|
||
alternate resource was provided for a referenced IMA document.
|
||
|
||
o A note has been added for G722 to clarify a discrepancy between
|
||
the actual sampling rate and the RTP timestamp clock rate.
|
||
|
||
o Small clarifications of the text have been made in several places,
|
||
some in response to questions from readers. In particular:
|
||
|
||
- A definition for "media type" is given in Section 1.1 to allow
|
||
the explanation of multiplexing RTP sessions in Section 6 to be
|
||
more clear regarding the multiplexing of multiple media.
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 37]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
- The explanation of how to determine the number of audio frames
|
||
in a packet from the length was expanded.
|
||
|
||
- More description of the allocation of bandwidth to SDES items
|
||
is given.
|
||
|
||
- A note was added that the convention for the order of channels
|
||
specified in Section 4.1 may be overridden by a particular
|
||
encoding or payload format specification.
|
||
|
||
- The terms MUST, SHOULD, MAY, etc. are used as defined in RFC
|
||
2119.
|
||
|
||
o A second author for this document was added.
|
||
|
||
10. Security Considerations
|
||
|
||
Implementations using the profile defined in this specification are
|
||
subject to the security considerations discussed in the RTP
|
||
specification [1]. This profile does not specify any different
|
||
security services. The primary function of this profile is to list a
|
||
set of data compression encodings for audio and video media.
|
||
|
||
Confidentiality of the media streams is achieved by encryption.
|
||
Because the data compression used with the payload formats described
|
||
in this profile is applied end-to-end, encryption may be performed
|
||
after compression so there is no conflict between the two operations.
|
||
|
||
A potential denial-of-service threat exists for data encodings using
|
||
compression techniques that have non-uniform receiver-end
|
||
computational load. The attacker can inject pathological datagrams
|
||
into the stream which are complex to decode and cause the receiver to
|
||
be overloaded.
|
||
|
||
As with any IP-based protocol, in some circumstances a receiver may
|
||
be overloaded simply by the receipt of too many packets, either
|
||
desired or undesired. Network-layer authentication MAY be used to
|
||
discard packets from undesired sources, but the processing cost of
|
||
the authentication itself may be too high. In a multicast
|
||
environment, source pruning is implemented in IGMPv3 (RFC 3376) [24]
|
||
and in multicast routing protocols to allow a receiver to select
|
||
which sources are allowed to reach it.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 38]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
11. IANA Considerations
|
||
|
||
The RTP specification establishes a registry of profile names for use
|
||
by higher-level control protocols, such as the Session Description
|
||
Protocol (SDP), RFC 2327 [6], to refer to transport methods. This
|
||
profile registers the name "RTP/AVP".
|
||
|
||
Section 3 establishes the policy that no additional registration of
|
||
static RTP payload types for this profile will be made beyond those
|
||
added in this document revision and included in Tables 4 and 5. IANA
|
||
may reference that section in declining to accept any additional
|
||
registration requests. In Tables 4 and 5, note that types 1 and 2
|
||
have been marked reserved and the set of "dyn" payload types included
|
||
has been updated. These changes are explained in Sections 6 and 9.
|
||
|
||
12. References
|
||
|
||
12.1 Normative References
|
||
|
||
[1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
|
||
"RTP: A Transport Protocol for Real-Time Applications", RFC
|
||
3550, July 2003.
|
||
|
||
[2] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
|
||
Levels", BCP 14, RFC 2119, March 1997.
|
||
|
||
[3] Apple Computer, "Audio Interchange File Format AIFF-C", August
|
||
1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z).
|
||
|
||
12.2 Informative References
|
||
|
||
[4] Braden, R., Clark, D. and S. Shenker, "Integrated Services in
|
||
the Internet Architecture: an Overview", RFC 1633, June 1994.
|
||
|
||
[5] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W.
|
||
Weiss, "An Architecture for Differentiated Service", RFC 2475,
|
||
December 1998.
|
||
|
||
[6] Handley, M. and V. Jacobson, "SDP: Session Description
|
||
Protocol", RFC 2327, April 1998.
|
||
|
||
[7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP
|
||
Payload Types", RFC 3555, July 2003.
|
||
|
||
[8] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
|
||
Mail Extensions (MIME) Part Four: Registration Procedures", BCP
|
||
13, RFC 2048, November 1996.
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 39]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
[9] Zopf, R., "Real-time Transport Protocol (RTP) Payload for
|
||
Comfort Noise (CN)", RFC 3389, September 2002.
|
||
|
||
[10] Deleam, D. and J.-P. Petit, "Real-time implementations of the
|
||
recent ITU-T low bit rate speech coders on the TI TMS320C54X
|
||
DSP: results, methodology, and applications", in Proc. of
|
||
International Conference on Signal Processing, Technology, and
|
||
Applications (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660,
|
||
October 1996.
|
||
|
||
[11] Mouly, M. and M.-B. Pautet, The GSM system for mobile
|
||
communications Lassay-les-Chateaux, France: Europe Media
|
||
Duplication, 1993.
|
||
|
||
[12] Degener, J., "Digital Speech Compression", Dr. Dobb's Journal,
|
||
December 1994.
|
||
|
||
[13] Redl, S., Weber, M. and M. Oliphant, An Introduction to GSM
|
||
Boston: Artech House, 1995.
|
||
|
||
[14] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP
|
||
Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
|
||
|
||
[15] Jayant, N. and P. Noll, Digital Coding of Waveforms--Principles
|
||
and Applications to Speech and Video Englewood Cliffs, New
|
||
Jersey: Prentice-Hall, 1984.
|
||
|
||
[16] McKay, K., "RTP Payload Format for PureVoice(tm) Audio", RFC
|
||
2658, August 1999.
|
||
|
||
[17] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
|
||
Bolot, J.-C., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload
|
||
for Redundant Audio Data", RFC 2198, September 1997.
|
||
|
||
[18] Speer, M. and D. Hoffman, "RTP Payload Format of Sun's CellB
|
||
Video Encoding", RFC 2029, October 1996.
|
||
|
||
[19] Berc, L., Fenner, W., Frederick, R., McCanne, S. and P. Stewart,
|
||
"RTP Payload Format for JPEG-Compressed Video", RFC 2435,
|
||
October 1998.
|
||
|
||
[20] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video
|
||
Streams", RFC 2032, October 1996.
|
||
|
||
[21] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190,
|
||
September 1997.
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 40]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
[22] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
|
||
Newell, D., Ott, J., Sullivan, G., Wenger, S. and C. Zhu, "RTP
|
||
Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
|
||
(H.263+)", RFC 2429, October 1998.
|
||
|
||
[23] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
|
||
Protocol (RTSP)", RFC 2326, April 1998.
|
||
|
||
[24] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A.
|
||
Thyagarajan, "Internet Group Management Protocol, Version 3",
|
||
RFC 3376, October 2002.
|
||
|
||
13. Current Locations of Related Resources
|
||
|
||
Note: Several sections below refer to the ITU-T Software Tool
|
||
Library (STL). It is available from the ITU Sales Service, Place des
|
||
Nations, CH-1211 Geneve 20, Switzerland (also check
|
||
http://www.itu.int). The ITU-T STL is covered by a license defined
|
||
in ITU-T Recommendation G.191, "Software tools for speech and audio
|
||
coding standardization".
|
||
|
||
DVI4
|
||
|
||
An archived copy of the document IMA Recommended Practices for
|
||
Enhancing Digital Audio Compatibility in Multimedia Systems (version
|
||
3.0), which describes the IMA ADPCM algorithm, is available at:
|
||
|
||
http://www.cs.columbia.edu/~hgs/audio/dvi/
|
||
|
||
An implementation is available from Jack Jansen at
|
||
|
||
ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar
|
||
|
||
G722
|
||
|
||
An implementation of the G.722 algorithm is available as part of the
|
||
ITU-T STL, described above.
|
||
|
||
G723
|
||
|
||
The reference C code implementation defining the G.723.1 algorithm
|
||
and its Annexes A, B, and C are available as an integral part of
|
||
Recommendation G.723.1 from the ITU Sales Service, address listed
|
||
above. Both the algorithm and C code are covered by a specific
|
||
license. The ITU-T Secretariat should be contacted to obtain such
|
||
licensing information.
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 41]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
G726
|
||
|
||
G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and
|
||
16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An
|
||
implementation of the G.726 algorithm is available as part of the
|
||
ITU-T STL, described above.
|
||
|
||
G729
|
||
|
||
The reference C code implementation defining the G.729 algorithm and
|
||
its Annexes A through I are available as an integral part of
|
||
Recommendation G.729 from the ITU Sales Service, listed above. Annex
|
||
I contains the integrated C source code for all G.729 operating
|
||
modes. The G.729 algorithm and associated C code are covered by a
|
||
specific license. The contact information for obtaining the license
|
||
is available from the ITU-T Secretariat.
|
||
|
||
GSM
|
||
|
||
A reference implementation was written by Carsten Bormann and Jutta
|
||
Degener (then at TU Berlin, Germany). It is available at
|
||
|
||
http://www.dmn.tzi.org/software/gsm/
|
||
|
||
Although the RPE-LTP algorithm is not an ITU-T standard, there is a C
|
||
code implementation of the RPE-LTP algorithm available as part of the
|
||
ITU-T STL. The STL implementation is an adaptation of the TU Berlin
|
||
version.
|
||
|
||
LPC
|
||
|
||
An implementation is available at
|
||
|
||
ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z
|
||
|
||
PCMU, PCMA
|
||
|
||
An implementation of these algorithms is available as part of the
|
||
ITU-T STL, described above.
|
||
|
||
14. Acknowledgments
|
||
|
||
The comments and careful review of Simao Campos, Richard Cox and AVT
|
||
Working Group participants are gratefully acknowledged. The GSM
|
||
description was adopted from the IMTC Voice over IP Forum Service
|
||
Interoperability Implementation Agreement (January 1997). Fred Burg
|
||
and Terry Lyons helped with the G.729 description.
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 42]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
15. Intellectual Property Rights Statement
|
||
|
||
The IETF takes no position regarding the validity or scope of any
|
||
intellectual property or other rights that might be claimed to
|
||
pertain to the implementation or use of the technology described in
|
||
this document or the extent to which any license under such rights
|
||
might or might not be available; neither does it represent that it
|
||
has made any effort to identify any such rights. Information on the
|
||
IETF's procedures with respect to rights in standards-track and
|
||
standards-related documentation can be found in BCP-11. Copies of
|
||
claims of rights made available for publication and any assurances of
|
||
licenses to be made available, or the result of an attempt made to
|
||
obtain a general license or permission for the use of such
|
||
proprietary rights by implementors or users of this specification can
|
||
be obtained from the IETF Secretariat.
|
||
|
||
The IETF invites any interested party to bring to its attention any
|
||
copyrights, patents or patent applications, or other proprietary
|
||
rights which may cover technology that may be required to practice
|
||
this standard. Please address the information to the IETF Executive
|
||
Director.
|
||
|
||
16. Authors' Addresses
|
||
|
||
Henning Schulzrinne
|
||
Department of Computer Science
|
||
Columbia University
|
||
1214 Amsterdam Avenue
|
||
New York, NY 10027
|
||
United States
|
||
|
||
EMail: schulzrinne@cs.columbia.edu
|
||
|
||
|
||
Stephen L. Casner
|
||
Packet Design
|
||
3400 Hillview Avenue, Building 3
|
||
Palo Alto, CA 94304
|
||
United States
|
||
|
||
EMail: casner@acm.org
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 43]
|
||
|
||
RFC 3551 RTP A/V Profile July 2003
|
||
|
||
|
||
17. Full Copyright Statement
|
||
|
||
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
||
|
||
This document and translations of it may be copied and furnished to
|
||
others, and derivative works that comment on or otherwise explain it
|
||
or assist in its implementation may be prepared, copied, published
|
||
and distributed, in whole or in part, without restriction of any
|
||
kind, provided that the above copyright notice and this paragraph are
|
||
included on all such copies and derivative works. However, this
|
||
document itself may not be modified in any way, such as by removing
|
||
the copyright notice or references to the Internet Society or other
|
||
Internet organizations, except as needed for the purpose of
|
||
developing Internet standards in which case the procedures for
|
||
copyrights defined in the Internet Standards process must be
|
||
followed, or as required to translate it into languages other than
|
||
English.
|
||
|
||
The limited permissions granted above are perpetual and will not be
|
||
revoked by the Internet Society or its successors or assigns.
|
||
|
||
This document and the information contained herein is provided on an
|
||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
||
|
||
Acknowledgement
|
||
|
||
Funding for the RFC Editor function is currently provided by the
|
||
Internet Society.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Schulzrinne & Casner Standards Track [Page 44]
|
||
|