mirror of
https://gitlab.freedesktop.org/pulseaudio/pulseaudio.git
synced 2025-11-14 06:59:53 -05:00
2468 lines
104 KiB
Text
2468 lines
104 KiB
Text
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Network Working Group H. Schulzrinne
|
|||
|
|
Request for Comments: 3551 Columbia University
|
|||
|
|
Obsoletes: 1890 S. Casner
|
|||
|
|
Category: Standards Track Packet Design
|
|||
|
|
July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
RTP Profile for Audio and Video Conferences
|
|||
|
|
with Minimal Control
|
|||
|
|
|
|||
|
|
Status of this Memo
|
|||
|
|
|
|||
|
|
This document specifies an Internet standards track protocol for the
|
|||
|
|
Internet community, and requests discussion and suggestions for
|
|||
|
|
improvements. Please refer to the current edition of the "Internet
|
|||
|
|
Official Protocol Standards" (STD 1) for the standardization state
|
|||
|
|
and status of this protocol. Distribution of this memo is unlimited.
|
|||
|
|
|
|||
|
|
Copyright Notice
|
|||
|
|
|
|||
|
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|
|||
|
|
Abstract
|
|||
|
|
|
|||
|
|
This document describes a profile called "RTP/AVP" for the use of the
|
|||
|
|
real-time transport protocol (RTP), version 2, and the associated
|
|||
|
|
control protocol, RTCP, within audio and video multiparticipant
|
|||
|
|
conferences with minimal control. It provides interpretations of
|
|||
|
|
generic fields within the RTP specification suitable for audio and
|
|||
|
|
video conferences. In particular, this document defines a set of
|
|||
|
|
default mappings from payload type numbers to encodings.
|
|||
|
|
|
|||
|
|
This document also describes how audio and video data may be carried
|
|||
|
|
within RTP. It defines a set of standard encodings and their names
|
|||
|
|
when used within RTP. The descriptions provide pointers to reference
|
|||
|
|
implementations and the detailed standards. This document is meant
|
|||
|
|
as an aid for implementors of audio, video and other real-time
|
|||
|
|
multimedia applications.
|
|||
|
|
|
|||
|
|
This memorandum obsoletes RFC 1890. It is mostly backwards-
|
|||
|
|
compatible except for functions removed because two interoperable
|
|||
|
|
implementations were not found. The additions to RFC 1890 codify
|
|||
|
|
existing practice in the use of payload formats under this profile
|
|||
|
|
and include new payload formats defined since RFC 1890 was published.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 1]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Table of Contents
|
|||
|
|
|
|||
|
|
1. Introduction ................................................. 3
|
|||
|
|
1.1 Terminology ............................................. 3
|
|||
|
|
2. RTP and RTCP Packet Forms and Protocol Behavior .............. 4
|
|||
|
|
3. Registering Additional Encodings ............................. 6
|
|||
|
|
4. Audio ........................................................ 8
|
|||
|
|
4.1 Encoding-Independent Rules .............................. 8
|
|||
|
|
4.2 Operating Recommendations ............................... 9
|
|||
|
|
4.3 Guidelines for Sample-Based Audio Encodings ............. 10
|
|||
|
|
4.4 Guidelines for Frame-Based Audio Encodings .............. 11
|
|||
|
|
4.5 Audio Encodings ......................................... 12
|
|||
|
|
4.5.1 DVI4 ............................................ 13
|
|||
|
|
4.5.2 G722 ............................................ 14
|
|||
|
|
4.5.3 G723 ............................................ 14
|
|||
|
|
4.5.4 G726-40, G726-32, G726-24, and G726-16 .......... 18
|
|||
|
|
4.5.5 G728 ............................................ 19
|
|||
|
|
4.5.6 G729 ............................................ 20
|
|||
|
|
4.5.7 G729D and G729E ................................. 22
|
|||
|
|
4.5.8 GSM ............................................. 24
|
|||
|
|
4.5.9 GSM-EFR ......................................... 27
|
|||
|
|
4.5.10 L8 .............................................. 27
|
|||
|
|
4.5.11 L16 ............................................. 27
|
|||
|
|
4.5.12 LPC ............................................. 27
|
|||
|
|
4.5.13 MPA ............................................. 28
|
|||
|
|
4.5.14 PCMA and PCMU ................................... 28
|
|||
|
|
4.5.15 QCELP ........................................... 28
|
|||
|
|
4.5.16 RED ............................................. 29
|
|||
|
|
4.5.17 VDVI ............................................ 29
|
|||
|
|
5. Video ........................................................ 30
|
|||
|
|
5.1 CelB .................................................... 30
|
|||
|
|
5.2 JPEG .................................................... 30
|
|||
|
|
5.3 H261 .................................................... 30
|
|||
|
|
5.4 H263 .................................................... 31
|
|||
|
|
5.5 H263-1998 ............................................... 31
|
|||
|
|
5.6 MPV ..................................................... 31
|
|||
|
|
5.7 MP2T .................................................... 31
|
|||
|
|
5.8 nv ...................................................... 32
|
|||
|
|
6. Payload Type Definitions ..................................... 32
|
|||
|
|
7. RTP over TCP and Similar Byte Stream Protocols ............... 34
|
|||
|
|
8. Port Assignment .............................................. 34
|
|||
|
|
9. Changes from RFC 1890 ........................................ 35
|
|||
|
|
10. Security Considerations ...................................... 38
|
|||
|
|
11. IANA Considerations .......................................... 39
|
|||
|
|
12. References ................................................... 39
|
|||
|
|
12.1 Normative References .................................... 39
|
|||
|
|
12.2 Informative References .................................. 39
|
|||
|
|
13. Current Locations of Related Resources ....................... 41
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 2]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
14. Acknowledgments .............................................. 42
|
|||
|
|
15. Intellectual Property Rights Statement ....................... 43
|
|||
|
|
16. Authors' Addresses ........................................... 43
|
|||
|
|
17. Full Copyright Statement ..................................... 44
|
|||
|
|
|
|||
|
|
1. Introduction
|
|||
|
|
|
|||
|
|
This profile defines aspects of RTP left unspecified in the RTP
|
|||
|
|
Version 2 protocol definition (RFC 3550) [1]. This profile is
|
|||
|
|
intended for the use within audio and video conferences with minimal
|
|||
|
|
session control. In particular, no support for the negotiation of
|
|||
|
|
parameters or membership control is provided. The profile is
|
|||
|
|
expected to be useful in sessions where no negotiation or membership
|
|||
|
|
control are used (e.g., using the static payload types and the
|
|||
|
|
membership indications provided by RTCP), but this profile may also
|
|||
|
|
be useful in conjunction with a higher-level control protocol.
|
|||
|
|
|
|||
|
|
Use of this profile may be implicit in the use of the appropriate
|
|||
|
|
applications; there may be no explicit indication by port number,
|
|||
|
|
protocol identifier or the like. Applications such as session
|
|||
|
|
directories may use the name for this profile specified in Section
|
|||
|
|
11.
|
|||
|
|
|
|||
|
|
Other profiles may make different choices for the items specified
|
|||
|
|
here.
|
|||
|
|
|
|||
|
|
This document also defines a set of encodings and payload formats for
|
|||
|
|
audio and video. These payload format descriptions are included here
|
|||
|
|
only as a matter of convenience since they are too small to warrant
|
|||
|
|
separate documents. Use of these payload formats is NOT REQUIRED to
|
|||
|
|
use this profile. Only the binding of some of the payload formats to
|
|||
|
|
static payload type numbers in Tables 4 and 5 is normative.
|
|||
|
|
|
|||
|
|
1.1 Terminology
|
|||
|
|
|
|||
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|||
|
|
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
|||
|
|
document are to be interpreted as described in RFC 2119 [2] and
|
|||
|
|
indicate requirement levels for implementations compliant with this
|
|||
|
|
RTP profile.
|
|||
|
|
|
|||
|
|
This document defines the term media type as dividing encodings of
|
|||
|
|
audio and video content into three classes: audio, video and
|
|||
|
|
audio/video (interleaved).
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 3]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
2. RTP and RTCP Packet Forms and Protocol Behavior
|
|||
|
|
|
|||
|
|
The section "RTP Profiles and Payload Format Specifications" of RFC
|
|||
|
|
3550 enumerates a number of items that can be specified or modified
|
|||
|
|
in a profile. This section addresses these items. Generally, this
|
|||
|
|
profile follows the default and/or recommended aspects of the RTP
|
|||
|
|
specification.
|
|||
|
|
|
|||
|
|
RTP data header: The standard format of the fixed RTP data
|
|||
|
|
header is used (one marker bit).
|
|||
|
|
|
|||
|
|
Payload types: Static payload types are defined in Section 6.
|
|||
|
|
|
|||
|
|
RTP data header additions: No additional fixed fields are
|
|||
|
|
appended to the RTP data header.
|
|||
|
|
|
|||
|
|
RTP data header extensions: No RTP header extensions are
|
|||
|
|
defined, but applications operating under this profile MAY use
|
|||
|
|
such extensions. Thus, applications SHOULD NOT assume that the
|
|||
|
|
RTP header X bit is always zero and SHOULD be prepared to ignore
|
|||
|
|
the header extension. If a header extension is defined in the
|
|||
|
|
future, that definition MUST specify the contents of the first 16
|
|||
|
|
bits in such a way that multiple different extensions can be
|
|||
|
|
identified.
|
|||
|
|
|
|||
|
|
RTCP packet types: No additional RTCP packet types are defined
|
|||
|
|
by this profile specification.
|
|||
|
|
|
|||
|
|
RTCP report interval: The suggested constants are to be used for
|
|||
|
|
the RTCP report interval calculation. Sessions operating under
|
|||
|
|
this profile MAY specify a separate parameter for the RTCP traffic
|
|||
|
|
bandwidth rather than using the default fraction of the session
|
|||
|
|
bandwidth. The RTCP traffic bandwidth MAY be divided into two
|
|||
|
|
separate session parameters for those participants which are
|
|||
|
|
active data senders and those which are not. Following the
|
|||
|
|
recommendation in the RTP specification [1] that 1/4 of the RTCP
|
|||
|
|
bandwidth be dedicated to data senders, the RECOMMENDED default
|
|||
|
|
values for these two parameters would be 1.25% and 3.75%,
|
|||
|
|
respectively. For a particular session, the RTCP bandwidth for
|
|||
|
|
non-data-senders MAY be set to zero when operating on
|
|||
|
|
unidirectional links or for sessions that don't require feedback
|
|||
|
|
on the quality of reception. The RTCP bandwidth for data senders
|
|||
|
|
SHOULD be kept non-zero so that sender reports can still be sent
|
|||
|
|
for inter-media synchronization and to identify the source by
|
|||
|
|
CNAME. The means by which the one or two session parameters for
|
|||
|
|
RTCP bandwidth are specified is beyond the scope of this memo.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 4]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
SR/RR extension: No extension section is defined for the RTCP SR
|
|||
|
|
or RR packet.
|
|||
|
|
|
|||
|
|
SDES use: Applications MAY use any of the SDES items described
|
|||
|
|
in the RTP specification. While CNAME information MUST be sent
|
|||
|
|
every reporting interval, other items SHOULD only be sent every
|
|||
|
|
third reporting interval, with NAME sent seven out of eight times
|
|||
|
|
within that slot and the remaining SDES items cyclically taking up
|
|||
|
|
the eighth slot, as defined in Section 6.2.2 of the RTP
|
|||
|
|
specification. In other words, NAME is sent in RTCP packets 1, 4,
|
|||
|
|
7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 22.
|
|||
|
|
|
|||
|
|
Security: The RTP default security services are also the default
|
|||
|
|
under this profile.
|
|||
|
|
|
|||
|
|
String-to-key mapping: No mapping is specified by this profile.
|
|||
|
|
|
|||
|
|
Congestion: RTP and this profile may be used in the context of
|
|||
|
|
enhanced network service, for example, through Integrated Services
|
|||
|
|
(RFC 1633) [4] or Differentiated Services (RFC 2475) [5], or they
|
|||
|
|
may be used with best effort service.
|
|||
|
|
|
|||
|
|
If enhanced service is being used, RTP receivers SHOULD monitor
|
|||
|
|
packet loss to ensure that the service that was requested is
|
|||
|
|
actually being delivered. If it is not, then they SHOULD assume
|
|||
|
|
that they are receiving best-effort service and behave
|
|||
|
|
accordingly.
|
|||
|
|
|
|||
|
|
If best-effort service is being used, RTP receivers SHOULD monitor
|
|||
|
|
packet loss to ensure that the packet loss rate is within
|
|||
|
|
acceptable parameters. Packet loss is considered acceptable if a
|
|||
|
|
TCP flow across the same network path and experiencing the same
|
|||
|
|
network conditions would achieve an average throughput, measured
|
|||
|
|
on a reasonable timescale, that is not less than the RTP flow is
|
|||
|
|
achieving. This condition can be satisfied by implementing
|
|||
|
|
congestion control mechanisms to adapt the transmission rate (or
|
|||
|
|
the number of layers subscribed for a layered multicast session),
|
|||
|
|
or by arranging for a receiver to leave the session if the loss
|
|||
|
|
rate is unacceptably high.
|
|||
|
|
|
|||
|
|
The comparison to TCP cannot be specified exactly, but is intended
|
|||
|
|
as an "order-of-magnitude" comparison in timescale and throughput.
|
|||
|
|
The timescale on which TCP throughput is measured is the round-
|
|||
|
|
trip time of the connection. In essence, this requirement states
|
|||
|
|
that it is not acceptable to deploy an application (using RTP or
|
|||
|
|
any other transport protocol) on the best-effort Internet which
|
|||
|
|
consumes bandwidth arbitrarily and does not compete fairly with
|
|||
|
|
TCP within an order of magnitude.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 5]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Underlying protocol: The profile specifies the use of RTP over
|
|||
|
|
unicast and multicast UDP as well as TCP. (This does not preclude
|
|||
|
|
the use of these definitions when RTP is carried by other lower-
|
|||
|
|
layer protocols.)
|
|||
|
|
|
|||
|
|
Transport mapping: The standard mapping of RTP and RTCP to
|
|||
|
|
transport-level addresses is used.
|
|||
|
|
|
|||
|
|
Encapsulation: This profile leaves to applications the
|
|||
|
|
specification of RTP encapsulation in protocols other than UDP.
|
|||
|
|
|
|||
|
|
3. Registering Additional Encodings
|
|||
|
|
|
|||
|
|
This profile lists a set of encodings, each of which is comprised of
|
|||
|
|
a particular media data compression or representation plus a payload
|
|||
|
|
format for encapsulation within RTP. Some of those payload formats
|
|||
|
|
are specified here, while others are specified in separate RFCs. It
|
|||
|
|
is expected that additional encodings beyond the set listed here will
|
|||
|
|
be created in the future and specified in additional payload format
|
|||
|
|
RFCs.
|
|||
|
|
|
|||
|
|
This profile also assigns to each encoding a short name which MAY be
|
|||
|
|
used by higher-level control protocols, such as the Session
|
|||
|
|
Description Protocol (SDP), RFC 2327 [6], to identify encodings
|
|||
|
|
selected for a particular RTP session.
|
|||
|
|
|
|||
|
|
In some contexts it may be useful to refer to these encodings in the
|
|||
|
|
form of a MIME content-type. To facilitate this, RFC 3555 [7]
|
|||
|
|
provides registrations for all of the encodings names listed here as
|
|||
|
|
MIME subtype names under the "audio" and "video" MIME types through
|
|||
|
|
the MIME registration procedure as specified in RFC 2048 [8].
|
|||
|
|
|
|||
|
|
Any additional encodings specified for use under this profile (or
|
|||
|
|
others) may also be assigned names registered as MIME subtypes with
|
|||
|
|
the Internet Assigned Numbers Authority (IANA). This registry
|
|||
|
|
provides a means to insure that the names assigned to the additional
|
|||
|
|
encodings are kept unique. RFC 3555 specifies the information that
|
|||
|
|
is required for the registration of RTP encodings.
|
|||
|
|
|
|||
|
|
In addition to assigning names to encodings, this profile also
|
|||
|
|
assigns static RTP payload type numbers to some of them. However,
|
|||
|
|
the payload type number space is relatively small and cannot
|
|||
|
|
accommodate assignments for all existing and future encodings.
|
|||
|
|
During the early stages of RTP development, it was necessary to use
|
|||
|
|
statically assigned payload types because no other mechanism had been
|
|||
|
|
specified to bind encodings to payload types. It was anticipated
|
|||
|
|
that non-RTP means beyond the scope of this memo (such as directory
|
|||
|
|
services or invitation protocols) would be specified to establish a
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 6]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
dynamic mapping between a payload type and an encoding. Now,
|
|||
|
|
mechanisms for defining dynamic payload type bindings have been
|
|||
|
|
specified in the Session Description Protocol (SDP) and in other
|
|||
|
|
protocols such as ITU-T Recommendation H.323/H.245. These mechanisms
|
|||
|
|
associate the registered name of the encoding/payload format, along
|
|||
|
|
with any additional required parameters, such as the RTP timestamp
|
|||
|
|
clock rate and number of channels, with a payload type number. This
|
|||
|
|
association is effective only for the duration of the RTP session in
|
|||
|
|
which the dynamic payload type binding is made. This association
|
|||
|
|
applies only to the RTP session for which it is made, thus the
|
|||
|
|
numbers can be re-used for different encodings in different sessions
|
|||
|
|
so the number space limitation is avoided.
|
|||
|
|
|
|||
|
|
This profile reserves payload type numbers in the range 96-127
|
|||
|
|
exclusively for dynamic assignment. Applications SHOULD first use
|
|||
|
|
values in this range for dynamic payload types. Those applications
|
|||
|
|
which need to define more than 32 dynamic payload types MAY bind
|
|||
|
|
codes below 96, in which case it is RECOMMENDED that unassigned
|
|||
|
|
payload type numbers be used first. However, the statically assigned
|
|||
|
|
payload types are default bindings and MAY be dynamically bound to
|
|||
|
|
new encodings if needed. Redefining payload types below 96 may cause
|
|||
|
|
incorrect operation if an attempt is made to join a session without
|
|||
|
|
obtaining session description information that defines the dynamic
|
|||
|
|
payload types.
|
|||
|
|
|
|||
|
|
Dynamic payload types SHOULD NOT be used without a well-defined
|
|||
|
|
mechanism to indicate the mapping. Systems that expect to
|
|||
|
|
interoperate with others operating under this profile SHOULD NOT make
|
|||
|
|
their own assignments of proprietary encodings to particular, fixed
|
|||
|
|
payload types.
|
|||
|
|
|
|||
|
|
This specification establishes the policy that no additional static
|
|||
|
|
payload types will be assigned beyond the ones defined in this
|
|||
|
|
document. Establishing this policy avoids the problem of trying to
|
|||
|
|
create a set of criteria for accepting static assignments and
|
|||
|
|
encourages the implementation and deployment of the dynamic payload
|
|||
|
|
type mechanisms.
|
|||
|
|
|
|||
|
|
The final set of static payload type assignments is provided in
|
|||
|
|
Tables 4 and 5.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 7]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4. Audio
|
|||
|
|
|
|||
|
|
4.1 Encoding-Independent Rules
|
|||
|
|
|
|||
|
|
Since the ability to suppress silence is one of the primary
|
|||
|
|
motivations for using packets to transmit voice, the RTP header
|
|||
|
|
carries both a sequence number and a timestamp to allow a receiver to
|
|||
|
|
distinguish between lost packets and periods of time when no data was
|
|||
|
|
transmitted. Discontiguous transmission (silence suppression) MAY be
|
|||
|
|
used with any audio payload format. Receivers MUST assume that
|
|||
|
|
senders may suppress silence unless this is restricted by signaling
|
|||
|
|
specified elsewhere. (Even if the transmitter does not suppress
|
|||
|
|
silence, the receiver should be prepared to handle periods when no
|
|||
|
|
data is present since packets may be lost.)
|
|||
|
|
|
|||
|
|
Some payload formats (see Sections 4.5.3 and 4.5.6) define a "silence
|
|||
|
|
insertion descriptor" or "comfort noise" frame to specify parameters
|
|||
|
|
for artificial noise that may be generated during a period of silence
|
|||
|
|
to approximate the background noise at the source. For other payload
|
|||
|
|
formats, a generic Comfort Noise (CN) payload format is specified in
|
|||
|
|
RFC 3389 [9]. When the CN payload format is used with another
|
|||
|
|
payload format, different values in the RTP payload type field
|
|||
|
|
distinguish comfort-noise packets from those of the selected payload
|
|||
|
|
format.
|
|||
|
|
|
|||
|
|
For applications which send either no packets or occasional comfort-
|
|||
|
|
noise packets during silence, the first packet of a talkspurt, that
|
|||
|
|
is, the first packet after a silence period during which packets have
|
|||
|
|
not been transmitted contiguously, SHOULD be distinguished by setting
|
|||
|
|
the marker bit in the RTP data header to one. The marker bit in all
|
|||
|
|
other packets is zero. The beginning of a talkspurt MAY be used to
|
|||
|
|
adjust the playout delay to reflect changing network delays.
|
|||
|
|
Applications without silence suppression MUST set the marker bit to
|
|||
|
|
zero.
|
|||
|
|
|
|||
|
|
The RTP clock rate used for generating the RTP timestamp is
|
|||
|
|
independent of the number of channels and the encoding; it usually
|
|||
|
|
equals the number of sampling periods per second. For N-channel
|
|||
|
|
encodings, each sampling period (say, 1/8,000 of a second) generates
|
|||
|
|
N samples. (This terminology is standard, but somewhat confusing, as
|
|||
|
|
the total number of samples generated per second is then the sampling
|
|||
|
|
rate times the channel count.)
|
|||
|
|
|
|||
|
|
If multiple audio channels are used, channels are numbered left-to-
|
|||
|
|
right, starting at one. In RTP audio packets, information from
|
|||
|
|
lower-numbered channels precedes that from higher-numbered channels.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 8]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
For more than two channels, the convention followed by the AIFF-C
|
|||
|
|
audio interchange format SHOULD be followed [3], using the following
|
|||
|
|
notation, unless some other convention is specified for a particular
|
|||
|
|
encoding or payload format:
|
|||
|
|
|
|||
|
|
l left
|
|||
|
|
r right
|
|||
|
|
c center
|
|||
|
|
S surround
|
|||
|
|
F front
|
|||
|
|
R rear
|
|||
|
|
|
|||
|
|
channels description channel
|
|||
|
|
1 2 3 4 5 6
|
|||
|
|
_________________________________________________
|
|||
|
|
2 stereo l r
|
|||
|
|
3 l r c
|
|||
|
|
4 l c r S
|
|||
|
|
5 Fl Fr Fc Sl Sr
|
|||
|
|
6 l lc c r rc S
|
|||
|
|
|
|||
|
|
Note: RFC 1890 defined two conventions for the ordering of four
|
|||
|
|
audio channels. Since the ordering is indicated implicitly by
|
|||
|
|
the number of channels, this was ambiguous. In this revision,
|
|||
|
|
the order described as "quadrophonic" has been eliminated to
|
|||
|
|
remove the ambiguity. This choice was based on the observation
|
|||
|
|
that quadrophonic consumer audio format did not become popular
|
|||
|
|
whereas surround-sound subsequently has.
|
|||
|
|
|
|||
|
|
Samples for all channels belonging to a single sampling instant MUST
|
|||
|
|
be within the same packet. The interleaving of samples from
|
|||
|
|
different channels depends on the encoding. General guidelines are
|
|||
|
|
given in Section 4.3 and 4.4.
|
|||
|
|
|
|||
|
|
The sampling frequency SHOULD be drawn from the set: 8,000, 11,025,
|
|||
|
|
16,000, 22,050, 24,000, 32,000, 44,100 and 48,000 Hz. (Older Apple
|
|||
|
|
Macintosh computers had a native sample rate of 22,254.54 Hz, which
|
|||
|
|
can be converted to 22,050 with acceptable quality by dropping 4
|
|||
|
|
samples in a 20 ms frame.) However, most audio encodings are defined
|
|||
|
|
for a more restricted set of sampling frequencies. Receivers SHOULD
|
|||
|
|
be prepared to accept multi-channel audio, but MAY choose to only
|
|||
|
|
play a single channel.
|
|||
|
|
|
|||
|
|
4.2 Operating Recommendations
|
|||
|
|
|
|||
|
|
The following recommendations are default operating parameters.
|
|||
|
|
Applications SHOULD be prepared to handle other values. The ranges
|
|||
|
|
given are meant to give guidance to application writers, allowing a
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 9]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
set of applications conforming to these guidelines to interoperate
|
|||
|
|
without additional negotiation. These guidelines are not intended to
|
|||
|
|
restrict operating parameters for applications that can negotiate a
|
|||
|
|
set of interoperable parameters, e.g., through a conference control
|
|||
|
|
protocol.
|
|||
|
|
|
|||
|
|
For packetized audio, the default packetization interval SHOULD have
|
|||
|
|
a duration of 20 ms or one frame, whichever is longer, unless
|
|||
|
|
otherwise noted in Table 1 (column "ms/packet"). The packetization
|
|||
|
|
interval determines the minimum end-to-end delay; longer packets
|
|||
|
|
introduce less header overhead but higher delay and make packet loss
|
|||
|
|
more noticeable. For non-interactive applications such as lectures
|
|||
|
|
or for links with severe bandwidth constraints, a higher
|
|||
|
|
packetization delay MAY be used. A receiver SHOULD accept packets
|
|||
|
|
representing between 0 and 200 ms of audio data. (For framed audio
|
|||
|
|
encodings, a receiver SHOULD accept packets with a number of frames
|
|||
|
|
equal to 200 ms divided by the frame duration, rounded up.) This
|
|||
|
|
restriction allows reasonable buffer sizing for the receiver.
|
|||
|
|
|
|||
|
|
4.3 Guidelines for Sample-Based Audio Encodings
|
|||
|
|
|
|||
|
|
In sample-based encodings, each audio sample is represented by a
|
|||
|
|
fixed number of bits. Within the compressed audio data, codes for
|
|||
|
|
individual samples may span octet boundaries. An RTP audio packet
|
|||
|
|
may contain any number of audio samples, subject to the constraint
|
|||
|
|
that the number of bits per sample times the number of samples per
|
|||
|
|
packet yields an integral octet count. Fractional encodings produce
|
|||
|
|
less than one octet per sample.
|
|||
|
|
|
|||
|
|
The duration of an audio packet is determined by the number of
|
|||
|
|
samples in the packet.
|
|||
|
|
|
|||
|
|
For sample-based encodings producing one or more octets per sample,
|
|||
|
|
samples from different channels sampled at the same sampling instant
|
|||
|
|
SHOULD be packed in consecutive octets. For example, for a two-
|
|||
|
|
channel encoding, the octet sequence is (left channel, first sample),
|
|||
|
|
(right channel, first sample), (left channel, second sample), (right
|
|||
|
|
channel, second sample), .... For multi-octet encodings, octets
|
|||
|
|
SHOULD be transmitted in network byte order (i.e., most significant
|
|||
|
|
octet first).
|
|||
|
|
|
|||
|
|
The packing of sample-based encodings producing less than one octet
|
|||
|
|
per sample is encoding-specific.
|
|||
|
|
|
|||
|
|
The RTP timestamp reflects the instant at which the first sample in
|
|||
|
|
the packet was sampled, that is, the oldest information in the
|
|||
|
|
packet.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 10]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.4 Guidelines for Frame-Based Audio Encodings
|
|||
|
|
|
|||
|
|
Frame-based encodings encode a fixed-length block of audio into
|
|||
|
|
another block of compressed data, typically also of fixed length.
|
|||
|
|
For frame-based encodings, the sender MAY choose to combine several
|
|||
|
|
such frames into a single RTP packet. The receiver can tell the
|
|||
|
|
number of frames contained in an RTP packet, if all the frames have
|
|||
|
|
the same length, by dividing the RTP payload length by the audio
|
|||
|
|
frame size which is defined as part of the encoding. This does not
|
|||
|
|
work when carrying frames of different sizes unless the frame sizes
|
|||
|
|
are relatively prime. If not, the frames MUST indicate their size.
|
|||
|
|
|
|||
|
|
For frame-based codecs, the channel order is defined for the whole
|
|||
|
|
block. That is, for two-channel audio, right and left samples SHOULD
|
|||
|
|
be coded independently, with the encoded frame for the left channel
|
|||
|
|
preceding that for the right channel.
|
|||
|
|
|
|||
|
|
All frame-oriented audio codecs SHOULD be able to encode and decode
|
|||
|
|
several consecutive frames within a single packet. Since the frame
|
|||
|
|
size for the frame-oriented codecs is given, there is no need to use
|
|||
|
|
a separate designation for the same encoding, but with different
|
|||
|
|
number of frames per packet.
|
|||
|
|
|
|||
|
|
RTP packets SHALL contain a whole number of frames, with frames
|
|||
|
|
inserted according to age within a packet, so that the oldest frame
|
|||
|
|
(to be played first) occurs immediately after the RTP packet header.
|
|||
|
|
The RTP timestamp reflects the instant at which the first sample in
|
|||
|
|
the first frame was sampled, that is, the oldest information in the
|
|||
|
|
packet.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 11]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5 Audio Encodings
|
|||
|
|
|
|||
|
|
name of sampling default
|
|||
|
|
encoding sample/frame bits/sample rate ms/frame ms/packet
|
|||
|
|
__________________________________________________________________
|
|||
|
|
DVI4 sample 4 var. 20
|
|||
|
|
G722 sample 8 16,000 20
|
|||
|
|
G723 frame N/A 8,000 30 30
|
|||
|
|
G726-40 sample 5 8,000 20
|
|||
|
|
G726-32 sample 4 8,000 20
|
|||
|
|
G726-24 sample 3 8,000 20
|
|||
|
|
G726-16 sample 2 8,000 20
|
|||
|
|
G728 frame N/A 8,000 2.5 20
|
|||
|
|
G729 frame N/A 8,000 10 20
|
|||
|
|
G729D frame N/A 8,000 10 20
|
|||
|
|
G729E frame N/A 8,000 10 20
|
|||
|
|
GSM frame N/A 8,000 20 20
|
|||
|
|
GSM-EFR frame N/A 8,000 20 20
|
|||
|
|
L8 sample 8 var. 20
|
|||
|
|
L16 sample 16 var. 20
|
|||
|
|
LPC frame N/A 8,000 20 20
|
|||
|
|
MPA frame N/A var. var.
|
|||
|
|
PCMA sample 8 var. 20
|
|||
|
|
PCMU sample 8 var. 20
|
|||
|
|
QCELP frame N/A 8,000 20 20
|
|||
|
|
VDVI sample var. var. 20
|
|||
|
|
|
|||
|
|
Table 1: Properties of Audio Encodings (N/A: not applicable; var.:
|
|||
|
|
variable)
|
|||
|
|
|
|||
|
|
The characteristics of the audio encodings described in this document
|
|||
|
|
are shown in Table 1; they are listed in order of their payload type
|
|||
|
|
in Table 4. While most audio codecs are only specified for a fixed
|
|||
|
|
sampling rate, some sample-based algorithms (indicated by an entry of
|
|||
|
|
"var." in the sampling rate column of Table 1) may be used with
|
|||
|
|
different sampling rates, resulting in different coded bit rates.
|
|||
|
|
When used with a sampling rate other than that for which a static
|
|||
|
|
payload type is defined, non-RTP means beyond the scope of this memo
|
|||
|
|
MUST be used to define a dynamic payload type and MUST indicate the
|
|||
|
|
selected RTP timestamp clock rate, which is usually the same as the
|
|||
|
|
sampling rate for audio.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 12]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.1 DVI4
|
|||
|
|
|
|||
|
|
DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding
|
|||
|
|
scheme that was specified by the Interactive Multimedia Association
|
|||
|
|
(IMA) as the "IMA ADPCM wave type". However, the encoding defined
|
|||
|
|
here as DVI4 differs in three respects from the IMA specification:
|
|||
|
|
|
|||
|
|
o The RTP DVI4 header contains the predicted value rather than the
|
|||
|
|
first sample value contained the IMA ADPCM block header.
|
|||
|
|
|
|||
|
|
o IMA ADPCM blocks contain an odd number of samples, since the first
|
|||
|
|
sample of a block is contained just in the header (uncompressed),
|
|||
|
|
followed by an even number of compressed samples. DVI4 has an
|
|||
|
|
even number of compressed samples only, using the `predict' word
|
|||
|
|
from the header to decode the first sample.
|
|||
|
|
|
|||
|
|
o For DVI4, the 4-bit samples are packed with the first sample in
|
|||
|
|
the four most significant bits and the second sample in the four
|
|||
|
|
least significant bits. In the IMA ADPCM codec, the samples are
|
|||
|
|
packed in the opposite order.
|
|||
|
|
|
|||
|
|
Each packet contains a single DVI block. This profile only defines
|
|||
|
|
the 4-bit-per-sample version, while IMA also specified a 3-bit-per-
|
|||
|
|
sample encoding.
|
|||
|
|
|
|||
|
|
The "header" word for each channel has the following structure:
|
|||
|
|
|
|||
|
|
int16 predict; /* predicted value of first sample
|
|||
|
|
from the previous block (L16 format) */
|
|||
|
|
u_int8 index; /* current index into stepsize table */
|
|||
|
|
u_int8 reserved; /* set to zero by sender, ignored by receiver */
|
|||
|
|
|
|||
|
|
Each octet following the header contains two 4-bit samples, thus the
|
|||
|
|
number of samples per packet MUST be even because there is no means
|
|||
|
|
to indicate a partially filled last octet.
|
|||
|
|
|
|||
|
|
Packing of samples for multiple channels is for further study.
|
|||
|
|
|
|||
|
|
The IMA ADPCM algorithm was described in the document IMA Recommended
|
|||
|
|
Practices for Enhancing Digital Audio Compatibility in Multimedia
|
|||
|
|
Systems (version 3.0). However, the Interactive Multimedia
|
|||
|
|
Association ceased operations in 1997. Resources for an archived
|
|||
|
|
copy of that document and a software implementation of the RTP DVI4
|
|||
|
|
encoding are listed in Section 13.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 13]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.2 G722
|
|||
|
|
|
|||
|
|
G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding
|
|||
|
|
within 64 kbit/s". The G.722 encoder produces a stream of octets,
|
|||
|
|
each of which SHALL be octet-aligned in an RTP packet. The first bit
|
|||
|
|
transmitted in the G.722 octet, which is the most significant bit of
|
|||
|
|
the higher sub-band sample, SHALL correspond to the most significant
|
|||
|
|
bit of the octet in the RTP packet.
|
|||
|
|
|
|||
|
|
Even though the actual sampling rate for G.722 audio is 16,000 Hz,
|
|||
|
|
the RTP clock rate for the G722 payload format is 8,000 Hz because
|
|||
|
|
that value was erroneously assigned in RFC 1890 and must remain
|
|||
|
|
unchanged for backward compatibility. The octet rate or sample-pair
|
|||
|
|
rate is 8,000 Hz.
|
|||
|
|
|
|||
|
|
4.5.3 G723
|
|||
|
|
|
|||
|
|
G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech
|
|||
|
|
coder for multimedia communications transmitting at 5.3 and 6.3
|
|||
|
|
kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T
|
|||
|
|
as a mandatory codec for ITU-T H.324 GSTN videophone terminal
|
|||
|
|
applications. The algorithm has a floating point specification in
|
|||
|
|
Annex B to G.723.1, a silence compression algorithm in Annex A to
|
|||
|
|
G.723.1 and a scalable channel coding scheme for wireless
|
|||
|
|
applications in G.723.1 Annex C.
|
|||
|
|
|
|||
|
|
This Recommendation specifies a coded representation that can be used
|
|||
|
|
for compressing the speech signal component of multi-media services
|
|||
|
|
at a very low bit rate. Audio is encoded in 30 ms frames, with an
|
|||
|
|
additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be
|
|||
|
|
one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s
|
|||
|
|
frame), or 4 octets. These 4-octet frames are called SID frames
|
|||
|
|
(Silence Insertion Descriptor) and are used to specify comfort noise
|
|||
|
|
parameters. There is no restriction on how 4, 20, and 24 octet
|
|||
|
|
frames are intermixed. The least significant two bits of the first
|
|||
|
|
octet in the frame determine the frame size and codec type:
|
|||
|
|
|
|||
|
|
bits content octets/frame
|
|||
|
|
00 high-rate speech (6.3 kb/s) 24
|
|||
|
|
01 low-rate speech (5.3 kb/s) 20
|
|||
|
|
10 SID frame 4
|
|||
|
|
11 reserved
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 14]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
It is possible to switch between the two rates at any 30 ms frame
|
|||
|
|
boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of
|
|||
|
|
the encoder and decoder. Receivers MUST accept both data rates and
|
|||
|
|
MUST accept SID frames unless restriction of these capabilities has
|
|||
|
|
been signaled. The MIME registration for G723 in RFC 3555 [7]
|
|||
|
|
specifies parameters that MAY be used with MIME or SDP to restrict to
|
|||
|
|
a single data rate or to restrict the use of SID frames. This coder
|
|||
|
|
was optimized to represent speech with near-toll quality at the above
|
|||
|
|
rates using a limited amount of complexity.
|
|||
|
|
|
|||
|
|
The packing of the encoded bit stream into octets and the
|
|||
|
|
transmission order of the octets is specified in Rec. G.723.1 and is
|
|||
|
|
the same as that produced by the G.723 C code reference
|
|||
|
|
implementation. For the 6.3 kb/s data rate, this packing is
|
|||
|
|
illustrated as follows, where the header (HDR) bits are always "0 0"
|
|||
|
|
as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit
|
|||
|
|
is always set to zero. The diagrams show the bit packing in "network
|
|||
|
|
byte order", also known as big-endian order. The bits of each 32-bit
|
|||
|
|
word are numbered 0 to 31, with the most significant bit on the left
|
|||
|
|
and numbered 0. The octets (bytes) of each word are transmitted most
|
|||
|
|
significant octet first. The bits of each data field are numbered in
|
|||
|
|
the order of the bit stream representation of the encoding (least
|
|||
|
|
significant bit first). The vertical bars indicate the boundaries
|
|||
|
|
between field fragments.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 15]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| LPC |HDR| LPC | LPC | ACL0 |LPC|
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
|
|||
|
|
|5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 |
|
|||
|
|
| | 1 |C| | 3 | 2 | | |
|
|||
|
|
|0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
|
|||
|
|
|4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 |
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
|
|||
|
|
|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 |
|
|||
|
|
| | | 0 | | | 1 | |
|
|||
|
|
|0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1|
|
|||
|
|
|6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| POS1 | POS2 | POS1 | POS2 | POS3 | POS2 |
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1|
|
|||
|
|
|9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2|
|
|||
|
|
| | | 3 | | | | |
|
|||
|
|
|1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0|
|
|||
|
|
|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 1: G.723 (6.3 kb/s) bit packing
|
|||
|
|
|
|||
|
|
For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1",
|
|||
|
|
as shown in Fig. 2, to indicate operation at 5.3 kb/s.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 16]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| LPC |HDR| LPC | LPC | ACL0 |LPC|
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
|
|||
|
|
|5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 |
|
|||
|
|
| | 1 |C| | 3 | 2 | | |
|
|||
|
|
|0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
|
|||
|
|
|4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 |
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
|
|||
|
|
|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| POS0 | POS1 | POS0 | POS1 | POS2 |
|
|||
|
|
| | | | | |
|
|||
|
|
|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
|
|||
|
|
|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 |
|
|||
|
|
| | | | | | | |
|
|||
|
|
|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
|
|||
|
|
|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 2: G.723 (5.3 kb/s) bit packing
|
|||
|
|
|
|||
|
|
The packing of G.723.1 SID (silence) frames, which are indicated by
|
|||
|
|
the header (HDR) bits having the pattern "1 0", is depicted in Fig.
|
|||
|
|
3.
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| LPC |HDR| LPC | LPC | GAIN |LPC|
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
|
|||
|
|
|5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 3: G.723 SID mode bit packing
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 17]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.4 G726-40, G726-32, G726-24, and G726-16
|
|||
|
|
|
|||
|
|
ITU-T Recommendation G.726 describes, among others, the algorithm
|
|||
|
|
recommended for conversion of a single 64 kbit/s A-law or mu-law PCM
|
|||
|
|
channel encoded at 8,000 samples/sec to and from a 40, 32, 24, or 16
|
|||
|
|
kbit/s channel. The conversion is applied to the PCM stream using an
|
|||
|
|
Adaptive Differential Pulse Code Modulation (ADPCM) transcoding
|
|||
|
|
technique. The ADPCM representation consists of a series of
|
|||
|
|
codewords with a one-to-one correspondence to the samples in the PCM
|
|||
|
|
stream. The G726 data rates of 40, 32, 24, and 16 kbit/s have
|
|||
|
|
codewords of 5, 4, 3, and 2 bits, respectively.
|
|||
|
|
|
|||
|
|
The 16 and 24 kbit/s encodings do not provide toll quality speech.
|
|||
|
|
They are designed for used in overloaded Digital Circuit
|
|||
|
|
Multiplication Equipment (DCME). ITU-T G.726 recommends that the 16
|
|||
|
|
and 24 kbit/s encodings should be alternated with higher data rate
|
|||
|
|
encodings to provide an average sample size of between 3.5 and 3.7
|
|||
|
|
bits per sample.
|
|||
|
|
|
|||
|
|
The encodings of G.726 are here denoted as G726-40, G726-32, G726-24,
|
|||
|
|
and G726-16. Prior to 1990, G721 described the 32 kbit/s ADPCM
|
|||
|
|
encoding, and G723 described the 40, 32, and 16 kbit/s encodings.
|
|||
|
|
Thus, G726-32 designates the same algorithm as G721 in RFC 1890.
|
|||
|
|
|
|||
|
|
A stream of G726 codewords contains no information on the encoding
|
|||
|
|
being used, therefore transitions between G726 encoding types are not
|
|||
|
|
permitted within a sequence of packed codewords. Applications MUST
|
|||
|
|
determine the encoding type of packed codewords from the RTP payload
|
|||
|
|
identifier.
|
|||
|
|
|
|||
|
|
No payload-specific header information SHALL be included as part of
|
|||
|
|
the audio data. A stream of G726 codewords MUST be packed into
|
|||
|
|
octets as follows: the first codeword is placed into the first octet
|
|||
|
|
such that the least significant bit of the codeword aligns with the
|
|||
|
|
least significant bit in the octet, the second codeword is then
|
|||
|
|
packed so that its least significant bit coincides with the least
|
|||
|
|
significant unoccupied bit in the octet. When a complete codeword
|
|||
|
|
cannot be placed into an octet, the bits overlapping the octet
|
|||
|
|
boundary are placed into the least significant bits of the next
|
|||
|
|
octet. Packing MUST end with a completely packed final octet. The
|
|||
|
|
number of codewords packed will therefore be a multiple of 8, 2, 8,
|
|||
|
|
and 4 for G726-40, G726-32, G726-24, and G726-16, respectively. An
|
|||
|
|
example of the packing scheme for G726-32 codewords is as shown,
|
|||
|
|
where bit 7 is the least significant bit of the first octet, and bit
|
|||
|
|
A3 is the least significant bit of the first codeword:
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 18]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
|||
|
|
|B B B B|A A A A|D D D D|C C C C| ...
|
|||
|
|
|0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
|||
|
|
|
|||
|
|
An example of the packing scheme for G726-24 codewords follows, where
|
|||
|
|
again bit 7 is the least significant bit of the first octet, and bit
|
|||
|
|
A2 is the least significant bit of the first codeword:
|
|||
|
|
|
|||
|
|
0 1 2
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
|||
|
|
|C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ...
|
|||
|
|
|1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
|||
|
|
|
|||
|
|
Note that the "little-endian" direction in which samples are packed
|
|||
|
|
into octets in the G726-16, -24, -32 and -40 payload formats
|
|||
|
|
specified here is consistent with ITU-T Recommendation X.420, but is
|
|||
|
|
the opposite of what is specified in ITU-T Recommendation I.366.2
|
|||
|
|
Annex E for ATM AAL2 transport. A second set of RTP payload formats
|
|||
|
|
matching the packetization of I.366.2 Annex E and identified by MIME
|
|||
|
|
subtypes AAL2-G726-16, -24, -32 and -40 will be specified in a
|
|||
|
|
separate document.
|
|||
|
|
|
|||
|
|
4.5.5 G728
|
|||
|
|
|
|||
|
|
G728 is specified in ITU-T Recommendation G.728, "Coding of speech at
|
|||
|
|
16 kbit/s using low-delay code excited linear prediction".
|
|||
|
|
|
|||
|
|
A G.278 encoder translates 5 consecutive audio samples into a 10-bit
|
|||
|
|
codebook index, resulting in a bit rate of 16 kb/s for audio sampled
|
|||
|
|
at 8,000 samples per second. The group of five consecutive samples
|
|||
|
|
is called a vector. Four consecutive vectors, labeled V1 to V4
|
|||
|
|
(where V1 is to be played first by the receiver), build one G.728
|
|||
|
|
frame. The four vectors of 40 bits are packed into 5 octets, labeled
|
|||
|
|
B1 through B5. B1 SHALL be placed first in the RTP packet.
|
|||
|
|
|
|||
|
|
Referring to the figure below, the principle for bit order is
|
|||
|
|
"maintenance of bit significance". Bits from an older vector are
|
|||
|
|
more significant than bits from newer vectors. The MSB of the frame
|
|||
|
|
goes to the MSB of B1 and the LSB of the frame goes to LSB of B5.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 19]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
1 2 3 3
|
|||
|
|
0 0 0 0 9
|
|||
|
|
++++++++++++++++++++++++++++++++++++++++
|
|||
|
|
<---V1---><---V2---><---V3---><---V4---> vectors
|
|||
|
|
<--B1--><--B2--><--B3--><--B4--><--B5--> octets
|
|||
|
|
<------------- frame 1 ---------------->
|
|||
|
|
|
|||
|
|
In particular, B1 contains the eight most significant bits of V1,
|
|||
|
|
with the MSB of V1 being the MSB of B1. B2 contains the two least
|
|||
|
|
significant bits of V1, the more significant of the two in its MSB,
|
|||
|
|
and the six most significant bits of V2. B1 SHALL be placed first in
|
|||
|
|
the RTP packet and B5 last.
|
|||
|
|
|
|||
|
|
4.5.6 G729
|
|||
|
|
|
|||
|
|
G729 is specified in ITU-T Recommendation G.729, "Coding of speech at
|
|||
|
|
8 kbit/s using conjugate structure-algebraic code excited linear
|
|||
|
|
prediction (CS-ACELP)". A reduced-complexity version of the G.729
|
|||
|
|
algorithm is specified in Annex A to Rec. G.729. The speech coding
|
|||
|
|
algorithms in the main body of G.729 and in G.729 Annex A are fully
|
|||
|
|
interoperable with each other, so there is no need to further
|
|||
|
|
distinguish between them. An implementation that signals or accepts
|
|||
|
|
use of G729 payload format may implement either G.729 or G.729A
|
|||
|
|
unless restricted by additional signaling specified elsewhere related
|
|||
|
|
specifically to the encoding rather than the payload format. The
|
|||
|
|
G.729 and G.729 Annex A codecs were optimized to represent speech
|
|||
|
|
with high quality, where G.729 Annex A trades some speech quality for
|
|||
|
|
an approximate 50% complexity reduction [10]. See the next Section
|
|||
|
|
(4.5.7) for other data rates added in later G.729 Annexes. For all
|
|||
|
|
data rates, the sampling frequency (and RTP timestamp clock rate) is
|
|||
|
|
8,000 Hz.
|
|||
|
|
|
|||
|
|
A voice activity detector (VAD) and comfort noise generator (CNG)
|
|||
|
|
algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous
|
|||
|
|
voice and data applications and can be used in conjunction with G.729
|
|||
|
|
or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets,
|
|||
|
|
while the G.729 Annex B comfort noise frame occupies 2 octets.
|
|||
|
|
Receivers MUST accept comfort noise frames if restriction of their
|
|||
|
|
use has not been signaled. The MIME registration for G729 in RFC
|
|||
|
|
3555 [7] specifies a parameter that MAY be used with MIME or SDP to
|
|||
|
|
restrict the use of comfort noise frames.
|
|||
|
|
|
|||
|
|
A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A
|
|||
|
|
frames, followed by zero or one G.729 Annex B frames. The presence
|
|||
|
|
of a comfort noise frame can be deduced from the length of the RTP
|
|||
|
|
payload. The default packetization interval is 20 ms (two frames),
|
|||
|
|
but in some situations it may be desirable to send 10 ms packets. An
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 20]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
example would be a transition from speech to comfort noise in the
|
|||
|
|
first 10 ms of the packet. For some applications, a longer
|
|||
|
|
packetization interval may be required to reduce the packet rate.
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|L| L1 | L2 | L3 | P1 |P| C1 |
|
|||
|
|
|0| | | | |0| |
|
|||
|
|
| |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| C1 | S1 | GA1 | GB1 | P2 | C2 |
|
|||
|
|
| 1 1 1| | | | | |
|
|||
|
|
|5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| C2 | S2 | GA2 | GB2 |
|
|||
|
|
| 1 1 1| | | |
|
|||
|
|
|8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 4: G.729 and G.729A bit packing
|
|||
|
|
|
|||
|
|
The transmitted parameters of a G.729/G.729A 10-ms frame, consisting
|
|||
|
|
of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The
|
|||
|
|
mapping of the these parameters is given below in Fig. 4. The
|
|||
|
|
diagrams show the bit packing in "network byte order", also known as
|
|||
|
|
big-endian order. The bits of each 32-bit word are numbered 0 to 31,
|
|||
|
|
with the most significant bit on the left and numbered 0. The octets
|
|||
|
|
(bytes) of each word are transmitted most significant octet first.
|
|||
|
|
The bits of each data field are numbered in the order as produced by
|
|||
|
|
the G.729 C code reference implementation.
|
|||
|
|
|
|||
|
|
The packing of the G.729 Annex B comfort noise frame is shown in Fig.
|
|||
|
|
5.
|
|||
|
|
|
|||
|
|
0 1
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|L| LSF1 | LSF2 | GAIN |R|
|
|||
|
|
|S| | | |E|
|
|||
|
|
|F| | | |S|
|
|||
|
|
|0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero)
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 5: G.729 Annex B bit packing
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 21]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.7 G729D and G729E
|
|||
|
|
|
|||
|
|
Annexes D and E to ITU-T Recommendation G.729 provide additional data
|
|||
|
|
rates. Because the data rate is not signaled in the bitstream, the
|
|||
|
|
different data rates are given distinct RTP encoding names which are
|
|||
|
|
mapped to distinct payload type numbers. G729D indicates a 6.4
|
|||
|
|
kbit/s coding mode (G.729 Annex D, for momentary reduction in channel
|
|||
|
|
capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E,
|
|||
|
|
for improved performance with a wide range of narrow-band input
|
|||
|
|
signals, e.g., music and background noise). Annex E has two
|
|||
|
|
operating modes, backward adaptive and forward adaptive, which are
|
|||
|
|
signaled by the first two bits in each frame (the most significant
|
|||
|
|
two bits of the first octet).
|
|||
|
|
|
|||
|
|
The voice activity detector (VAD) and comfort noise generator (CNG)
|
|||
|
|
algorithm specified in Annex B of G.729 may be used with Annex D and
|
|||
|
|
Annex E frames in addition to G.729 and G.729 Annex A frames. The
|
|||
|
|
algorithm details for the operation of Annexes D and E with the Annex
|
|||
|
|
B CNG are specified in G.729 Annexes F and G. Note that Annexes F
|
|||
|
|
and G do not introduce any new encodings. Receivers MUST accept
|
|||
|
|
comfort noise frames if restriction of their use has not been
|
|||
|
|
signaled. The MIME registrations for G729D and G729E in RFC 3555 [7]
|
|||
|
|
specify a parameter that MAY be used with MIME or SDP to restrict the
|
|||
|
|
use of comfort noise frames.
|
|||
|
|
|
|||
|
|
For G729D, an RTP packet may consist of zero or more G.729 Annex D
|
|||
|
|
frames, followed by zero or one G.729 Annex B frame. Similarly, for
|
|||
|
|
G729E, an RTP packet may consist of zero or more G.729 Annex E
|
|||
|
|
frames, followed by zero or one G.729 Annex B frame. The presence of
|
|||
|
|
a comfort noise frame can be deduced from the length of the RTP
|
|||
|
|
payload.
|
|||
|
|
|
|||
|
|
A single RTP packet must contain frames of only one data rate,
|
|||
|
|
optionally followed by one comfort noise frame. The data rate may be
|
|||
|
|
changed from packet to packet by changing the payload type number.
|
|||
|
|
G.729 Annexes D, E and H describe what the encoding and decoding
|
|||
|
|
algorithms must do to accommodate a change in data rate.
|
|||
|
|
|
|||
|
|
For G729D, the bits of a G.729 Annex D frame are formatted as shown
|
|||
|
|
below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 22]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|L| L1 | L2 | L3 | P1 | C1 |
|
|||
|
|
|0| | | | | |
|
|||
|
|
| |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 |
|
|||
|
|
| | | | | | | | | |
|
|||
|
|
|6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 6: G.729 Annex D bit packing
|
|||
|
|
|
|||
|
|
The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a
|
|||
|
|
total of 118 bits are used. Two bits are appended as "don't care"
|
|||
|
|
bits to complete an integer number of octets for the frame. For
|
|||
|
|
G729E, the bits of a data frame are formatted as shown in the next
|
|||
|
|
two diagrams (cf. Table E.1/G.729). The fields for the G729E forward
|
|||
|
|
adaptive mode are packed as shown in Fig. 7.
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|0 0|L| L1 | L2 | L3 | P1 |P| C0_1|
|
|||
|
|
| |0| | | | |0| |
|
|||
|
|
| | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| | C1_1 | C2_1 | C3_1 | C4_1 |
|
|||
|
|
| | | | | |
|
|||
|
|
|3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 |
|
|||
|
|
| | | | | | |
|
|||
|
|
|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| | C3_2 | C4_2 | GA2 | GB2 |DC |
|
|||
|
|
| | | | | | |
|
|||
|
|
|6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 7: G.729 Annex E (forward adaptive mode) bit packing
|
|||
|
|
|
|||
|
|
The fields for the G729E backward adaptive mode are packed as shown
|
|||
|
|
in Fig. 8.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 23]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1 2 3
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|1 1| P1 |P| C0_1 | C1_1 |
|
|||
|
|
| | |0| 1 1 1| |
|
|||
|
|
| |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 |
|
|||
|
|
| | | | | | | |
|
|||
|
|
|8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| | C0_2 | C1_2 | C2_2 |
|
|||
|
|
| | 1 1 1| | |
|
|||
|
|
|2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
| | C3_2 | C4_2 | GA2 | GB2 |DC |
|
|||
|
|
| | | | | | |
|
|||
|
|
|6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
|
|||
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|
|||
|
|
Figure 8: G.729 Annex E (backward adaptive mode) bit packing
|
|||
|
|
|
|||
|
|
4.5.8 GSM
|
|||
|
|
|
|||
|
|
GSM (Group Speciale Mobile) denotes the European GSM 06.10 standard
|
|||
|
|
for full-rate speech transcoding, ETS 300 961, which is based on
|
|||
|
|
RPE/LTP (residual pulse excitation/long term prediction) coding at a
|
|||
|
|
rate of 13 kb/s [11,12,13]. The text of the standard can be obtained
|
|||
|
|
from:
|
|||
|
|
|
|||
|
|
ETSI (European Telecommunications Standards Institute)
|
|||
|
|
ETSI Secretariat: B.P.152
|
|||
|
|
F-06561 Valbonne Cedex
|
|||
|
|
France
|
|||
|
|
Phone: +33 92 94 42 00
|
|||
|
|
Fax: +33 93 65 47 16
|
|||
|
|
|
|||
|
|
Blocks of 160 audio samples are compressed into 33 octets, for an
|
|||
|
|
effective data rate of 13,200 b/s.
|
|||
|
|
|
|||
|
|
4.5.8.1 General Packaging Issues
|
|||
|
|
|
|||
|
|
The GSM standard (ETS 300 961) specifies the bit stream produced by
|
|||
|
|
the codec, but does not specify how these bits should be packed for
|
|||
|
|
transmission. The packetization specified here has subsequently been
|
|||
|
|
adopted in ETSI Technical Specification TS 101 318. Some software
|
|||
|
|
implementations of the GSM codec use a different packing than that
|
|||
|
|
specified here.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 24]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
field field name bits field field name bits
|
|||
|
|
________________________________________________
|
|||
|
|
1 LARc[0] 6 39 xmc[22] 3
|
|||
|
|
2 LARc[1] 6 40 xmc[23] 3
|
|||
|
|
3 LARc[2] 5 41 xmc[24] 3
|
|||
|
|
4 LARc[3] 5 42 xmc[25] 3
|
|||
|
|
5 LARc[4] 4 43 Nc[2] 7
|
|||
|
|
6 LARc[5] 4 44 bc[2] 2
|
|||
|
|
7 LARc[6] 3 45 Mc[2] 2
|
|||
|
|
8 LARc[7] 3 46 xmaxc[2] 6
|
|||
|
|
9 Nc[0] 7 47 xmc[26] 3
|
|||
|
|
10 bc[0] 2 48 xmc[27] 3
|
|||
|
|
11 Mc[0] 2 49 xmc[28] 3
|
|||
|
|
12 xmaxc[0] 6 50 xmc[29] 3
|
|||
|
|
13 xmc[0] 3 51 xmc[30] 3
|
|||
|
|
14 xmc[1] 3 52 xmc[31] 3
|
|||
|
|
15 xmc[2] 3 53 xmc[32] 3
|
|||
|
|
16 xmc[3] 3 54 xmc[33] 3
|
|||
|
|
17 xmc[4] 3 55 xmc[34] 3
|
|||
|
|
18 xmc[5] 3 56 xmc[35] 3
|
|||
|
|
19 xmc[6] 3 57 xmc[36] 3
|
|||
|
|
20 xmc[7] 3 58 xmc[37] 3
|
|||
|
|
21 xmc[8] 3 59 xmc[38] 3
|
|||
|
|
22 xmc[9] 3 60 Nc[3] 7
|
|||
|
|
23 xmc[10] 3 61 bc[3] 2
|
|||
|
|
24 xmc[11] 3 62 Mc[3] 2
|
|||
|
|
25 xmc[12] 3 63 xmaxc[3] 6
|
|||
|
|
26 Nc[1] 7 64 xmc[39] 3
|
|||
|
|
27 bc[1] 2 65 xmc[40] 3
|
|||
|
|
28 Mc[1] 2 66 xmc[41] 3
|
|||
|
|
29 xmaxc[1] 6 67 xmc[42] 3
|
|||
|
|
30 xmc[13] 3 68 xmc[43] 3
|
|||
|
|
31 xmc[14] 3 69 xmc[44] 3
|
|||
|
|
32 xmc[15] 3 70 xmc[45] 3
|
|||
|
|
33 xmc[16] 3 71 xmc[46] 3
|
|||
|
|
34 xmc[17] 3 72 xmc[47] 3
|
|||
|
|
35 xmc[18] 3 73 xmc[48] 3
|
|||
|
|
36 xmc[19] 3 74 xmc[49] 3
|
|||
|
|
37 xmc[20] 3 75 xmc[50] 3
|
|||
|
|
38 xmc[21] 3 76 xmc[51] 3
|
|||
|
|
|
|||
|
|
Table 2: Ordering of GSM variables
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 25]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7
|
|||
|
|
_____________________________________________________________________
|
|||
|
|
0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3
|
|||
|
|
1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5
|
|||
|
|
2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2
|
|||
|
|
3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1
|
|||
|
|
4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2
|
|||
|
|
5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0
|
|||
|
|
6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04
|
|||
|
|
7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0
|
|||
|
|
8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2
|
|||
|
|
9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1
|
|||
|
|
10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0
|
|||
|
|
11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2
|
|||
|
|
12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0
|
|||
|
|
13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14
|
|||
|
|
14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0
|
|||
|
|
15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2
|
|||
|
|
16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1
|
|||
|
|
17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0
|
|||
|
|
18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2
|
|||
|
|
19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0
|
|||
|
|
20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24
|
|||
|
|
21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0
|
|||
|
|
22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2
|
|||
|
|
23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1
|
|||
|
|
24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0
|
|||
|
|
25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2
|
|||
|
|
26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0
|
|||
|
|
27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34
|
|||
|
|
28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0
|
|||
|
|
29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2
|
|||
|
|
30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1
|
|||
|
|
31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0
|
|||
|
|
32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2
|
|||
|
|
|
|||
|
|
Table 3: GSM payload format
|
|||
|
|
|
|||
|
|
In the GSM packing used by RTP, the bits SHALL be packed beginning
|
|||
|
|
from the most significant bit. Every 160 sample GSM frame is coded
|
|||
|
|
into one 33 octet (264 bit) buffer. Every such buffer begins with a
|
|||
|
|
4 bit signature (0xD), followed by the MSB encoding of the fields of
|
|||
|
|
the frame. The first octet thus contains 1101 in the 4 most
|
|||
|
|
significant bits (0-3) and the 4 most significant bits of F1 (0-3) in
|
|||
|
|
the 4 least significant bits (4-7). The second octet contains the 2
|
|||
|
|
least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so
|
|||
|
|
on. The order of the fields in the frame is described in Table 2.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 26]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.8.2 GSM Variable Names and Numbers
|
|||
|
|
|
|||
|
|
In the RTP encoding we have the bit pattern described in Table 3,
|
|||
|
|
where F.i signifies the ith bit of the field F, bit 0 is the most
|
|||
|
|
significant bit, and the bits of every octet are numbered from 0 to 7
|
|||
|
|
from most to least significant.
|
|||
|
|
|
|||
|
|
4.5.9 GSM-EFR
|
|||
|
|
|
|||
|
|
GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding,
|
|||
|
|
specified in ETS 300 726 which is available from ETSI at the address
|
|||
|
|
given in Section 4.5.8. This codec has a frame length of 244 bits.
|
|||
|
|
For transmission in RTP, each codec frame is packed into a 31 octet
|
|||
|
|
(248 bit) buffer beginning with a 4-bit signature 0xC in a manner
|
|||
|
|
similar to that specified here for the original GSM 06.10 codec. The
|
|||
|
|
packing is specified in ETSI Technical Specification TS 101 318.
|
|||
|
|
|
|||
|
|
4.5.10 L8
|
|||
|
|
|
|||
|
|
L8 denotes linear audio data samples, using 8-bits of precision with
|
|||
|
|
an offset of 128, that is, the most negative signal is encoded as
|
|||
|
|
zero.
|
|||
|
|
|
|||
|
|
4.5.11 L16
|
|||
|
|
|
|||
|
|
L16 denotes uncompressed audio data samples, using 16-bit signed
|
|||
|
|
representation with 65,535 equally divided steps between minimum and
|
|||
|
|
maximum signal level, ranging from -32,768 to 32,767. The value is
|
|||
|
|
represented in two's complement notation and transmitted in network
|
|||
|
|
byte order (most significant byte first).
|
|||
|
|
|
|||
|
|
The MIME registration for L16 in RFC 3555 [7] specifies parameters
|
|||
|
|
that MAY be used with MIME or SDP to indicate that analog pre-
|
|||
|
|
emphasis was applied to the signal before quantization or to indicate
|
|||
|
|
that a multiple-channel audio stream follows a different channel
|
|||
|
|
ordering convention than is specified in Section 4.1.
|
|||
|
|
|
|||
|
|
4.5.12 LPC
|
|||
|
|
|
|||
|
|
LPC designates an experimental linear predictive encoding contributed
|
|||
|
|
by Ron Frederick, which is based on an implementation written by Ron
|
|||
|
|
Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The
|
|||
|
|
codec generates 14 octets for every frame. The framesize is set to
|
|||
|
|
20 ms, resulting in a bit rate of 5,600 b/s.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 27]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.13 MPA
|
|||
|
|
|
|||
|
|
MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary
|
|||
|
|
streams. The encoding is defined in ISO standards ISO/IEC 11172-3
|
|||
|
|
and 13818-3. The encapsulation is specified in RFC 2250 [14].
|
|||
|
|
|
|||
|
|
The encoding may be at any of three levels of complexity, called
|
|||
|
|
Layer I, II and III. The selected layer as well as the sampling rate
|
|||
|
|
and channel count are indicated in the payload. The RTP timestamp
|
|||
|
|
clock rate is always 90,000, independent of the sampling rate.
|
|||
|
|
MPEG-1 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC
|
|||
|
|
11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of
|
|||
|
|
16, 22.05 and 24 kHz. The number of samples per frame is fixed, but
|
|||
|
|
the frame size will vary with the sampling rate and bit rate.
|
|||
|
|
|
|||
|
|
The MIME registration for MPA in RFC 3555 [7] specifies parameters
|
|||
|
|
that MAY be used with MIME or SDP to restrict the selection of layer,
|
|||
|
|
channel count, sampling rate, and bit rate.
|
|||
|
|
|
|||
|
|
4.5.14 PCMA and PCMU
|
|||
|
|
|
|||
|
|
PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio
|
|||
|
|
data is encoded as eight bits per sample, after logarithmic scaling.
|
|||
|
|
PCMU denotes mu-law scaling, PCMA A-law scaling. A detailed
|
|||
|
|
description is given by Jayant and Noll [15]. Each G.711 octet SHALL
|
|||
|
|
be octet-aligned in an RTP packet. The sign bit of each G.711 octet
|
|||
|
|
SHALL correspond to the most significant bit of the octet in the RTP
|
|||
|
|
packet (i.e., assuming the G.711 samples are handled as octets on the
|
|||
|
|
host machine, the sign bit SHALL be the most significant bit of the
|
|||
|
|
octet as defined by the host machine format). The 56 kb/s and 48
|
|||
|
|
kb/s modes of G.711 are not applicable to RTP, since PCMA and PCMU
|
|||
|
|
MUST always be transmitted as 8-bit samples.
|
|||
|
|
|
|||
|
|
See Section 4.1 regarding silence suppression.
|
|||
|
|
|
|||
|
|
4.5.15 QCELP
|
|||
|
|
|
|||
|
|
The Electronic Industries Association (EIA) & Telecommunications
|
|||
|
|
Industry Association (TIA) standard IS-733, "TR45: High Rate Speech
|
|||
|
|
Service Option for Wideband Spread Spectrum Communications Systems",
|
|||
|
|
defines the QCELP audio compression algorithm for use in wireless
|
|||
|
|
CDMA applications. The QCELP CODEC compresses each 20 milliseconds
|
|||
|
|
of 8,000 Hz, 16-bit sampled input speech into one of four different
|
|||
|
|
size output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4
|
|||
|
|
(54 bits) or Rate 1/8 (20 bits). For typical speech patterns, this
|
|||
|
|
results in an average output of 6.8 kb/s for normal mode and 4.7 kb/s
|
|||
|
|
for reduced rate mode. The packetization of the QCELP audio codec is
|
|||
|
|
described in [16].
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 28]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4.5.16 RED
|
|||
|
|
|
|||
|
|
The redundant audio payload format "RED" is specified by RFC 2198
|
|||
|
|
[17]. It defines a means by which multiple redundant copies of an
|
|||
|
|
audio packet may be transmitted in a single RTP stream. Each packet
|
|||
|
|
in such a stream contains, in addition to the audio data for that
|
|||
|
|
packetization interval, a (more heavily compressed) copy of the data
|
|||
|
|
from a previous packetization interval. This allows an approximation
|
|||
|
|
of the data from lost packets to be recovered upon decoding of a
|
|||
|
|
subsequent packet, giving much improved sound quality when compared
|
|||
|
|
with silence substitution for lost packets.
|
|||
|
|
|
|||
|
|
4.5.17 VDVI
|
|||
|
|
|
|||
|
|
VDVI is a variable-rate version of DVI4, yielding speech bit rates of
|
|||
|
|
between 10 and 25 kb/s. It is specified for single-channel operation
|
|||
|
|
only. Samples are packed into octets starting at the most-
|
|||
|
|
significant bit. The last octet is padded with 1 bits if the last
|
|||
|
|
sample does not fill the last octet. This padding is distinct from
|
|||
|
|
the valid codewords. The receiver needs to detect the padding
|
|||
|
|
because there is no explicit count of samples in the packet.
|
|||
|
|
|
|||
|
|
It uses the following encoding:
|
|||
|
|
|
|||
|
|
DVI4 codeword VDVI bit pattern
|
|||
|
|
_______________________________
|
|||
|
|
0 00
|
|||
|
|
1 010
|
|||
|
|
2 1100
|
|||
|
|
3 11100
|
|||
|
|
4 111100
|
|||
|
|
5 1111100
|
|||
|
|
6 11111100
|
|||
|
|
7 11111110
|
|||
|
|
8 10
|
|||
|
|
9 011
|
|||
|
|
10 1101
|
|||
|
|
11 11101
|
|||
|
|
12 111101
|
|||
|
|
13 1111101
|
|||
|
|
14 11111101
|
|||
|
|
15 11111111
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 29]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
5. Video
|
|||
|
|
|
|||
|
|
The following sections describe the video encodings that are defined
|
|||
|
|
in this memo and give their abbreviated names used for
|
|||
|
|
identification. These video encodings and their payload types are
|
|||
|
|
listed in Table 5.
|
|||
|
|
|
|||
|
|
All of these video encodings use an RTP timestamp frequency of 90,000
|
|||
|
|
Hz, the same as the MPEG presentation time stamp frequency. This
|
|||
|
|
frequency yields exact integer timestamp increments for the typical
|
|||
|
|
24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates
|
|||
|
|
and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED
|
|||
|
|
rate for future video encodings used within this profile, other rates
|
|||
|
|
MAY be used. However, it is not sufficient to use the video frame
|
|||
|
|
rate (typically between 15 and 30 Hz) because that does not provide
|
|||
|
|
adequate resolution for typical synchronization requirements when
|
|||
|
|
calculating the RTP timestamp corresponding to the NTP timestamp in
|
|||
|
|
an RTCP SR packet. The timestamp resolution MUST also be sufficient
|
|||
|
|
for the jitter estimate contained in the receiver reports.
|
|||
|
|
|
|||
|
|
For most of these video encodings, the RTP timestamp encodes the
|
|||
|
|
sampling instant of the video image contained in the RTP data packet.
|
|||
|
|
If a video image occupies more than one packet, the timestamp is the
|
|||
|
|
same on all of those packets. Packets from different video images
|
|||
|
|
are distinguished by their different timestamps.
|
|||
|
|
|
|||
|
|
Most of these video encodings also specify that the marker bit of the
|
|||
|
|
RTP header SHOULD be set to one in the last packet of a video frame
|
|||
|
|
and otherwise set to zero. Thus, it is not necessary to wait for a
|
|||
|
|
following packet with a different timestamp to detect that a new
|
|||
|
|
frame should be displayed.
|
|||
|
|
|
|||
|
|
5.1 CelB
|
|||
|
|
|
|||
|
|
The CELL-B encoding is a proprietary encoding proposed by Sun
|
|||
|
|
Microsystems. The byte stream format is described in RFC 2029 [18].
|
|||
|
|
|
|||
|
|
5.2 JPEG
|
|||
|
|
|
|||
|
|
The encoding is specified in ISO Standards 10918-1 and 10918-2. The
|
|||
|
|
RTP payload format is as specified in RFC 2435 [19].
|
|||
|
|
|
|||
|
|
5.3 H261
|
|||
|
|
|
|||
|
|
The encoding is specified in ITU-T Recommendation H.261, "Video codec
|
|||
|
|
for audiovisual services at p x 64 kbit/s". The packetization and
|
|||
|
|
RTP-specific properties are described in RFC 2032 [20].
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 30]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
5.4 H263
|
|||
|
|
|
|||
|
|
The encoding is specified in the 1996 version of ITU-T Recommendation
|
|||
|
|
H.263, "Video coding for low bit rate communication". The
|
|||
|
|
packetization and RTP-specific properties are described in RFC 2190
|
|||
|
|
[21]. The H263-1998 payload format is RECOMMENDED over this one for
|
|||
|
|
use by new implementations.
|
|||
|
|
|
|||
|
|
5.5 H263-1998
|
|||
|
|
|
|||
|
|
The encoding is specified in the 1998 version of ITU-T Recommendation
|
|||
|
|
H.263, "Video coding for low bit rate communication". The
|
|||
|
|
packetization and RTP-specific properties are described in RFC 2429
|
|||
|
|
[22]. Because the 1998 version of H.263 is a superset of the 1996
|
|||
|
|
syntax, this payload format can also be used with the 1996 version of
|
|||
|
|
H.263, and is RECOMMENDED for this use by new implementations. This
|
|||
|
|
payload format does not replace RFC 2190, which continues to be used
|
|||
|
|
by existing implementations, and may be required for backward
|
|||
|
|
compatibility in new implementations. Implementations using the new
|
|||
|
|
features of the 1998 version of H.263 MUST use the payload format
|
|||
|
|
described in RFC 2429.
|
|||
|
|
|
|||
|
|
5.6 MPV
|
|||
|
|
|
|||
|
|
MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary
|
|||
|
|
streams as specified in ISO Standards ISO/IEC 11172 and 13818-2,
|
|||
|
|
respectively. The RTP payload format is as specified in RFC 2250
|
|||
|
|
[14], Section 3.
|
|||
|
|
|
|||
|
|
The MIME registration for MPV in RFC 3555 [7] specifies a parameter
|
|||
|
|
that MAY be used with MIME or SDP to restrict the selection of the
|
|||
|
|
type of MPEG video.
|
|||
|
|
|
|||
|
|
5.7 MP2T
|
|||
|
|
|
|||
|
|
MP2T designates the use of MPEG-2 transport streams, for either audio
|
|||
|
|
or video. The RTP payload format is described in RFC 2250 [14],
|
|||
|
|
Section 2.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 31]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
5.8 nv
|
|||
|
|
|
|||
|
|
The encoding is implemented in the program `nv', version 4, developed
|
|||
|
|
at Xerox PARC by Ron Frederick. Further information is available
|
|||
|
|
from the author:
|
|||
|
|
|
|||
|
|
Ron Frederick
|
|||
|
|
Blue Coat Systems Inc.
|
|||
|
|
650 Almanor Avenue
|
|||
|
|
Sunnyvale, CA 94085
|
|||
|
|
United States
|
|||
|
|
EMail: ronf@bluecoat.com
|
|||
|
|
|
|||
|
|
6. Payload Type Definitions
|
|||
|
|
|
|||
|
|
Tables 4 and 5 define this profile's static payload type values for
|
|||
|
|
the PT field of the RTP data header. In addition, payload type
|
|||
|
|
values in the range 96-127 MAY be defined dynamically through a
|
|||
|
|
conference control protocol, which is beyond the scope of this
|
|||
|
|
document. For example, a session directory could specify that for a
|
|||
|
|
given session, payload type 96 indicates PCMU encoding, 8,000 Hz
|
|||
|
|
sampling rate, 2 channels. Entries in Tables 4 and 5 with payload
|
|||
|
|
type "dyn" have no static payload type assigned and are only used
|
|||
|
|
with a dynamic payload type. Payload type 2 was assigned to G721 in
|
|||
|
|
RFC 1890 and to its equivalent successor G726-32 in draft versions of
|
|||
|
|
this specification, but its use is now deprecated and that static
|
|||
|
|
payload type is marked reserved due to conflicting use for the
|
|||
|
|
payload formats G726-32 and AAL2-G726-32 (see Section 4.5.4).
|
|||
|
|
Payload type 13 indicates the Comfort Noise (CN) payload format
|
|||
|
|
specified in RFC 3389 [9]. Payload type 19 is marked "reserved"
|
|||
|
|
because some draft versions of this specification assigned that
|
|||
|
|
number to an earlier version of the comfort noise payload format.
|
|||
|
|
The payload type range 72-76 is marked "reserved" so that RTCP and
|
|||
|
|
RTP packets can be reliably distinguished (see Section "Summary of
|
|||
|
|
Protocol Constants" of the RTP protocol specification).
|
|||
|
|
|
|||
|
|
The payload types currently defined in this profile are assigned to
|
|||
|
|
exactly one of three categories or media types: audio only, video
|
|||
|
|
only and those combining audio and video. The media types are marked
|
|||
|
|
in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types
|
|||
|
|
of different media types SHALL NOT be interleaved or multiplexed
|
|||
|
|
within a single RTP session, but multiple RTP sessions MAY be used in
|
|||
|
|
parallel to send multiple media types. An RTP source MAY change
|
|||
|
|
payload types within the same media type during a session. See the
|
|||
|
|
section "Multiplexing RTP Sessions" of RFC 3550 for additional
|
|||
|
|
explanation.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 32]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
PT encoding media type clock rate channels
|
|||
|
|
name (Hz)
|
|||
|
|
___________________________________________________
|
|||
|
|
0 PCMU A 8,000 1
|
|||
|
|
1 reserved A
|
|||
|
|
2 reserved A
|
|||
|
|
3 GSM A 8,000 1
|
|||
|
|
4 G723 A 8,000 1
|
|||
|
|
5 DVI4 A 8,000 1
|
|||
|
|
6 DVI4 A 16,000 1
|
|||
|
|
7 LPC A 8,000 1
|
|||
|
|
8 PCMA A 8,000 1
|
|||
|
|
9 G722 A 8,000 1
|
|||
|
|
10 L16 A 44,100 2
|
|||
|
|
11 L16 A 44,100 1
|
|||
|
|
12 QCELP A 8,000 1
|
|||
|
|
13 CN A 8,000 1
|
|||
|
|
14 MPA A 90,000 (see text)
|
|||
|
|
15 G728 A 8,000 1
|
|||
|
|
16 DVI4 A 11,025 1
|
|||
|
|
17 DVI4 A 22,050 1
|
|||
|
|
18 G729 A 8,000 1
|
|||
|
|
19 reserved A
|
|||
|
|
20 unassigned A
|
|||
|
|
21 unassigned A
|
|||
|
|
22 unassigned A
|
|||
|
|
23 unassigned A
|
|||
|
|
dyn G726-40 A 8,000 1
|
|||
|
|
dyn G726-32 A 8,000 1
|
|||
|
|
dyn G726-24 A 8,000 1
|
|||
|
|
dyn G726-16 A 8,000 1
|
|||
|
|
dyn G729D A 8,000 1
|
|||
|
|
dyn G729E A 8,000 1
|
|||
|
|
dyn GSM-EFR A 8,000 1
|
|||
|
|
dyn L8 A var. var.
|
|||
|
|
dyn RED A (see text)
|
|||
|
|
dyn VDVI A var. 1
|
|||
|
|
|
|||
|
|
Table 4: Payload types (PT) for audio encodings
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 33]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
PT encoding media type clock rate
|
|||
|
|
name (Hz)
|
|||
|
|
_____________________________________________
|
|||
|
|
24 unassigned V
|
|||
|
|
25 CelB V 90,000
|
|||
|
|
26 JPEG V 90,000
|
|||
|
|
27 unassigned V
|
|||
|
|
28 nv V 90,000
|
|||
|
|
29 unassigned V
|
|||
|
|
30 unassigned V
|
|||
|
|
31 H261 V 90,000
|
|||
|
|
32 MPV V 90,000
|
|||
|
|
33 MP2T AV 90,000
|
|||
|
|
34 H263 V 90,000
|
|||
|
|
35-71 unassigned ?
|
|||
|
|
72-76 reserved N/A N/A
|
|||
|
|
77-95 unassigned ?
|
|||
|
|
96-127 dynamic ?
|
|||
|
|
dyn H263-1998 V 90,000
|
|||
|
|
|
|||
|
|
Table 5: Payload types (PT) for video and combined
|
|||
|
|
encodings
|
|||
|
|
|
|||
|
|
Session participants agree through mechanisms beyond the scope of
|
|||
|
|
this specification on the set of payload types allowed in a given
|
|||
|
|
session. This set MAY, for example, be defined by the capabilities
|
|||
|
|
of the applications used, negotiated by a conference control protocol
|
|||
|
|
or established by agreement between the human participants.
|
|||
|
|
|
|||
|
|
Audio applications operating under this profile SHOULD, at a minimum,
|
|||
|
|
be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4).
|
|||
|
|
This allows interoperability without format negotiation and ensures
|
|||
|
|
successful negotiation with a conference control protocol.
|
|||
|
|
|
|||
|
|
7. RTP over TCP and Similar Byte Stream Protocols
|
|||
|
|
|
|||
|
|
Under special circumstances, it may be necessary to carry RTP in
|
|||
|
|
protocols offering a byte stream abstraction, such as TCP, possibly
|
|||
|
|
multiplexed with other data. The application MUST define its own
|
|||
|
|
method of delineating RTP and RTCP packets (RTSP [23] provides an
|
|||
|
|
example of such an encapsulation specification).
|
|||
|
|
|
|||
|
|
8. Port Assignment
|
|||
|
|
|
|||
|
|
As specified in the RTP protocol definition, RTP data SHOULD be
|
|||
|
|
carried on an even UDP port number and the corresponding RTCP packets
|
|||
|
|
SHOULD be carried on the next higher (odd) port number.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 34]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Applications operating under this profile MAY use any such UDP port
|
|||
|
|
pair. For example, the port pair MAY be allocated randomly by a
|
|||
|
|
session management program. A single fixed port number pair cannot
|
|||
|
|
be required because multiple applications using this profile are
|
|||
|
|
likely to run on the same host, and there are some operating systems
|
|||
|
|
that do not allow multiple processes to use the same UDP port with
|
|||
|
|
different multicast addresses.
|
|||
|
|
|
|||
|
|
However, port numbers 5004 and 5005 have been registered for use with
|
|||
|
|
this profile for those applications that choose to use them as the
|
|||
|
|
default pair. Applications that operate under multiple profiles MAY
|
|||
|
|
use this port pair as an indication to select this profile if they
|
|||
|
|
are not subject to the constraint of the previous paragraph.
|
|||
|
|
Applications need not have a default and MAY require that the port
|
|||
|
|
pair be explicitly specified. The particular port numbers were
|
|||
|
|
chosen to lie in the range above 5000 to accommodate port number
|
|||
|
|
allocation practice within some versions of the Unix operating
|
|||
|
|
system, where port numbers below 1024 can only be used by privileged
|
|||
|
|
processes and port numbers between 1024 and 5000 are automatically
|
|||
|
|
assigned by the operating system.
|
|||
|
|
|
|||
|
|
9. Changes from RFC 1890
|
|||
|
|
|
|||
|
|
This RFC revises RFC 1890. It is mostly backwards-compatible with
|
|||
|
|
RFC 1890 except for functions removed because two interoperable
|
|||
|
|
implementations were not found. The additions to RFC 1890 codify
|
|||
|
|
existing practice in the use of payload formats under this profile.
|
|||
|
|
Since this profile may be used without using any of the payload
|
|||
|
|
formats listed here, the addition of new payload formats in this
|
|||
|
|
revision does not affect backwards compatibility. The changes are
|
|||
|
|
listed below, categorized into functional and non-functional changes.
|
|||
|
|
|
|||
|
|
Functional changes:
|
|||
|
|
|
|||
|
|
o Section 11, "IANA Considerations" was added to specify the
|
|||
|
|
registration of the name for this profile. That appendix also
|
|||
|
|
references a new Section 3 "Registering Additional Encodings"
|
|||
|
|
which establishes a policy that no additional registration of
|
|||
|
|
static payload types for this profile will be made beyond those
|
|||
|
|
added in this revision and included in Tables 4 and 5. Instead,
|
|||
|
|
additional encoding names may be registered as MIME subtypes for
|
|||
|
|
binding to dynamic payload types. Non-normative references were
|
|||
|
|
added to RFC 3555 [7] where MIME subtypes for all the listed
|
|||
|
|
payload formats are registered, some with optional parameters for
|
|||
|
|
use of the payload formats.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 35]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
o Static payload types 4, 16, 17 and 34 were added to incorporate
|
|||
|
|
IANA registrations made since the publication of RFC 1890, along
|
|||
|
|
with the corresponding payload format descriptions for G723 and
|
|||
|
|
H263.
|
|||
|
|
|
|||
|
|
o Following working group discussion, static payload types 12 and 18
|
|||
|
|
were added along with the corresponding payload format
|
|||
|
|
descriptions for QCELP and G729. Static payload type 13 was
|
|||
|
|
assigned to the Comfort Noise (CN) payload format defined in RFC
|
|||
|
|
3389. Payload type 19 was marked reserved because it had been
|
|||
|
|
temporarily allocated to an earlier version of Comfort Noise
|
|||
|
|
present in some draft revisions of this document.
|
|||
|
|
|
|||
|
|
o The payload format for G721 was renamed to G726-32 following the
|
|||
|
|
ITU-T renumbering, and the payload format description for G726 was
|
|||
|
|
expanded to include the -16, -24 and -40 data rates. Because of
|
|||
|
|
confusion regarding draft revisions of this document, some
|
|||
|
|
implementations of these G726 payload formats packed samples into
|
|||
|
|
octets starting with the most significant bit rather than the
|
|||
|
|
least significant bit as specified here. To partially resolve
|
|||
|
|
this incompatibility, new payload formats named AAL2-G726-16, -24,
|
|||
|
|
-32 and -40 will be specified in a separate document (see note in
|
|||
|
|
Section 4.5.4), and use of static payload type 2 is deprecated as
|
|||
|
|
explained in Section 6.
|
|||
|
|
|
|||
|
|
o Payload formats G729D and G729E were added following the ITU-T
|
|||
|
|
addition of Annexes D and E to Recommendation G.729. Listings
|
|||
|
|
were added for payload formats GSM-EFR, RED, and H263-1998
|
|||
|
|
published in other documents subsequent to RFC 1890. These
|
|||
|
|
additional payload formats are referenced only by dynamic payload
|
|||
|
|
type numbers.
|
|||
|
|
|
|||
|
|
o The descriptions of the payload formats for G722, G728, GSM, VDVI
|
|||
|
|
were expanded.
|
|||
|
|
|
|||
|
|
o The payload format for 1016 audio was removed and its static
|
|||
|
|
payload type assignment 1 was marked "reserved" because two
|
|||
|
|
interoperable implementations were not found.
|
|||
|
|
|
|||
|
|
o Requirements for congestion control were added in Section 2.
|
|||
|
|
|
|||
|
|
o This profile follows the suggestion in the revised RTP spec that
|
|||
|
|
RTCP bandwidth may be specified separately from the session
|
|||
|
|
bandwidth and separately for active senders and passive receivers.
|
|||
|
|
|
|||
|
|
o The mapping of a user pass-phrase string into an encryption key
|
|||
|
|
was deleted from Section 2 because two interoperable
|
|||
|
|
implementations were not found.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 36]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
o The "quadrophonic" sample ordering convention for four-channel
|
|||
|
|
audio was removed to eliminate an ambiguity as noted in Section
|
|||
|
|
4.1.
|
|||
|
|
|
|||
|
|
Non-functional changes:
|
|||
|
|
|
|||
|
|
o In Section 4.1, it is now explicitly stated that silence
|
|||
|
|
suppression is allowed for all audio payload formats. (This has
|
|||
|
|
always been the case and derives from a fundamental aspect of
|
|||
|
|
RTP's design and the motivations for packet audio, but was not
|
|||
|
|
explicit stated before.) The use of comfort noise is also
|
|||
|
|
explained.
|
|||
|
|
|
|||
|
|
o In Section 4.1, the requirement level for setting of the marker
|
|||
|
|
bit on the first packet after silence for audio was changed from
|
|||
|
|
"is" to "SHOULD be", and clarified that the marker bit is set only
|
|||
|
|
when packets are intentionally not sent.
|
|||
|
|
|
|||
|
|
o Similarly, text was added to specify that the marker bit SHOULD be
|
|||
|
|
set to one on the last packet of a video frame, and that video
|
|||
|
|
frames are distinguished by their timestamps.
|
|||
|
|
|
|||
|
|
o RFC references are added for payload formats published after RFC
|
|||
|
|
1890.
|
|||
|
|
|
|||
|
|
o The security considerations and full copyright sections were
|
|||
|
|
added.
|
|||
|
|
|
|||
|
|
o According to Peter Hoddie of Apple, only pre-1994 Macintosh used
|
|||
|
|
the 22254.54 rate and none the 11127.27 rate, so the latter was
|
|||
|
|
dropped from the discussion of suggested sampling frequencies.
|
|||
|
|
|
|||
|
|
o Table 1 was corrected to move some values from the "ms/packet"
|
|||
|
|
column to the "default ms/packet" column where they belonged.
|
|||
|
|
|
|||
|
|
o Since the Interactive Multimedia Association ceased operations, an
|
|||
|
|
alternate resource was provided for a referenced IMA document.
|
|||
|
|
|
|||
|
|
o A note has been added for G722 to clarify a discrepancy between
|
|||
|
|
the actual sampling rate and the RTP timestamp clock rate.
|
|||
|
|
|
|||
|
|
o Small clarifications of the text have been made in several places,
|
|||
|
|
some in response to questions from readers. In particular:
|
|||
|
|
|
|||
|
|
- A definition for "media type" is given in Section 1.1 to allow
|
|||
|
|
the explanation of multiplexing RTP sessions in Section 6 to be
|
|||
|
|
more clear regarding the multiplexing of multiple media.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 37]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
- The explanation of how to determine the number of audio frames
|
|||
|
|
in a packet from the length was expanded.
|
|||
|
|
|
|||
|
|
- More description of the allocation of bandwidth to SDES items
|
|||
|
|
is given.
|
|||
|
|
|
|||
|
|
- A note was added that the convention for the order of channels
|
|||
|
|
specified in Section 4.1 may be overridden by a particular
|
|||
|
|
encoding or payload format specification.
|
|||
|
|
|
|||
|
|
- The terms MUST, SHOULD, MAY, etc. are used as defined in RFC
|
|||
|
|
2119.
|
|||
|
|
|
|||
|
|
o A second author for this document was added.
|
|||
|
|
|
|||
|
|
10. Security Considerations
|
|||
|
|
|
|||
|
|
Implementations using the profile defined in this specification are
|
|||
|
|
subject to the security considerations discussed in the RTP
|
|||
|
|
specification [1]. This profile does not specify any different
|
|||
|
|
security services. The primary function of this profile is to list a
|
|||
|
|
set of data compression encodings for audio and video media.
|
|||
|
|
|
|||
|
|
Confidentiality of the media streams is achieved by encryption.
|
|||
|
|
Because the data compression used with the payload formats described
|
|||
|
|
in this profile is applied end-to-end, encryption may be performed
|
|||
|
|
after compression so there is no conflict between the two operations.
|
|||
|
|
|
|||
|
|
A potential denial-of-service threat exists for data encodings using
|
|||
|
|
compression techniques that have non-uniform receiver-end
|
|||
|
|
computational load. The attacker can inject pathological datagrams
|
|||
|
|
into the stream which are complex to decode and cause the receiver to
|
|||
|
|
be overloaded.
|
|||
|
|
|
|||
|
|
As with any IP-based protocol, in some circumstances a receiver may
|
|||
|
|
be overloaded simply by the receipt of too many packets, either
|
|||
|
|
desired or undesired. Network-layer authentication MAY be used to
|
|||
|
|
discard packets from undesired sources, but the processing cost of
|
|||
|
|
the authentication itself may be too high. In a multicast
|
|||
|
|
environment, source pruning is implemented in IGMPv3 (RFC 3376) [24]
|
|||
|
|
and in multicast routing protocols to allow a receiver to select
|
|||
|
|
which sources are allowed to reach it.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 38]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
11. IANA Considerations
|
|||
|
|
|
|||
|
|
The RTP specification establishes a registry of profile names for use
|
|||
|
|
by higher-level control protocols, such as the Session Description
|
|||
|
|
Protocol (SDP), RFC 2327 [6], to refer to transport methods. This
|
|||
|
|
profile registers the name "RTP/AVP".
|
|||
|
|
|
|||
|
|
Section 3 establishes the policy that no additional registration of
|
|||
|
|
static RTP payload types for this profile will be made beyond those
|
|||
|
|
added in this document revision and included in Tables 4 and 5. IANA
|
|||
|
|
may reference that section in declining to accept any additional
|
|||
|
|
registration requests. In Tables 4 and 5, note that types 1 and 2
|
|||
|
|
have been marked reserved and the set of "dyn" payload types included
|
|||
|
|
has been updated. These changes are explained in Sections 6 and 9.
|
|||
|
|
|
|||
|
|
12. References
|
|||
|
|
|
|||
|
|
12.1 Normative References
|
|||
|
|
|
|||
|
|
[1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
|
|||
|
|
"RTP: A Transport Protocol for Real-Time Applications", RFC
|
|||
|
|
3550, July 2003.
|
|||
|
|
|
|||
|
|
[2] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
|
|||
|
|
Levels", BCP 14, RFC 2119, March 1997.
|
|||
|
|
|
|||
|
|
[3] Apple Computer, "Audio Interchange File Format AIFF-C", August
|
|||
|
|
1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z).
|
|||
|
|
|
|||
|
|
12.2 Informative References
|
|||
|
|
|
|||
|
|
[4] Braden, R., Clark, D. and S. Shenker, "Integrated Services in
|
|||
|
|
the Internet Architecture: an Overview", RFC 1633, June 1994.
|
|||
|
|
|
|||
|
|
[5] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W.
|
|||
|
|
Weiss, "An Architecture for Differentiated Service", RFC 2475,
|
|||
|
|
December 1998.
|
|||
|
|
|
|||
|
|
[6] Handley, M. and V. Jacobson, "SDP: Session Description
|
|||
|
|
Protocol", RFC 2327, April 1998.
|
|||
|
|
|
|||
|
|
[7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP
|
|||
|
|
Payload Types", RFC 3555, July 2003.
|
|||
|
|
|
|||
|
|
[8] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
|
|||
|
|
Mail Extensions (MIME) Part Four: Registration Procedures", BCP
|
|||
|
|
13, RFC 2048, November 1996.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 39]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
[9] Zopf, R., "Real-time Transport Protocol (RTP) Payload for
|
|||
|
|
Comfort Noise (CN)", RFC 3389, September 2002.
|
|||
|
|
|
|||
|
|
[10] Deleam, D. and J.-P. Petit, "Real-time implementations of the
|
|||
|
|
recent ITU-T low bit rate speech coders on the TI TMS320C54X
|
|||
|
|
DSP: results, methodology, and applications", in Proc. of
|
|||
|
|
International Conference on Signal Processing, Technology, and
|
|||
|
|
Applications (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660,
|
|||
|
|
October 1996.
|
|||
|
|
|
|||
|
|
[11] Mouly, M. and M.-B. Pautet, The GSM system for mobile
|
|||
|
|
communications Lassay-les-Chateaux, France: Europe Media
|
|||
|
|
Duplication, 1993.
|
|||
|
|
|
|||
|
|
[12] Degener, J., "Digital Speech Compression", Dr. Dobb's Journal,
|
|||
|
|
December 1994.
|
|||
|
|
|
|||
|
|
[13] Redl, S., Weber, M. and M. Oliphant, An Introduction to GSM
|
|||
|
|
Boston: Artech House, 1995.
|
|||
|
|
|
|||
|
|
[14] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP
|
|||
|
|
Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
|
|||
|
|
|
|||
|
|
[15] Jayant, N. and P. Noll, Digital Coding of Waveforms--Principles
|
|||
|
|
and Applications to Speech and Video Englewood Cliffs, New
|
|||
|
|
Jersey: Prentice-Hall, 1984.
|
|||
|
|
|
|||
|
|
[16] McKay, K., "RTP Payload Format for PureVoice(tm) Audio", RFC
|
|||
|
|
2658, August 1999.
|
|||
|
|
|
|||
|
|
[17] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
|
|||
|
|
Bolot, J.-C., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload
|
|||
|
|
for Redundant Audio Data", RFC 2198, September 1997.
|
|||
|
|
|
|||
|
|
[18] Speer, M. and D. Hoffman, "RTP Payload Format of Sun's CellB
|
|||
|
|
Video Encoding", RFC 2029, October 1996.
|
|||
|
|
|
|||
|
|
[19] Berc, L., Fenner, W., Frederick, R., McCanne, S. and P. Stewart,
|
|||
|
|
"RTP Payload Format for JPEG-Compressed Video", RFC 2435,
|
|||
|
|
October 1998.
|
|||
|
|
|
|||
|
|
[20] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video
|
|||
|
|
Streams", RFC 2032, October 1996.
|
|||
|
|
|
|||
|
|
[21] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190,
|
|||
|
|
September 1997.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 40]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
[22] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
|
|||
|
|
Newell, D., Ott, J., Sullivan, G., Wenger, S. and C. Zhu, "RTP
|
|||
|
|
Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
|
|||
|
|
(H.263+)", RFC 2429, October 1998.
|
|||
|
|
|
|||
|
|
[23] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
|
|||
|
|
Protocol (RTSP)", RFC 2326, April 1998.
|
|||
|
|
|
|||
|
|
[24] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A.
|
|||
|
|
Thyagarajan, "Internet Group Management Protocol, Version 3",
|
|||
|
|
RFC 3376, October 2002.
|
|||
|
|
|
|||
|
|
13. Current Locations of Related Resources
|
|||
|
|
|
|||
|
|
Note: Several sections below refer to the ITU-T Software Tool
|
|||
|
|
Library (STL). It is available from the ITU Sales Service, Place des
|
|||
|
|
Nations, CH-1211 Geneve 20, Switzerland (also check
|
|||
|
|
http://www.itu.int). The ITU-T STL is covered by a license defined
|
|||
|
|
in ITU-T Recommendation G.191, "Software tools for speech and audio
|
|||
|
|
coding standardization".
|
|||
|
|
|
|||
|
|
DVI4
|
|||
|
|
|
|||
|
|
An archived copy of the document IMA Recommended Practices for
|
|||
|
|
Enhancing Digital Audio Compatibility in Multimedia Systems (version
|
|||
|
|
3.0), which describes the IMA ADPCM algorithm, is available at:
|
|||
|
|
|
|||
|
|
http://www.cs.columbia.edu/~hgs/audio/dvi/
|
|||
|
|
|
|||
|
|
An implementation is available from Jack Jansen at
|
|||
|
|
|
|||
|
|
ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar
|
|||
|
|
|
|||
|
|
G722
|
|||
|
|
|
|||
|
|
An implementation of the G.722 algorithm is available as part of the
|
|||
|
|
ITU-T STL, described above.
|
|||
|
|
|
|||
|
|
G723
|
|||
|
|
|
|||
|
|
The reference C code implementation defining the G.723.1 algorithm
|
|||
|
|
and its Annexes A, B, and C are available as an integral part of
|
|||
|
|
Recommendation G.723.1 from the ITU Sales Service, address listed
|
|||
|
|
above. Both the algorithm and C code are covered by a specific
|
|||
|
|
license. The ITU-T Secretariat should be contacted to obtain such
|
|||
|
|
licensing information.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 41]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
G726
|
|||
|
|
|
|||
|
|
G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and
|
|||
|
|
16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An
|
|||
|
|
implementation of the G.726 algorithm is available as part of the
|
|||
|
|
ITU-T STL, described above.
|
|||
|
|
|
|||
|
|
G729
|
|||
|
|
|
|||
|
|
The reference C code implementation defining the G.729 algorithm and
|
|||
|
|
its Annexes A through I are available as an integral part of
|
|||
|
|
Recommendation G.729 from the ITU Sales Service, listed above. Annex
|
|||
|
|
I contains the integrated C source code for all G.729 operating
|
|||
|
|
modes. The G.729 algorithm and associated C code are covered by a
|
|||
|
|
specific license. The contact information for obtaining the license
|
|||
|
|
is available from the ITU-T Secretariat.
|
|||
|
|
|
|||
|
|
GSM
|
|||
|
|
|
|||
|
|
A reference implementation was written by Carsten Bormann and Jutta
|
|||
|
|
Degener (then at TU Berlin, Germany). It is available at
|
|||
|
|
|
|||
|
|
http://www.dmn.tzi.org/software/gsm/
|
|||
|
|
|
|||
|
|
Although the RPE-LTP algorithm is not an ITU-T standard, there is a C
|
|||
|
|
code implementation of the RPE-LTP algorithm available as part of the
|
|||
|
|
ITU-T STL. The STL implementation is an adaptation of the TU Berlin
|
|||
|
|
version.
|
|||
|
|
|
|||
|
|
LPC
|
|||
|
|
|
|||
|
|
An implementation is available at
|
|||
|
|
|
|||
|
|
ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z
|
|||
|
|
|
|||
|
|
PCMU, PCMA
|
|||
|
|
|
|||
|
|
An implementation of these algorithms is available as part of the
|
|||
|
|
ITU-T STL, described above.
|
|||
|
|
|
|||
|
|
14. Acknowledgments
|
|||
|
|
|
|||
|
|
The comments and careful review of Simao Campos, Richard Cox and AVT
|
|||
|
|
Working Group participants are gratefully acknowledged. The GSM
|
|||
|
|
description was adopted from the IMTC Voice over IP Forum Service
|
|||
|
|
Interoperability Implementation Agreement (January 1997). Fred Burg
|
|||
|
|
and Terry Lyons helped with the G.729 description.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 42]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
15. Intellectual Property Rights Statement
|
|||
|
|
|
|||
|
|
The IETF takes no position regarding the validity or scope of any
|
|||
|
|
intellectual property or other rights that might be claimed to
|
|||
|
|
pertain to the implementation or use of the technology described in
|
|||
|
|
this document or the extent to which any license under such rights
|
|||
|
|
might or might not be available; neither does it represent that it
|
|||
|
|
has made any effort to identify any such rights. Information on the
|
|||
|
|
IETF's procedures with respect to rights in standards-track and
|
|||
|
|
standards-related documentation can be found in BCP-11. Copies of
|
|||
|
|
claims of rights made available for publication and any assurances of
|
|||
|
|
licenses to be made available, or the result of an attempt made to
|
|||
|
|
obtain a general license or permission for the use of such
|
|||
|
|
proprietary rights by implementors or users of this specification can
|
|||
|
|
be obtained from the IETF Secretariat.
|
|||
|
|
|
|||
|
|
The IETF invites any interested party to bring to its attention any
|
|||
|
|
copyrights, patents or patent applications, or other proprietary
|
|||
|
|
rights which may cover technology that may be required to practice
|
|||
|
|
this standard. Please address the information to the IETF Executive
|
|||
|
|
Director.
|
|||
|
|
|
|||
|
|
16. Authors' Addresses
|
|||
|
|
|
|||
|
|
Henning Schulzrinne
|
|||
|
|
Department of Computer Science
|
|||
|
|
Columbia University
|
|||
|
|
1214 Amsterdam Avenue
|
|||
|
|
New York, NY 10027
|
|||
|
|
United States
|
|||
|
|
|
|||
|
|
EMail: schulzrinne@cs.columbia.edu
|
|||
|
|
|
|||
|
|
|
|||
|
|
Stephen L. Casner
|
|||
|
|
Packet Design
|
|||
|
|
3400 Hillview Avenue, Building 3
|
|||
|
|
Palo Alto, CA 94304
|
|||
|
|
United States
|
|||
|
|
|
|||
|
|
EMail: casner@acm.org
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 43]
|
|||
|
|
|
|||
|
|
RFC 3551 RTP A/V Profile July 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
17. Full Copyright Statement
|
|||
|
|
|
|||
|
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|
|||
|
|
This document and translations of it may be copied and furnished to
|
|||
|
|
others, and derivative works that comment on or otherwise explain it
|
|||
|
|
or assist in its implementation may be prepared, copied, published
|
|||
|
|
and distributed, in whole or in part, without restriction of any
|
|||
|
|
kind, provided that the above copyright notice and this paragraph are
|
|||
|
|
included on all such copies and derivative works. However, this
|
|||
|
|
document itself may not be modified in any way, such as by removing
|
|||
|
|
the copyright notice or references to the Internet Society or other
|
|||
|
|
Internet organizations, except as needed for the purpose of
|
|||
|
|
developing Internet standards in which case the procedures for
|
|||
|
|
copyrights defined in the Internet Standards process must be
|
|||
|
|
followed, or as required to translate it into languages other than
|
|||
|
|
English.
|
|||
|
|
|
|||
|
|
The limited permissions granted above are perpetual and will not be
|
|||
|
|
revoked by the Internet Society or its successors or assigns.
|
|||
|
|
|
|||
|
|
This document and the information contained herein is provided on an
|
|||
|
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
|||
|
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
|||
|
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
|||
|
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
|||
|
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|||
|
|
|
|||
|
|
Acknowledgement
|
|||
|
|
|
|||
|
|
Funding for the RFC Editor function is currently provided by the
|
|||
|
|
Internet Society.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Schulzrinne & Casner Standards Track [Page 44]
|
|||
|
|
|