14 KiB
| title | author | date |
|---|---|---|
| OPUS-A2DP-0.5 specification | Pauli Virtanen <pav@iki.fi> | Jun 4, 2022 |
OPUS-A2DP-0.5 specification
DRAFT
In this file, we specify how to use Opus as an A2DP vendor codec. We will call this "OPUS-A2DP-0.5". There is no previous public specification for using Opus as an A2DP vendor codec (to my knowledge), which is why we need this one.
[[TOC]]
A2DP Codec Capabilities
The A2DP capability structure is as follows.
Integer fields and multi-byte bitfields are laid out in little endian order. All integer fields are unsigned.
Each entry may have different meaning when present as a capability. Below, we indicate this by abbreviations CAP/SNK for sink capability, CAP/SRC for source capability, CAP for capability as either, and SEL for the selected value by SRC.
Bits in fields marked RFA (Reserved For Additions) shall be set to zero.
The capability and configuration structure is as follows:
| Octet | Bits | Meaning |
|---|---|---|
| 0-5 | 0-7 | Vendor ID Part |
| 6-7 | 0-7 | Channel Configuration |
| 8-11 | 0-7 | Audio Location Configuration |
| 12-14 | 0-7 | Limits Configuration |
| 15-16 | 0-7 | Return Direction Channel Configuration |
| 17-20 | 0-7 | Return Direction Audio Location Configuration |
| 21-23 | 0-7 | Return Direction Limits Configuration |
See a2dp-codec-caps.h for definition as C structs.
Vendor ID Part
The fixed value
| Octet | Bits | Meaning |
|---|---|---|
| 0-3 | 0-7 | A2DP Vendor ID (0x05F1) |
| 4-5 | 0-7 | A2DP Vendor Codec ID (0x1005) |
The Vendor ID is that of the Linux Foundation, and we are using it here unofficially.
Channel Configuration
The channel configuration consists of the channel count and a bitfield indicating which of them are encoded in coupled streams.
| Octet | Bits | Meaning |
|---|---|---|
| 6 | 0-7 | Channel Count. CAP: maximum number supported. SEL: actual. |
| 7 | 0-7 | Coupled Stream Count. CAP: 0. SEL: actual. |
The Channel Count indicates the number of logical channels encoded in the data stream.
The Coupled Stream Count indicates the number of streams that encode a
coupled (left & right) channel pair. The count shall satisfy
(Channel Count) >= 2*(Coupled Stream Count).
The Stream Count is (Channel Count) - (Coupled Stream Count).
Streams and Coupled Streams have the same meaning as in Sec. 5.1.1 of Opus Multistream RFC7845.
The logical Channels are identified by a Channel Index j such that 0 <= j < (Channel Count). The channels 0 <= j < 2*(Coupled Stream Count)
are encoded in the k-th stream of the payload, where k = floor(j/2) and
j mod 2 determines which of the two channels of the stream the logical
channel is. The channels 2*(Coupled Stream Count) <= j < (Channel Count)
are encoded in the k-th stream of the payload, where k = j - (Coupled Stream Count).
The prescription here is identical to RFC7845 with channel mapping
mapping[j] = j.
The semantic meaning for each channel is determined by their Audio Location.
Audio Location Configuration
The channel audio location specification is similar to the location
bitfield of the Audio_Channel_Allocation LTV structure in Bluetooth
SIG Assigned Numbers, Generic Audio used in the LE Audio.
| Octet | Bits | Meaning |
|---|---|---|
| 8-11 | 0-7 | Audio Location bitfield. CAP: available. SEL: actual |
The values specified in CAP are informative, and SEL may contain bits that were not set in CAP. SNK shall handle unsupported audio locations. It may do this for example by ignoring unsupported channels or via suitable up/downmixing. Hence, SRC may transmit channels with audio locations that are not marked supported by SNK. The maximum Channel Count however shall not be exceeded.
The audio location bitfield values defined in Assigned Numbers, Generic Audio are:
| Channel Order | Bitmask | Audio Location |
|---|---|---|
| 0 | 0x00000001 | Front Left |
| 1 | 0x00000002 | Front Right |
| 2 | 0x00000400 | Side Left |
| 3 | 0x00000800 | Side Right |
| 4 | 0x00000010 | Back Left |
| 5 | 0x00000020 | Back Right |
| 6 | 0x00000040 | Front Left of Center |
| 7 | 0x00000080 | Front Right of Center |
| 8 | 0x00001000 | Top Front Left |
| 9 | 0x00002000 | Top Front Right |
| 10 | 0x00040000 | Top Side Left |
| 11 | 0x00080000 | Top Side Right |
| 12 | 0x00010000 | Top Back Left |
| 13 | 0x00020000 | Top Back Right |
| 14 | 0x00400000 | Bottom Front Left |
| 15 | 0x00800000 | Bottom Front Right |
| 16 | 0x01000000 | Front Left Wide |
| 17 | 0x02000000 | Front Right Wide |
| 18 | 0x04000000 | Left Surround |
| 19 | 0x08000000 | Right Surround |
| 20 | 0x00000004 | Front Center |
| 21 | 0x00000100 | Back Center |
| 22 | 0x00004000 | Top Front Center |
| 23 | 0x00008000 | Top Center |
| 24 | 0x00100000 | Top Back Center |
| 25 | 0x00200000 | Bottom Front Center |
| 26 | 0x00000008 | Low Frequency Effects 1 |
| 27 | 0x00000200 | Low Frequency Effects 2 |
| 28 | 0x10000000 | RFA |
| 29 | 0x20000000 | RFA |
| 30 | 0x40000000 | RFA |
| 31 | 0x80000000 | RFA |
In addition, we define a specific Channel Order for each. The bits set in the bitfield define audio locations for the streams present in the payload. The set bit with the smallest Channel Order value defines the audio location for the Channel Index j=0, the bit with the next lowest Channel Order value defines the audio location for the Channel Index j=1, and so forth.
When the Channel Count is larger than the number of bits set in the Audio Location bitfield, the audio locations of the remaining channels are unspecified. Implementations may handle them as appropriate for their use case, considering them as AUX0-AUXN, or in the case of Channel Count = 1, as the single mono audio channel.
When the Channel Count is smaller than the number of bits set in the Audio Location bitfield, the audio locations for the channels are assigned as above, and remaining excess bits shall be ignored.
The channel ordering defined here is compatible with the internal stream ordering in the reference Opus Multistream surround encoder Mapping Family 0 and 1 output. This allows making use of its surround masking and LFE handling capabilities. The stream ordering of the reference Opus surround encoder, although being unchanged since its addition in 2013, is an internal detail of the encoder. Implementations using the surround encoder shall check that the mapping table used by the encoder corresponds to the above channel ordering.
For reference, we list the Audio Location bitfield values corresponding to the different channel counts in Opus Mapping Family 0 and 1 surround encoder output, and the expected mapping table:
| Mapping Family | Channel Count | Audio Location Value | Stream Ordering | Mapping Table |
|---|---|---|---|---|
| 0 | 1 | 0x00000000 | mono | {0} |
| 0 | 2 | 0x00000003 | FL, FR | {0, 1} |
| 1 | 1 | 0x00000000 | mono | {0} |
| 1 | 2 | 0x00000003 | FL, FR | {0, 1} |
| 1 | 3 | 0x00000007 | FL, FR, FC | {0, 2, 1} |
| 1 | 4 | 0x00000033 | FL, FR, BL, BR | {0, 1, 2, 3} |
| 1 | 5 | 0x00000037 | FL, FR, BL, BR, FC | {0, 4, 1, 2, 3} |
| 1 | 6 | 0x0000003f | FL, FR, BL, BR, FC, LFE | {0, 4, 1, 2, 3, 5} |
| 1 | 7 | 0x00000d0f | FL, FR, SL, SR, FC, BC, LFE | {0, 4, 1, 2, 3, 5, 6} |
| 1 | 8 | 0x00000c3f | FL, FR, SL, SR, BL, BR, FC, LFE | {0, 6, 1, 2, 3, 4, 5, 7} |
The Mapping Table in the table indicates the mapping table selected by
opus_multistream_surround_encoder_create (Opus 1.3.1). If the
encoder outputs a different mapping table in a future Opus encoder
release, the channel ordering will be incorrect, and the surround
encoder can not be used. We expect that the probability of the Opus
encoder authors making such changes is negligible.
Limits Configuration
The limits for allowed frame durations and maximum bitrate can also be configured.
| Octet | Bits | Meaning |
|---|---|---|
| 16 | 0 | Frame duration 2.5ms. CAP: supported, SEL: selected |
| 16 | 1 | Frame duration 5ms. CAP: supported, SEL: selected |
| 16 | 2 | Frame duration 10ms. CAP: supported, SEL: selected |
| 16 | 3 | Frame duration 20ms. CAP: supported, SEL: selected |
| 16 | 4 | Frame duration 40ms. CAP: supported, SEL: selected |
| 16 | 5-7 | RFA |
| Octet | Bits | Meaning |
|---|---|---|
| 17-18 | 0-7 | Maximum bitrate. CAP: supported, SEL: selected |
The maximum bitrate is given in units of 1024 bits per second.
The maximum bitrate field in CAP may contain value 0 to indicate everything is supported.
Bidirectional Audio Configuration
Bidirectional audio may be supported. Its Channel Configuration, Audio Location Configuration, and Limits Configuration have identical form to the forward direction, and represented by exactly similar structures.
Namely:
| Octet | Bits | Meaning |
|---|---|---|
| 19-20 | 0-7 | Channel Configuration fields, for return direction |
| 21-28 | 0-7 | Audio Location fields, for return direction |
| 29-31 | 0-7 | Limits Configuration fields, for return direction |
If no return channel is supported or selected, the number of channels is set to 0 in CAP or SEL.
Packet Structure
Each packet consists of an RTP header, an RTP payload header, and a payload containing Opus Multistream data.
| Octet | Bits | Meaning |
|---|---|---|
| 0-11 | 0-7 | RTP header |
| 12 | 0-7 | RTP payload header |
| 13-N | 0-7 | Opus Multistream payload |
For each Bluetooth packet, the payload shall contain exactly one Opus Multistream packet, or a fragment of one. The Opus Multistream packet may be fragmented to several consecutive Bluetooth packets.
The format of the Multistream data is the same as in the audio packets of RFC7845, or, as produced/consumed by the Opus Multistream API.
(Note that we DO NOT follow RFC7587, as we want fragmentation and multichannel support.)
RTP Header
See RFC3550.
The RTP payload type is pt=96 (dynamic).
RTP Payload Header
The RTP payload header is used to indicate if and how the Opus Multistream packet is fragmented across several consecutive Bluetooth packets.
| Octet | Bits | Meaning |
|---|---|---|
| 0 | 0-3 | Frame Count |
| 4 | 4 | RFA |
| 4 | 5 | Is Last Fragment |
| 4 | 6 | Is First Fragment |
| 4 | 7 | Is Fragmented |
In each packet, Frame Count indicates how many Bluetooth packets are still to be received (including the present packet) before the Opus Multistream packet is complete.
The Is Fragment flag indicates whether the present packet contains fragmented payload.
The Is Last Fragment flag indicates whether the present packet is the last part of fragmented payload.
The Is First Fragment flag indicates whether the present packet is the first part of fragmented payload.
In non-fragmented packets, Frame Count shall be (1), and the other bits in the header zero.
Opus Payload
The Opus payload is a single Opus Multistream packet, or its fragment.
In case of fragmentation, as indicated by the RTP payload header, concatenating the payloads of the fragment Bluetooth packets shall yield the total Opus Multistream packet.
The SRC should choose encoder parameters such that Bluetooth bandwidth limitations are not exceeded.
The SRC may include FEC data. The SNK may enable forward error correction instead of PLC.