mirror of
https://gitlab.freedesktop.org/pipewire/pipewire.git
synced 2026-07-03 00:06:38 -04:00
module-rtp: Add documentation about module internals and improve comments
* Add .dox file that describes the internal design of the RTP module. * Add details to the sink module documentation about the meaning of source.ip and how it interacts with local.ifname . * Rename "constant delay mode" to "constant latency mode" comments in the code to match the documentation. * Minor comment fixes.
This commit is contained in:
parent
2f94a49962
commit
c4a9023215
7 changed files with 428 additions and 5 deletions
|
|
@ -16,6 +16,7 @@
|
|||
- \subpage page_latency
|
||||
- \subpage page_tag
|
||||
- \subpage page_native_protocol
|
||||
- \subpage page_rtp_module_internals
|
||||
|
||||
|
||||
# Components
|
||||
|
|
|
|||
335
doc/dox/internals/rtp-module-internals.dox
Normal file
335
doc/dox/internals/rtp-module-internals.dox
Normal file
|
|
@ -0,0 +1,335 @@
|
|||
/** \page page_rtp_module_internals RTP sink and source module internals
|
||||
|
||||
This document explains the architecture of PipeWire's RTP module.
|
||||
|
||||
\tableofcontents
|
||||
|
||||
# Introduction {#rtp-module-internals-introduction}
|
||||
|
||||
The "RTP module" actually refers to a set of three modules which share source code:
|
||||
|
||||
- \ref page_module_rtp_sink "RTP sink module" : Creates an RTP sink node and
|
||||
exposes it to the graph. This sink node places PCM audio into an internal ring
|
||||
buffer. This ring buffer is the source for the data of outgoing packets. The
|
||||
RTP timestamps may be synchronized against PTP time, depending on what buffer
|
||||
mode is used. This module also has a special "separate PTP sender" mode, where
|
||||
the actual send portion is done by an internal mini graph that runs on a special
|
||||
PTP based graph driver.
|
||||
- \ref page_module_rtp_source "RTP source module" : Creates an RTP source node
|
||||
and exposes it to the graph. This source node receives RTP packets and places
|
||||
their PCM data into an internal ring buffer. The node's process callback reads
|
||||
from that ring buffer and outputs that data to the graph. Depending on what mode
|
||||
is used, the position that the ring buffer is read from may be synchronized
|
||||
against a PTP time source.
|
||||
- \ref page_module_rtp_sap "SAP module" : Announces SAP sessions via multicast,
|
||||
and also listens for SAP sessions. If it discovers another SAP session, it
|
||||
instantiates the RTP source module, which in turn creates and exposes its RTP
|
||||
source node. See RFC 2974 for more about SAP.
|
||||
|
||||
For notes about the configuration, see the individual module documentation.
|
||||
|
||||
# RTP stream details {#rtp-module-internals-stream-details}
|
||||
|
||||
The core of the RTP sink and source modules is the `rtp_stream`. This is built around
|
||||
a \ref pw_stream "PipeWire stream". This stream can operate in the `PW_DIRECTION_INPUT`
|
||||
direction (used by the RTP sink module) or in the `PW_DIRECTION_OUTPUT` direction
|
||||
(used by the RTP source module).
|
||||
|
||||
The `rtp_stream` is implemented in `stream.c` and `stream.h`. `stream.c` includes
|
||||
`audio.c`, `midi.c`, `opus.c`. These handle media subtype specific setups,
|
||||
teardowns, and data processing:
|
||||
|
||||
- `audio.c` corresponds to `SPA_MEDIA_SUBTYPE_raw` and handles PCM audio.
|
||||
- `midi.c` corresponds to `SPA_MEDIA_SUBTYPE_control` and handles MIDI.
|
||||
- `opus.c` is similar to `audio.c`, but corresponds to `SPA_MEDIA_SUBTYPE_opus`,
|
||||
and encodes PCM audio to Opus prior to sending out RTP packets and decodes
|
||||
Opus encoded audio from incoming RTP packets.
|
||||
|
||||
The process callback in `rtp_stream` is set by these sources depending
|
||||
on the media subtype. Other, `rtp_stream` specific callbacks like a flush timeout
|
||||
handler are also set by these sources, since they are media subtype specific.
|
||||
|
||||
The RTP sink and source modules are configured via properties, represented by
|
||||
`pw_properties`. Both support "stream.props" values inside their properties. These
|
||||
values in turn are child `pw_properties` instances that are passed directly to
|
||||
their `rtp_stream` instances. The modules also copy some of the values of their
|
||||
own properties into that child `pw_properties` instance. The exact list of values
|
||||
that are copied over depends on the module. But, this means that some values can
|
||||
be set directly in the module properties, or inside the stream.props properties.
|
||||
One example of this would be `sess.ts-direct`.
|
||||
|
||||
\note This document refers to this as "copying to the stream properties". Actually,
|
||||
a value is copied from the module's properties to the stream properties if and only
|
||||
if that value is not already set in the stream properties. If it is, the already
|
||||
existing value takes priority.
|
||||
|
||||
`audio.c` is by far the most complex of the media subtype handlers. All three
|
||||
handlers have some notion of the direct timestamp and constant latency modes, but
|
||||
`audio.c` is (currently) the only one with the fully reworked implementation that
|
||||
this document describes (the `impl->actual_max_buffer_size` modulo scheme,
|
||||
`impl->ts_align`, device delay compensation, and the exact over/underrun thresholds).
|
||||
`midi.c` and `opus.c` still carry their own, simpler direct-vs-constant-latency
|
||||
handling and a `TODO` to converge on the `audio.c` approach. `audio.c` also features
|
||||
the separate PTP sender mode, which the other two do not have at all.
|
||||
|
||||
## Ring buffer and wrap-around behavior {#rtp-module-internals-ring-buffer-behavior}
|
||||
|
||||
The `rtp_stream` sets up a fixed-size ring buffer. Its size is derived from the
|
||||
`sess.buffer-size` property, in bytes. Note that this is a *stream* property: it
|
||||
is read by `rtp_stream_new()` from the properties it is handed, and - unlike e.g.
|
||||
`sess.ts-direct` - neither the sink nor the source module copies it over from its
|
||||
own properties, so in practice it can only be set inside `stream.props`.
|
||||
|
||||
The `sess.buffer-size` value is not used verbatim. `rtp_stream_new()` derives two
|
||||
quantities from it:
|
||||
|
||||
- `impl->buffer_size` is `sess.buffer-size` rounded *up* to the next power of two
|
||||
(via `SPA_ROUND_UP_POW2_32()`), and is the size of the actual allocation (that is,
|
||||
of `impl->buffer`). It is a power of two because the `midi.c` and `opus.c` handlers
|
||||
wrap their indices with a bit mask (`impl->buffer_mask`, and `impl->buffer_mask2`
|
||||
against the half-sized `impl->buffer_size2`) rather than a modulo, and masking only
|
||||
wraps correctly for power-of-two sizes. `impl->buffer_size` is generally *not* an
|
||||
integer multiple of the stride.
|
||||
- `impl->actual_max_buffer_size` is `impl->buffer_size` rounded *down* to an integer
|
||||
multiple of the stride (via `SPA_ROUND_DOWN()`). This is used by `audio.c`, which
|
||||
- unlike `midi.c` and `opus.c` - wraps via a modulo against this value. `audio.c`
|
||||
was reworked to do this to fix the stride-alignment problem described below;
|
||||
`midi.c` and `opus.c` still use the mask scheme and carry a `TODO` to converge on
|
||||
it.
|
||||
|
||||
The actual, allocated buffer is present as `impl->buffer`. This is the pure data
|
||||
storage buffer, without any read or write index.
|
||||
|
||||
\note `impl->buffer` and `impl->target_buffer` are not to be confused. The former
|
||||
is the actual buffer, while the latter is the session latency, converted to RTP
|
||||
samples. Furthermore, `sess.buffer-size` and the session latency must be picked such
|
||||
that `impl->target_buffer` worth of samples fits within the buffer. Since
|
||||
`impl->target_buffer` is in samples while `impl->actual_max_buffer_size` is in bytes,
|
||||
this means `impl->target_buffer * stride` must not exceed
|
||||
`impl->actual_max_buffer_size` (equivalently, `impl->target_buffer` must not exceed
|
||||
`impl->actual_max_buffer_size / stride`).
|
||||
|
||||
The stride value depends on the media subtype, and is set internally by `rtp_stream_new()`.
|
||||
|
||||
The buffer contents are always interleaved when the number of channels is greater
|
||||
than 1 and the data is raw audio (so, this does not apply to MIDI for example).
|
||||
The stride value specifies the unit size inside the buffer that contains audio
|
||||
data for all channels, played at the exact same time. In the PCM case, the stride
|
||||
is (num_channels * bytes_per_pcm_sample).
|
||||
|
||||
\note It is important to keep in mind that the way the read and write index are
|
||||
handled in this ring buffer deviates somewhat from standard ring buffer usage
|
||||
in typical producer-consumer schemes, especially in the direct timestamp mode
|
||||
(more on that further below).
|
||||
|
||||
The read and write index logic is handled by `impl->ring`. Both read and write
|
||||
indices increase monotonically (as free-running values) unless they are
|
||||
resynchronized. Because they are free-running rather than being wrapped at the
|
||||
buffer boundary, the fill level is simply their difference, and that is what removes
|
||||
the usual ambiguity about whether the ring buffer is empty or full. When accessing
|
||||
the actual buffer contents, an index is first turned into a byte offset (see below),
|
||||
and that offset is then reduced to the buffer bounds - in `audio.c` by taking it
|
||||
modulo `impl->actual_max_buffer_size`, and in `midi.c` and `opus.c` by masking it
|
||||
with `impl->buffer_mask` / `impl->buffer_mask2`. Reducing modulo
|
||||
`impl->actual_max_buffer_size` (rather than the raw `impl->buffer_size`) is essential
|
||||
for the buffer modes to work properly (explained further below).
|
||||
|
||||
The read and write indices are given in RTP sample units. To access data in the
|
||||
buffer, the indices are multiplied by the stride to get a byte offset. This also
|
||||
means that the buffer size (which is given in bytes) must be an integer multiple
|
||||
of the stride size - otherwise, the read and write indices may refer to places in
|
||||
the buffer that cannot contain a full data set for all channels. For example, if
|
||||
the stride is 6, and the buffer size is 100, then when the read index is 16, the
|
||||
byte offset would be 16*6 = 96 - but there, only 4 bytes could be read, not 6.
|
||||
For this reason, the buffer size is internally rounded down to the nearest
|
||||
integer multiple of the stride size, as mentioned above.
|
||||
|
||||
In the RTP sink module, the `rtp_stream` appends data to the ring buffer at its
|
||||
write index, except for when a resynchronization happens - the write index is then
|
||||
reset to match the `spa_io_clock.position` value (scaled to RTP sample units).
|
||||
One resynchronization always happens at startup. The RTP timestamps of outgoing
|
||||
packets are derived from the ring buffer's read index.
|
||||
|
||||
In the RTP source module, `rtp_stream` reads data from the ring buffer depending
|
||||
on the buffer mode. More on that further below.
|
||||
|
||||
## Threading model and data processing {#rtp-module-internals-threading-model}
|
||||
|
||||
Most of the code in `stream.c` runs in the stream's main loop, while most of the
|
||||
code in the media subtype handlers (`audio.c` etc.) runs in the stream's data loop.
|
||||
|
||||
`stream_start()` is called by `on_stream_state_changed()`when the stream's state
|
||||
changes to `PW_STREAM_STATE_STREAMING`. At that stage, the stream's data loop is
|
||||
running, but the stream's PipeWire graph node is not yet attached to the data loop,
|
||||
so no data processing takes place at this time. The attachment happens after
|
||||
`on_stream_state_changed()` finished. This means that while `stream_start()` is
|
||||
run from the main loop, it is safe to set internal states that are accessed and
|
||||
modified by other functions that run in the data loop.
|
||||
|
||||
Similarly, `stream_stop()` is called by `on_stream_state_changed()`when the stream's
|
||||
state changes to `PW_STREAM_STATE_PAUSED`. (It is not called however if the
|
||||
`node.always-process` in the stream.props properties in the RTP source module
|
||||
is set to true.) At that stage, the stream's graph node has already been detached
|
||||
from the data loop. It therefore is safe for `stream_stop()` to touch internal
|
||||
states that normally would be accessed by functions that run in the data loop.
|
||||
|
||||
The media subtype handlers each have an init function, like `rtp_audio_init()`.
|
||||
This is one of the functions from these handlers that runs in the main loop, since
|
||||
these init functions are called by `rtp_stream_new()`. The other functions are:
|
||||
|
||||
- `stop_timer()` (called by `stream_start()`)
|
||||
- `resend_packets()` (RAOP specific - not used by the RTP sink or source modules)
|
||||
- `deinit()` (called by `rtp_stream_destroy()`)
|
||||
|
||||
Everything else in the media subtype handlers runs in the data loop, with the
|
||||
exception of `ptp_sender_process()` in `audio.c`, which runs under the separate
|
||||
PTP sender's own driver and may have a separate data loop.
|
||||
|
||||
`audio.c` has two extra specialties:
|
||||
|
||||
1. It aggregates the contents of the ring buffer such that it can split it up into
|
||||
RTP packets with the specified packet time (see `rtp.ptime` in the module
|
||||
and stream properties). Depending on how full the ring buffer is, it may decide
|
||||
to send out some of its contents within the current graph cycle, and may use
|
||||
a timer (which runs in the data loop) to schedule the output of the remaining
|
||||
data later, to not risk an xrun by blocking the data loop in the current graph
|
||||
cycle for too long.
|
||||
2. The separate PTP sender mode is driven by its own driver. More on that
|
||||
mode is documented further below.
|
||||
|
||||
# Buffer modes {#rtp-module-internals-buffer-modes}
|
||||
|
||||
\note Read the buffer modes documentation in \ref page_module_rtp_source first
|
||||
if not already done.
|
||||
|
||||
Also, this section specifically describes how the buffer modes in `audio.c` are
|
||||
handled. `midi.c` and `opus.c` do branch on `impl->direct_timestamp` too, but with
|
||||
their own, simpler handling (and aligning those with what `audio.c` does is an
|
||||
open `TODO`); the detailed behavior described here is `audio.c` specific.
|
||||
|
||||
The buffer mode only has a minor influence on the RTP sink module. In the constant
|
||||
latency mode, `impl->ts_align` is used in resynchronization cases to avoid a
|
||||
discontinuity in the outgoing RTP timestamps. In the direct timestamp mode,
|
||||
`impl->ts_align` is not used.
|
||||
|
||||
The rest of the buffer mode documentation is about the behavior on the receiving
|
||||
side, that is, how the RTP source module uses the `rtp_stream`.
|
||||
|
||||
In both modes, received data is inserted into the ring buffer according to the
|
||||
RTP timestamp. This timestamp is first shifted into the future by the value of
|
||||
`impl->target_buffer`. Then, the ring buffer's write index is advanced. It is
|
||||
expected by the code that the sender produces continuous timestamps; that is,
|
||||
`rtp_timestamp_of_packet_2 = rtp_timestamp_of_packet_1 + rtp_samples_per_packet`.
|
||||
In certain cases, resynchronization may take place; the read and write indices
|
||||
are then reset; the read index is set to the timestamp of the next incoming RTP
|
||||
packet, while the write index is set to that packet timestamp + `impl->target_buffer`;
|
||||
that is, the write index is set to be ahead of the read index by the session
|
||||
latency in samples.
|
||||
|
||||
The write index is advanced in `rtp_audio_receive()`, the read index is advanced
|
||||
in `rtp_audio_process_playback()`.
|
||||
|
||||
## Constant latency mode {#rtp-module-internals-constant-latency-mode}
|
||||
|
||||
As mentioned in the RTP source module documentation, this is the default mode,
|
||||
where the fill level is kept at a steady value, which is `impl->target_buffer`.
|
||||
If the fill level is above or below this, a DLL is used to compute an error rate,
|
||||
which then is fed into the ASRC of the `pw_stream` the `rtp_stream` is based on.
|
||||
The estimated amount of samples that are "in-flight" (that is, samples that
|
||||
already were sent out but not yet received or which arrived right after the
|
||||
last graph cycle) are also factored into this computation. This establishes a
|
||||
control loop that resamples the audio data as needed to maintain the fill level
|
||||
at `impl->target_buffer`. Should the difference between the target and the
|
||||
actual fill level exceed a threshold, the ring buffer indices are resynchronized.
|
||||
|
||||
More concretely, the thresholds work as follows. An *underrun* is detected when
|
||||
fewer samples are available than the current graph cycle needs (`avail < wanted`);
|
||||
the missing samples are filled with silence and the sync state is dropped.
|
||||
An *overrun* on the read side is detected when the fill level exceeds
|
||||
`SPA_MIN(target_buffer * 8, impl->buffer_size / stride)`; the excess is dropped
|
||||
by advancing the read index so that only `target_buffer` worth of data remains
|
||||
(a soft correction, not a full resync). Here `target_buffer` is the
|
||||
device-delay-adjusted target (see below), i.e. `impl->target_buffer` minus the
|
||||
device delay - the two coincide only when the device delay is zero. On the write
|
||||
side (`rtp_audio_receive()`), a fill level exceeding the ring capacity
|
||||
`impl->buffer_size / stride` sets `impl->have_sync` to false, forcing a full resync.
|
||||
|
||||
\note The factor of 8 in `target_buffer * 8` is an arbitrarily / empirically
|
||||
chosen headroom multiplier: it sets how far the fill level may run above the target
|
||||
before the buffered data is treated as stale. It is *not* a unit conversion - in
|
||||
particular, it is unrelated to the eight bits in a byte, despite the superficial
|
||||
resemblance. The `impl->buffer_size / stride` term merely caps this bound at the
|
||||
physical ring capacity, in samples.
|
||||
|
||||
If the device delay (specified by the `pw_time.delay` value) is nonzero, then it
|
||||
is subtracted from `impl->target_buffer`, and the result is then used as the target
|
||||
fill level instead of `impl->target_buffer` directly.
|
||||
|
||||
## Direct timestamp mode {#rtp-module-internals-direct-timestamp-mode}
|
||||
|
||||
Since this mode requires that the graph drivers of sender and receiver are somehow
|
||||
synchronized, it implies that, if the sender's and the receiver's
|
||||
\ref spa_io_clock::position values are sampled at the exact same moment, they
|
||||
are identical. In practice, they usually deviate a bit. This deviation is the
|
||||
time sync error, and the time synchronization mechanism that is used tries to
|
||||
keep this sync error as minimal as possible.
|
||||
|
||||
The aforementioned incoming RTP timestamp shift by `impl->target_buffer` plays
|
||||
a crucial role here, since it makes sure the transport delay (which is what
|
||||
the session latency specifies in this mode) is accounted for.
|
||||
|
||||
This mode is called "direct timestamp" mode since, unlike in the constant latency
|
||||
mode, the `rtp_audio_process_playback()` function directly reads from the ring
|
||||
buffer at an index that is derived from \ref spa_io_clock::position , even if this
|
||||
position jumps around. There is some logic to detect underruns and substitute
|
||||
missing data with silence, but discontinuities otherwise have no lasting effect.
|
||||
The driver must ensure that the \ref spa_io_clock::position value increases steadily
|
||||
(except in major discontinuity cases); clock drift compensation is done by the
|
||||
driver by adjusting the graph invocation timings. See \ref page_driver for more.
|
||||
|
||||
In this mode, the `rtp_stream` DLL is not used.
|
||||
|
||||
# Separate PTP sender {#rtp-module-internals-separate-ptp-sender}
|
||||
|
||||
This section covers the *internals* of the separate PTP sender. Its user-facing
|
||||
behavior - what it is for, how it is activated via `aes67.driver-group`, and its
|
||||
benefits and trade-offs - is documented in \ref page_module_rtp_sink .
|
||||
|
||||
Only the `audio.c` media subtype handler supports this mode. When it is enabled,
|
||||
`rtp_audio_init()` in `audio.c` creates an internal `pw_filter` node that is kept
|
||||
isolated from the graph and is driven by the driver from the `aes67.driver-group`
|
||||
node group.
|
||||
|
||||
When this separate PTP sender is active, `rtp_audio_process_capture()` behaves
|
||||
differently. Rather than computing a drift itself, it stores the sink driver's
|
||||
timing information (`impl->sink_nsec`, `impl->sink_next_nsec`,
|
||||
`impl->sink_resamp_delay`, `impl->sink_quantum`) for the sender to use. From that
|
||||
information, `ptp_sender_process()` estimates the current total delay and computes
|
||||
the error between it and the target. That error is fed into a separate dedicated DLL
|
||||
(`impl->ptp_dll`), which outputs a rate. That rate (`impl->ptp_corr`) is then applied
|
||||
as the ASRC's rate at the start of `rtp_audio_process_capture()`. The ASRC then
|
||||
produces larger or smaller amounts of data, filling the ring buffer to a larger or
|
||||
smaller degree, thus forming a control loop that keeps the fill level at a certain
|
||||
target (see below), similar to what the constant latency mode does.
|
||||
|
||||
During the refilling state, no packets are sent out. The refilling state ends once
|
||||
the estimated total delay reaches `impl->target_buffer` (which is also what the
|
||||
control loop mentioned above targets). That estimated total delay is the sum of
|
||||
the current ring buffer fill level, the delay of the ASRC, and the estimated
|
||||
amount of samples that are "in-flight" (that is, samples that already were sent
|
||||
out but not yet received or which arrived right after the last graph cycle).
|
||||
|
||||
Additionally, the sender contains code for checking for too severe deviations
|
||||
between the send progress and the current PTP time. The tolerance range is
|
||||
2x the quantum size. If the deviation goes beyond that, a resynchronization
|
||||
(and consequently, another refilling) is performed. This catches cases where
|
||||
the separate sender is starved of data (that is, the main graph is lagging
|
||||
behind), and also cases when PTP discontinuities occur.
|
||||
|
||||
A similar check exists for the node wake up times. The filter node is scheduled
|
||||
by its own driver, independently of the sink node, so their wake ups are not
|
||||
inherently aligned. It is therefore important to check that the filter wakes
|
||||
up within the bounds of the sink node's wake up times (with some tolerance);
|
||||
if it does not, a resynchronization is performed.
|
||||
|
||||
*/
|
||||
Loading…
Add table
Add a link
Reference in a new issue