module-rtp: Add documentation about module internals and improve comments

* Add .dox file that describes the internal design of the RTP module. * Add details to the sink module documentation about the meaning of source.ip and how it interacts with local.ifname . * Rename "constant delay mode" to "constant latency mode" comments in the code to match the documentation. * Minor comment fixes.
2026-07-03 00:06:38 -04:00 · 2026-06-30 10:12:50 +02:00 · 2026-06-30 10:12:50 +02:00 · c4a9023215
commit c4a9023215
parent 2f94a49962
7 changed files with 428 additions and 5 deletions
--- a/doc/dox/internals/index.dox
+++ b/doc/dox/internals/index.dox
@ -16,6 +16,7 @@
 - \subpage page_latency
 - \subpage page_tag
 - \subpage page_native_protocol
+- \subpage page_rtp_module_internals


 # Components
--- a/doc/dox/internals/rtp-module-internals.dox
+++ b/doc/dox/internals/rtp-module-internals.dox
@ -0,0 +1,335 @@
+/** \page page_rtp_module_internals RTP sink and source module internals
+
+This document explains the architecture of PipeWire's RTP module.
+
+\tableofcontents
+
+# Introduction {#rtp-module-internals-introduction}
+
+The "RTP module" actually refers to a set of three modules which share source code:
+
+- \ref page_module_rtp_sink "RTP sink module" : Creates an RTP sink node and
+  exposes it to the graph. This sink node places PCM audio into an internal ring
+  buffer. This ring buffer is the source for the data of outgoing packets. The
+  RTP timestamps may be synchronized against PTP time, depending on what buffer
+  mode is used. This module also has a special "separate PTP sender" mode, where
+  the actual send portion is done by an internal mini graph that runs on a special
+  PTP based graph driver.
+- \ref page_module_rtp_source "RTP source module" : Creates an RTP source node
+  and exposes it to the graph. This source node receives RTP packets and places
+  their PCM data into an internal ring buffer. The node's process callback reads
+  from that ring buffer and outputs that data to the graph. Depending on what mode
+  is used, the position that the ring buffer is read from may be synchronized
+  against a PTP time source.
+- \ref page_module_rtp_sap "SAP module" : Announces SAP sessions via multicast,
+  and also listens for SAP sessions. If it discovers another SAP session, it
+  instantiates the RTP source module, which in turn creates and exposes its RTP
+  source node. See RFC 2974 for more about SAP.
+
+For notes about the configuration, see the individual module documentation.
+
+# RTP stream details {#rtp-module-internals-stream-details}
+
+The core of the RTP sink and source modules is the `rtp_stream`. This is built around
+a \ref pw_stream "PipeWire stream". This stream can operate in the `PW_DIRECTION_INPUT`
+direction (used by the RTP sink module) or in the `PW_DIRECTION_OUTPUT` direction
+(used by the RTP source module).
+
+The `rtp_stream` is implemented in `stream.c` and `stream.h`. `stream.c` includes
+`audio.c`, `midi.c`, `opus.c`. These handle media subtype specific setups,
+teardowns, and data processing:
+
+- `audio.c` corresponds to `SPA_MEDIA_SUBTYPE_raw` and handles PCM audio.
+- `midi.c` corresponds to `SPA_MEDIA_SUBTYPE_control` and handles MIDI.
+- `opus.c` is similar to `audio.c`, but corresponds to `SPA_MEDIA_SUBTYPE_opus`,
+  and encodes PCM audio to Opus prior to sending out RTP packets and decodes
+  Opus encoded audio from incoming RTP packets.
+
+The process callback in `rtp_stream` is set by these sources depending
+on the media subtype. Other, `rtp_stream` specific callbacks like a flush timeout
+handler are also set by these sources, since they are media subtype specific.
+
+The RTP sink and source modules are configured via properties, represented by
+`pw_properties`. Both support "stream.props" values inside their properties. These
+values in turn are child `pw_properties` instances that are passed directly to
+their `rtp_stream` instances. The modules also copy some of the values of their
+own properties into that child `pw_properties` instance. The exact list of values
+that are copied over depends on the module. But, this means that some values can
+be set directly in the module properties, or inside the stream.props properties.
+One example of this would be `sess.ts-direct`.
+
+\note This document refers to this as "copying to the stream properties". Actually,
+a value is copied from the module's properties to the stream properties if and only
+if that value is not already set in the stream properties. If it is, the already
+existing value takes priority.
+
+`audio.c` is by far the most complex of the media subtype handlers. All three
+handlers have some notion of the direct timestamp and constant latency modes, but
+`audio.c` is (currently) the only one with the fully reworked implementation that
+this document describes (the `impl->actual_max_buffer_size` modulo scheme,
+`impl->ts_align`, device delay compensation, and the exact over/underrun thresholds).
+`midi.c` and `opus.c` still carry their own, simpler direct-vs-constant-latency
+handling and a `TODO` to converge on the `audio.c` approach. `audio.c` also features
+the separate PTP sender mode, which the other two do not have at all.
+
+## Ring buffer and wrap-around behavior {#rtp-module-internals-ring-buffer-behavior}
+
+The `rtp_stream` sets up a fixed-size ring buffer. Its size is derived from the
+`sess.buffer-size` property, in bytes. Note that this is a *stream* property: it
+is read by `rtp_stream_new()` from the properties it is handed, and - unlike e.g.
+`sess.ts-direct` - neither the sink nor the source module copies it over from its
+own properties, so in practice it can only be set inside `stream.props`.
+
+The `sess.buffer-size` value is not used verbatim. `rtp_stream_new()` derives two
+quantities from it:
+
+- `impl->buffer_size` is `sess.buffer-size` rounded *up* to the next power of two
+  (via `SPA_ROUND_UP_POW2_32()`), and is the size of the actual allocation (that is,
+  of `impl->buffer`). It is a power of two because the `midi.c` and `opus.c` handlers
+  wrap their indices with a bit mask (`impl->buffer_mask`, and `impl->buffer_mask2`
+  against the half-sized `impl->buffer_size2`) rather than a modulo, and masking only
+  wraps correctly for power-of-two sizes. `impl->buffer_size` is generally *not* an
+  integer multiple of the stride.
+- `impl->actual_max_buffer_size` is `impl->buffer_size` rounded *down* to an integer
+  multiple of the stride (via `SPA_ROUND_DOWN()`). This is used by `audio.c`, which
+  - unlike `midi.c` and `opus.c` - wraps via a modulo against this value. `audio.c`
+  was reworked to do this to fix the stride-alignment problem described below;
+  `midi.c` and `opus.c` still use the mask scheme and carry a `TODO` to converge on
+  it.
+
+The actual, allocated buffer is present as `impl->buffer`. This is the pure data
+storage buffer, without any read or write index.
+
+\note `impl->buffer` and `impl->target_buffer` are not to be confused. The former
+is the actual buffer, while the latter is the session latency, converted to RTP
+samples. Furthermore, `sess.buffer-size` and the session latency must be picked such
+that `impl->target_buffer` worth of samples fits within the buffer. Since
+`impl->target_buffer` is in samples while `impl->actual_max_buffer_size` is in bytes,
+this means `impl->target_buffer * stride` must not exceed
+`impl->actual_max_buffer_size` (equivalently, `impl->target_buffer` must not exceed
+`impl->actual_max_buffer_size / stride`).
+
+The stride value depends on the media subtype, and is set internally by `rtp_stream_new()`.
+
+The buffer contents are always interleaved when the number of channels is greater
+than 1 and the data is raw audio (so, this does not apply to MIDI for example).
+The stride value specifies the unit size inside the buffer that contains audio
+data for all channels, played at the exact same time. In the PCM case, the stride
+is (num_channels * bytes_per_pcm_sample).
+
+\note It is important to keep in mind that the way the read and write index are
+handled in this ring buffer deviates somewhat from standard ring buffer usage
+in typical producer-consumer schemes, especially in the direct timestamp mode
+(more on that further below).
+
+The read and write index logic is handled by `impl->ring`. Both read and write
+indices increase monotonically (as free-running values) unless they are
+resynchronized. Because they are free-running rather than being wrapped at the
+buffer boundary, the fill level is simply their difference, and that is what removes
+the usual ambiguity about whether the ring buffer is empty or full. When accessing
+the actual buffer contents, an index is first turned into a byte offset (see below),
+and that offset is then reduced to the buffer bounds - in `audio.c` by taking it
+modulo `impl->actual_max_buffer_size`, and in `midi.c` and `opus.c` by masking it
+with `impl->buffer_mask` / `impl->buffer_mask2`. Reducing modulo
+`impl->actual_max_buffer_size` (rather than the raw `impl->buffer_size`) is essential
+for the buffer modes to work properly (explained further below).
+
+The read and write indices are given in RTP sample units. To access data in the
+buffer, the indices are multiplied by the stride to get a byte offset. This also
+means that the buffer size (which is given in bytes) must be an integer multiple
+of the stride size - otherwise, the read and write indices may refer to places in
+the buffer that cannot contain a full data set for all channels. For example, if
+the stride is 6, and the buffer size is 100, then when the read index is 16, the
+byte offset would be 16*6 = 96 - but there, only 4 bytes could be read, not 6.
+For this reason, the buffer size is internally rounded down to the nearest
+integer multiple of the stride size, as mentioned above.
+
+In the RTP sink module, the `rtp_stream` appends data to the ring buffer at its
+write index, except for when a resynchronization happens - the write index is then
+reset to match the `spa_io_clock.position` value (scaled to RTP sample units).
+One resynchronization always happens at startup. The RTP timestamps of outgoing
+packets are derived from the ring buffer's read index.
+
+In the RTP source module, `rtp_stream` reads data from the ring buffer depending
+on the buffer mode. More on that further below.
+
+## Threading model and data processing {#rtp-module-internals-threading-model}
+
+Most of the code in `stream.c` runs in the stream's main loop, while most of the
+code in the media subtype handlers (`audio.c` etc.) runs in the stream's data loop.
+
+`stream_start()` is called by `on_stream_state_changed()`when the stream's state
+changes to `PW_STREAM_STATE_STREAMING`. At that stage, the stream's data loop is
+running, but the stream's PipeWire graph node is not yet attached to the data loop,
+so no data processing takes place at this time. The attachment happens after
+`on_stream_state_changed()` finished. This means that while `stream_start()` is
+run from the main loop, it is safe to set internal states that are accessed and
+modified by other functions that run in the data loop.
+
+Similarly, `stream_stop()` is called by `on_stream_state_changed()`when the stream's
+state changes to `PW_STREAM_STATE_PAUSED`. (It is not called however if the
+`node.always-process` in the stream.props properties in the RTP source module
+is set to true.) At that stage, the stream's graph node has already been detached
+from the data loop. It therefore is safe for `stream_stop()` to touch internal
+states that normally would be accessed by functions that run in the data loop.
+
+The media subtype handlers each have an init function, like `rtp_audio_init()`.
+This is one of the functions from these handlers that runs in the main loop, since
+these init functions are called by `rtp_stream_new()`. The other functions are:
+
+- `stop_timer()` (called by `stream_start()`)
+- `resend_packets()` (RAOP specific - not used by the RTP sink or source modules)
+- `deinit()` (called by `rtp_stream_destroy()`)
+
+Everything else in the media subtype handlers runs in the data loop, with the
+exception of `ptp_sender_process()` in `audio.c`, which runs under the separate
+PTP sender's own driver and may have a separate data loop.
+
+`audio.c` has two extra specialties:
+
+1. It aggregates the contents of the ring buffer such that it can split it up into
+   RTP packets with the specified packet time (see `rtp.ptime` in the module
+   and stream properties). Depending on how full the ring buffer is, it may decide
+   to send out some of its contents within the current graph cycle, and may use
+   a timer (which runs in the data loop) to schedule the output of the remaining
+   data later, to not risk an xrun by blocking the data loop in the current graph
+   cycle for too long.
+2. The separate PTP sender mode is driven by its own driver. More on that
+   mode is documented further below.
+
+# Buffer modes {#rtp-module-internals-buffer-modes}
+
+\note Read the buffer modes documentation in \ref page_module_rtp_source first
+if not already done.
+
+Also, this section specifically describes how the buffer modes in `audio.c` are
+handled. `midi.c` and `opus.c` do branch on `impl->direct_timestamp` too, but with
+their own, simpler handling (and aligning those with what `audio.c` does is an
+open `TODO`); the detailed behavior described here is `audio.c` specific.
+
+The buffer mode only has a minor influence on the RTP sink module. In the constant
+latency mode, `impl->ts_align` is used in resynchronization cases to avoid a
+discontinuity in the outgoing RTP timestamps. In the direct timestamp mode,
+`impl->ts_align` is not used.
+
+The rest of the buffer mode documentation is about the behavior on the receiving
+side, that is, how the RTP source module uses the `rtp_stream`.
+
+In both modes, received data is inserted into the ring buffer according to the
+RTP timestamp. This timestamp is first shifted into the future by the value of
+`impl->target_buffer`. Then, the ring buffer's write index is advanced. It is
+expected by the code that the sender produces continuous timestamps; that is,
+`rtp_timestamp_of_packet_2 = rtp_timestamp_of_packet_1 + rtp_samples_per_packet`.
+In certain cases, resynchronization may take place; the read and write indices
+are then reset; the read index is set to the timestamp of the next incoming RTP
+packet, while the write index is set to that packet timestamp + `impl->target_buffer`;
+that is, the write index is set to be ahead of the read index by the session
+latency in samples.
+
+The write index is advanced in `rtp_audio_receive()`, the read index is advanced
+in `rtp_audio_process_playback()`.
+
+## Constant latency mode {#rtp-module-internals-constant-latency-mode}
+
+As mentioned in the RTP source module documentation, this is the default mode,
+where the fill level is kept at a steady value, which is `impl->target_buffer`.
+If the fill level is above or below this, a DLL is used to compute an error rate,
+which then is fed into the ASRC of the `pw_stream` the `rtp_stream` is based on.
+The estimated amount of samples that are "in-flight" (that is, samples that
+already were sent out but not yet received or which arrived right after the
+last graph cycle) are also factored into this computation. This establishes a
+control loop that resamples the audio data as needed to maintain the fill level
+at `impl->target_buffer`. Should the difference between the target and the
+actual fill level exceed a threshold, the ring buffer indices are resynchronized.
+
+More concretely, the thresholds work as follows. An *underrun* is detected when
+fewer samples are available than the current graph cycle needs (`avail < wanted`);
+the missing samples are filled with silence and the sync state is dropped.
+An *overrun* on the read side is detected when the fill level exceeds
+`SPA_MIN(target_buffer * 8, impl->buffer_size / stride)`; the excess is dropped
+by advancing the read index so that only `target_buffer` worth of data remains
+(a soft correction, not a full resync). Here `target_buffer` is the
+device-delay-adjusted target (see below), i.e. `impl->target_buffer` minus the
+device delay - the two coincide only when the device delay is zero. On the write
+side (`rtp_audio_receive()`), a fill level exceeding the ring capacity
+`impl->buffer_size / stride` sets `impl->have_sync` to false, forcing a full resync.
+
+\note The factor of 8 in `target_buffer * 8` is an arbitrarily / empirically
+chosen headroom multiplier: it sets how far the fill level may run above the target
+before the buffered data is treated as stale. It is *not* a unit conversion - in
+particular, it is unrelated to the eight bits in a byte, despite the superficial
+resemblance. The `impl->buffer_size / stride` term merely caps this bound at the
+physical ring capacity, in samples.
+
+If the device delay (specified by the `pw_time.delay` value) is nonzero, then it
+is subtracted from `impl->target_buffer`, and the result is then used as the target
+fill level instead of `impl->target_buffer` directly.
+
+## Direct timestamp mode {#rtp-module-internals-direct-timestamp-mode}
+
+Since this mode requires that the graph drivers of sender and receiver are somehow
+synchronized, it implies that, if the sender's and the receiver's
+\ref spa_io_clock::position values are sampled at the exact same moment, they
+are identical. In practice, they usually deviate a bit. This deviation is the
+time sync error, and the time synchronization mechanism that is used tries to
+keep this sync error as minimal as possible.
+
+The aforementioned incoming RTP timestamp shift by `impl->target_buffer` plays
+a crucial role here, since it makes sure the transport delay (which is what
+the session latency specifies in this mode) is accounted for.
+
+This mode is called "direct timestamp" mode since, unlike in the constant latency
+mode, the `rtp_audio_process_playback()` function directly reads from the ring
+buffer at an index that is derived from \ref spa_io_clock::position , even if this
+position jumps around. There is some logic to detect underruns and substitute
+missing data with silence, but discontinuities otherwise have no lasting effect.
+The driver must ensure that the \ref spa_io_clock::position value increases steadily
+(except in major discontinuity cases); clock drift compensation is done by the
+driver by adjusting the graph invocation timings. See \ref page_driver for more.
+
+In this mode, the `rtp_stream` DLL is not used.
+
+# Separate PTP sender {#rtp-module-internals-separate-ptp-sender}
+
+This section covers the *internals* of the separate PTP sender. Its user-facing
+behavior - what it is for, how it is activated via `aes67.driver-group`, and its
+benefits and trade-offs - is documented in \ref page_module_rtp_sink .
+
+Only the `audio.c` media subtype handler supports this mode. When it is enabled,
+`rtp_audio_init()` in `audio.c` creates an internal `pw_filter` node that is kept
+isolated from the graph and is driven by the driver from the `aes67.driver-group`
+node group.
+
+When this separate PTP sender is active, `rtp_audio_process_capture()` behaves
+differently. Rather than computing a drift itself, it stores the sink driver's
+timing information (`impl->sink_nsec`, `impl->sink_next_nsec`,
+`impl->sink_resamp_delay`, `impl->sink_quantum`) for the sender to use. From that
+information, `ptp_sender_process()` estimates the current total delay and computes
+the error between it and the target. That error is fed into a separate dedicated DLL
+(`impl->ptp_dll`), which outputs a rate. That rate (`impl->ptp_corr`) is then applied
+as the ASRC's rate at the start of `rtp_audio_process_capture()`. The ASRC then
+produces larger or smaller amounts of data, filling the ring buffer to a larger or
+smaller degree, thus forming a control loop that keeps the fill level at a certain
+target (see below), similar to what the constant latency mode does.
+
+During the refilling state, no packets are sent out. The refilling state ends once
+the estimated total delay reaches `impl->target_buffer` (which is also what the
+control loop mentioned above targets). That estimated total delay is the sum of
+the current ring buffer fill level, the delay of the ASRC, and the estimated
+amount of samples that are "in-flight" (that is, samples that already were sent
+out but not yet received or which arrived right after the last graph cycle).
+
+Additionally, the sender contains code for checking for too severe deviations
+between the send progress and the current PTP time. The tolerance range is
+2x the quantum size. If the deviation goes beyond that, a resynchronization
+(and consequently, another refilling) is performed. This catches cases where
+the separate sender is starved of data (that is, the main graph is lagging
+behind), and also cases when PTP discontinuities occur.
+
+A similar check exists for the node wake up times. The filter node is scheduled
+by its own driver, independently of the sink node, so their wake ups are not
+inherently aligned. It is therefore important to check that the filter wakes
+up within the bounds of the sink node's wake up times (with some tolerance);
+if it does not, a resynchronization is performed.
+
+*/