Align RX of streams in same ISO group:
- Ensure all streams in ISO group have same target latency also for BAP
Client
- Determine rate matching to ISO group clock from RX times of all
streams in the group
- Based on this, compute nominal packet RX times, and feed them to
decode-buffer instead of the real RX time. This is enough for
sub-sample level sync.
- Customise buffer overrun handling for ISO so that it drops data to
arrive exactly at the target, for faster convergence at RX start
The ISO clock matching is done based on kernel-provided packet RX times,
so it has unknown offset from the actual ISO clock, probably a few ms.
Current kernels (6.17) do not provide anything better to use for the
clock matching, and doing it properly appears to be controller
vendor-defined (if possible at all).
Take resampler delay into account when computing the buffer fill level,
including the fractional part.
If decode-buffer is now fed nominal packet reference times in
write_packet(), it converges the total buffer + resampler latency to the
target at sub-sample accuracy.
This is needed for aligning RX of ISO streams in the same group, so that
e.g. stereo pair alignment is achieved even though the streams have
separate resamplers. Resampler phases get aligned via independent rate
matching.
The rate matching calculations are done in the system clock domain. If
the driver ticks at a different rate, the correction factor needs to be
adjusted by the rate_diff.
setup_matching() also needs to be before spa_bt_decode_buffer_process():
as follower we should use rate matching value calculated on the
*previous* cycle, because this is what driver is doing when it adjusts
it tick rate.
On production systems, having a constant high latency is favored over
dynamically adjusting it in order to optimize for low latency,
because every time a dynamic adjustment happens, there's a glitch.
This adds an option to let the user specify the exact amount of latency
they want.
The hardcoded latency of 512/<rate> is quite low on some ALSA devices.
Instead of forcing that latency onto the graph, just don't set it at all
unless it originates from the BAP presentation delay. That means that
the functionality remains the same for BAP but changes for A2DP to favor
the preferred quantum of the ALSA sink (or whatever is the driver).
Also, avoid setting an empty string ("") latency and rate in the cases
where it's not defined. This allows users to override those properties
through the wireplumber monitor rules if they need to.
Use kernel BT_PKT_SEQNUM (likely in Linux v6.17) to provide the ISO
packets sequence numbers. Fall back to counting packets if kernel is too
old to support the feature.
Pass zero-length packets to the codec. BAP/ISO may use these to indicate
missing data.
Fix A2DP codecs to not parse input with spa_return_val_if_fail, that's
meant for assertions. Just return -EINVAL directly, it's normal that
input data may contain garbage.
If packet sequence number jumps ahead, or we would underflow, use
codec-provided packet loss concealment to produce some audio data.
When we produce it during underflow, skip the corresponding number of
sequence numbers of future packets.
If codec doesn't have PLC, keep the previous behavior (pad with zeros,
buffering pauses to wait for data).
Change media-source to use sco-io for HFP codecs.
Replace sco-source with media-source.
sco-source is mostly copypaste from media-source, only differed in the
IO handling.
Use codecs via media_codec in sco-source instead of implementing the
decoding in-place.
Also slightly adjust media-source decode semantics.
In future, media-source could replace sco-source to reduce code
duplication.
When we simply need to change some state for the code executed in the
loop, we can use locked() instead of invoke(). This is more efficient
and avoids some context switches in the normal case.
Use kernel-provided packet reception timestamps to get less jitter in
packet timings. Mostly matters for ISO/SCO which have regular schedule.
A2DP (L2CAP) doesn't currently do RX timestamps in kernel, but we can as
well use the same mechanism for it.
As AG, set node.rate for output streams that originate from remote
source, so that graph switches rate as needed. This follows what
pipewire-pulse etc. do.
The patch is by Wim.
Encoders and some decoders have additional internal latency that needs
to be accounted for.
This mostly matters for AAC (~40ms), as the other BT codecs have much
lower delays (~5ms).
Interpolate buffer level to current playback position, and change its
definition so it directly corresponds to the total buffer latency. This
is also a bit simpler.
Now that BlueZ supports delay reporting in A2DP sink role, implement
that.
Report value that gives the total latency between packet reception and
audio rendering.
Also make Latency parameter in media-source to be not just a dummy
value.
Use TX timestamps to get accurate reading of queue length and latency on
kernel + controller side.
This is new kernel BT feature, so requires kernel with the necessary
patches, available currently only in bluetooth-next/master branch.
Enabling Poll Errqueue kernel experimental Bluetooth feature is also
required for this.
Use the latency information to mitigate controller issues where ISO
streams are desynchronized due to tx problems or spontaneously when some
packets that should have been sent are left sitting in the queue, and
transmission is off by a multiple of the ISO interval. This state is
visible in the latency information, so if we see streams in a group have
persistently different latencies, drop packets to resynchronize them.
Also make corrections if the kernel/controller queues get too long, so
that we don't have too big latency there.
Since BlueZ watches the same socket for errors, and TX timestamps arrive
via the socket error queue, we need to set BT_POLL_ERRQUEUE in addition
to SO_TIMESTAMPING so that BlueZ doesn't think TX timestamps are errors.
Link: https://github.com/bluez/bluez/issues/515
Link: https://lore.kernel.org/linux-bluetooth/cover.1710440392.git.pav@iki.fi/
Link: https://lore.kernel.org/linux-bluetooth/f57e065bb571d633f811610d273711c7047af335.1712499936.git.pav@iki.fi/
Buffer sizes smaller than one cycle are possible, so don't assert that.
Instead, just provide as much samples as fits to the buffer.
If we are driver when this happens, emit a warning (once). Similarly to
ALSA, as driver we produce only one buffer at cycle start, and no new
buffers in process. If the whole cycle doesn't fit into the buffer,
recording probably will be broken and we want some debug when there will
be a bug report about that.
The duplex polling issue was due to spa_loop_add_source failing
when source and sink were both using the same fd. We now dup, so the
issue no longer exists.
Remove the now unnecessary workaround, and check the return values from
spa_add_source.
Emit BAP device set nodes, which the session manager can use to combine
the sinks/sources of a device set to a single sink/source.
Emit the actual sinks/sources with media.class=.../Internal to hide them
from pipewire-pulse.
Add separate device set routes to the set leader device. Other routes
of the set members will be marked as unavailable when the set is active.
Accordingly, return failure for attempts to set these unavailable
routes, so that volumes etc. of the "internal" nodes are only controlled
via the device set route.
Drivers should only read the target_ values in the timeout, update the
timeout with the new duration and then update the position.
For the position we simply need to add the previous duration to the
position and then set the new duration + rate.
Otherwise, everything else should read the duration/rate and not use
the target_ values.
Allow asynchronous changes in transport state in the sinks/sources.
Also allow transport acquire to be actually synchronous, in this case it
must set transport state during acquire call.
Separate driver start/stop from transport start/stop.
Add some guards against doing processing when there has been an error or
the node is not started. Set error status to IO. Continue driving on IO
errors.
In media-sink, there's no need to set RCVBUF.
In media-source, we don't need to set NONBLOCK, as reads are done with
DONTWAIT. Don't set SNDBUF as it's not needed there. Don't set RCVBUF,
but use the (big) kernel default value: decode-buffer will handle any
overruns. Small values of RCVBUF might cause problems if kernel is
sending packets in a burst faster than we wake up.
On underflow in sources, pad with explicit silence. This avoids the
audioadapter from getting off sync from the cycle. That causes problems
as driver when we want to produce a buffer only a the start of the
cycle.
In some cases, it's also possible that the io already has buffer at the
start of the cycle when rate matching as driver. Currently, we don't
produce buffer in this case, but we should. Fix that by doing things in
the exact same way as ALSA sources do.
The maximum receive buffer target of 6 packets may be too small when
there's huge jitter in reception. Increase it so that we may use all
buffer available if needed (2*quantum_limit = 370 ms @ 44100).
For SCO, explicitly set maximum buffer to 40 ms, so that latency cannot
grow too large there. For A2DP duplex, set it to 80 ms for same reason.
These are close to the old 6*packet limit.
For BAP server audio sink, set buffering target so that we try to match
the target presentation delay. Also adjust requested node latency to be
smaller than the delay.
Also fix BAP transport presentation delay value parsing, and parse also
the other BAP transport properties. Of these, transport latency value
needs to be taken into account in the total sink latency.
When reading the timerfd gives an error, we should return right away
because the timeout did not happen.
If we change the timerfd timeout before reading it, we can get -EAGAIN.
Don't log an error in that case but wait for the new timeout.