Take active rate correction properly into account when dropping data on
overrun resync.
Drop data only for the currently processed stream, after data has been
consumed from it. Make sure the rate correction factor is updated after
this for the next cycle of the stream.
Also fix buffer fill level calculation: the fill level interpolation
should use node rate corr, not clock rate diff, since the calculations
are done in system clock domain. Fix same issue in fractional delay
calculation, and take no resampler prefill into account.
Later, we maybe need some more resampler APIs to avoid such details
leaking in.
Previously, stream could have its old rate correction locked in, and its
fill level would then end up off the target on the next cycle.
Also when we are capable of PLC, it's better to buffer audio at start,
to get a buffer level close to the target initially.
Delay ISO overrun handling one cycle after buffering is complete, so
that any resamplers are filled at that point.
When PLC data was produced due to underrun and decode buffer has reached
target level, drop received audio if we already did PLC for that packet.
It's better to lose some packets than having to resync latency.
When about to underrun when PLC is active, update rate matching before
filling up buffer, so that rate slows down and we do not get stuck in a
continuous underrun where PLC fills data and we drop received packets.
The rate matching filter assumes buffer level for cycle j+1 is
buffer(j+1) = buffer(j) + recv(j) - corr(j+1) * duration
but what we are actually doing is instead
buffer(j+1) = buffer(j) + recv(j) - corr(j-1) * duration
because the correction factor that is computed is not used for the next
cycle, but the one following that. Although the filter is still stable
in theory the extra lag causes oscillations to be damped less.
Fix by using the computed correction factor for the next cycle, as
there's no reason why we'd like to have more lag in rate matching.
This then changes c(j-1) -> c(j) in the assumptions, which turns out to
fix the situation. Fix the filter derivation to match. The filter
coefficients stay as they were, and they are actually exactly correct
also for short averaging times.
In practice, it is observed that ISO RX with quantum 4096 converges to
stable rate, whereas previously the matching retained small
oscillations.
As BAP server, when we can't satisfy BAP presentation delay, just accept
a bigger latency and emit a warning, instead of having broken audio.
Also make sure it works if quantum is forced to a larger value than our
wanted node.latency.
Take other latencies into account when selecting the wanted node.latency
for BAP Server.
Fix up port.params usage vs. user flag, which is not used here.
Reset decode buffer rate matching if we need to PLC due to underrun so
it doesn't get stuck at playback at increased rate if target is too
small.
Flush errqueue for iso-io also from the media-source handler, to avoid
dropping packet tx reports.
This applies to bidirectional streams. The sink/source handlers poll on
different fd, one being dup'd, and epoll does not trigger them in any
specific interleaved order.
Align RX of streams in same ISO group:
- Ensure all streams in ISO group have same target latency also for BAP
Client
- Determine rate matching to ISO group clock from RX times of all
streams in the group
- Based on this, compute nominal packet RX times, and feed them to
decode-buffer instead of the real RX time. This is enough for
sub-sample level sync.
- Customise buffer overrun handling for ISO so that it drops data to
arrive exactly at the target, for faster convergence at RX start
The ISO clock matching is done based on kernel-provided packet RX times,
so it has unknown offset from the actual ISO clock, probably a few ms.
Current kernels (6.17) do not provide anything better to use for the
clock matching, and doing it properly appears to be controller
vendor-defined (if possible at all).
Take resampler delay into account when computing the buffer fill level,
including the fractional part.
If decode-buffer is now fed nominal packet reference times in
write_packet(), it converges the total buffer + resampler latency to the
target at sub-sample accuracy.
This is needed for aligning RX of ISO streams in the same group, so that
e.g. stereo pair alignment is achieved even though the streams have
separate resamplers. Resampler phases get aligned via independent rate
matching.
The rate matching calculations are done in the system clock domain. If
the driver ticks at a different rate, the correction factor needs to be
adjusted by the rate_diff.
setup_matching() also needs to be before spa_bt_decode_buffer_process():
as follower we should use rate matching value calculated on the
*previous* cycle, because this is what driver is doing when it adjusts
it tick rate.
On production systems, having a constant high latency is favored over
dynamically adjusting it in order to optimize for low latency,
because every time a dynamic adjustment happens, there's a glitch.
This adds an option to let the user specify the exact amount of latency
they want.
The hardcoded latency of 512/<rate> is quite low on some ALSA devices.
Instead of forcing that latency onto the graph, just don't set it at all
unless it originates from the BAP presentation delay. That means that
the functionality remains the same for BAP but changes for A2DP to favor
the preferred quantum of the ALSA sink (or whatever is the driver).
Also, avoid setting an empty string ("") latency and rate in the cases
where it's not defined. This allows users to override those properties
through the wireplumber monitor rules if they need to.
Use kernel BT_PKT_SEQNUM (likely in Linux v6.17) to provide the ISO
packets sequence numbers. Fall back to counting packets if kernel is too
old to support the feature.
Pass zero-length packets to the codec. BAP/ISO may use these to indicate
missing data.
Fix A2DP codecs to not parse input with spa_return_val_if_fail, that's
meant for assertions. Just return -EINVAL directly, it's normal that
input data may contain garbage.
If packet sequence number jumps ahead, or we would underflow, use
codec-provided packet loss concealment to produce some audio data.
When we produce it during underflow, skip the corresponding number of
sequence numbers of future packets.
If codec doesn't have PLC, keep the previous behavior (pad with zeros,
buffering pauses to wait for data).
Change media-source to use sco-io for HFP codecs.
Replace sco-source with media-source.
sco-source is mostly copypaste from media-source, only differed in the
IO handling.
Use codecs via media_codec in sco-source instead of implementing the
decoding in-place.
Also slightly adjust media-source decode semantics.
In future, media-source could replace sco-source to reduce code
duplication.
When we simply need to change some state for the code executed in the
loop, we can use locked() instead of invoke(). This is more efficient
and avoids some context switches in the normal case.
Use kernel-provided packet reception timestamps to get less jitter in
packet timings. Mostly matters for ISO/SCO which have regular schedule.
A2DP (L2CAP) doesn't currently do RX timestamps in kernel, but we can as
well use the same mechanism for it.
As AG, set node.rate for output streams that originate from remote
source, so that graph switches rate as needed. This follows what
pipewire-pulse etc. do.
The patch is by Wim.
Encoders and some decoders have additional internal latency that needs
to be accounted for.
This mostly matters for AAC (~40ms), as the other BT codecs have much
lower delays (~5ms).
Interpolate buffer level to current playback position, and change its
definition so it directly corresponds to the total buffer latency. This
is also a bit simpler.
Now that BlueZ supports delay reporting in A2DP sink role, implement
that.
Report value that gives the total latency between packet reception and
audio rendering.
Also make Latency parameter in media-source to be not just a dummy
value.
Use TX timestamps to get accurate reading of queue length and latency on
kernel + controller side.
This is new kernel BT feature, so requires kernel with the necessary
patches, available currently only in bluetooth-next/master branch.
Enabling Poll Errqueue kernel experimental Bluetooth feature is also
required for this.
Use the latency information to mitigate controller issues where ISO
streams are desynchronized due to tx problems or spontaneously when some
packets that should have been sent are left sitting in the queue, and
transmission is off by a multiple of the ISO interval. This state is
visible in the latency information, so if we see streams in a group have
persistently different latencies, drop packets to resynchronize them.
Also make corrections if the kernel/controller queues get too long, so
that we don't have too big latency there.
Since BlueZ watches the same socket for errors, and TX timestamps arrive
via the socket error queue, we need to set BT_POLL_ERRQUEUE in addition
to SO_TIMESTAMPING so that BlueZ doesn't think TX timestamps are errors.
Link: https://github.com/bluez/bluez/issues/515
Link: https://lore.kernel.org/linux-bluetooth/cover.1710440392.git.pav@iki.fi/
Link: https://lore.kernel.org/linux-bluetooth/f57e065bb571d633f811610d273711c7047af335.1712499936.git.pav@iki.fi/
Buffer sizes smaller than one cycle are possible, so don't assert that.
Instead, just provide as much samples as fits to the buffer.
If we are driver when this happens, emit a warning (once). Similarly to
ALSA, as driver we produce only one buffer at cycle start, and no new
buffers in process. If the whole cycle doesn't fit into the buffer,
recording probably will be broken and we want some debug when there will
be a bug report about that.
The duplex polling issue was due to spa_loop_add_source failing
when source and sink were both using the same fd. We now dup, so the
issue no longer exists.
Remove the now unnecessary workaround, and check the return values from
spa_add_source.
Emit BAP device set nodes, which the session manager can use to combine
the sinks/sources of a device set to a single sink/source.
Emit the actual sinks/sources with media.class=.../Internal to hide them
from pipewire-pulse.
Add separate device set routes to the set leader device. Other routes
of the set members will be marked as unavailable when the set is active.
Accordingly, return failure for attempts to set these unavailable
routes, so that volumes etc. of the "internal" nodes are only controlled
via the device set route.
Drivers should only read the target_ values in the timeout, update the
timeout with the new duration and then update the position.
For the position we simply need to add the previous duration to the
position and then set the new duration + rate.
Otherwise, everything else should read the duration/rate and not use
the target_ values.
Allow asynchronous changes in transport state in the sinks/sources.
Also allow transport acquire to be actually synchronous, in this case it
must set transport state during acquire call.
Separate driver start/stop from transport start/stop.
Add some guards against doing processing when there has been an error or
the node is not started. Set error status to IO. Continue driving on IO
errors.
In media-sink, there's no need to set RCVBUF.
In media-source, we don't need to set NONBLOCK, as reads are done with
DONTWAIT. Don't set SNDBUF as it's not needed there. Don't set RCVBUF,
but use the (big) kernel default value: decode-buffer will handle any
overruns. Small values of RCVBUF might cause problems if kernel is
sending packets in a burst faster than we wake up.
On underflow in sources, pad with explicit silence. This avoids the
audioadapter from getting off sync from the cycle. That causes problems
as driver when we want to produce a buffer only a the start of the
cycle.
In some cases, it's also possible that the io already has buffer at the
start of the cycle when rate matching as driver. Currently, we don't
produce buffer in this case, but we should. Fix that by doing things in
the exact same way as ALSA sources do.
The maximum receive buffer target of 6 packets may be too small when
there's huge jitter in reception. Increase it so that we may use all
buffer available if needed (2*quantum_limit = 370 ms @ 44100).
For SCO, explicitly set maximum buffer to 40 ms, so that latency cannot
grow too large there. For A2DP duplex, set it to 80 ms for same reason.
These are close to the old 6*packet limit.