Add a filter-graph info structure with the number of inputs and outputs
in the graph definition.
Use the input/outputs to update the number of channels on the capture and
playback streams when not explicitly given. Also copy over the positions
when they match the other stream and were not explicitly specified.
Fixes#4404
Add the overflow queues again. We can easily iterate atomically over the
overflow queues and flush them.
Overflowing a queue is quite common when heavy swapping is done and
should never cause a lockup, so allocate new queues as we need them. We
can share the eventfd with the main queue to avoid wastings fds.
The limit on the number of queues is then only for when concurrent
threads want to invoke things, so 128 is plenty enough.
When a queue overflows we place the queue back in the stack and try
again. Because it's at the top of the stack we take exactly the same
queue and keep on looping forever if the other thread is blocked for
some reason.
Instead, mark the queue as overflowed and only place it back in the
stack when we have flushed it.
This avoids a deadlock when the main-thread invokes on the data loop
and blocks and when the data loop invokes on the main-thread and
overflows the queue.
When we don't support EXPBUF according to the probe, the host will
allocate memory for us. We will then try to use USERPTR to import the
memory into v4l2.
If this is not supported, try to fall back to MMAP support, mmap the
buffers and memcpy the MMAP buffer to the host allocated buffers.
Instead of just probing with 2 buffers, probe with the MAX_BUFFERS and
remember how many buffers we received.
Some drivers (v4l2loopback) work with less than MAX_BUFFERS (8) and we
need to set the max number of buffers in the Buffers param.
Clean up route/profile building a bit so that it is easier to add new
device profiles.
Use names instead of magic numbers for the routes.
Fix marking BAP set input route unavailable by error due to magic number
off by one.
Don't use TSS to store per-thread queues but keep a lockfree stack of
queues. We can then pick off a queue and write to that one and place it
back after use.
We need to keep the queues indexed by id in the stack because otherwise
we would need to compare-and-swap 128 bits (pointer + tag), which is
more problematic.
Because we keep the queues in an array and no queue is ever removed and
the array can only grow, we can quite easily just iterate the array
without a lock. Without the lock we also fix one of the potential
problems with ardour where the queue_flush thread is canceled while
flushing and the queue_mutex remains locked.
Because we end up with all queues in the array now, we can overflow the
fixed max amount of queues we can manage. When that happens, sleep for a
while and try again. This is a case where more than QUEUES_MAX (128) threads
are invoking at the same time and is rather unlikely.
There is also the queue overflow case which we now also must handle with
a retry. This potentially uses more eventfds but again this should be
unlikely and cause no further problems.
See #4356
Iterate the channels in the inner loop instead of the outer loop. This
makes it handle with 0 channels better but also does the more
complicated phase increment code only once for all channels. Also the
filters might stay in the cache for each channel now.
This adds support for BAP Broadcast transport links.
Unlike unicast, broadcast links are used by a BAP Broadcast Sink
device to link together multiple transports in the same BIG that
the user wants to start receiving audio from. Each transport is
associated with a different BIS, so each one has a different fd.
Thus, each link needs to be acquired and released separately.
Add some padding to the delay buffer. If we wrap around, copy the
spilled samples to the front of the buffer. This makes it possible to
use the more optimized sse delay function in more cases.
In our current world, it is possible to have a negative delay. This
means that the stream should be delayed to sync with other streams.
The pulse-server sets negative delay and the Latency message can hold
those negative values so make sure we handle them in the helper
functions as well.
Do the delay calculations in pw_stream and JACK with signed values to
correctly handle negative values. Clamp JACK latency range to 0 because
negative latency is not supported in JACK.
We should also probably make sure we never end up with negative
latency, mostly in ALSA when we set a Latency offset, but that is
another detail.
The loop in the TSS gets an extra refcount and is unreffed when the TSS
destroy is called.
We can then also ref the queue during the function callback. When the
queue (thread) was destroyed during the callback, ignore the result and
continue with the next queues.
See #4356
When we clear we need to have all our queues removed from the TSS when
we delete the tss key or else they are leaked, check an warn about this
using a refcount of queued in the TSS.
See #4356
Make it possible to call loop_queue_destroy() from both the TSS destroy
and impl_clear() without races. We make sure that only one can remove
the queue from the queue list and cleanup. We also store the IN_TSS flag
in the flags so that we can see them before the queue is added to the
queue list. Only free the IN_TSS queue when the TSS destroy is called.
See #4356
Similarly as in spa_dbus, don't exit if bus goes down. Neither the
audio or midi Bluetooth backend reconnects to DBus, but they shouldn't
exit the process.
Use a wrap around delay ringbuffer. We can then avoid some modulo
arithmetic and read more efficiently.
Also handle the delay convolver case better by reversing the taps and
reading the taps and delay buffer without extra overhead.
Add a new followerClock block in the profiler info. This is only set
when the follower could be a driver and it contains the clock info used
for following the driver, mostly the rate difference and delay.
Dump this info in pw-profiler -J
Make sure we always set the info in the clock, especially also when we
are following.
When the follower doesn't produce enough data for this many attempts,
bail and cause an xrun to avoid an infinite loop.
The limit of 8 cause real-life problems and should be larger. It should
probably depend on the expected size per cycle (node.latency) and the
current quantum but we don't always have this information.
See #4334
Keep a running average and variance of the error. Use this to
periodically update the DLL bandwidth. When the variance gets smaller,
we update the DLL more slowly to stay closer to the ideal rate.
This seems to improve the rate stability.
We don't actually need the extra allocation for the tss. We can just
mark the queue as being in the tss. When a queue is destroyed, mark it
as destroyed but when it is still in the tss, don't free the structure
yet. We free the structure when we destroy the tss.
We can also free the overflow queues of a queue when it is destroyed
immediately.
The thread that calls the impl_clear method might be the main thread and
is certainly not going to call the invoke function anymore so free the
tss if there is any.
Fixes a leak in the unit test.
The properties of the card might overwrite those of the PCM.
For example, the cards's `alsa.id` will be set on the PCM too
since 37a51533e0 ("acp: add more properties for the card").
To avoid that, call `pa_alsa_init_proplist_card()` first
in `pa_alsa_init_proplist_pcm_info()` instead of last.
See #4135