Add a _fast callback function that skips the version and method check.
We can use this in places where performance is critical when we do the
check out of the critical loops.
Make all system methods _fast calls. We expect them to exist and have
the right version. If we add new versions we can make them slow.
Add check for running the the loop context and thread.
Add checks in filter and stream to avoid doing things when not run from
the context main-loop because this can crash things when doing IPC from
concurrent threads.
On glibc, `pthread_t` is `unsigned long int` while on musl
it has a pointer type. To avoid format string warnings,
cast it to `void *` and use the `%p` format specifier.
We can only write from one thread to the ringbuffer so bypass the
ringbuffer when doing in-thread invoke. Only flush the current
items so that out-of-thread items don't get inserted.
Always append the item to the ringbuffer, even if we are invoking from
the thread itself. This ensure all items are always invoked in the
right order.
If we invoke from the thread, flush all items of the ringbuffer and
return.
Make sure to set the callback to NULL before invoking so that recursive
invoke doesn't call it again.
When while flushing the items we get a recursive invoke, detect this
with a counter and return immediately.
Mostly useful for when invoking from the thread itself so that the new
invoke item is executed before new items are added.
Imagine this case with module-loopback:
- data-loop goes into the capture process function
- mainloop invokes node remove of capture and waits
- data-loop invokes trigger -> node remove is first executed, mainloop
is woken up
- mainloop continues
- mainloop invokes remove of playback and waits
- data-loop continues flushing the ringbuffer -> playback remove is
executed, mainloop wakes up
- mainloop continues destroying items, frees playback
and capture streams
- data-loop finaly gets to flush the trigger and crashes because
streams are gone.
When we try to read one of the events and there was an error, don't
signal the callback. If the error is something else than EAGAIN log
a warning.
Especially for timerfd, EAGAIN can happen when the timer changed
while polling. This can happen when running the profiler because it
polls and updates the timer from different threads.
Register a pthread cleanup handler to guarantee
that `spa_source::{priv, rmask}` are cleared even
if the thread is cancelled while the loop is dispatching.
This is necessary, otherwise `spa_source::priv` could point
to the stack of the cancelled thread, which will lead to
problems like this later:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f846b025be2 in detach_source (source=0x7f845f435f60) at ../spa/plugins/support/loop.c:144
144 e->data = NULL;
[Current thread is 1 (LWP 5274)]
(gdb) p e
$1 = (struct spa_poll_event *) 0x7f845e297820
(gdb) bt
#0 0x00007f846b025be2 in detach_source (source=0x7f845f435f60) at ../spa/plugins/support/loop.c:144
#1 0x00007f846b0276ad in free_source (s=0x7f845f435f60) at ../spa/plugins/support/loop.c:359
#2 0x00007f846b02a453 in loop_destroy_source (object=0x7f845f3af478, source=0x7f845f435f60) at ../spa/plugins/support/loop.c:786
#3 0x00007f846b02a886 in impl_clear (handle=0x7f845f3af478) at ../spa/plugins/support/loop.c:859
#4 0x00007f846b172f40 in unref_handle (handle=0x7f845f3af450) at ../src/pipewire/pipewire.c:211
#5 0x00007f846b173579 in pw_unload_spa_handle (handle=0x7f845f3af478) at ../src/pipewire/pipewire.c:346
#6 0x00007f846b15a761 in pw_loop_destroy (loop=0x7f845f434e30) at ../src/pipewire/loop.c:159
#7 0x00007f846b135d8e in pw_data_loop_destroy (loop=0x7f845f434cb0) at ../src/pipewire/data-loop.c:166
#8 0x00007f846b12c31c in pw_context_destroy (context=0x7f845f41c690) at ../src/pipewire/context.c:485
#9 0x00007f846b3ddf9e in jack_client_close (client=0x7f845f3c1030) at ../pipewire-jack/src/pipewire-jack.c:3481
...
When we leave the last recursive enter of the loop, clear the polling
flag.
It might be possible that it was not cleared because the loop might have
been killed with pthread_kill. In any case, the _leave calls need to be
made in this case as well.
This fixes issues when jack clients stop because it triggers and assert
because the polling flag is still active when the object is cleared.
See !1171
The core of the issue is the following: what happens if an
active source is destroyed before it could be dispatched?
For loop-managed sources (`struct source_impl`) this was addressed
by storing all destroyed sources in a list, and only freeing them
after dispatching has been finished. (0eb73f0f06)
This approach works for both strictly single-threaded
and `pw_thread_loop` loops assuming the loop is not
reentered.
However, if the loop is reentered, there can still be issues.
Assume that in one iteration sources A and B are active,
and returned from the system call, and source B is destroyed
before the loop starts dispatching. Consider what happens when
"A" is dispatched first, and it reenters the loop with timeout 0.
Imagine there are no new events, so `loop_iterate()` will immediately
return, but it will first destroy everything in the destroy list
(this is done at the end of `loop_iterate()`).
And herein lies the problem. In the previous iteration,
there exists a `spa_poll_event` object which points to source "B",
but that has just been destroyed at the end of the recursive
iteration. This will trigger a use-after-free once the previous
iteration inspects it.
Fix that by processing the destroy list right after first
processing the returned `spa_poll_event` objects, and
"detach" the source from the loop and its iterations
in `process_destroy()` before the source is destroyed.
See #2114#2147
It may be a little confusing that both the loop object
and the `source_impl` objects are referred to with variables
named `impl`. For this reason, rename all source_impl objects
named `impl` to `s`.
It is expected that `nfds` is non-negative in the vast majority
of cases, so hopefully the runtime performance will not be
significantly affected by removing the check. This way
it is guaranteed that the destroy list is processed.
This reverts commit c474846c42.
In addition, `s->loop` is also checked before dispatching a source.
The destroy list is needed in the presence of threads. The
issue is that a source may be destroyed between `epoll_wait()`
returning and thread loop lock being acquired. If this
source is active, then a use-after-free will be triggered
when the thread loop acquires the lock and starts dispatching
the sources.
thread 1 thread 2
---------- ----------
loop_iterate
spa_loop_control_hook_before
// release lock
pw_thread_loop_lock
spa_system_pollfd_wait
// assume it returns with source A
pw_loop_destroy_source(..., A)
// frees storage of A
pw_thread_loop_unlock
spa_loop_control_hook_after
// acquire the lock
for (...) {
struct spa_source *s = ep[i].data;
s->rmask = ep[i].events;
// use-after-free if `s` refers to
// the previously freed `A`
Fixes#2147
Add an extra private field to the source to store the pollevent of
the current iteration. This changes ABI but it seems an embedded source
is not used outside of our own plugins and the unit test doesn't test
this ABI case.
Whenever a source is removed, we can set the data field of the
pollevent to NULL so that it won't be handled in any iteration anymore.
Avoid dispatching the same event multiple times when doing recursive
iterations.
Add some more unit tests for this.
Fixes#2114
This commit adds a counter for loop_enter/leave and checks:
- consecutive enter are used on the same thread
- leave is used on the same thread as enter
- at destruction, the enter_count must be 0
Now that sources can't be dispatched anymore after a _remove, we don't
need to keep the destroy_list anymore and we can free the source
immediately.
See #2114
Keep the array of dispatched sources around in the loop. When a source
is removed while dispatching, set the data to NULL so that we don't try
to deref the source again or call its function.
Fixes#2114
The ringbuffer can't be written to from multiple threads.
When both the main loop and data thread do _invoke, they both write to
the ringbuffer and cause it to be corrupted because the ringbuffer is
not multi-writer safe.
Doing invoke from the thread itself is usually done to flush things out
so we really only need to flush the ringbuffer and call the callback.
See #1451
First calculate the size of the aligned payload and then check if
we can fit this aligned payload in the remaining space in the
ringbuffer.
Otherwise we might be able to fit the item + payload in the remaining
space but then place the alignment bytes at the begginning, which would
break alignment of the next invoke_item struct.
Also check if there is enough space to write the payload bytes.
We check if there is enough space for the invoke_item structure first.
Then we calculate how much bytes we need to use for the payload but we
fail to check if we can actually write that much data, risking
overwriting existing data from the ringbuffer and causing a crash later
when we try to jump to invalid memory.
Add some more comments.
SPA_MEMBER is misleading, all we're doing here is pointer+offset and a
type-casting the result. Rename to SPA_PTROFF which is more expressive (and
has the same number of characters so we don't need to re-indent).