Commit graph

1141 commits

Author SHA1 Message Date
Wim Taymans
fb2b314660 audioconvert: interchange the resampler loops
Iterate the channels in the inner loop instead of the outer loop. This
makes it handle with 0 channels better but also does the more
complicated phase increment code only once for all channels. Also the
filters might stay in the cache for each channel now.
2024-10-30 13:31:24 +01:00
Wim Taymans
2c132be626 audioconvert: align some buffers
so that we can use aligned read and writes in SSE.
2024-10-23 12:54:23 +02:00
Wim Taymans
662bf68122 audioconvert: handle odd writes in delay
Add some padding to the delay buffer. If we wrap around, copy the
spilled samples to the front of the buffer. This makes it possible to
use the more optimized sse delay function in more cases.
2024-10-23 12:34:04 +02:00
Wim Taymans
3309e0b244 audioconvert: don't unroll when unaligned write pointer
We require the write pointer to be a multiple of 4 for our unrolled SSE
loop to work, so enforce this to avoid segfaults.
2024-10-23 11:40:34 +02:00
Wim Taymans
2c0ce6afc2 audioconvert: SSE optimize delay and convolver 2024-10-15 16:10:25 +02:00
Wim Taymans
33fb2f04c7 audioconvert: use MAX_TAPS constant 2024-10-15 12:49:16 +02:00
Wim Taymans
ad84c45c0c audioconvert: optimize lr4 handling
Make an SSE optimized LR4 function and one that handles 2 LR4s in
parallel.
2024-10-15 12:20:04 +02:00
Wim Taymans
fec3730489 audioconvert: avoid a memcpy when we can 2024-10-15 12:19:17 +02:00
Wim Taymans
a57f2f25b6 delay: improve delay performance
Use a wrap around delay ringbuffer. We can then avoid some modulo
arithmetic and read more efficiently.

Also handle the delay convolver case better by reversing the taps and
reading the taps and delay buffer without extra overhead.
2024-10-15 12:14:57 +02:00
Wim Taymans
8cd3fc6922 adapter: increase max-retry to 64
When the follower doesn't produce enough data for this many attempts,
bail and cause an xrun to avoid an infinite loop.

The limit of 8 cause real-life problems and should be larger. It should
probably depend on the expected size per cycle (node.latency) and the
current quantum but we don't always have this information.

See #4334
2024-10-07 09:28:39 +02:00
sunyuechi
245adda985 fmt-ops: add RVV optimizations for s32_to_f32d 2024-09-29 11:17:42 +08:00
sunyuechi
79d41e183e fmt-ops: add RVV optimizations for f32d_to_s32 2024-09-26 00:55:49 +08:00
sunyuechi
74832445ba fmt-ops: add RVV optimizations for s16_to_f32d 2024-09-25 10:50:05 +00:00
sunyuechi
588f2bcb69 RISCV: Improve scalar computation of f32d_to_s16 2024-09-25 10:50:05 +00:00
sunyuechi
8a8843ba20 fmt-ops: add RVV optimizations for f32d_s16 2024-09-23 08:10:43 +00:00
sunyuechi
852de6c35c fmt-ops: add RVV optimizations for f32d_s16d 2024-09-23 08:10:43 +00:00
sunyuechi
62ec61a3bb benchmark-fmt-ops: Fix test arg for f32_s16 2024-09-20 16:02:22 +08:00
sunyuechi
d932e52d5b fmt-ops: add R-V V optimizations for f32_s16 2024-09-18 10:40:48 +00:00
Wim Taymans
e2991f6398 json: add helper function to parse channel positions
Use the helper instead of duplicating the same code.

Also add some helpers to parse a json array of uint32_t

Move some functions to convert between type name and id.
2024-09-18 09:54:34 +02:00
Wim Taymans
563186eff6 adapter: also forward the RequestProcess events 2024-09-16 17:01:50 +02:00
Wim Taymans
ce390d5b22 spa: add spa_json_object_next
This gets the next key and value from an object. This function is better
because it will skip key/value pairs that don't fit in the array to hold
the key.

The previous code patter would stop parsing the object as soon as a key
larger than the available space was found.
2024-09-16 09:50:36 +02:00
Wim Taymans
cd81b5f39a spa: add spa_json_begin_array/object and relaxed versions
Add spa_json_begin_array/object to replace
spa_json_init+spa_json_begin_array/object

This function is better because it does not waste a useless spa_json
structure as an iterator. The relaxed versions also error out when the
container is mismatched because parsing a mismatched container is not
going to give any results anyway.
2024-09-16 09:50:33 +02:00
Wim Taymans
5c2b5fa552 audioadapter: clear the handle as well to avoid leaks 2024-09-09 13:42:44 +02:00
Wim Taymans
ffed9763fd audioadapter: improve convert plugin loader
Use the converter in the current plugin when no plugin loader was given
to make the unit tests work.
2024-09-09 13:26:45 +02:00
Wim Taymans
4d2cdd6da3 audioadapter: dynamically load the audio converter
So that we can plug in other implementations. Also handle the cases
where we can't load a converter.
2024-09-06 17:30:58 +02:00
Wim Taymans
b4c8627a62 audioadapter: improve format negiotiation
First try to pass the format of the converter directly into the
follower. This allows us to avoid conversion when it can be avoided.

Iterate all follower formats (not just the first one) to find something
that intersects with the converter formats.
2024-09-06 15:08:31 +02:00
Wim Taymans
9fb14be4e3 adapter: improve format parsing some more 2024-09-06 15:06:31 +02:00
Wim Taymans
c5a7f30a68 audioadapter: use generic audio format parsing
We don't need to use the raw audio format parsing functions, we can use
the more generic audio ones. This avoids some extra parsing for the
media type and subtype and will support compressed audio formats
as well when the converter handles this.
2024-09-06 14:46:34 +02:00
Wim Taymans
f6803d4c03 audioadapter: pass the config mode around
When we are working in convert mode, configure the converter to convert
mode as well instead of DSP.
2024-09-06 11:20:25 +02:00
Wim Taymans
7036fc76e0 audioadapter: handle port flags better
Save the convert and follower port flags and use them in buffer
allocation.
2024-09-05 12:26:30 +02:00
Wim Taymans
cbbf37c3b8 audioadapter: move some checks around
Move the check for the follower==target to the negotiate functions.

Refer to the target when doing operations. The converter reference
is just some internal element that may or may not be active at the
moment. If we have multiple converter elements, the current active
one will be in target.
2024-09-02 15:18:29 +02:00
Wim Taymans
82e4b9a213 audioadapter: remove redundant statement
The same check is done a little later.
2024-09-02 11:51:40 +02:00
Arun Raghavan
70a7bae5d7 resampler: Precompute some common filter coefficients
While this is quite fast on x86 (order of a few microseconds), the
computation can take a few milliseconds on ARM (measured at 1.9ms (32000
-> 48000) and 3.3ms (32000 -> 44100) on a Cortex A53).

Let's precompute some common rates so that we can avoid this overhead on
each stream (or any other audioconvert) instantiation. The approach
taken here is to write a little program to create the resampler
instance, and run that on the host at compile-time to generate some
common rate conversions.
2024-08-08 00:30:24 -04:00
Wim Taymans
40cd8535eb audioconvert: only accept UMP on the control port 2024-07-30 09:38:40 +02:00
Wim Taymans
61dcd8dede audioconvert: set IO_Buffers only when buffers are negotiated
The IO_Buffers is used in the data thread to check if the port should be
scheduled or not. Make sure it is only set after we set buffers on the
port and cleared before the buffers are cleared.

Make sure we sync the port->io with the data thread.

See #4094
2024-07-29 18:15:06 +02:00
David Coles
5d7624001d Add spa/utils/endian.h
This provides access to GNU C library-style endian and byteswap functions.

Windows doesn't provide pre-processor defines for endianness, but
all current Windows architectures (X32, X64, ARM) are little-endian.
2024-07-01 15:28:58 +00:00
Wim Taymans
c94d5ed215 tests: don't iterate all possible values
Or else the valgrind unit test times out.
2024-07-01 17:20:25 +02:00
Roman Lebedev
7c40cafa7c
audioconvert: avoid even more precision loss in F32 to S32 conversion
This is somewhat similar to the S32->F32 conversion improvements,
but here things a bit more tricky...

The main consideration is that the limits to which we clamp
must be valid 32-bit signed integers, but not all such integers
are exactly losslessly representable in `float32_t`.

For example it we'd clamp to `2147483647`,
that is actually a `2147483648.0f`,
and `2147483648` is not a valid 32-bit signed integer,
so the post-clamp conversion would basically be UB.
We don't have this problem for negative bound, though.

But as we know, any 25-bit signed integer is losslessly
round-trippable through float32_t, and since multiplying by 2
only changes the float's exponent, we can clamp to `2147483520`!
The algorithm of selection of the pre-clamping scale is unaffected.

This additionally avoids right-shift, and thus is even faster.

As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error is finally zero.

Without going through `float`->`double`->`int`,
i'm not sure if the `float`->`int` conversion
can be improved further.
2024-06-27 19:41:20 +03:00
Roman Lebedev
f4c89b1b40
audioconvert: avoid even more precision loss in S32 to F32 conversion
There's really no point in doing that s25_32 intermediate step,
to be honest i don't have a clue why the original implementation
did that \_(ツ)_/¯.

Both `S25_SCALE` and `S32_SCALE` are powers of two,
and thus are both exactly representable as floats,
and reprocial of power-of-two is also exactly representable,
so it's not like that rescaling results in precision loss.

This additionally avoids right-shift, and thus is even faster.

As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error became even lower,
but not zero, because F32->S32 still goes through S25 intermediate.
I think we could theoretically do better,
but then the clamping becomes pretty finicky,
so i don't feel like touching that here.
2024-06-27 19:41:20 +03:00
Roman Lebedev
c517865864
audioconvert: somewhat avoid precision loss in S32 to F32 conversion
At the very least, we should go through s25_32 intermediate
instead of s24_32, to avoid needlessly loosing 1 LSB precision bit.

That being said, i suspect it's still not doing the right thing.
Why are we silently dropping those 7 LSB bits?
Is that really the way to do it?
2024-06-27 19:41:20 +03:00
Roman Lebedev
175d533b56
audioconvert: somewhat avoid precision loss in F32 to S32 conversion
At the very least, we should go through s25_32 intermediate
instead of s24_32, to avoid needlessly loosing 1 LSB precision bit.

FIXME: the noise codepath is not covered with tests.
2024-06-27 19:41:20 +03:00
Roman Lebedev
2a035ac49e
audioconvert: introduce s25_32 type, f32<->s25 cast is lossless
The largest integer that 32-bit floating point can exactly represent
is actually `(2^24)-1`, not`(2^23)-1` like the code assumes.
This means, whenever we use s24 as an intermediate step
to go between f32 and s32, we lose a bit of precision.

s25_32 is really a i32 with highest byte always being a sign byte.

Printing was done by adding
```
for(int e = 0; e != 13; ++e)
fprintf(stderr, "%16.32e,", ((float*)m1)[e]);
```
to `compare_mem`. I don't like how these tests work.

https://godbolt.org/z/abe94sedT
2024-06-27 19:41:20 +03:00
Wim Taymans
9d1d1fcbef impl-port: add port.group property
Can be used to group ports together. Mostly because they are all from
the same stream and split into multiple ports by audioconvert/adapter.

Also useful for the alsa sequence to group client ports together.

Also interesting when pw-filter would be able to handle streams in the
future to find out what ports belong to what streams.
2024-06-24 13:38:09 +02:00
Wim Taymans
f7d59bcea7 fix compilation some more
The math M_*f symbols are GNU extensions.
2024-06-18 15:41:12 +02:00
Wim Taymans
1ae4374ccf Fix compilation with -Werror=float-conversion
Better make the conversions explicit so that we don't get any surprises.

Fixes #4065
2024-06-18 12:17:56 +02:00
Wim Taymans
b421331275 doc: clarify the dither.noise
Fixes #4057
2024-06-13 11:38:26 +02:00
Diego Viola
7410755c03 Fix typos
found them with codespell.

Signed-off-by: Diego Viola <diego.viola@gmail.com>
2024-05-22 09:19:34 +02:00
Wim Taymans
e1e0a886d5 stream: improve async handling
We can remove most of the special async handling in adapter, filter and
stream because this is now handled in the core.

Add a node.data-loop property to assign the node to a named data-loop.

Assign the non-rt stream and filter to the main loop. This means that
the node fd will be added to the main-loop and will be woken up directly
without having to wake up the RT thread and invoke the process callback
in the main-loop first. Because non-RT implies async, we can do all of
this like we do our rt processing because the output will only be used
in the next cycle.
2024-04-18 15:20:07 +02:00
Wim Taymans
b97c6e2eac audioconvert: also clamp monitor volume to min/max
When we set a min/max value, also clamp the monitor volume to it.

Fixes #3962
2024-04-15 16:28:24 +02:00
Pauli Virtanen
e784de3933 spa: use log topics everywhere
Use log topics properly everywhere, convert from "#define NAME".
2024-03-11 18:45:21 +02:00