This provides access to GNU C library-style endian and byteswap functions.
Windows doesn't provide pre-processor defines for endianness, but
all current Windows architectures (X32, X64, ARM) are little-endian.
This is somewhat similar to the S32->F32 conversion improvements,
but here things a bit more tricky...
The main consideration is that the limits to which we clamp
must be valid 32-bit signed integers, but not all such integers
are exactly losslessly representable in `float32_t`.
For example it we'd clamp to `2147483647`,
that is actually a `2147483648.0f`,
and `2147483648` is not a valid 32-bit signed integer,
so the post-clamp conversion would basically be UB.
We don't have this problem for negative bound, though.
But as we know, any 25-bit signed integer is losslessly
round-trippable through float32_t, and since multiplying by 2
only changes the float's exponent, we can clamp to `2147483520`!
The algorithm of selection of the pre-clamping scale is unaffected.
This additionally avoids right-shift, and thus is even faster.
As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error is finally zero.
Without going through `float`->`double`->`int`,
i'm not sure if the `float`->`int` conversion
can be improved further.
There's really no point in doing that s25_32 intermediate step,
to be honest i don't have a clue why the original implementation
did that \_(ツ)_/¯.
Both `S25_SCALE` and `S32_SCALE` are powers of two,
and thus are both exactly representable as floats,
and reprocial of power-of-two is also exactly representable,
so it's not like that rescaling results in precision loss.
This additionally avoids right-shift, and thus is even faster.
As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error became even lower,
but not zero, because F32->S32 still goes through S25 intermediate.
I think we could theoretically do better,
but then the clamping becomes pretty finicky,
so i don't feel like touching that here.
At the very least, we should go through s25_32 intermediate
instead of s24_32, to avoid needlessly loosing 1 LSB precision bit.
That being said, i suspect it's still not doing the right thing.
Why are we silently dropping those 7 LSB bits?
Is that really the way to do it?
At the very least, we should go through s25_32 intermediate
instead of s24_32, to avoid needlessly loosing 1 LSB precision bit.
FIXME: the noise codepath is not covered with tests.
The largest integer that 32-bit floating point can exactly represent
is actually `(2^24)-1`, not`(2^23)-1` like the code assumes.
This means, whenever we use s24 as an intermediate step
to go between f32 and s32, we lose a bit of precision.
s25_32 is really a i32 with highest byte always being a sign byte.
Printing was done by adding
```
for(int e = 0; e != 13; ++e)
fprintf(stderr, "%16.32e,", ((float*)m1)[e]);
```
to `compare_mem`. I don't like how these tests work.
https://godbolt.org/z/abe94sedT
uint32_t i;
for (i = 0; i < SPA_N_ELEMENTS(some_array); i++)
.. stuff with some_array[i].foo ...
becomes:
SPA_FOR_EACH_ELEMENT_VAR(some_array, p)
.. stuff with p->foo ..
So that we can reuse optimized versions in unoptimized noise
functions.
Do allocation a little different so that we can align everything
from the start.
Make a new noise method called PATTERN and use it to add a slow (every
1024 samples) repeating pattern of -1, 0.
Only use this method when we don't already use triangular dither.
See #2540
Tweak the conversion constants a bit so that they handle the
extreme ranges a bit better.
Align the C and vector instructions.
Reactivate the unit test asserts when a conversion fails.
Make a new uint42_t and int24_t type and use that to handle 24 bits
samples. This makes it easier because we can iterate and copy the
structs like other types.
Make dither noise as a value between -0.5 and 0.5 and add this
to the scaled samples.
For this, we first need to do the scaling and then the CLAMP to
the target depth. This optimizes to the same code but allows us
to avoid under and overflows when we add the dither noise.
Add more dithering methods.
Expose a dither.method property on audioconvert. Disable dither when
the target depth > 16.
We need to do dithering and noise when converting f32 to the
target format. This is more natural because we can work in 32 bits
integers instead of floats.
This will also make it possible to actually calculate the error between
source and target values and implement some sort of feedback and
noise shaping later.
pipewire will allocate buffers aligned to the max alignment required for
the CPU. Take this into account and don't expect larger alignment.
Fixes a warning in mixer-dsp when the CPU max alignment is 16 but the
plugin requires 32 bytes alignment for the AVX2 path (that would never
be chosen on the CPU).
See #2074