This is somewhat similar to the S32->F32 conversion improvements,
but here things a bit more tricky...
The main consideration is that the limits to which we clamp
must be valid 32-bit signed integers, but not all such integers
are exactly losslessly representable in `float32_t`.
For example it we'd clamp to `2147483647`,
that is actually a `2147483648.0f`,
and `2147483648` is not a valid 32-bit signed integer,
so the post-clamp conversion would basically be UB.
We don't have this problem for negative bound, though.
But as we know, any 25-bit signed integer is losslessly
round-trippable through float32_t, and since multiplying by 2
only changes the float's exponent, we can clamp to `2147483520`!
The algorithm of selection of the pre-clamping scale is unaffected.
This additionally avoids right-shift, and thus is even faster.
As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error is finally zero.
Without going through `float`->`double`->`int`,
i'm not sure if the `float`->`int` conversion
can be improved further.
There's really no point in doing that s25_32 intermediate step,
to be honest i don't have a clue why the original implementation
did that \_(ツ)_/¯.
Both `S25_SCALE` and `S32_SCALE` are powers of two,
and thus are both exactly representable as floats,
and reprocial of power-of-two is also exactly representable,
so it's not like that rescaling results in precision loss.
This additionally avoids right-shift, and thus is even faster.
As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error became even lower,
but not zero, because F32->S32 still goes through S25 intermediate.
I think we could theoretically do better,
but then the clamping becomes pretty finicky,
so i don't feel like touching that here.
At the very least, we should go through s25_32 intermediate
instead of s24_32, to avoid needlessly loosing 1 LSB precision bit.
That being said, i suspect it's still not doing the right thing.
Why are we silently dropping those 7 LSB bits?
Is that really the way to do it?
At the very least, we should go through s25_32 intermediate
instead of s24_32, to avoid needlessly loosing 1 LSB precision bit.
FIXME: the noise codepath is not covered with tests.
The largest integer that 32-bit floating point can exactly represent
is actually `(2^24)-1`, not`(2^23)-1` like the code assumes.
This means, whenever we use s24 as an intermediate step
to go between f32 and s32, we lose a bit of precision.
s25_32 is really a i32 with highest byte always being a sign byte.
Printing was done by adding
```
for(int e = 0; e != 13; ++e)
fprintf(stderr, "%16.32e,", ((float*)m1)[e]);
```
to `compare_mem`. I don't like how these tests work.
https://godbolt.org/z/abe94sedT
Tweak the conversion constants a bit so that they handle the
extreme ranges a bit better.
Align the C and vector instructions.
Reactivate the unit test asserts when a conversion fails.
Mark some structures, arrays static/const at various places.
In some cases this prevents unnecessary initialization
when a function is entered.
All in all, the text segments across all shared
libraries are reduced by about 2 KiB. However,
the total size increases by about 2 KiB as well.
This makes installed-tests (see commit b852b58f) do the right thing.
For build-time testing, spa/plugins/audioconvert/meson.build overrides
this with the SPA_PLUGIN_DIR environment variable, and for ad-hoc
testing by developers, pw-uninstalled.sh sets the necessary variables.
Signed-off-by: Simon McVittie <smcv@debian.org>
Pass some state to convert and channelmix functions. This makes it
possible to select per channel optimized convert functions but
also makes it possible to implement noise shaping later.
Pass the channelmix matrix and volume in the state.
Handle specialized 2 channel s16 -> f32 conversion
Improve the allocators to always align the buffer memory to the
requested alignment
Use aligned read and writes for sse functions and check alignment,
optionally falling back to unaligned path.
Add more tests and benchmark cases
Check and warn for misaligned memory in plugins.