audioconvert: avoid even more precision loss in S32 to F32 conversion

There's really no point in doing that s25_32 intermediate step,
to be honest i don't have a clue why the original implementation
did that \_(ツ)_/¯.

Both `S25_SCALE` and `S32_SCALE` are powers of two,
and thus are both exactly representable as floats,
and reprocial of power-of-two is also exactly representable,
so it's not like that rescaling results in precision loss.

This additionally avoids right-shift, and thus is even faster.

As `test_lossless_s32_lossless_subset` shows,
if the integer is in the form of s25+shift,
the maximal absolute error became even lower,
but not zero, because F32->S32 still goes through S25 intermediate.
I think we could theoretically do better,
but then the clamping becomes pretty finicky,
so i don't feel like touching that here.
This commit is contained in:
Roman Lebedev 2024-06-14 06:05:18 +03:00
parent c517865864
commit f4c89b1b40
No known key found for this signature in database
GPG key ID: 083C3EBB4A1689E0
4 changed files with 26 additions and 37 deletions

View file

@ -335,7 +335,7 @@ conv_s32_to_f32d_1s_sse2(void *data, void * SPA_RESTRICT dst[], const void * SPA
float *d0 = dst[0];
uint32_t n, unrolled;
__m128i in;
__m128 out, factor = _mm_set1_ps(1.0f / S25_SCALE);
__m128 out, factor = _mm_set1_ps(1.0f / S32_SCALE);
if (SPA_IS_ALIGNED(d0, 16))
unrolled = n_samples & ~3;
@ -347,14 +347,13 @@ conv_s32_to_f32d_1s_sse2(void *data, void * SPA_RESTRICT dst[], const void * SPA
s[1*n_channels],
s[2*n_channels],
s[3*n_channels]);
in = _mm_srai_epi32(in, 7);
out = _mm_cvtepi32_ps(in);
out = _mm_mul_ps(out, factor);
_mm_store_ps(&d0[n], out);
s += 4*n_channels;
}
for(; n < n_samples; n++) {
out = _mm_cvtsi32_ss(factor, s[0]>>7);
out = _mm_cvtsi32_ss(factor, s[0]);
out = _mm_mul_ss(out, factor);
_mm_store_ss(&d0[n], out);
s += n_channels;