Add an alternative avx2 s32_to_f32d implementation that doesn't use the
gather function for when gather is slow.
Don't overwrite the orinal cpu_flags but store the selected flags in a
new variable. Use this to debug the selected function cpu flags.
Build libraries with defines from previous libraries so that we can
reuse functions from them.
We can then remove the SSE2 | SLOW_GATHER function selection from the
list. We will now select avx2 and it will then switch implementations
based on the CPU flags.
Intel Skylake (level 0x16) is the first model with fast gather
opcodes. Mark lower versions with the SLOW_GATHER flag.
Prefer the SSE2 version of the format conversion without gather when
SLOW_GATHER is set. Makes the conversion much faster on my Ivy
Bridge.
uint32_t i;
for (i = 0; i < SPA_N_ELEMENTS(some_array); i++)
.. stuff with some_array[i].foo ...
becomes:
SPA_FOR_EACH_ELEMENT_VAR(some_array, p)
.. stuff with p->foo ..
So that we can reuse optimized versions in unoptimized noise
functions.
Do allocation a little different so that we can align everything
from the start.
Make a new noise method called PATTERN and use it to add a slow (every
1024 samples) repeating pattern of -1, 0.
Only use this method when we don't already use triangular dither.
See #2540
Make dither noise as a value between -0.5 and 0.5 and add this
to the scaled samples.
For this, we first need to do the scaling and then the CLAMP to
the target depth. This optimizes to the same code but allows us
to avoid under and overflows when we add the dither noise.
Add more dithering methods.
Expose a dither.method property on audioconvert. Disable dither when
the target depth > 16.
We need to do dithering and noise when converting f32 to the
target format. This is more natural because we can work in 32 bits
integers instead of floats.
This will also make it possible to actually calculate the error between
source and target values and implement some sort of feedback and
noise shaping later.
Pass some state to convert and channelmix functions. This makes it
possible to select per channel optimized convert functions but
also makes it possible to implement noise shaping later.
Pass the channelmix matrix and volume in the state.
Handle specialized 2 channel s16 -> f32 conversion
Add some const and SPA_RESTRICT to methods
When the input and output is the same, work in passthrough mode
where we simply copy place the input pointer onto the output buffer
without doing a memcpy.
Do memcpy when the resampler is not active.