Add an alternative avx2 s32_to_f32d implementation that doesn't use the
gather function for when gather is slow.
Don't overwrite the orinal cpu_flags but store the selected flags in a
new variable. Use this to debug the selected function cpu flags.
Build libraries with defines from previous libraries so that we can
reuse functions from them.
We can then remove the SSE2 | SLOW_GATHER function selection from the
list. We will now select avx2 and it will then switch implementations
based on the CPU flags.
pipewire will allocate buffers aligned to the max alignment required for
the CPU. Take this into account and don't expect larger alignment.
Fixes a warning in mixer-dsp when the CPU max alignment is 16 but the
plugin requires 32 bytes alignment for the AVX2 path (that would never
be chosen on the CPU).
See #2074