Our AVX optimizations are really AVX2 so rename the files and functions
and use the right HAVE_AVX2 and cpu flags to compile and select the
right functions.
Fixes#5072
Add avx mixer to test and benchmark
Rework and unroll the avx mixer some more.
The SSE one is 10 times faster than the C one, The AVX is 20 times
faster. The SSE2 function is 5 times faster than the C one.