final:
* includes some minor style fixes and build-time changes to allow
building a single binary for neon and non-neon systems
v4:
* fix for sample length < 4
v3:
* convert from intrinsics to inline assembly
v2:
* load and store data with vld1/vld1q and vst1/vst1q, resp., to work
around alignment issues of compiler-generated vldmia instruction
* remove redundant check for NEON flags
Ubuntu/Linaro gcc 4.6.3
arm-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon
runtime on beagle-xm:
D: [pulseaudio] sconv_neon.c: checking NEON sconv_s16le_from_float
I: [pulseaudio] sconv_neon.c: NEON: 3754 usec.
I: [pulseaudio] sconv_neon.c: ref: 58594 usec.
D: [pulseaudio] sconv_neon.c: checking NEON sconv_s16le_to_float
I: [pulseaudio] sconv_neon.c: NEON: 1831 usec.
I: [pulseaudio] sconv_neon.c: ref: 10528 usec.
I: [pulseaudio] sconv_neon.c: Initialising ARM NEON optimized conversions.
conversion may be off by one for some samples due to rounding issues
This allows us to test the sconv code with the incoming samples at
various byte alignments. The test is also now split into correctness and
performance checks.
This factors out the basic measurement code for each test into a
separate block so that each test can be broken down into a basic
correctness test, and a performance comparison with minimum effort.