v2 (comments by Paul Menzel):
* generate test samples from -1..1, -0x8000..0x7fff
* check all output samples (not just half of them)
the idea is to compare the output of the C (reference) implementation
against the output of the optimized code path; currently, there are MMX
and SSE implementation for the mono-to-stereo remapper for s16 and float
sample formats
Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
Cc: Paul Menzel <paulepanter@users.sourceforge.net>
SSE sconv was not tested before, only SSE2 was (on CPUs supporting both
instruction sets)
now both code path are tested on CPUs supporting both
Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
nsamples should be forced to be a multiple of channels; do so correctly
and don't make nsamples larger than it actually is
Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
With some optimised sconv implementations (read NEON), rounding
inaccuracy might lead to a difference of 1 with the reference
implementation. The inaccuracy is worth the performance gain.
Also increases floating-point accuracy while printing errors to make
errors easier to analyse.
final:
* includes some minor style fixes and build-time changes to allow
building a single binary for neon and non-neon systems
v4:
* fix for sample length < 4
v3:
* convert from intrinsics to inline assembly
v2:
* load and store data with vld1/vld1q and vst1/vst1q, resp., to work
around alignment issues of compiler-generated vldmia instruction
* remove redundant check for NEON flags
Ubuntu/Linaro gcc 4.6.3
arm-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon
runtime on beagle-xm:
D: [pulseaudio] sconv_neon.c: checking NEON sconv_s16le_from_float
I: [pulseaudio] sconv_neon.c: NEON: 3754 usec.
I: [pulseaudio] sconv_neon.c: ref: 58594 usec.
D: [pulseaudio] sconv_neon.c: checking NEON sconv_s16le_to_float
I: [pulseaudio] sconv_neon.c: NEON: 1831 usec.
I: [pulseaudio] sconv_neon.c: ref: 10528 usec.
I: [pulseaudio] sconv_neon.c: Initialising ARM NEON optimized conversions.
conversion may be off by one for some samples due to rounding issues
This allows us to test the sconv code with the incoming samples at
various byte alignments. The test is also now split into correctness and
performance checks.
This factors out the basic measurement code for each test into a
separate block so that each test can be broken down into a basic
correctness test, and a performance comparison with minimum effort.