Commit graph

7 commits

Author SHA1 Message Date
Peter Meerwald
6d9891e8b9 sconv: Use optimized conversion function for both directions
for example, the conversion function for
convert_from_float32ne(PA_SAMPLE_S16LE) can also be used for
convert_to_s16ne(PA_SAMPLE_FLOAT32LE)

v2: ARM can potentially be big- or little endian; only apply
optimization on LE based on WORDS_BIGENDIAN #define (thanks, Tanu)

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
2014-08-10 11:59:28 +03:00
Peter Meerwald
3b7504495f sconv: Cleanup ARM NEON code
Fix compiler warning, code formatting

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
2014-04-27 22:04:14 +02:00
Peter Meerwald
736e041980 sconv: avoid multiply in ARM NEON s16->float conversion
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
2013-07-05 09:52:19 +03:00
Peter Meerwald
40c3c117e7 sconv: avoid multiply in ARM NEON float->s16 conversion
optimization idea taken from libavresample

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
2013-07-05 09:38:52 +03:00
Peter Meerwald
e66e846418 sconv: Change/fix conversion to/from float32
use (1<<15) instead of 0x7fff as a factor when converting from s16 to float32
use (1<<31) instead of 0x7fffffff as a factor when converting from s32 to float32

the change is motivated by the following desireable properties:
* s16_from_f32(f32_from_s16(x)) == x for all possible s16 values
* x / (1.0f << 15) == x * (1.0f / (1 << 15)) for all x in s16

above changes enable easier optimization while guaranteeing bit-exact results

further, other audio sample conversion code (libavresample) does it the same way

v3 (comments Tanu):
* fix saturation in pa_sconv_s16le_from_f32ne_neon(), use vqrshrn
v2 (comments Tanu):
* fix comments in ARM NEON code
* use llrintf() in pa_sconv_s32le_from_float32ne()

Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
Cc: Tanu Kaskinen <tanuk@iki.fi>
2013-02-04 12:07:14 +02:00
Arun Raghavan
1a8ec3c3e0 sconv: Fix NEON sconv rounding code
Rounding with 0.5 causes us to always round up for any value of the form
x.5. IEEE754 specifies round-to-nearest-even as the behaviour in this
case. This might not always be possible with NEON code, but this change
gets us much closer to it.
2012-10-29 13:13:39 +05:30
Peter Meerwald
1319c4533a core: Add ARM NEON optimized sample conversion code
final:
* includes some minor style fixes and build-time changes to allow
  building a single binary for neon and non-neon systems
v4:
* fix for sample length < 4
v3:
* convert from intrinsics to inline assembly
v2:
* load and store data with vld1/vld1q and vst1/vst1q, resp., to work
  around alignment issues of compiler-generated vldmia instruction
* remove redundant check for NEON flags

Ubuntu/Linaro gcc 4.6.3
arm-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon

runtime on beagle-xm:

D: [pulseaudio] sconv_neon.c: checking NEON sconv_s16le_from_float
I: [pulseaudio] sconv_neon.c: NEON: 3754 usec.
I: [pulseaudio] sconv_neon.c: ref: 58594 usec.
D: [pulseaudio] sconv_neon.c: checking NEON sconv_s16le_to_float
I: [pulseaudio] sconv_neon.c: NEON: 1831 usec.
I: [pulseaudio] sconv_neon.c: ref: 10528 usec.
I: [pulseaudio] sconv_neon.c: Initialising ARM NEON optimized conversions.

conversion may be off by one for some samples due to rounding issues
2012-10-29 12:49:37 +05:30