Commit graph

2 commits

Author SHA1 Message Date
Sascha Silbe
034b77823a remap: support S32NE work format
So far PulseAudio only supported two different work formats: S16NE if
it's sufficient to represent the input and output formats without loss
of precision and FLOAT32NE in all other cases. For systems that use
S32NE exclusively, this results in unnecessary conversions from S32NE to
FLOAT32NE and back again.

Add S32NE remap operations and make use of them (for the COPY and
TRIVIAL resamplers) if both input and output format are S32NE. This
avoids the back and forth conversions between S32NE and FLOAT32NE,
significantly improving performance for those cases.
2019-03-29 06:04:28 +00:00
Peter Meerwald
54a10eb915 remap: Add ARM NEON optimized remapping and rearrange code
v7:
* cleanups and reduce code; add 4->4 channels mappings, add rearrange code
v6:
* rename mono_to_stereo_float_neon_a9() to mono_to_stereo_float_arm_generic(); note that
Cortex-A8 and -A9/A15 are different, later chips do not benefit from NEON memory transfers
v5:
* 4-channel remapping
* use vrhadd instruction, fix int16 overflow for to-mono case
v4:
* fix for sample length < 4
v3:
* fix test code: init float and int map_table
* different code path for Cortex-A8 and later (-A9, A15, unknown)
* convert from intrinsics to inline assembly
v2:
* add ARM NEON stereo-to-mono remapping code
* static __attribute__ ((noinline)) is necessary to prevent inlining and
  work around gcc 4.6 ICE, see https://bugs.launchpad.net/bugs/936863
* call test code, the reference implementation is obtained using
  pa_get_init_remap_func()
* remove check for NEON flags
v1:
* ARM NEON mono-to-stereo remapping code

note that orig is the time of the special-case C implementation where available, not
the generic matric remapping implementation

on ARM Cortex-A8 (TI OMAP3 DM3730 @ 1GHz) (Linaro GCC 4.6):

Checking NEON remap (float, mono->stereo)
func: 757474 usec (avg: 7574.74, min = 6165, max = 11963, stddev = 1479.71).
orig: 784882 usec (avg: 7848.82, min = 6835, max = 17639, stddev = 1656.01).
Checking NEON remap (float, mono->4-channel)
func: 1545507 usec (avg: 15455.1, min = 6531, max = 30609, stddev = 2689.6).
orig: 2601413 usec (avg: 26014.1, min = 22796, max = 52979, stddev = 3281.84).
Checking NEON remap (s16, mono->stereo)
func: 343844 usec (avg: 3438.44, min = 1709, max = 8880, stddev = 1180.1).
orig: 474460 usec (avg: 4744.6, min = 4212, max = 7751, stddev = 1069.29).
Checking NEON remap (s16, mono->4-channel)
func: 736574 usec (avg: 7365.74, min = 3784, max = 11902, stddev = 1637.79).
orig: 1062772 usec (avg: 10627.7, min = 7630, max = 17517, stddev = 3011.44).
Checking NEON remap (float, stereo->mono)
func: 571412 usec (avg: 5714.12, min = 4608, max = 15808, stddev = 2131.7).
orig: 4356630 usec (avg: 43566.3, min = 41596, max = 52430, stddev = 2056.79).
Checking NEON remap (float, 4-channel->mono)
func: 1443202 usec (avg: 14432, min = 12298, max = 32349, stddev = 3300).
orig: 9273410 usec (avg: 92734.1, min = 81940, max = 184265, stddev = 23310).
Checking NEON remap (s16, stereo->mono)
func: 185761 usec (avg: 1857.61, min = 1556, max = 4975, stddev = 743.681).
orig: 1204776 usec (avg: 12047.8, min = 10711, max = 16022, stddev = 1596.88).
Checking NEON remap (s16, 4-channel->mono)
func: 482912 usec (avg: 4829.12, min = 4241, max = 9980, stddev = 1270.8).
orig: 1692050 usec (avg: 16920.5, min = 14679, max = 30060, stddev = 2760.7).
Checking NEON remap (float, 4-channel->4-channel)
func: 5324471 usec (avg: 53244.7, min = 49774, max = 87036, stddev = 4255.47).
orig: 73674628 usec (avg: 736746, min = 720338, max = 824128, stddev = 18361.8).
Checking NEON remap (s16, 4-channel->4-channel)
func: 5321320 usec (avg: 53213.2, min = 49591, max = 84443, stddev = 3931.49).
orig: 24122021 usec (avg: 241220, min = 233337, max = 291687, stddev = 9064.31).

Checking NEON remap (float, stereo rearrange)
func: 1116547 usec (avg: 11165.5, min = 9124, max = 27496, stddev = 3345.63).
orig: 1385011 usec (avg: 13850.1, min = 12237, max = 18005, stddev = 1793.05).
Checking NEON remap (s16, stereo rearrange)
func: 517027 usec (avg: 5170.27, min = 4577, max = 9735, stddev = 1215.23).
orig: 1208435 usec (avg: 12084.4, min = 10406, max = 25299, stddev = 2512.02).
Checking NEON remap (float, 4-channel rearrange)
func: 1564667 usec (avg: 15646.7, min = 13855, max = 20172, stddev = 1766.48).
orig: 2970000 usec (avg: 29700, min = 26215, max = 45654, stddev = 2351.07).
Checking NEON remap (s16, 4-channel rearrange)
func: 1088808 usec (avg: 10888.1, min = 9064, max = 23407, stddev = 2465.82).
orig: 1908416 usec (avg: 19084.2, min = 16968, max = 22705, stddev = 1637.46).

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
2014-05-25 18:13:27 +02:00