pulseaudio

mirrors/pulseaudio

Fork 0

mirror of https://gitlab.freedesktop.org/pulseaudio/pulseaudio.git synced 2025-11-29 06:59:51 -05:00

Commit graph

Author	SHA1	Message	Date
Khem Raj	3450d1fcfe	remap/arm: Adjust inline asm constraints gcc10 can effectively emit single precision registers if right operand modifier constraint is not in use This results in assembler rejecting the code /tmp/ccEG4QpI.s:646: Error: VFP/Neon double precision register expected -- `vtbl.8 d3,{d0,d1},s8' /tmp/ccEG4QpI.s:678: Error: invalid instruction shape -- `vmul.f32 d0,d0,s8' Therefore add %P qualifier to request double registers sinece 'w' could mean variable could be stored in s0..s14 and GCC defaults to printing out s0..s14. Note those registers map to d0..d7 also. Output generated is exactly same with gcc9, and it also now compiles with gcc10 Its not documented well in gcc docs and there is a ticket for that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343 Signed-off-by: Khem Raj <raj.khem@gmail.com>	2020-03-05 13:11:27 -08:00
Sascha Silbe	034b77823a	remap: support S32NE work format So far PulseAudio only supported two different work formats: S16NE if it's sufficient to represent the input and output formats without loss of precision and FLOAT32NE in all other cases. For systems that use S32NE exclusively, this results in unnecessary conversions from S32NE to FLOAT32NE and back again. Add S32NE remap operations and make use of them (for the COPY and TRIVIAL resamplers) if both input and output format are S32NE. This avoids the back and forth conversions between S32NE and FLOAT32NE, significantly improving performance for those cases.	2019-03-29 06:04:28 +00:00
Peter Meerwald	54a10eb915	remap: Add ARM NEON optimized remapping and rearrange code v7: * cleanups and reduce code; add 4->4 channels mappings, add rearrange code v6: * rename mono_to_stereo_float_neon_a9() to mono_to_stereo_float_arm_generic(); note that Cortex-A8 and -A9/A15 are different, later chips do not benefit from NEON memory transfers v5: * 4-channel remapping * use vrhadd instruction, fix int16 overflow for to-mono case v4: * fix for sample length < 4 v3: * fix test code: init float and int map_table * different code path for Cortex-A8 and later (-A9, A15, unknown) * convert from intrinsics to inline assembly v2: * add ARM NEON stereo-to-mono remapping code * static __attribute__ ((noinline)) is necessary to prevent inlining and work around gcc 4.6 ICE, see https://bugs.launchpad.net/bugs/936863 * call test code, the reference implementation is obtained using pa_get_init_remap_func() * remove check for NEON flags v1: * ARM NEON mono-to-stereo remapping code note that orig is the time of the special-case C implementation where available, not the generic matric remapping implementation on ARM Cortex-A8 (TI OMAP3 DM3730 @ 1GHz) (Linaro GCC 4.6): Checking NEON remap (float, mono->stereo) func: 757474 usec (avg: 7574.74, min = 6165, max = 11963, stddev = 1479.71). orig: 784882 usec (avg: 7848.82, min = 6835, max = 17639, stddev = 1656.01). Checking NEON remap (float, mono->4-channel) func: 1545507 usec (avg: 15455.1, min = 6531, max = 30609, stddev = 2689.6). orig: 2601413 usec (avg: 26014.1, min = 22796, max = 52979, stddev = 3281.84). Checking NEON remap (s16, mono->stereo) func: 343844 usec (avg: 3438.44, min = 1709, max = 8880, stddev = 1180.1). orig: 474460 usec (avg: 4744.6, min = 4212, max = 7751, stddev = 1069.29). Checking NEON remap (s16, mono->4-channel) func: 736574 usec (avg: 7365.74, min = 3784, max = 11902, stddev = 1637.79). orig: 1062772 usec (avg: 10627.7, min = 7630, max = 17517, stddev = 3011.44). Checking NEON remap (float, stereo->mono) func: 571412 usec (avg: 5714.12, min = 4608, max = 15808, stddev = 2131.7). orig: 4356630 usec (avg: 43566.3, min = 41596, max = 52430, stddev = 2056.79). Checking NEON remap (float, 4-channel->mono) func: 1443202 usec (avg: 14432, min = 12298, max = 32349, stddev = 3300). orig: 9273410 usec (avg: 92734.1, min = 81940, max = 184265, stddev = 23310). Checking NEON remap (s16, stereo->mono) func: 185761 usec (avg: 1857.61, min = 1556, max = 4975, stddev = 743.681). orig: 1204776 usec (avg: 12047.8, min = 10711, max = 16022, stddev = 1596.88). Checking NEON remap (s16, 4-channel->mono) func: 482912 usec (avg: 4829.12, min = 4241, max = 9980, stddev = 1270.8). orig: 1692050 usec (avg: 16920.5, min = 14679, max = 30060, stddev = 2760.7). Checking NEON remap (float, 4-channel->4-channel) func: 5324471 usec (avg: 53244.7, min = 49774, max = 87036, stddev = 4255.47). orig: 73674628 usec (avg: 736746, min = 720338, max = 824128, stddev = 18361.8). Checking NEON remap (s16, 4-channel->4-channel) func: 5321320 usec (avg: 53213.2, min = 49591, max = 84443, stddev = 3931.49). orig: 24122021 usec (avg: 241220, min = 233337, max = 291687, stddev = 9064.31). Checking NEON remap (float, stereo rearrange) func: 1116547 usec (avg: 11165.5, min = 9124, max = 27496, stddev = 3345.63). orig: 1385011 usec (avg: 13850.1, min = 12237, max = 18005, stddev = 1793.05). Checking NEON remap (s16, stereo rearrange) func: 517027 usec (avg: 5170.27, min = 4577, max = 9735, stddev = 1215.23). orig: 1208435 usec (avg: 12084.4, min = 10406, max = 25299, stddev = 2512.02). Checking NEON remap (float, 4-channel rearrange) func: 1564667 usec (avg: 15646.7, min = 13855, max = 20172, stddev = 1766.48). orig: 2970000 usec (avg: 29700, min = 26215, max = 45654, stddev = 2351.07). Checking NEON remap (s16, 4-channel rearrange) func: 1088808 usec (avg: 10888.1, min = 9064, max = 23407, stddev = 2465.82). orig: 1908416 usec (avg: 19084.2, min = 16968, max = 22705, stddev = 1637.46). Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>	2014-05-25 18:13:27 +02:00

Author

SHA1

Message

Date

Khem Raj

3450d1fcfe

remap/arm: Adjust inline asm constraints

gcc10 can effectively emit single precision registers if right
operand modifier constraint is not in use

This results in assembler rejecting the code

/tmp/ccEG4QpI.s:646: Error: VFP/Neon double precision register expected -- `vtbl.8 d3,{d0,d1},s8'
/tmp/ccEG4QpI.s:678: Error: invalid instruction shape -- `vmul.f32 d0,d0,s8'

Therefore add %P qualifier to request double registers sinece 'w' could
mean variable could be stored in s0..s14 and GCC defaults to printing out s0..s14.
Note those registers map to d0..d7 also.

Output generated is exactly same with gcc9, and it also now compiles
with gcc10

Its not documented well in gcc docs and there is a ticket for that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343

Signed-off-by: Khem Raj <raj.khem@gmail.com>

2020-03-05 13:11:27 -08:00

Sascha Silbe

034b77823a

remap: support S32NE work format

So far PulseAudio only supported two different work formats: S16NE if
it's sufficient to represent the input and output formats without loss
of precision and FLOAT32NE in all other cases. For systems that use
S32NE exclusively, this results in unnecessary conversions from S32NE to
FLOAT32NE and back again.

Add S32NE remap operations and make use of them (for the COPY and
TRIVIAL resamplers) if both input and output format are S32NE. This
avoids the back and forth conversions between S32NE and FLOAT32NE,
significantly improving performance for those cases.

2019-03-29 06:04:28 +00:00

Peter Meerwald

54a10eb915

remap: Add ARM NEON optimized remapping and rearrange code

v7:
* cleanups and reduce code; add 4->4 channels mappings, add rearrange code
v6:
* rename mono_to_stereo_float_neon_a9() to mono_to_stereo_float_arm_generic(); note that
Cortex-A8 and -A9/A15 are different, later chips do not benefit from NEON memory transfers
v5:
* 4-channel remapping
* use vrhadd instruction, fix int16 overflow for to-mono case
v4:
* fix for sample length < 4
v3:
* fix test code: init float and int map_table
* different code path for Cortex-A8 and later (-A9, A15, unknown)
* convert from intrinsics to inline assembly
v2:
* add ARM NEON stereo-to-mono remapping code
* static __attribute__ ((noinline)) is necessary to prevent inlining and
  work around gcc 4.6 ICE, see https://bugs.launchpad.net/bugs/936863
* call test code, the reference implementation is obtained using
  pa_get_init_remap_func()
* remove check for NEON flags
v1:
* ARM NEON mono-to-stereo remapping code

note that orig is the time of the special-case C implementation where available, not
the generic matric remapping implementation

on ARM Cortex-A8 (TI OMAP3 DM3730 @ 1GHz) (Linaro GCC 4.6):

Checking NEON remap (float, mono->stereo)
func: 757474 usec (avg: 7574.74, min = 6165, max = 11963, stddev = 1479.71).
orig: 784882 usec (avg: 7848.82, min = 6835, max = 17639, stddev = 1656.01).
Checking NEON remap (float, mono->4-channel)
func: 1545507 usec (avg: 15455.1, min = 6531, max = 30609, stddev = 2689.6).
orig: 2601413 usec (avg: 26014.1, min = 22796, max = 52979, stddev = 3281.84).
Checking NEON remap (s16, mono->stereo)
func: 343844 usec (avg: 3438.44, min = 1709, max = 8880, stddev = 1180.1).
orig: 474460 usec (avg: 4744.6, min = 4212, max = 7751, stddev = 1069.29).
Checking NEON remap (s16, mono->4-channel)
func: 736574 usec (avg: 7365.74, min = 3784, max = 11902, stddev = 1637.79).
orig: 1062772 usec (avg: 10627.7, min = 7630, max = 17517, stddev = 3011.44).
Checking NEON remap (float, stereo->mono)
func: 571412 usec (avg: 5714.12, min = 4608, max = 15808, stddev = 2131.7).
orig: 4356630 usec (avg: 43566.3, min = 41596, max = 52430, stddev = 2056.79).
Checking NEON remap (float, 4-channel->mono)
func: 1443202 usec (avg: 14432, min = 12298, max = 32349, stddev = 3300).
orig: 9273410 usec (avg: 92734.1, min = 81940, max = 184265, stddev = 23310).
Checking NEON remap (s16, stereo->mono)
func: 185761 usec (avg: 1857.61, min = 1556, max = 4975, stddev = 743.681).
orig: 1204776 usec (avg: 12047.8, min = 10711, max = 16022, stddev = 1596.88).
Checking NEON remap (s16, 4-channel->mono)
func: 482912 usec (avg: 4829.12, min = 4241, max = 9980, stddev = 1270.8).
orig: 1692050 usec (avg: 16920.5, min = 14679, max = 30060, stddev = 2760.7).
Checking NEON remap (float, 4-channel->4-channel)
func: 5324471 usec (avg: 53244.7, min = 49774, max = 87036, stddev = 4255.47).
orig: 73674628 usec (avg: 736746, min = 720338, max = 824128, stddev = 18361.8).
Checking NEON remap (s16, 4-channel->4-channel)
func: 5321320 usec (avg: 53213.2, min = 49591, max = 84443, stddev = 3931.49).
orig: 24122021 usec (avg: 241220, min = 233337, max = 291687, stddev = 9064.31).

Checking NEON remap (float, stereo rearrange)
func: 1116547 usec (avg: 11165.5, min = 9124, max = 27496, stddev = 3345.63).
orig: 1385011 usec (avg: 13850.1, min = 12237, max = 18005, stddev = 1793.05).
Checking NEON remap (s16, stereo rearrange)
func: 517027 usec (avg: 5170.27, min = 4577, max = 9735, stddev = 1215.23).
orig: 1208435 usec (avg: 12084.4, min = 10406, max = 25299, stddev = 2512.02).
Checking NEON remap (float, 4-channel rearrange)
func: 1564667 usec (avg: 15646.7, min = 13855, max = 20172, stddev = 1766.48).
orig: 2970000 usec (avg: 29700, min = 26215, max = 45654, stddev = 2351.07).
Checking NEON remap (s16, 4-channel rearrange)
func: 1088808 usec (avg: 10888.1, min = 9064, max = 23407, stddev = 2465.82).
orig: 1908416 usec (avg: 19084.2, min = 16968, max = 22705, stddev = 1637.46).

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>

2014-05-25 18:13:27 +02:00

3 commits