pulseaudio

mirror of https://gitlab.freedesktop.org/pulseaudio/pulseaudio.git synced 2025-11-28 07:00:13 -05:00

Author	SHA1	Message	Date
Tanu Kaskinen	5c0ad422a8	remap_neon: use register r12 instead of r7 When the Thumb instructions set is used and frame pointers are enabled (-fno-omit-frame-pointer), r7 can't be used, because it's used for the frame pointer. Trying to use r7 caused the compilation to fail. Thanks to Andre McCurdy for suggesting[1] this fix, all I had to do was to test that it works. The code builds now, and cpu-remap-test also succeeds. [1] https://lists.openembedded.org/g/openembedded-core/message/136786	2020-07-20 13:27:44 +00:00
Khem Raj	3450d1fcfe	remap/arm: Adjust inline asm constraints gcc10 can effectively emit single precision registers if right operand modifier constraint is not in use This results in assembler rejecting the code /tmp/ccEG4QpI.s:646: Error: VFP/Neon double precision register expected -- `vtbl.8 d3,{d0,d1},s8' /tmp/ccEG4QpI.s:678: Error: invalid instruction shape -- `vmul.f32 d0,d0,s8' Therefore add %P qualifier to request double registers sinece 'w' could mean variable could be stored in s0..s14 and GCC defaults to printing out s0..s14. Note those registers map to d0..d7 also. Output generated is exactly same with gcc9, and it also now compiles with gcc10 Its not documented well in gcc docs and there is a ticket for that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343 Signed-off-by: Khem Raj <raj.khem@gmail.com>	2020-03-05 13:11:27 -08:00
Sascha Silbe	034b77823a	remap: support S32NE work format So far PulseAudio only supported two different work formats: S16NE if it's sufficient to represent the input and output formats without loss of precision and FLOAT32NE in all other cases. For systems that use S32NE exclusively, this results in unnecessary conversions from S32NE to FLOAT32NE and back again. Add S32NE remap operations and make use of them (for the COPY and TRIVIAL resamplers) if both input and output format are S32NE. This avoids the back and forth conversions between S32NE and FLOAT32NE, significantly improving performance for those cases.	2019-03-29 06:04:28 +00:00
Peter Meerwald	54a10eb915	remap: Add ARM NEON optimized remapping and rearrange code v7: * cleanups and reduce code; add 4->4 channels mappings, add rearrange code v6: * rename mono_to_stereo_float_neon_a9() to mono_to_stereo_float_arm_generic(); note that Cortex-A8 and -A9/A15 are different, later chips do not benefit from NEON memory transfers v5: * 4-channel remapping * use vrhadd instruction, fix int16 overflow for to-mono case v4: * fix for sample length < 4 v3: * fix test code: init float and int map_table * different code path for Cortex-A8 and later (-A9, A15, unknown) * convert from intrinsics to inline assembly v2: * add ARM NEON stereo-to-mono remapping code * static __attribute__ ((noinline)) is necessary to prevent inlining and work around gcc 4.6 ICE, see https://bugs.launchpad.net/bugs/936863 * call test code, the reference implementation is obtained using pa_get_init_remap_func() * remove check for NEON flags v1: * ARM NEON mono-to-stereo remapping code note that orig is the time of the special-case C implementation where available, not the generic matric remapping implementation on ARM Cortex-A8 (TI OMAP3 DM3730 @ 1GHz) (Linaro GCC 4.6): Checking NEON remap (float, mono->stereo) func: 757474 usec (avg: 7574.74, min = 6165, max = 11963, stddev = 1479.71). orig: 784882 usec (avg: 7848.82, min = 6835, max = 17639, stddev = 1656.01). Checking NEON remap (float, mono->4-channel) func: 1545507 usec (avg: 15455.1, min = 6531, max = 30609, stddev = 2689.6). orig: 2601413 usec (avg: 26014.1, min = 22796, max = 52979, stddev = 3281.84). Checking NEON remap (s16, mono->stereo) func: 343844 usec (avg: 3438.44, min = 1709, max = 8880, stddev = 1180.1). orig: 474460 usec (avg: 4744.6, min = 4212, max = 7751, stddev = 1069.29). Checking NEON remap (s16, mono->4-channel) func: 736574 usec (avg: 7365.74, min = 3784, max = 11902, stddev = 1637.79). orig: 1062772 usec (avg: 10627.7, min = 7630, max = 17517, stddev = 3011.44). Checking NEON remap (float, stereo->mono) func: 571412 usec (avg: 5714.12, min = 4608, max = 15808, stddev = 2131.7). orig: 4356630 usec (avg: 43566.3, min = 41596, max = 52430, stddev = 2056.79). Checking NEON remap (float, 4-channel->mono) func: 1443202 usec (avg: 14432, min = 12298, max = 32349, stddev = 3300). orig: 9273410 usec (avg: 92734.1, min = 81940, max = 184265, stddev = 23310). Checking NEON remap (s16, stereo->mono) func: 185761 usec (avg: 1857.61, min = 1556, max = 4975, stddev = 743.681). orig: 1204776 usec (avg: 12047.8, min = 10711, max = 16022, stddev = 1596.88). Checking NEON remap (s16, 4-channel->mono) func: 482912 usec (avg: 4829.12, min = 4241, max = 9980, stddev = 1270.8). orig: 1692050 usec (avg: 16920.5, min = 14679, max = 30060, stddev = 2760.7). Checking NEON remap (float, 4-channel->4-channel) func: 5324471 usec (avg: 53244.7, min = 49774, max = 87036, stddev = 4255.47). orig: 73674628 usec (avg: 736746, min = 720338, max = 824128, stddev = 18361.8). Checking NEON remap (s16, 4-channel->4-channel) func: 5321320 usec (avg: 53213.2, min = 49591, max = 84443, stddev = 3931.49). orig: 24122021 usec (avg: 241220, min = 233337, max = 291687, stddev = 9064.31). Checking NEON remap (float, stereo rearrange) func: 1116547 usec (avg: 11165.5, min = 9124, max = 27496, stddev = 3345.63). orig: 1385011 usec (avg: 13850.1, min = 12237, max = 18005, stddev = 1793.05). Checking NEON remap (s16, stereo rearrange) func: 517027 usec (avg: 5170.27, min = 4577, max = 9735, stddev = 1215.23). orig: 1208435 usec (avg: 12084.4, min = 10406, max = 25299, stddev = 2512.02). Checking NEON remap (float, 4-channel rearrange) func: 1564667 usec (avg: 15646.7, min = 13855, max = 20172, stddev = 1766.48). orig: 2970000 usec (avg: 29700, min = 26215, max = 45654, stddev = 2351.07). Checking NEON remap (s16, 4-channel rearrange) func: 1088808 usec (avg: 10888.1, min = 9064, max = 23407, stddev = 2465.82). orig: 1908416 usec (avg: 19084.2, min = 16968, max = 22705, stddev = 1637.46). Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>	2014-05-25 18:13:27 +02:00

4 commits