pulseaudio

mirror of https://gitlab.freedesktop.org/pulseaudio/pulseaudio.git synced 2026-06-25 13:14:15 -04:00

Author	SHA1	Message	Date
Johan Hedberg	8c0cca7905	bluetooth: sbc: Reduce for-loop induced indentation in sbc_unpack_frame	2011-10-28 15:43:56 +02:00
Siarhei Siamashka	00602537ba	bluetooth: sbc: overflow bugfix and audio decoding quality improvement The "(((audio_sample << 1) \| 1) << frame->scale_factor[ch][sb])" part of expression "frame->sb_sample[blk][ch][sb] = (((audio_sample << 1) \| 1) << frame->scale_factor[ch][sb]) / levels[ch][sb] - (1 << frame->scale_factor[ch][sb])" in "sbc_unpack_frame" function can sometimes overflow 32-bit signed int. This problem can be reproduced by first using bitpool 128 and encoding some random noise data, and then feeding it to sbc decoder. The obvious thing to do would be to change "audio_sample" variable type to uint32_t. However the problem is a little bit more complicated. According to the section "12.6.2 Scale Factors" of A2DP spec: scalefactor[ch][sb] = pow(2.0, (scale_factor[ch][sb] + 1)) And according to "12.6.4 Reconstruction of the Subband Samples": sb_sample[blk][ch][sb] = scalefactor[ch][sb] * ((audio_sample[blk][ch][sb]*2.0+1.0) / levels[ch][sb]-1.0); Hence the current code for calculating "sb_sample[blk][ch][sb]" is not quite correct, because it loses one least significant bit of sample data and passes twice smaller sample values to the synthesis filter (the filter also deviates from the spec to compensate this). This all has quite a noticeable impact on audio quality. Moreover, it makes sense to keep a few extra bits of precision here in order to minimize rounding errors. So the proposed patch introduces a new SBCDEC_FIXED_EXTRA_BITS constant and uses uint64_t data type for intermediate calculations in order to safeguard against overflows. This patch intentionally addresses only the quality issue, but performance can be also improved later (like replacing division with multiplication by reciprocal). Test for the difference of sbc encoding/decoding roundtrip vs. the original audio file for joint stereo, bitpool 128, 8 subbands and http://media.xiph.org/sintel/sintel-master-st.flac sample demonstrates some quality improvement: === before === --- comparing original / sbc_encoder.exe + sbcdec --- stddev: 4.64 PSNR: 82.97 bytes:170495708/170496000 === after === --- comparing original / sbc_encoder.exe + sbcdec --- stddev: 1.95 PSNR: 90.50 bytes:170495708/170496000	2011-10-28 15:43:48 +02:00
Maarten Bosmans	3d04a05736	bluetooth/sbc: Use __asm__ keyword	2011-09-03 12:16:49 +02:00
Paul Menzel	0bed5caf3b	bluetooth: run `make update-sbc` to pull in build fix for thumb mode This update pulls in commit c495077c [1] to fix a build error. commit c495077cf8a8c37afd90875ec5a5b16b294be15e Author: Siarhei Siamashka <siarhei.siamashka@nokia.com> Date: Tue Mar 29 01:57:39 2011 +0300 sbc: better compatibility with ARM thumb/thumb2 ARM assembly optimizations fail to compile in thumb mode, but are fine for thumb2. Update ifdefs in the code to make use of ARM assembly only when it is safe and also make sure that no optimizations are missed when compiling for thumb2. The problem was reported by Paul Menzel: https://tango.0pointer.de/pipermail/pulseaudio-discuss/2011-February/009022.html This patch is tested with OpenEmbedded using `minimal-uclibc` for `MACHINE = "at91sam9260ek"`. Note that changes to ipc.h from `8f3ef04b` had to be manually reapplied. [1] http://git.kernel.org/?p=bluetooth/bluez.git;a=commit;h=c495077cf8a8c37afd90875ec5a5b16b294be15e	2011-03-29 21:12:17 +01:00
Colin Guthrie	b676f89d85	bluetooth: Run 'make update-sbc' Note that changes to ipc.h from `8f3ef04b` had to be manually reapplied.	2011-03-20 12:49:49 +00:00
Siarhei Siamashka	ee93eff6b7	sbc: add iwmmxt optimization for sbc for pxa series cpu Benchmarked on ARM PXA platform: === Before (4 bands) ==== $ time ./sbcenc_orig -s 4 long.au > /dev/null real 0m 2.44s user 0m 2.39s sys 0m 0.05s === After (4 bands) ==== $ time ./sbcenc -s 4 long.au > /dev/null real 0m 1.59s user 0m 1.49s sys 0m 0.10s === Before (8 bands) ==== $ time ./sbcenc_orig -s 8 long.au > /dev/null real 0m 4.05s user 0m 3.98s sys 0m 0.07s === After (8 bands) ==== $ time ./sbcenc -s 8 long.au > /dev/null real 0m 1.48s user 0m 1.41s sys 0m 0.06s === Before (a2dp usage) ==== $ time ./sbcenc_orig -b53 -s8 -j long.au > /dev/null real 0m 4.51s user 0m 4.41s sys 0m 0.10s === After (a2dp usage) ==== $ time ./sbcenc -b53 -s8 -j long.au > /dev/null real 0m 2.05s user 0m 1.99s sys 0m 0.06s	2011-03-14 15:45:39 -03:00
Siarhei Siamashka	82ef8346d8	sbc: ARMv6 optimized version of analysis filter for SBC encoder The optimized filter gets enabled when the code is compiled with -mcpu=/-march options set to target the processors which support ARMv6 instructions. This code is also disabled when NEON is used (which is a lot better alternative). For additional safety ARM EABI is required and thumb mode should not be used. Benchmarks from ARM11: == 8 subbands == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m 35.65s user 0m 34.17s sys 0m 1.28s $ time ./sbcenc.armv6 -b53 -s8 -j test.au > /dev/null real 0m 17.29s user 0m 15.47s sys 0m 0.67s == 4 subbands == $ time ./sbcenc -b53 -s4 -j test.au > /dev/null real 0m 25.28s user 0m 23.76s sys 0m 1.32s $ time ./sbcenc.armv6 -b53 -s4 -j test.au > /dev/null real 0m 18.64s user 0m 15.78s sys 0m 2.22s	2011-03-14 15:45:27 -03:00
Siarhei Siamashka	51d5f3c9fd	sbc: added "cc" to the clobber list of mmx inline assembly In the case of scale factors calculation optimizations, the inline assembly code has instructions which update flags register, but "cc" was not mentioned in the clobber list. When optimizing code, gcc theoretically is allowed to do a comparison before the inline assembly block, and a conditional branch after it which would lead to a problem if the flags register gets clobbered. While this is apparently not happening in practice with the current versions of gcc, the clobber list needs to be corrected. Regarding the other inline assembly blocks. While most likely it is actually unnecessary based on quick review, "cc" is also added there to the clobber list because it should have no impact on performance in practice. It's kind of cargo cult, but relieves us from the need to track the potential updates of flags register in all these places.	2011-03-14 15:44:47 -03:00
Siarhei Siamashka	5423dc1644	sbc: faster 'sbc_calculate_bits' function By using SBC_ALWAYS_INLINE trick, the implementation of 'sbc_calculate_bits' function is split into two branches, each having 'subband' variable value known at compile time. It helps the compiler to generate more optimal code by saving at least one extra register, and also provides more obvious opportunities for loops unrolling. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.989s user 0m3.602s sys 0m0.391s samples % image name symbol name 26057 32.6128 sbcenc sbc_pack_frame 20003 25.0357 sbcenc sbc_analyze_4b_8s_neon 14220 17.7977 sbcenc sbc_calculate_bits 8498 10.6361 no-vmlinux /no-vmlinux 5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon 3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon 2172 2.7185 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.652s user 0m3.195s sys 0m0.445s samples % image name symbol name 26207 36.0095 sbcenc sbc_pack_frame 19820 27.2335 sbcenc sbc_analyze_4b_8s_neon 8629 11.8566 no-vmlinux /no-vmlinux 6988 9.6018 sbcenc sbc_calculate_bits 5094 6.9994 sbcenc sbc_calc_scalefactors_j_neon 3351 4.6044 sbcenc sbc_enc_process_input_8s_be_neon 2182 2.9982 sbcenc sbc_encode	2011-03-14 15:31:30 -03:00
Siarhei Siamashka	8997917000	sbc: slightly faster 'sbc_calc_scalefactors_neon' Previous variant was basically derived from C and MMX implementations. Now new variant makes use of 'vmax' instruction, which is available in NEON and can do this job faster. The same method for calculating scale factors is also used in 'sbc_calc_scalefactors_j_neon'. Benchmarked without joint stereo on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m3.851s user 0m3.375s sys 0m0.469s samples % image name symbol name 26260 34.2672 sbcenc sbc_pack_frame 20013 26.1154 sbcenc sbc_analyze_4b_8s_neon 13796 18.0027 sbcenc sbc_calculate_bits 8388 10.9457 no-vmlinux /no-vmlinux 3229 4.2136 sbcenc sbc_enc_process_input_8s_be_neon 2408 3.1422 sbcenc sbc_calc_scalefactors_neon 2093 2.7312 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m3.796s user 0m3.344s sys 0m0.438s samples % image name symbol name 26582 34.8726 sbcenc sbc_pack_frame 20032 26.2797 sbcenc sbc_analyze_4b_8s_neon 13808 18.1146 sbcenc sbc_calculate_bits 8374 10.9858 no-vmlinux /no-vmlinux 3187 4.1810 sbcenc sbc_enc_process_input_8s_be_neon 2027 2.6592 sbcenc sbc_encode 1766 2.3168 sbcenc sbc_calc_scalefactors_neon	2011-03-14 15:29:38 -03:00
Siarhei Siamashka	68bdf5526e	sbc: ARM NEON optimizations for input permutation in SBC encoder Using SIMD optimizations for 'sbc_enc_process_input_*' functions provides a modest, but consistent speedup in all SBC encoding cases. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m4.389s user 0m3.969s sys 0m0.422s samples % image name symbol name 26234 29.9625 sbcenc sbc_pack_frame 20057 22.9076 sbcenc sbc_analyze_4b_8s_neon 14306 16.3393 sbcenc sbc_calculate_bits 9866 11.2682 sbcenc sbc_enc_process_input_8s_be 8506 9.7149 no-vmlinux /no-vmlinux 5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon 2280 2.6040 sbcenc sbc_encode 661 0.7549 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.989s user 0m3.602s sys 0m0.391s samples % image name symbol name 26057 32.6128 sbcenc sbc_pack_frame 20003 25.0357 sbcenc sbc_analyze_4b_8s_neon 14220 17.7977 sbcenc sbc_calculate_bits 8498 10.6361 no-vmlinux /no-vmlinux 5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon 3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon 2172 2.7185 sbcenc sbc_encode	2011-03-14 15:28:31 -03:00
Siarhei Siamashka	718fe73cab	sbc: ARM NEON optimized joint stereo processing in SBC encoder Improves SBC encoding performance when joint stereo is used, which is a typical A2DP configuration. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m5.239s user 0m4.805s sys 0m0.430s samples % image name symbol name 26083 25.0856 sbcenc sbc_pack_frame 21548 20.7240 sbcenc sbc_calc_scalefactors_j 19910 19.1486 sbcenc sbc_analyze_4b_8s_neon 14377 13.8272 sbcenc sbc_calculate_bits 9990 9.6080 sbcenc sbc_enc_process_input_8s_be 8667 8.3356 no-vmlinux /no-vmlinux 2263 2.1765 sbcenc sbc_encode 696 0.6694 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m4.389s user 0m3.969s sys 0m0.422s samples % image name symbol name 26234 29.9625 sbcenc sbc_pack_frame 20057 22.9076 sbcenc sbc_analyze_4b_8s_neon 14306 16.3393 sbcenc sbc_calculate_bits 9866 11.2682 sbcenc sbc_enc_process_input_8s_be 8506 9.7149 no-vmlinux /no-vmlinux 5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon 2280 2.6040 sbcenc sbc_encode 661 0.7549 libc-2.10.1.so memcpy	2011-03-14 15:27:30 -03:00
Siarhei Siamashka	177948a6f2	sbc: fix signedness of parameters The written parameter of sbc_encode can be negative so it should be ssize_t instead of size_t.	2011-03-14 15:21:53 -03:00
Siarhei Siamashka	fd7dc68ded	sbc: ARM NEON optimization for scale factors calculation Improves SBC encoding performance when joint stereo is not used. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m4.756s user 0m4.313s sys 0m0.438s samples % image name symbol name 2569 27.6296 sbcenc sbc_pack_frame 1934 20.8002 sbcenc sbc_analyze_4b_8s_neon 1386 14.9064 sbcenc sbc_calculate_bits 1221 13.1319 sbcenc sbc_calc_scalefactors 996 10.7120 sbcenc sbc_enc_process_input_8s_be 878 9.4429 no-vmlinux /no-vmlinux 204 2.1940 sbcenc sbc_encode 56 0.6023 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m4.220s user 0m3.797s sys 0m0.422s samples % image name symbol name 2563 31.3249 sbcenc sbc_pack_frame 1892 23.1239 sbcenc sbc_analyze_4b_8s_neon 1368 16.7196 sbcenc sbc_calculate_bits 961 11.7453 sbcenc sbc_enc_process_input_8s_be 836 10.2176 no-vmlinux /no-vmlinux 262 3.2022 sbcenc sbc_calc_scalefactors_neon 199 2.4322 sbcenc sbc_encode 49 0.5989 libc-2.10.1.so memcpy	2011-03-14 15:18:46 -03:00
Siarhei Siamashka	1f617ea9ec	sbc: MMX optimization for scale factors calculation Improves SBC encoding performance when joint stereo is not used. Benchmarked on Pentium-M: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m1.439s user 0m1.336s sys 0m0.104s samples % image name symbol name 8642 33.7473 sbcenc sbc_pack_frame 5873 22.9342 sbcenc sbc_analyze_4b_8s_mmx 4435 17.3188 sbcenc sbc_calc_scalefactors 4285 16.7331 sbcenc sbc_calculate_bits 1942 7.5836 sbcenc sbc_enc_process_input_8s_be 322 1.2574 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m1.319s user 0m1.220s sys 0m0.084s samples % image name symbol name 8706 37.9959 sbcenc sbc_pack_frame 5740 25.0513 sbcenc sbc_analyze_4b_8s_mmx 4307 18.7972 sbcenc sbc_calculate_bits 1937 8.4537 sbcenc sbc_enc_process_input_8s_be 1801 7.8602 sbcenc sbc_calc_scalefactors_mmx 307 1.3399 sbcenc sbc_encode	2011-03-14 15:17:31 -03:00
Siarhei Siamashka	c2b2fc1640	sbc: new 'sbc_calc_scalefactors_j' function added to sbc primitives The code for scale factors calculation with joint stereo support has been moved to a separate function. It can get platform-specific SIMD optimizations later for best possible performance. But even this change in C code improves performance because of the use of __builtin_clz() instead of loops similar to what was done to sbc_calc_scalefactors earlier. Also technically it does loop unrolling by processing two channels at once, which might be either good or bad for performance (if the registers pressure is increased and more data is spilled to memory). But the benchmark from 32-bit x86 system (pentium-m) shows that it got clearly faster: $ time ./sbcenc.old -b53 -s8 -j test.au > /dev/null real 0m1.868s user 0m1.808s sys 0m0.048s $ time ./sbcenc.new -b53 -s8 -j test.au > /dev/null real 0m1.742s user 0m1.668s sys 0m0.064s	2011-03-14 15:16:30 -03:00
Gustavo F. Padovan	16a05e52c6	sbc: Fix redundant null check on calling free() Issues found by smatch static check: http://smatch.sourceforge.net/	2011-03-14 15:09:50 -03:00
Siarhei Siamashka	84d91fb708	sbc: added saturated clipping of decoder output to 16-bit This prevents overflows and audible artefacts for the audio files which originally had loudness maximized. Music from audio CD disks is an example of such files, see http://en.wikipedia.org/wiki/Loudness_war	2011-03-14 15:07:38 -03:00
Siarhei Siamashka	4d2f0daba1	sbc: ensure 16-byte buffer position alignment for 4 subbands encoding Buffer position in X array was not always 16-bytes aligned. Strict 16-byte alignment is strictly required for powerpc altivec simd optimizations because altivec does not have support for unaligned vector loads at all.	2011-03-14 15:01:19 -03:00
Luiz Augusto von Dentz	e4eb467010	build: move sbc related files to its own directory This should make it easier to apply patches from BlueZ which also uses sbc subdir for this files.	2011-03-14 14:52:52 -03:00

20 commits