similar to volume functions, simplifies leftover samples handling
for SIMD'd code path
use concrete pointer type (e.g. int16_t*) instead of void*,
saves several casts
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
move code to function pa_mult_s16_volume() in sample-util.h
use 64 bit integers on 64 bit platforms (it's faster)
on i5, 2.5GHz (64-bit)
Running suite(s): Mult-s16
32 bit mult: 1272300 usec (avg: 12723, min = 12533, max = 18749, stddev = 620.48).
64 bit mult: 852241 usec (avg: 8522.41, min = 8420, max = 9148, stddev = 109.388).
100%: Checks: 1, Failures: 0, Errors: 0
on Pentium D, 3.4GHz (32-bit)
Running suite(s): Mult-s16
32 bit mult: 2228504 usec (avg: 22285, min = 18775, max = 29648, stddev = 3865.59).
64 bit mult: 5546861 usec (avg: 55468.6, min = 55028, max = 64924, stddev = 978.981).
100%: Checks: 1, Failures: 0, Errors: 0
on TI DM3730, Cortex-A8, 800MHz (32-bit)
Running suite(s): Mult-s16
32 bit mult: 23708900 usec (avg: 237089, min = 191864, max = 557312, stddev = 77503.6).
64 bit mult: 22190039 usec (avg: 221900, min = 177978, max = 480469, stddev = 68520.5).
100%: Checks: 1, Failures: 0, Errors: 0
there is a test program called mult-s16-test which checks that the functions compute the
same results, and compares runtime
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
have individual function for mixing stream with different sample format instead
of huge case block in pa_mix()
shorter functions, prepare for optimized code path
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
idea is to allow optimized code path (similar to volume code)
and rework/specialize mixing cases to enable runtime performance improvements
no functionality changes in this patch
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
The patch intends to reduce computational load when resampling AND remapping. The PA
resampler performs the following steps:
sample format conversion -> remapping -> resampling -> sample format conversion
In case the number of output channels is higher than the number of input channels, the
resampler has to be run more often than necessary. E.g. in case of mono to 4-channel remapping,
the resampler runs on 4 channels separately.
To ímprove this, the PA resampler pipeline is made adaptive:
if out-channels <= in-channels:
sample format conversion -> remapping -> resampling -> sample format conversion
if out-channels > in-channels:
sample format conversion -> resampling -> remapping -> sample format conversion
Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
Remixing one channel map to another is (except for special cases) done
via a linear mapping between channels, whose corresponding matrix is
computed by calc_map_table(). The k-th row in this matrix corresponds to
the coefficients of the linear combination of the input channels that
result in the k-th output channel. In order to avoid clipping of samples
we require that the sum of these coefficients is (at most) 1. This
commit ensures this.
Prior to this commit tests/remix-test.c gives 52 of 132 matrices that
violate this property. For example:
'front-left,front-right,front-center,lfe' -> 'front-left,front-right'
prior this commit after this commit
I00 I01 I02 I03 I00 I01 I02 I03
+------------------------ +------------------------
O00 | 0.750 0.000 0.375 0.375 O00 | 0.533 0.000 0.267 0.200
O01 | 0.000 0.750 0.375 0.375 O01 | 0.000 0.533 0.267 0.200
Building the matrix is done in several steps. However, only insufficient
measures are taken in order to preserve a row-sum of 1.0 (or leaves it
at 0.0) after each step. The current patch adds a post-processing step
in order check for each row whether the sum exceeds 1.0 and, if
necessary, normalizes this row. This allows for further simplifactions:
- The insufficient normalizations after some steps are removed. Gains
are adapted to (partially) resemble the old matrices.
- Handling unconnected input channls becomes a lot simpler.
- Separate the cases with PA_RESAMPLER_NO_REMAP or PA_RESAMPLER_NO_REMIX
set and remove redundant if-conditions.
- Fix C90 compiler warning due to mixing code and variable declaration.
- Do not repeatedly count number of left, right and center channels in
the input channel map.
The logic of calc_map_table() remains unaltered.
Earlier, -1 was returned if the memchunk size was not a multiple of the frame
size. Now, it is verified unconditionally through an assertion. Error code -1
is still returned when the memblock queue is full.
In those few cases where the return value of pa_memblockq_push() is checked,
an overflow is assumed to be the reason in case an error code is returned.
use (1<<15) instead of 0x7fff as a factor when converting from s16 to float32
use (1<<31) instead of 0x7fffffff as a factor when converting from s32 to float32
the change is motivated by the following desireable properties:
* s16_from_f32(f32_from_s16(x)) == x for all possible s16 values
* x / (1.0f << 15) == x * (1.0f / (1 << 15)) for all x in s16
above changes enable easier optimization while guaranteeing bit-exact results
further, other audio sample conversion code (libavresample) does it the same way
v3 (comments Tanu):
* fix saturation in pa_sconv_s16le_from_f32ne_neon(), use vqrshrn
v2 (comments Tanu):
* fix comments in ARM NEON code
* use llrintf() in pa_sconv_s32le_from_float32ne()
Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
Cc: Tanu Kaskinen <tanuk@iki.fi>
Problem: s16 to s32 conversion is performed as s16->float->s32 (via work
format float) for resamplers TRIVIAL, COPY, PEAKS.
Precision and efficiency suffers: e.g. 0x9fff results in 0x9ffe4001 (instead
of 0x9fff0000) and there are two sample format conversions instead of one
conversion.
Solution: If input or output format is s16, then choose the work format
to be s16 as well.
If remapping is to be performed, we could stick to work format float32ne for
precision reseans. This is debateable.
Signed-off-by: Peter Meerwald <p.meerwald@bct-electronic.com>
it's useless to get the same SF_FORMAT_INFO three times, just compare the
name/extention in the same loop.
Signed-off-by: Wang Xingchao <xingchao.wang@intel.com>
I was looking at a log that showed that a suspend happened (at
a strange time), but the log didn't tell me why the suspend was done.
This patch tries to make sure that that won't happen again.
This makes sure we don't try to plug in a passthrough stream if the
final sink/source sample spec doesn't match what we want. In the future,
we might want to change rate updates to try a full sample spec update
for passthrough streams.
https://bugs.freedesktop.org/show_bug.cgi?id=50951
When a rewind is requested on a sink input, the request parameters are
stored in the pa_sink_input struct. The parameters are reset during
rewind processing, and if the sink decides to ignore the rewind
request due to being suspended, stale parameters are left in
pa_sink_input. It's particularly problematic if the rewrite_bytes
parameter is left at -1, because that will prevent all future rewind
processing on that sink input. So, in order to avoid stale parameters,
every rewind request needs to be processed, even if the sink is
suspended.
Reported-by: Uoti Urpala
To reproduce, add resampler-method = ffmpeg in daemon.conf
then start PA, and load module-loopback
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb2f1db40 (LWP 23047)]
0x00000000 in ?? ()
(gdb) bt
0 0x00000000 in ?? ()
1 0xb7c463cb in pa_resampler_set_input_rate (r=0x80e9438, rate=44011) at pulsecore/resampler.c:365
2 0xb7c6321d in pa_sink_input_process_msg (o=0x80e87a0, code=3, userdata=0xabeb, offset=0, chunk=0x0)
at pulsecore/sink-input.c:1833
3 0xb7e9840b in sink_input_process_msg_cb (obj=0x80e87a0, code=3, data=0xabeb, offset=0, chunk=0x0)
at modules/module-loopback.c:538
4 0xb7c2709b in pa_asyncmsgq_dispatch (object=0x80e87a0, code=3, userdata=0xabeb, offset=0, memchunk=0xb2f1d17c)
at pulsecore/asyncmsgq.c:322
5 0xb7c4c6e3 in asyncmsgq_read_work (i=0x80dd580) at pulsecore/rtpoll.c:564
6 0xb7c4b34a in pa_rtpoll_run (p=0x80fb7e0, wait_op=true) at pulsecore/rtpoll.c:238
7 0xb7dd90af in thread_func (userdata=0x80afe88) at modules/alsa/alsa-sink.c:1785
8 0xb7bf3291 in internal_thread_func (userdata=0x8095d08) at pulsecore/thread-posix.c:83
9 0xb7ab9d4c in start_thread (arg=0xb2f1db40) at pthread_create.c:308
10 0xb79f3ace in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
A rewind may erase data that sink_input counted in playing_for or
underrun_for earlier. Add code adjusting those values after a rewind.
One visible symptom of this bug was problems recovering from an
underrun. When a client calls pa_stream_write() with a large block of
memory, the function can split that into smaller pieces before sending
it to the server. When receiving new data for a stream that had
silence queued due to underrun, the server would do a rewind to
replace the queued-but-not-played silence with the new data. Because
of the bug, this rewind itself would not change underrun_for. It's
possible for multiple rewinds to be done without filling the sink
buffer in between (which is what would eventually reset underrun_for).
In this case, the server rapidly processing the split packets would
rewind the stream for _each_ of them (as underrun_for would stay set),
erasing valid audio as a result.
As Peter Meerwald <p.meerwald@bct-electronic.com> discovered, our ARM
svolume code performance is quite terrible when the incoming samples are
not word-aligned. This can very easily be the case, since the
architecture only requires that the samples be 16-bit aligned, and we
might end up running the innermost loop after processing modulo-4
samples. The performance degradation was ~50x on a Cortex A9
(Pandaboard).
This reworks the svolume logic to first consume enough samples to make
sure the rest is word aligned, and reordering the processing to work
with 4 samples at a time first, and then finally deal with the
remainder.
With this, performance is comparable for arbitrary alignments (~3x
faster than the C code).
This fixes at least one crash that has been observed. The
multiplication in trivial_resample() overflowed when
resampling from 96 kHz to 48 kHz, causing an assertion
error:
Assertion 'o_index * fz < pa_memblock_get_length(output->memblock)' failed at pulsecore/resampler.c:1521, function trivial_resample(). Aborting.
Without the assertion, the memcpy() after the assertion
would have overwritten some random heap memory.
When compiling without HAVE_SYMLINK the runtime dir is a real directory,
which is attempted to be created. In the case it already exists we shouldn't
error out. The HAVE_SYMLINK-enabled code already does this.
Currently, Windows versions of pacat and friends fail because the current
poll emulation is not sufficient (it only works for socket fds).
Luckily Gnulib has a much better emulation that seems to work good enough.
The implementation has been largely copied (except a few bug fix
regarding timeout handling, to be pushed upstream) and works on pipes
and files as well. The copy has been obtained through their gnulib-tool utility,
which gives a LGPLv2.1+ licensed file.
This fixes the "Assertion (!e->dead) failed" error coming and lets pacat
and friends stream happily to/from a server (I didn't actually test parec).