As Maarten Baert recently reported, the current dmix code applies the
semaphore unnecessarily around mixing streams even when the lockless
mix operation is used on x86. This was rather introduced mistakenly
at the commit 267d7c7281 ("Add support of little-endian on
i386/x86_64 dmix") where the generic dmix code was included on x86,
too.
For achieving the original performance back, this patch changes the
semaphore handling to be checked at run time instead of statically at
compile time.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The dmix plugin has some optimized implementations for x86 using the
direct memory accesses, which was rather the original version, in
addition to the "generic" implementation using the semaphore
blocking. The x86 implementation relies on the memory coherency *and*
the fast read/write on it.
For other architectures, this has been always disabled just because of
memory coherency. But, the recent LPE audio development revealed
that, even on x86 platforms, the read/write performance might become
extremely bad when the buffer is marked as uncached. Some drivers
already know the buffer is uncached, we need to switch to the generic
mode in such a case.
This patch introduces yet another flag to dmix configuration,
direct_memory_access, that indicates whether the x86-specific
optimization can be used or not. Each driver can set the flag in its
cards config namespace, and the default dmix config refers to it.
As of this patch, only HDMI LPE Audio driver sets it.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
i386/x86_64 alsa-lib may need to handle big-endian formats, e.g.
when running via qemu on PPC. The generic dmix code already has
both endian support, so let's use it as fallback.
Add to the dmix plugin support for the S24_3LE sample format which is
used by 24-bit USB devices.
The optimized assembler version uses only 23 bits for sample data so
that the lowest bit can be used for synchronization because there is no
24-bit cmpxchg instruction.