Add to the dmix plugin support for the S24_3LE sample format which is
used by 24-bit USB devices.
The optimized assembler version uses only 23 bits for sample data so
that the lowest bit can be used for synchronization because there is no
24-bit cmpxchg instruction.
Support dmix on generic architectures without atomic operations but
using a semaphore to avoid concurrent accesses. This is less effective
than atomic operations but should work on every system.