In some cases, depending on the instruction that performs the load, orc
ignores the size of the parameter when loading it for the first time.
Explicitly load the parameter into a temp to make sure it is loaded
correctly, like we do for the 2ch case.
See https://bugzilla.gnome.org/show_bug.cgi?id=742271
This adds volume scaling for 1- and 2-channel software volume scaling
using Orc. While testing the MMX and SSE backends on a Core2, I see an
~2x performance benefit over the hand-rolled MMX and SSE code. Since I
haven't been able to test on other architectures, the Orc code is only
used when MMX/SSE* is present. This can be changed in the future after
testing on AMD and ARM machines.