Add mixer benchmark. Improve sse and sse2 mixer function by removin some read/write to the temporary buffer at the expense of more scattered reads.