Use a wrap around delay ringbuffer. We can then avoid some modulo
arithmetic and read more efficiently.
Also handle the delay convolver case better by reversing the taps and
reading the taps and delay buffer without extra overhead.
Add an option to do a hilbert transform on the generated rear channels
to do a 90 degree pahse shift on them. This can improve spacialization
of the rear channels.
See #861