Intel Skylake (level 0x16) is the first model with fast gather
opcodes. Mark lower versions with the SLOW_GATHER flag.
Prefer the SSE2 version of the format conversion without gather when
SLOW_GATHER is set. Makes the conversion much faster on my Ivy
Bridge.
Add a method to enable/disable the denormals flush-to-zero and
denormals-as-zero CPU options.
Add a config option to make it possible to disable this again.
Fixes high CPU usage when dealing with denormals, which can happen
in many DSP functions.
Fixes#1681