Quantcast
Channel: Richard Geldreich's Blog
Viewing all articles
Browse latest Browse all 302

Vectorized interleaved Range Coding using SSE 4.1

$
0
0
In order to avoid the current (and upcoming) ANS/rANS entropy coding patent minefield, we're avoiding it and using vectorized Range Coding instead. Here's a 24-bit SSE 4.1 example using 16 interleaved streams. This example decoder gets 550-700 megabytes/sec. with 8-bit alphabets on various Intel/AMD CPU's I've tried:


More on the rANS patent situation (from early 2022):

This decoder design is practical on any CPU or GPU that supports fast hardware integer or float division. It explicitly uses 24-bit registers to sidestep issues with float divides. I've put much less work on optimizing the encoder, but the key step (the post-encode byte swizzle) is the next bottleneck to address.

Viewing all articles
Browse latest Browse all 302

Trending Articles