Somewhere Range Coding went off the rails

I've lost track of the number of Range Coders I've seen floating around for the past ~25 years. Subbotin's coders released in the late 90's/early 2000's seem to be very influential. Each implementation has different tradeoffs that greatly impact a vectorized implementation. There is no single definitive range coder design, and I've seen 3 very different ways to handle carries. I learned a lot implementing my first vectorized range decoder.

Most of the tradeoffs boil down to how carries are handled or how many state registers are used by the decoder. You can delay the carries during encoding (Subbotin's earliest "carry-less" or delayed carry encoder that I can find, which is very clever), or fixup the already encoded bytes by propagating the carries backwards in memory during encoding (LZMA's binary range coder, which is similar to Subbotin's), or move the carry logic into the decoder (another by Subbotin called the "Russian People's Range Coder"). There are also decoder variants that track low/code/range vs. just code/range (LZMA's and Subbotin's earliest).

Importantly, range coders using this renormalization method during decoding are unnecessarily complex:

https://github.com/jkbonfield/rans_static/blob/master/arith_static.c#L222

Here's a different Subbotin variant that also uses overly complex and obfuscated renormalization logic during decoding. (Please just stop guys - this is the wrong direction.)

Why is all that messy conditional logic in the decoder renormalization step? It appears to be carry related, but the carries should have been handled during encoding in some way. It's unnecessary to create a working and correct range coder.

Instead do this, from Subbotin's first delayed carry range coder. This is easily vectorizable and he only uses 2 variables (code and range). This is quite similar to LZMA's and LZHAM's, except it doesn't need to propagate carries back in the buffer containing the output bytes.

https://gist.github.com/richgel999/d522e318bf3ad67e019eabc13137350a

Another important note: If you switch to 24-bits vs. 32-bits, you don't need to use integer division. (Note division isn't needed for a binary range coder. It's also not strictly needed for a range coder that only supports small alphabets. The division enables using a fast decoding table for large alphabets.) Instead you can use single precision division and truncate the results. This option is crucial for vectorization on modern CPU's. Modern vectorized single precision division is fairly fast now. Or said in a different way, it's not so slow that it's a show stopper.

In vectorized implementations of both Huffman and range decoding, most of the instructions are actually not related to the actual entropy coding method. Instead they are common things like fetching from the decoder's lookup table (using some sort of manual or explicit gather), and fetching source bytes and distributing them to the lanes during renormalization. The FP division is a small part of the whole process.

What seems to slow down vectorized range decoding is the more complex renormalization vs. Huffman, which in my implementation can fetch between [0,2] bytes. By comparison, my vectorized Huffman decoders fetch either 0 or 2 bytes per step.

All of these details are starting to matter now that rANS coding is falling under patent protection. Range Coding is a ~44 years old technique.

Somewhere Range Coding went off the rails

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112