I finally sat down and added a simple LZ4-like simulator to my in-progress soon to be open source RDO BC7 encoder. You can add blocks in, query it to find the longest/nearest match, and give it some bytes and ask it how many bits it would take to code (up to 128 for BC7, less if it finds matches). It's definitely the right path forward for RDO encoders. It looks like, for BC7 modes 1 and 6, that it's accurate vs. Deflate within around 1.5-7%. It predicts on the high side vs. Deflate, because it doesn't have a Huffman model. Mode 1's predictions tend to be more accurate, I think because this mode has encoded endpoints nicely aligned on byte boundaries.
With BC7 RDO encoding, you really need an LZ simulator of some sort. Or you need decent approximations. Once you can simulate how many bits a block compresses to, you can then have the encoder try replacing byte aligned sequences within each block (with sequences that appear in previous blocks). This is the key magic that makes this method work so well. You need to "talk" to the LZ compressor in the primary language it understands: 2+ or 3+ length byte matches.
For example, with mode 6, the selectors are 4-bits per texel, and are aligned at the end of the block. So each byte has 2 texels. If your p-bits are always [0,1] (mine are in RDO mode), then it's easy to substitute various regions of bytes from previously encoded mode 6 blocks, and see what LZ does.
In one experiment, around 40% of the blocks that got selector byte substitutions from previous blocks are from plugging in 3 or 4 byte matches and evaluating the Lagrangian.
40% is ridiculously high - which means this technique works well. It'll work with BC1 too. The downside (as usual) is encoding performance.
- RDO BC7 mode 1+6, lambda 10.0, 8KB max search distance, match replacements taken from up to 2 previous blocks
- RDO BC7 mode 1+6, lambda 12.0, 8KB max search distance, match replacements taken from up to 2 previous blocks
- Non-RDO mode 1+6 (bc7enc level 4)
With BC7 RDO encoding, you really need an LZ simulator of some sort. Or you need decent approximations. Once you can simulate how many bits a block compresses to, you can then have the encoder try replacing byte aligned sequences within each block (with sequences that appear in previous blocks). This is the key magic that makes this method work so well. You need to "talk" to the LZ compressor in the primary language it understands: 2+ or 3+ length byte matches.
For example, with mode 6, the selectors are 4-bits per texel, and are aligned at the end of the block. So each byte has 2 texels. If your p-bits are always [0,1] (mine are in RDO mode), then it's easy to substitute various regions of bytes from previously encoded mode 6 blocks, and see what LZ does.
This is pretty awesome because it allows the encoder to escape from being forced to always using an entire previous block's selectors, greatly reducing block artifacts.
In one experiment, around 40% of the blocks that got selector byte substitutions from previous blocks are from plugging in 3 or 4 byte matches and evaluating the Lagrangian.
40% is ridiculously high - which means this technique works well. It'll work with BC1 too. The downside (as usual) is encoding performance.
I've implemented byte replacement trials for 3-8 byte matches. All are heavily used, especially 7 and 8 byte matches. I may try other combinations, like trying two 3 byte matches with 2 literals, etc. You can also do byte replacement in two passes, by trying 3 or 4 byte sequences from 2 previously encoded blocks.
Making this go fast will be a perf. optimization challenge. I'm convinced that you need to do something like this otherwise you're always stuck replacing entire block's worth of selectors, which can be way uglier.
Example encodings (non-RDO modes 1+6 is 42.253 dB, 6.84 bits/texel):
- RDO BC7 mode 1+6, lambda .1, 8KB max search distance, match replacements taken from up to 2 previous blocks
41.765 RGB dB, 6.13 bits/texel (Deflate - miniz library max compression)
- RDO BC7 mode 1+6, lambda .25, 8KB max search distance, match replacements taken from up to 2 previous blocks
41.496 RGB dB, 5.78 bits/texel (Deflate - miniz library max compression)
- RDO BC7 mode 1+6, lambda .5, 8KB max search distance, match replacements taken from up to 2 previous blocks
40.830 RGB dB, 5.36 bits/texel (Deflate - miniz library max compression)
- RDO BC7 mode 1+6, lambda 1.0, 4KB max search distance
39.507 RGB dB, 4.97 bits/texel (Deflate - miniz library max compression)
Mode 6 byte replacement histogram (lengths of matches, in bytes):
14752 0 5000 3688 3833 3975 4632 0 0 0 0 0 0 0 0 0
8 3 4 5 6 7
- RDO BC7 mode 1+6, lambda 3.0, 2KB max search distance
36.161 dB, 4.59 bits/texel
- RDO BC7 mode 1+6, lambda 4.0, 2KB max search distance
35.035 dB, 4.47 bits/texel
- RDO BC7 mode 1+6, lambda 5.0, 4KB max search distance, match replacements taken from up to 2 previous blocks
33.760 dB, 3.96 bits/texel
- RDO BC7 mode 1+6, lambda 8.0, 4KB max search distance, match replacements taken from up to 2 previous blocks
32.072 dB, 3.47 bits/texel
-RDO BC7 mode 1+6, lambda 10.0, 4KB max search distance, match replacements taken from up to 2 previous blocks
31.318 dB, 3.32 bits/texel
- RDO BC7 mode 1+6, lambda 10.0, 8KB max search distance, match replacements taken from up to 2 previous blocks
31.279 dB, 3.21 bits/texel
- RDO BC7 mode 1+6, lambda 12.0, 8KB max search distance, match replacements taken from up to 2 previous blocks
30.675 db, 3.07 bits/texel
- RDO BC7 mode 1+6, lambda 20.0, 8KB max search distance, match replacements taken from up to 2 previous blocks
29.179 dB, 2.68 bits/texel
42.253 dB, 6.84 bits/texel: