Updated bc7enc_rdo with improved smooth block handling
The command line tool now detects extremely smooth blocks and encodes them with a significantly higher MSE scale factor. It computes a per-block mask image, filters it, then supplies an array of...
View Articlebc7e.ispc integrated into bc7enc_rdo
bc7e.ispc is a very powerful/fast 8 mode encoder. It supports the entire BC7 format, unlike bc7enc's default encoder. It's 2-3x faster than ispc_texcomp at the same average quality. Now that it's been...
View ArticleFirst-ever RDO ASTC encodings
Here are my first-ever RDO LDR ASTC 4x4 encodings. Perhaps they are the first ever for the ASTC texture format: Non-RDO:5.951 bits/texel, 45.1 dB, 75773 PSNR/bpt RDO:4.286 bpt, 38.9 dB, 90752...
View ArticleFirst RDO LDR ASTC 6x6 encodings
This is 6x6 block size, using the ERT in bc7enc_rdo:Left=Non-RDO, 37.3 dB, 2.933 bits/texel (Deflate)Â Right=RDO lambda=.5, 36.557 dB, 2.399 bptUsing more aggressive ERT settings, but the same lambda:Â
View ArticleAverage rate-distortion curves for bc7enc_rdo
bc7enc_rdo is now a library that's utilized by the command line tool, which is far simpler now. This makes it trivial to call multiple times to generate large .CSV files.If you can only choose one set...
View ArticleLena is Retired
As an open source author, I will not assist or waste time implementing support for any new image/video/GPU texture file format that is not fuzzed, or if it uses the "test" image "lena" (or "lenna") for...
View ArticleLZ_XOR
If LZ compression is compilation, what other instructions are useful to add? I've spent the last several years, off and on, trying to figure this out.LZSS has LIT [byte] and COPY [length, distance]....
View ArticleOther LZ_XOR variants
Other LZ instruction set variants that can utilize XOR:1. ROLZ_XOR: Reduced-Offset LZÂ (or here) that uses XOR partial matches. Simplifies the search, but the decompressor has to keep the offsets table...
View ArticleLZ_XOR on enwik8
First results of LZ_XOR on enwik8 (a common 100MB test file of Wikipedia data, higher ratio is smaller file):LZ4:Â Â 58.09% 2.369 GiB/sec decompressZstd:Â Â 69.03%Â .639 GiB/secLZ_XOR: 63.08% 1.204...
View ArticleOne nice property of highly constrained length limited prefix codes
I realized earlier that I don't need AVX2 for really fast Huffman (really length limited prefix) decoding. Using a 6-bit max Huffman code size, a 12-bit decode table (16KB - which fits in the cache) is...
View ArticleLZ_XOR on canterbury corpus
LZ_XOR 128KB dictionary, AVX2, BMI1, mid-level CPU parsing, Ice Lake CPU (Core i7 1065G7 @ 1.3GHz, Dell Laptop).Only the XOR bytes are entropy coded, otherwise everything else (the control stream, the...
View ArticleFast AVX2 PNG writer
281-346 megapixels/sec. when using fixed filters: https://github.com/veluca93/fpnge
View ArticleLagrangian RDO PNG
Turns out PNG is very amendable to RDO optimization approaches, but few have really tried.This is something I've been wanting to try for a while. This experiment only injects 3 pixel matches into the...
View ArticleEA/Microsoft Neural Network GPU Texture Compression Patents
Both Microsoft and EA have patented various ways of bolting on neural networks to GPU (ASTC/BC7/BC6h) texture compressors, in order to accelerate determining the compression params (mode, partition...
View ArticleThe Dark Horse of the Image Codec World: Near-Lossless Image Formats Using...
I think simple ultra-high speed lossy (or near-lossless) image codecs, built from the new generation of fast LZ codecs, are going to become more relevant in the future.Computing bottlenecks change over...
View ArticleFaster LZ is not the answer to 150-250+ GB video game downloads
When the JPEG folks were working on image compression, they didn't create a better or faster LZ. Instead they developed new approaches. I see games growing >150GB and then graphs like this, and it's...
View ArticleVectorized interleaved Range Coding using SSE 4.1
In order to avoid the current (and upcoming) ANS/rANS entropy coding patent minefield, we're avoiding it and using vectorized Range Coding instead. Here's a 24-bit SSE 4.1 example using 16 interleaved...
View ArticleComparing Vectorized Huffman and Range Decoding vs. rANS (Or: rANS entropy...
The point here is to show that both Huffman and range decoding are all vectorizable and competitive vs. rANS in software.What I care about is fast entropy decoding. Even scalar encoding of any of these...
View ArticleSomewhere Range Coding went off the rails
I've lost track of the number of Range Coders I've seen floating around for the past ~25 years. Subbotin's coders released in the late 90's/early 2000's seem to be very influential. Each implementation...
View ArticleLZ_XOR/LZ_ADD progress
I'm tired of all the endless LZ clones, so I'm trying something different.I now have two prototype LZ_XOR/ADD lossless codecs. In this design a new fundamental instruction is added to the usual LZ...
View Article