LodePNG

September 28, 2016, 2:53 pm

≫ Next: libsquish's DXT1 "Cluster Fit" method applied to ETC1

≪ Previous: An interesting ETC1/2 encoding test vector

So far this is a nice looking library, and I've heard it reliably handles 16-bit/component .PNG's :

http://lodev.org/lodepng/

↧

libsquish's DXT1 "Cluster Fit" method applied to ETC1

September 29, 2016, 9:46 pm

≫ Next: ETC1 block flip estimation

≪ Previous: LodePNG

libsquish (a popular DXT encoding library) internally uses a total ordering based method to find high-quality DXT endpoints. This method can also be applied to ETC1 encoding, using the equations in rg_etc1's optimizer's remarks to solve for optimal subblock colors given each possible selector distribution in the total ordering and the current best intensity index and subblock color.

I don't actually compute the total ordering, I instead iterate over all selector distributions present in the total ordering because the actual per-pixel selector values don't matter to the solver. A hash table is also used to prevent the optimizer from evaluating a trial solution more than once.

Single threaded results:

perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim03.png 768x512

--- basislib Quality: 4
basislib time: 5.644
basislib ETC image Error: Max: 70, Mean: 1.952, MSE: 8.220, RMSE: 2.867, PSNR: 38.982, SSIM: 0.964853

--- etc2comp effort: 10
etc2comp time: 75.792851
etc2comp Error: Max: 75, Mean: 1.925, MSE: 8.009, RMSE: 2.830, PSNR: 39.095, SSIM: 0.965339

--- etcpak time: 0.006
etcpak Error: Max: 80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.944697

--- ispc_etc time: 1.021655
ispc_etc1 Error: Max: 75, Mean: 1.965, MSE: 8.280, RMSE: 2.877, PSNR: 38.951, SSIM: 0.963916

After enabling multithreading (40 threads) in those encoders that support it:

J:\dev\basislib1\bin>texexp kodak\kodim03.png
perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim03.png 768x512

--- basislib Quality: 4
basislib pack time: 0.266
basislib ETC image Error: Max: 70, Mean: 1.952, MSE: 8.220, RMSE: 2.867, PSNR: 38.982, SSIM: 0.964853

--- etc2comp effort: 10
etc2comp time: 3.608819
etc2comp Error: Max: 75, Mean: 1.925, MSE: 8.009, RMSE: 2.830, PSNR: 39.095, SSIM: 0.965339

--- etcpak time: 0.006
etcpak Error: Max: 80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.944697

--- ispc_etc time: 1.054324

ispc_etc1 Error: Max: 75, Mean: 1.965, MSE: 8.280, RMSE: 2.877, PSNR: 38.951, SSIM: 0.963916

Intel is doing some kind of amazing SIMD dark magic in there. The ETC1 cluster fit method is around 10-27x faster than rg_etc1 (which uses my previous method, a hybrid of a 3D neighborhood search with iterative base color refinement) and etc2comp (effort 100) in ETC1 mode. RGB Avg. PSNR is usually within ~.1 dB of Intel.

I'm so tempted to update rg_etc1 with this algorithm, if only I had the time.

↧

ETC1 block flip estimation

October 1, 2016, 3:22 pm

≫ Next: More ETC1 cluster fit data

≪ Previous: libsquish's DXT1 "Cluster Fit" method applied to ETC1

ETC1's block includes a "flip" bit, which describes how the subblocks are oriented in each block. When this bit is set the two subblocks are oriented horizontally in a 4x4 pixel block like this:

ABCD

EFGH

IJKL

MNOP

Otherwise they're oriented vertically like this:

AB CD

EF GH

IJ KL

MN OP

(In BC7 terminology, ETC1 supports 1 or 2 subsets per block, with 1 or 2 partitions.)

Anyhow, a high quality encoder typically tries both subblock orientations and chooses the one which results in the lowest overall error. Interestingly, it's possible to predict with a reasonable degree of certainty which subblock orientation is best. The advantage to this method is obvious: higher encoding throughput, with (hopefully) only a small loss in quality.

The method used by etc2comp computes each possible subblock's average color, then it sums the squared "gray distance" from each pixel to each subblock's average color. (To compute gray distance from color A vs. B, it projects A onto the grayscale line going through B, then it computes the squared RGB distance from A to this projected point.) etc2comp then chooses the flip orientation with the lowest overall gray distance to try first.

Success rates on a handful of images, with RGB avg. PSNR and SSIM stats (using accurate vs. estimated flips):

kodim01: 68% (35.904 vs. 35.607, .975236 vs. .973744)
kodim03: 65% (39.949 vs. 38.801, .964774 vs. .963467)
kodim18: 69% (35.917 vs. 35.661, .965767 vs. .964385)
kodim20: 63% (38.849 vs. 38.589, .975558 vs. .974669)

ETC1 block flip bit visualizations on kodim03:

Image: