Quantcast
Channel: Richard Geldreich's Blog
Viewing all 302 articles
Browse latest View live

LodePNG


libsquish's DXT1 "Cluster Fit" method applied to ETC1

$
0
0
libsquish (a popular DXT encoding library) internally uses a total ordering based method to find high-quality DXT endpoints. This method can also be applied to ETC1 encoding, using the equations in rg_etc1's optimizer's remarks to solve for optimal subblock colors given each possible selector distribution in the total ordering and the current best intensity index and subblock color.

I don't actually compute the total ordering, I instead iterate over all selector distributions present in the total ordering because the actual per-pixel selector values don't matter to the solver. A hash table is also used to prevent the optimizer from evaluating a trial solution more than once.

Single threaded results:

perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim03.png 768x512

--- basislib Quality: 4
basislib time: 5.644
basislib ETC image Error: Max:  70, Mean: 1.952, MSE: 8.220, RMSE: 2.867, PSNR: 38.982, SSIM: 0.964853

--- etc2comp effort: 10
etc2comp time: 75.792851
etc2comp Error: Max:  75, Mean: 1.925, MSE: 8.009, RMSE: 2.830, PSNR: 39.095, SSIM: 0.965339

--- etcpak time: 0.006
etcpak Error: Max:  80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.944697

--- ispc_etc time: 1.021655
ispc_etc1 Error: Max:  75, Mean: 1.965, MSE: 8.280, RMSE: 2.877, PSNR: 38.951, SSIM: 0.963916



After enabling multithreading (40 threads) in those encoders that support it:

J:\dev\basislib1\bin>texexp kodak\kodim03.png
perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim03.png 768x512

--- basislib Quality: 4
basislib pack time: 0.266
basislib ETC image Error: Max:  70, Mean: 1.952, MSE: 8.220, RMSE: 2.867, PSNR: 38.982, SSIM: 0.964853

--- etc2comp effort: 10
etc2comp time: 3.608819
etc2comp Error: Max:  75, Mean: 1.925, MSE: 8.009, RMSE: 2.830, PSNR: 39.095, SSIM: 0.965339

--- etcpak time: 0.006
etcpak Error: Max:  80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.944697

--- ispc_etc time: 1.054324

ispc_etc1 Error: Max:  75, Mean: 1.965, MSE: 8.280, RMSE: 2.877, PSNR: 38.951, SSIM: 0.963916

Intel is doing some kind of amazing SIMD dark magic in there. The ETC1 cluster fit method is around 10-27x faster than rg_etc1 (which uses my previous method, a hybrid of a 3D neighborhood search with iterative base color refinement) and etc2comp (effort 100) in ETC1 mode. RGB Avg. PSNR is usually within ~.1 dB of Intel.

I'm so tempted to update rg_etc1 with this algorithm, if only I had the time.

ETC1 block flip estimation

$
0
0
ETC1's block includes a "flip" bit, which describes how the subblocks are oriented in each block. When this bit is set the two subblocks are oriented horizontally in a 4x4 pixel block like this:


ABCD
EFGH

IJKL
MNOP

Otherwise they're oriented vertically like this:

AB CD
EF GH
IJ KL
MN OP

(In BC7 terminology, ETC1 supports 1 or 2 subsets per block, with 1 or 2 partitions.)

Anyhow, a high quality encoder typically tries both subblock orientations and chooses the one which results in the lowest overall error. Interestingly, it's possible to predict with a reasonable degree of certainty which subblock orientation is best. The advantage to this method is obvious: higher encoding throughput, with (hopefully) only a small loss in quality.

The method used by etc2comp computes each possible subblock's average color, then it sums the squared "gray distance" from each pixel to each subblock's average color. (To compute gray distance from color A vs. B, it projects A onto the grayscale line going through B, then it computes the squared RGB distance from A to this projected point.) etc2comp then chooses the flip orientation with the lowest overall gray distance to try first.

Success rates on a handful of images, with RGB avg. PSNR and SSIM stats (using accurate vs. estimated flips):

kodim01: 68% (35.904 vs. 35.607, .975236 vs. .973744)
kodim03: 65% (39.949 vs. 38.801, .964774 vs. .963467)
kodim18: 69% (35.917 vs. 35.661, .965767 vs. .964385)
kodim20: 63% (38.849 vs. 38.589, .975558 vs. .974669)

ETC1 block flip bit visualizations on kodim03:

Image:



basislib best flips:

Flip estimates, using etc2comp's gray line estimate algorithm:


etc2comp's flips (in ETC1 mode):


etcpak's flips (in ETC1 mode):


More ETC1 cluster fit data

$
0
0
Ignore the SSIM stats, this is a corpus test image with random 4x4 blocks chosen from many other images.

This was on a 20 core Xeon workstation, with multithreading enabled in basislib and etc2comp (40 threads each). The API I'm using in etcpak is not threaded (it's already sorta ridiculously fast), and Intel is not multithreaded either. (As far as I can tell from breakpointing inside it.) etcpak and Intel make heavy use of SIMD operations, while etc2comp and basislib do not.

Cluster fit (new algorithm):

J:\dev\basislib1\bin>texexp test_0.tga
perceptual: 0 etc2: 0 rec709: 1
Source filename: test_0.tga 4096x4096
--- basislib Quality: 4
basislib pack time: 4.211
basislib ETC image Error: Max: 255, Mean: 3.277, MSE: 40.847, RMSE: 6.391, PSNR: 32.019, SSIM: 0.999618

--- etc2comp effort: 100
etc2comp time: 154.562167
etc2comp Error: Max: 255, Mean: 3.286, MSE: 42.777, RMSE: 6.540, PSNR: 31.819, SSIM: 0.999612

--- etcpak time: 0.258
etcpak Error: Max: 255, Mean: 4.282, MSE: 69.927, RMSE: 8.362, PSNR: 29.684, SSIM: 0.998407

--- ispc_etc time: 43.694160
ispc_etc1 Error: Max: 255, Mean: 3.249, MSE: 39.912, RMSE: 6.318, PSNR: 32.120, SSIM: 0.999655

Previous algorithm (rg_etc1's lattice scan+refinement):

J:\dev\basislib1\bin>texexp test_0.tga
perceptual: 0 etc2: 0 rec709: 1
Source filename: test_0.tga 4096x4096

--- basislib Quality: 4
basislib pack time: 240.638
basislib ETC image Error: Max: 255, Mean: 3.203, MSE: 39.336, RMSE: 6.272, PSNR: 32.183, SSIM: 0.999680

--- etc2comp effort: 100
etc2comp time: 151.739932
etc2comp Error: Max: 255, Mean: 3.286, MSE: 42.777, RMSE: 6.540, PSNR: 31.819, SSIM: 0.999612

--- etcpak time: 0.243
etcpak Error: Max: 255, Mean: 4.282, MSE: 69.927, RMSE: 8.362, PSNR: 29.684, SSIM: 0.998407

--- ispc_etc time: 46.807596
ispc_etc1 Error: Max: 255, Mean: 3.249, MSE: 39.912, RMSE: 6.318, PSNR: 32.120, SSIM: 0.999655

ETC1 encoder performance on kodim18 at various quality/effort levels

$
0
0
I'm trying to get a handle on how the available ETC1 compressors perform, using their public API's, at their various quality or effort levels. This is only for a single image (kodim18 - my usual for quick tests like this).

First, here's the performance of etc2comp on kodim18, in ETC1-only mode, multithreading enabled, RGB Avg PSNR metrics, on effort levels [0,100] in increments of 5:


Efforts between roughly 40-65 seem to be the sweet spot. Effort=100 is obviously wasteful.

Here's another graph (can you tell I'm practicing my Excel graphing skills!), this time comparing the time and quality of various ETC1 (and now ETC2 - for etc2comp) compressors at different encoder quality/effort settings:



Important Notes:
  • To be fair to Intel's ETC1 encoder (function CompressBlocksETC1()), which is not multithreaded, I added another bar labeled "ispc_etc1 MT" which has the total CPU time divided by 20 (to roughly match the speedup I'm seeing using 40 threads in the other natively multithreaded encoders). 
  • basislib is now using a variant of cluster fit (see previous posts). basislib_1 is lowest quality, basislib_3 is highest. Notice that basislib_3 is only ~2X slower than Intel's SIMD code, but basislib doesn't use any SIMD at all.
  • basislib and etc2comp both use 40 threads
  • etcpak's timings are currently single threaded, because I'm still using a single threaded entrypoint inside the code (BlockData::Process()). It's on my TODO to fix this. IMHO unless you need a real-time ETC1 encoder I think it trades off too much quality. However, if you need a real-time encoder and don't mind the loss in quality it's your best bet. If the author added a few optional SIMD-optimized cluster fit trials in there it would probably kick ass.
To get an idea how efficient etc2comp currently is at scanning the ETC1 search space for kodim18, let's see what effort level (and how much CPU time) etc2comp takes to approximately match two other encoder's quality levels:
  • basislib Cluster Fit (64 trials out of 165): .115 secs 35.917 dB
Quality matched or exceeded first at etc2comp effort 70: 1.095 secs 35.953 dB
  • Intel ISPC: 1.03 secs 35.969 dB
Quality matched or exceeded first at etc2comp effort 80: 1.83 secs 35.992 dB

perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim18.png 512x768
--- basislib Quality: 4
basislib pack time: 0.115
basislib ETC image Error: Max:  56, Mean: 2.865, MSE: 16.648, RMSE: 4.080, PSNR: 35.917, SSIM: 0.965767

--- etc2comp effort: 0
etc2comp time: 0.051663
etc2comp Error: Max:  64, Mean: 3.186, MSE: 21.123, RMSE: 4.596, PSNR: 34.883, SSIM: 0.958817
--- etc2comp effort: 5
etc2comp time: 0.083509
etc2comp Error: Max:  64, Mean: 3.129, MSE: 19.658, RMSE: 4.434, PSNR: 35.196, SSIM: 0.959313
--- etc2comp effort: 10
etc2comp time: 0.106361
etc2comp Error: Max:  64, Mean: 3.092, MSE: 19.052, RMSE: 4.365, PSNR: 35.331, SSIM: 0.959794
--- etc2comp effort: 15
etc2comp time: 0.133278
etc2comp Error: Max:  64, Mean: 3.063, MSE: 18.661, RMSE: 4.320, PSNR: 35.421, SSIM: 0.960250
--- etc2comp effort: 20
etc2comp time: 0.193460
etc2comp Error: Max:  64, Mean: 3.042, MSE: 18.416, RMSE: 4.291, PSNR: 35.479, SSIM: 0.960595
--- etc2comp effort: 25
etc2comp time: 0.162790
etc2comp Error: Max:  64, Mean: 3.027, MSE: 18.256, RMSE: 4.273, PSNR: 35.517, SSIM: 0.960869
--- etc2comp effort: 30
etc2comp time: 0.182370
etc2comp Error: Max:  64, Mean: 3.012, MSE: 18.108, RMSE: 4.255, PSNR: 35.552, SSIM: 0.961207
--- etc2comp effort: 35
etc2comp time: 0.196609
etc2comp Error: Max:  64, Mean: 2.998, MSE: 17.980, RMSE: 4.240, PSNR: 35.583, SSIM: 0.961578
--- etc2comp effort: 40
etc2comp time: 0.217227
etc2comp Error: Max:  64, Mean: 2.987, MSE: 17.888, RMSE: 4.229, PSNR: 35.605, SSIM: 0.961854
--- etc2comp effort: 45
etc2comp time: 0.248881
etc2comp Error: Max:  64, Mean: 2.970, MSE: 17.771, RMSE: 4.216, PSNR: 35.634, SSIM: 0.962461
--- etc2comp effort: 50
etc2comp time: 0.361306
etc2comp Error: Max:  59, Mean: 2.916, MSE: 17.175, RMSE: 4.144, PSNR: 35.782, SSIM: 0.963669
--- etc2comp effort: 55
etc2comp time: 0.379762
etc2comp Error: Max:  59, Mean: 2.902, MSE: 17.091, RMSE: 4.134, PSNR: 35.803, SSIM: 0.964149
--- etc2comp effort: 60
etc2comp time: 0.522357
etc2comp Error: Max:  59, Mean: 2.882, MSE: 16.840, RMSE: 4.104, PSNR: 35.867, SSIM: 0.964800
--- etc2comp effort: 65
etc2comp time: 0.560707
etc2comp Error: Max:  59, Mean: 2.878, MSE: 16.818, RMSE: 4.101, PSNR: 35.873, SSIM: 0.964974
--- etc2comp effort: 70
etc2comp time: 1.095014
etc2comp Error: Max:  59, Mean: 2.857, MSE: 16.512, RMSE: 4.063, PSNR: 35.953, SSIM: 0.965366
--- etc2comp effort: 75
etc2comp time: 1.166479
etc2comp Error: Max:  59, Mean: 2.852, MSE: 16.490, RMSE: 4.061, PSNR: 35.959, SSIM: 0.965534
--- etc2comp effort: 80
etc2comp time: 1.829960
etc2comp Error: Max:  59, Mean: 2.842, MSE: 16.362, RMSE: 4.045, PSNR: 35.992, SSIM: 0.965769
--- etc2comp effort: 85
etc2comp time: 1.904691
etc2comp Error: Max:  59, Mean: 2.836, MSE: 16.329, RMSE: 4.041, PSNR: 36.001, SSIM: 0.966037
--- etc2comp effort: 90
etc2comp time: 2.709250
etc2comp Error: Max:  59, Mean: 2.829, MSE: 16.255, RMSE: 4.032, PSNR: 36.021, SSIM: 0.966277
--- etc2comp effort: 95
etc2comp time: 2.802099
etc2comp Error: Max:  59, Mean: 2.827, MSE: 16.251, RMSE: 4.031, PSNR: 36.022, SSIM: 0.966315
--- etc2comp effort: 100
etc2comp time: 3.619217
etc2comp Error: Max:  59, Mean: 2.825, MSE: 16.216, RMSE: 4.027, PSNR: 36.031, SSIM: 0.966349

--- etcpak time: 0.006
etcpak Error: Max:  90, Mean: 3.464, MSE: 25.640, RMSE: 5.064, PSNR: 34.042, SSIM: 0.950396

--- ispc_etc time: 1.033881
ispc_etc1 Error: Max:  56, Mean: 2.866, MSE: 16.450, RMSE: 4.056, PSNR: 35.969, SSIM: 0.965412


ETC2 enabled:

perceptual: 0 etc2: 1 rec709: 1
Source filename: kodak\kodim18.png 512x768
--- basislib Quality: 1
basislib pack time: 0.036
basislib ETC image Error: Max:  73, Mean: 3.168, MSE: 20.910, RMSE: 4.573, PSNR: 34.927, SSIM: 0.959441
--- basislib Quality: 2
basislib pack time: 0.054
basislib ETC image Error: Max:  56, Mean: 2.945, MSE: 17.772, RMSE: 4.216, PSNR: 35.634, SSIM: 0.964359
--- basislib Quality: 3
basislib pack time: 0.107
basislib ETC image Error: Max:  56, Mean: 2.865, MSE: 16.648, RMSE: 4.080, PSNR: 35.917, SSIM: 0.965767
--- etc2comp effort: 0
etc2comp time: 0.086306
etc2comp Error: Max:  64, Mean: 3.158, MSE: 20.810, RMSE: 4.562, PSNR: 34.948, SSIM: 0.959069
--- etc2comp effort: 5
etc2comp time: 0.179816
etc2comp Error: Max:  59, Mean: 3.087, MSE: 19.044, RMSE: 4.364, PSNR: 35.333, SSIM: 0.959701
--- etc2comp effort: 10
etc2comp time: 0.247190
etc2comp Error: Max:  59, Mean: 3.046, MSE: 18.396, RMSE: 4.289, PSNR: 35.484, SSIM: 0.960256
--- etc2comp effort: 15
etc2comp time: 0.287620
etc2comp Error: Max:  59, Mean: 3.013, MSE: 17.977, RMSE: 4.240, PSNR: 35.584, SSIM: 0.960751
--- etc2comp effort: 20
etc2comp time: 0.339065
etc2comp Error: Max:  59, Mean: 2.990, MSE: 17.709, RMSE: 4.208, PSNR: 35.649, SSIM: 0.961162
--- etc2comp effort: 25
etc2comp time: 0.383907
etc2comp Error: Max:  59, Mean: 2.971, MSE: 17.515, RMSE: 4.185, PSNR: 35.697, SSIM: 0.961507
--- etc2comp effort: 30
etc2comp time: 0.432019
etc2comp Error: Max:  59, Mean: 2.954, MSE: 17.352, RMSE: 4.166, PSNR: 35.737, SSIM: 0.961876
--- etc2comp effort: 35
etc2comp time: 0.480186
etc2comp Error: Max:  59, Mean: 2.938, MSE: 17.210, RMSE: 4.149, PSNR: 35.773, SSIM: 0.962279
--- etc2comp effort: 40
etc2comp time: 0.516155
etc2comp Error: Max:  59, Mean: 2.925, MSE: 17.107, RMSE: 4.136, PSNR: 35.799, SSIM: 0.962590
--- etc2comp effort: 45
etc2comp time: 0.565827
etc2comp Error: Max:  59, Mean: 2.911, MSE: 17.009, RMSE: 4.124, PSNR: 35.824, SSIM: 0.963044
--- etc2comp effort: 50
etc2comp time: 1.124057
etc2comp Error: Max:  59, Mean: 2.892, MSE: 16.852, RMSE: 4.105, PSNR: 35.864, SSIM: 0.963703
--- etc2comp effort: 55
etc2comp time: 1.192462
etc2comp Error: Max:  59, Mean: 2.880, MSE: 16.772, RMSE: 4.095, PSNR: 35.885, SSIM: 0.964164
--- etc2comp effort: 60
etc2comp time: 1.713074
etc2comp Error: Max:  59, Mean: 2.851, MSE: 16.424, RMSE: 4.053, PSNR: 35.976, SSIM: 0.964913
--- etc2comp effort: 65
etc2comp time: 1.828673
etc2comp Error: Max:  59, Mean: 2.846, MSE: 16.398, RMSE: 4.049, PSNR: 35.983, SSIM: 0.965099
--- etc2comp effort: 70
etc2comp time: 2.461853
etc2comp Error: Max:  59, Mean: 2.836, MSE: 16.274, RMSE: 4.034, PSNR: 36.016, SSIM: 0.965358
--- etc2comp effort: 75
etc2comp time: 2.608303
etc2comp Error: Max:  59, Mean: 2.831, MSE: 16.247, RMSE: 4.031, PSNR: 36.023, SSIM: 0.965534
--- etc2comp effort: 80
etc2comp time: 3.383624
etc2comp Error: Max:  59, Mean: 2.820, MSE: 16.156, RMSE: 4.019, PSNR: 36.047, SSIM: 0.965855
--- etc2comp effort: 85
etc2comp time: 3.719689
etc2comp Error: Max:  59, Mean: 2.814, MSE: 16.125, RMSE: 4.016, PSNR: 36.056, SSIM: 0.966079
--- etc2comp effort: 90
etc2comp time: 4.675509
etc2comp Error: Max:  59, Mean: 2.808, MSE: 16.072, RMSE: 4.009, PSNR: 36.070, SSIM: 0.966264
--- etc2comp effort: 95
etc2comp time: 4.619700
etc2comp Error: Max:  59, Mean: 2.806, MSE: 16.068, RMSE: 4.008, PSNR: 36.071, SSIM: 0.966293
--- etc2comp effort: 100
etc2comp time: 4.771136
etc2comp Error: Max:  59, Mean: 2.805, MSE: 16.064, RMSE: 4.008, PSNR: 36.072, SSIM: 0.966309
--- etcpak time: 0.045
etcpak Error: Max:  90, Mean: 3.458, MSE: 25.593, RMSE: 5.059, PSNR: 34.050, SSIM: 0.950551
ispc_etc time: 1.034064
ispc_etc1 Error: Max:  56, Mean: 2.866, MSE: 16.450, RMSE: 4.056, PSNR: 35.969, SSIM: 0.965412

Visualization of random ETC2 Planar Mode blocks

$
0
0
For fun I've been poking around at the planar mode in ETC2. From this presentation:


Okay, they are intended for use on smoothly varying blocks. I'm intrigued by planar mode because the colors are stored at high precision (676) and there are no selectors like in the other modes. But what do planar blocks really look like though? This random planar block image, sorted by standard deviation, was pixel (box filter) upsampled by 400%:


Just the green channel:


This image was computed by poking random 8-bit bytes into an ETC2 block. If the block passed the planar ETC2 mode check it gets decoded and stored in a temporary image. After the temp image was full it gets sorted by standard deviation.

Hey - most of these random planar blocks are not actually smoothly varying! I'm unsure if this is actually useful in practice, but it's interesting.

Looking at how Planar blocks are actually unpacked, the H and V vectors are interpreted relative to the "origin" color, so they're actually signed and the unpacking code uses per-component [0,255] clamping. This is where the high frequency patterns come from.

// ro, go, bo - origin color, unpacked from 676
// rv, gv, bv and rh, gh, bh - vertical and horizontal colors, unpacked from 676
// unpack planar block's pixels - there are no selectors in this mode
for (int y = 0; y < 4; y++)
{
    for (int x = 0; x < 4; x++)
    {
        pDst->set(
            (4 * ro + x * (rh - ro) + y * (rv - ro) + 2) >> 2,
            (4 * go + x * (gh - go) + y * (gv - go) + 2) >> 2,
            (4 * bo + x * (bh - bo) + y * (bv - bo) + 2) >> 2,
            255);
        pDst++;
    }
}

ETC2 texture compression using exclusively planar blocks

$
0
0
The usual explanation given for planar blocks is that they are intended for smoothly varying blocks (see my previous post on planar blocks). So it seems natural to model the block's pixels as three planes (separately R, G, and B) when trying to create a trial planar encoding.

etc2comp tries fitting several lines along the edges of the block to compute a trial plane definition, then it "twiddles" the quantized coordinates to minimize the quantization error. From what I've been told, in practice planar blocks aren't actually used that much in ETC2 compression, which seems sad to me.

I realized while looking at the planar mode's unpack function that the "offset" or "origin" component is equivalent to the DC component in the DCT. And, the H and V components are equivalent to two of the lowest frequency basis vectors (not counting DC) in the 4x4 DCT (circled in red):


So planar blocks actually support three basis vectors (including the DC component, in the upper left). So, why not try encoding each block in a test image as a ETC2-style planar block and see what the results looks like? In other words, look at using planar blocks from the perspective of transform coding.

In this experiment, I find the average block color, set the planar "O" color to that, then subtract that out from the block. I then dot (or inner product) the left-over pixel values against the H basis vector, then the V basis vector. I encode the resulting vectors into the ETC2 planar block H and V colors. I then use the same code to unpack this as I use to decode actual ETC2 planar block data. (Note because this is only a little fun experiment on Sunday night, I'm using 888 colors for O, H, and V, not 676, but I think the results should hold up in 676 with careful quantization/twiddling.)

The results are interesting. Remember, only planar blocks are used:

DC+H+V:



DC+H+V:


DC only:



DC+H+V:


DC only:

Rest are DC+H+V:

















ETC2 planar block only output created with etcpak

$
0
0
Bartosz Taudul (etcpak author) sent these ETC2 planar block only encodings in a reply to my previous post. For planar-only they look amazing!

Note: I've verified these images myself by hacking etcpak's ProcessRGB_ETC2() function to immediately "return result.first" after it calls Planar( src ); It returns all planar blocks in this case. I've verified this by generating a histogram of the used ETC1/2 modes in all the encoded blocks.

Hey GPU texture format engineers: Come on, give us more basis functions to play with! I'm starting to look more deeply at ETC2 encoded textures, and a surprising amount of blocks in some textures are using planar mode vs. the other ETC2 modes.











He also says that etcpak uses planar blocks quite often (blue indicates a planar block):












ETC1 optimization notes

$
0
0
I've been optimizing this function:

std::pair<etc1_bits, error> = ETC1Encode(pixels, options).

Which actually gives me a really fast way of accurately computing this:

error = ETC1Distance(pixelsA, pixelsB, options).

I'm seriously considering a SIMD implementation next. I wrote one for DXT1 just for fun last week.

I need this distance function to be fast in order to justify another series of bottom->up clusterization experiments, and on improving the clusterization process itself. 

ETC1 block color clusterization progress

$
0
0
I've got block color ("endpoint") clusterization working pretty well with the full ETC1 format. (Not just a subset, like in last month's endpoint clusterization experiment.)

Here are some quick examples, using only 256 unique block color/intensity table values for each image, and RGB avg. error metrics. There are actually two tables for each image, one for differential and another for individual mode, each built from the same 256 clusters. The tables are closely related, so it's possible to store the block colors in 555 format and use them as predictors to delta code the 444 block colors.

This is the first (and trickiest) major step to full ETC1 CRN/RDO support in Basis (the successor to crunch). In practice I think 256 unique endpoints is too few, but I'm purposely limited the # of clusters to get a feel for how well the current algorithm works.











 kodim18 at 256 endpoint clusters, with tile and differential bit visualizations:







RDO ETC1 texture compression prototype

$
0
0
I've now got a basic ETC1 RDO compressor working. Clusterization is now used on both the block colors/intensity table indices and selectors. This compressor supports the entire ETC1 format: 2 subblocks per block, flipping, and both differential and individual block color modes.

Here's kodim14 using only 256 unique selector vectors and 256 block colors/intensity table indices:


This is just a bare minimum prototype. It doesn't support crunch-style macroblock tiling, or required things like mipmaps, texture arrays, etc. It's a proof of principle prototype that crunch-style RDO compression is totally doable in the full ETC1 format.

Here are more examples. I have PSNR and SSIM stats, which I'm going to focus on next.

16 block color, 16 selector clusters:

 32, 32:


64, 64:

128, 128:

512, 512:

4096:

512 block color, 3072 selector clusters:


RDO ETC1 texture compression tool output

$
0
0
Here's what my current experimental compression tool outputs to stdout while compressing a single image. I've begun to experiment with different perceptual metrics, such as PSNR-HVS and PSNR-HVSM. (I'm somewhat leery of SSIM/MS-SSIM for this problem domain, but I still compute it.)

texexp -out kodim23.ktx -e 2048 -s 8192 -adaptive -file kodak\kodim23.png

Source filename: kodak\kodim23.png 768x512
Force ETC1S: 0 NumEndpointClusters: 2048 NumSelectorClusters: 8192 Adaptive: 1
Num failed 555 packing: 565 out of 24576 blocks, 692 out of 2048 clusters

clustered  RGB: Error: Max: 109, Mean: 2.925, MSE: 20.664, RMSE: 4.546, PSNR: 34.979, SSIM: 0.928872
clustered    R: Error: Max:  63, Mean: 3.007, MSE: 20.729, RMSE: 4.553, PSNR: 34.965, SSIM: 0.929768
clustered    G: Error: Max:  52, Mean: 2.049, MSE: 9.793, RMSE: 3.129, PSNR: 38.222, SSIM: 0.960504
clustered    B: Error: Max: 109, Mean: 3.719, MSE: 31.472, RMSE: 5.610, PSNR: 33.152, SSIM: 0.896344
clustered    Y: Error: Max:  53, Mean: 1.661, MSE: 6.830, RMSE: 2.613, PSNR: 39.787, SSIM: 0.971262

best_etc1    RGB: Error: Max:  61, Mean: 2.235, MSE: 11.225, RMSE: 3.350, PSNR: 37.629, SSIM: 0.946851
best_etc1      R: Error: Max:  50, Mean: 2.245, MSE: 10.829, RMSE: 3.291, PSNR: 37.785, SSIM: 0.952394
best_etc1      G: Error: Max:  35, Mean: 1.529, MSE: 5.260, RMSE: 2.294, PSNR: 40.921, SSIM: 0.973678
best_etc1      B: Error: Max:  61, Mean: 2.931, MSE: 17.587, RMSE: 4.194, PSNR: 35.679, SSIM: 0.914481
best_etc1      Y: Error: Max:  33, Mean: 1.231, MSE: 3.618, RMSE: 1.902, PSNR: 42.547, SSIM: 0.981130

etcpak_etc1  RGB: Error: Max: 113, Mean: 2.494, MSE: 14.325, RMSE: 3.785, PSNR: 36.570, SSIM: 0.940693
etcpak_etc1    R: Error: Max:  60, Mean: 2.637, MSE: 14.934, RMSE: 3.864, PSNR: 36.389, SSIM: 0.938706
etcpak_etc1    G: Error: Max:  65, Mean: 1.875, MSE: 8.011, RMSE: 2.830, PSNR: 39.094, SSIM: 0.964159
etcpak_etc1    B: Error: Max: 113, Mean: 2.970, MSE: 20.031, RMSE: 4.476, PSNR: 35.114, SSIM: 0.919215
etcpak_etc1    Y: Error: Max:  64, Mean: 1.497, MSE: 5.630, RMSE: 2.373, PSNR: 40.625, SSIM: 0.975777

clustered_s  RGB: Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
clustered_s    R: Error: Max:  63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
clustered_s    G: Error: Max:  52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
clustered_s    B: Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388
clustered_s    Y: Error: Max:  53, Mean: 1.874, MSE: 8.115, RMSE: 2.849, PSNR: 39.038, SSIM: 0.959831

ETC1/2 block histogram:
ETC1_DIFFERENTIAL: 23576
ETC1_INDIVIDUAL: 1000
ETC2_T: 0
ETC2_H: 0
ETC2_PLANAR: 0

Total blocks: 24576, ETC1S: 22391 (91.109%), Diff: 23576 (95.931%), Indiv: 1000 (4.069%), Flip: 9005 (36.641%)

Wrote file kodim23.ktx

clustered_s LZMA compressed from 196676 to 89489 bytes, 1.820658 bits/texel
Best ETC1 LZMA compressed from 196676 to 130830 bytes, 2.661743 bits/texel
etcpak ETC1 LZMA compressed from 196676 to 117666 bytes, 2.393921 bits/texel

OpenCV SSIM:
R:     0.919503
G:     0.948527
B:     0.887368
Avg:   0.918466
709 L: 0.9599

basislib:
RGB Total   Error: Max: 109, Mean: 9.195, MSE: 65.655, RMSE: 8.103, PSNR: 29.958
RGB Average Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
Luma        Error: Max:  53, Mean: 1.845, MSE: 7.865, RMSE: 2.805, PSNR: 39.174, SSIM: 0.959831
Red         Error: Max:  63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
Green       Error: Max:  52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
Blue        Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388

PSNR-HVS:  85.836
PSNR-HVSM: 90.828

Experiment succeeded.

The tool outputs over a dozen debug images. Here's some of the compressor prototype's output:





ETC1S visualization (white=ETC1 subset differential, green=ETC1 subset individual, black=full ETC1). The "ETC1 subset" format is a simplified form of ETC1 where both subblocks are constrained to use the same block colors.



Quantized selectors:


Differential vs. individual mode visualization (black=individual 444 444, white=differential 555 333):


Block flip visualization:


Quantized subblock 0 and 1 intensity tables:



Quantized subblock 0 and 1 block colors:



2D Haar Wavelet Transform on GPU texture selector indices

$
0
0
I've been very busy refining my new ETC1 compressor, so I haven't been posting much recently. Today I decided to do something different, so I've been playing around with the 2D Haar 4x4 and 8x8 transforms (or here) on ETC1 selector bits. I first did this years ago while writing crunch on DXT1/BC1, but I unfortunately didn't publish or use the results.

To use the Haar transform on selector indices, I prepare the input samples by adding .5 to each selector index (which range from [0,3] in ETC1), do the transform, uniform quantize, then do the inverse transform and truncate the resulting values back to the [0,3] selector range. (You must shift the input samples by .5 or it won't work.)

The quantization stage scales the the floating point coefficient by 4 (to get 2 bits to the right of the decimal point, which in experiments is just enough for 4x4) and converts to integer. This integer is then divided by a quantization value, then it's converted to float and divided by 4

For this uniform quantization matrix:
  1   1   1   2   2   3   3   4
  1   1   2   2   3   3   4   4
  1   2   2   3   3   4   4   5
  2   2   3   3   4   4   5   5
  2   3   3   4   4   5   5   6
  3   3   4   4   5   5   6   6
  3   4   4   5   5   6   6   7
  4   4   5   5   6   6   7   7

I get this ETC1 image after 8x8 Haar transform+quantization+inverse transform:


The original ETC1 compressed texture (before Haar filtering):


Selector visualization:


1x difference image (the delta between the original and filtered ETC1 images):


There is error in high frequencies, which is exactly what is to be expected given the above quantization matrix.

Here's a more aggressive quantization matrix:

  2   4   6   8  10  12  14  16
  4   6   8  10  12  14  16  18
  6   8  10  12  14  16  18  20
  8  10  12  14  16  18  20  22
 10  12  14  16  18  20  22  24
 12  14  16  18  20  22  24  26
 14  16  18  20  22  24  26  28
 16  18  20  22  24  26  28  30

ETC1 image:


Selector visualization:


An even more aggressive quantization matrix:

  3   6   9  12  15  18  21  24
  6   9  12  15  18  21  24  27
  9  12  15  18  21  24  27  30
 12  15  18  21  24  27  30  33
 15  18  21  24  27  30  33  36
 18  21  24  27  30  33  36  39
 21  24  27  30  33  36  39  42
 24  27  30  33  36  39  42  45


Selector visualization:


I have some ideas on how the 4x4 Haar transform could be very useful in Basis, but they are just ideas right now. I find it amazing that the selectors can be transformed and manipulated in the frequency domain like this.

GST: GPU-decodable Supercompressed Textures

Example code: 4x4 Haar transform on ETC1 selectors

$
0
0
Someone asked me for the code that implements the selector frequency domain filtering experiment I did on one of my previous posts. Here you go. The linear algebra and utility code is all very similar to crunch's. The Haar stuff was copied straight from wikipedia.

If you understand how coefficient quantization works in JPEG, then this example should be pretty easy to follow. Let me know if you have any questions or want to see the 8x8 version, but it's basically the same thing with larger matrices (and the selectors within 2x2 ETC1 blocks).

Notes:
gpu_image: A compressed GPU texture (in this case, ETC1). image_u8: A 32bpp ARGB raster image. etc_block: A simple helper class to get/set ETC1/2 selectors, subblock colors, codework table indices, etc.
static void selector_haar_transform_test(const gpu_image &g)
{
image_u8 g_unpacked;
g.unpack_image(g_unpacked);

image_utils::write_to_file("g_orig.png", g_unpacked);

gpu_image g_temp(g);

#define sqrt2 (1.4142135623730951f)

matrix44F haar_matrix(
.5f * 1, .5f * 1, .5f * 1, .5f * 1,
.5f * 1, .5f * 1, .5f * -1, .5f * -1,
.5f * sqrt2, .5f * -sqrt2, 0, 0,
0, 0, .5f * sqrt2, .5f * -sqrt2);

matrix44F inv_haar_matrix(haar_matrix.get_transposed());

for (uint block_y = 0; block_y < g_temp.get_blocks_y(); block_y++)
{
for (uint block_x = 0; block_x < g_temp.get_blocks_x(); block_x++)
{
etc_block &blk = g_temp.get_element_of_type<etc_block>(block_x, block_y);

matrix44F s;
for (uint y = 0; y < 4; y++)
for (uint x = 0; x < 4; x++)
s(x, y) = (float)blk.get_selector(x, y) + .5f;

matrix44F h(haar_matrix * s * inv_haar_matrix);

for (uint i = 0; i < 4; i++)
{
for (uint j = 0; j < 4; j++)
{
float q = h(i, j);

const float scale = 4.0f;
int iq = math::float_to_int_nearest(q * scale);

int quant = (i+j+1);

if (quant < 1)
quant = 1;

iq /= quant;

iq *= quant;

h(i, j) = iq / scale;
}
}

matrix44F ih(inv_haar_matrix * h * haar_matrix);

for (uint y = 0; y < 4; y++)
for (uint x = 0; x < 4; x++)
//blk.set_selector(x, y, math::clamp<int>((int)ih(x, y) + .5f, 0, 3));
blk.set_selector(x, y, math::clamp<int>((int)ih(x, y), 0, 3));
}
}

write_etc1_vis_images(g_temp, "g_temp_");

g_temp.unpack_image(g_unpacked);

image_utils::write_to_file("g.png", g_unpacked);
exit(0);
}

In-app charting/graphing using pplot

$
0
0
pplot is a nice little graphing library:

http://pplot.sourceforge.net/

pplot is device and platform independent, which I really liked. I hooked it up to my generic image class, which supports things like vector text rendering, antialiased line drawing, etc.



status of basis ETC1 support

$
0
0
I've "upgraded" my 2D-only prototype to a full-blown class now, instead of it living in my experimental framework as a huge function.

Next up are things like macroblock support, more endpoint/selector codebook refinements, an investigation into alternative selector compression schemes, and an experiment to exploit endpoint/selector codebook entry correlation. After this, I'm rewriting the code so it works on texture arrays, cubemaps, etc.

This rewritten new code will be the "front end" of the full ETC1 compressor. The back end (that does the coding) comes after the front end is in good shape. Unlike crunch, basis will use the same basic front end for both .RDO mode and .CRN (or .basis) mode.

This compressor is also compatible with the ETC1 "subset" format I mentioned here, which means it could be trivially transcoded to DXT1 with the aid of a precomputed lookup table.

Rate distortion performance of Basis ETC1 RDO+LZMA on the Kodak test set

$
0
0
At 3 quality levels, using REC709 perceptual colorspace metrics. This compares plain ETC1 (with no lossless compression), basislib highest quality ETC1+LZMA, and basislib RDO+LZMA.

"S" = selectors, "E" = endpoints.

crunch-style adaptive endpoint quantization at the block/subblock level is supported, but not at the macroblock (2x2 block) level yet. Also, the KTX writer backend is greedy, meaning it doesn't try to choose the best combination of selectors+endpoints that result in the least amount of compressed bits output by LZMA (or LZHAM). The lack of both features hurts compression. I have several other improvements to both quality and bitrate coming, but this is a good milestone.





Effect of ETC1 endpoint quantization on Luma SSIM/PSNR

$
0
0
In this test on the 24 kodak images I quantized the ETC1 block colors/intensity tables (or what I've been calling "endpoints", from DXT1/BC1 terminology) to 128 clusters, but the selectors were not quantized at all. 128 clusters for endpoints is at the edge of usability for many photos.

This test also adaptively limits blocks to only a single endpoint (verses a unique endpoint for each subblock), if doing so doesn't lower the block's PSNR by more than 1.25 dB.

Anyhow, these two graphs show that this process is quite  effective. Even at only 128 clusters, the overall SSIM is only reduced by around .01, while the bitrate is reduced by around .4 - .5 bits/texel.

The results look surprisingly good. I've made great progress on quality per bit over the previous few weeks, and I'll be posting images and .KTX files in a day or so.



Two more graphs, with 3 different endpoint quantization settings:


Overall stats:

ETC1 (no quantization):
best_luma_psnr: Avg: 40.009226, Std Dev: 2.154732, Min: 35.193684, Max: 42.750275, Mean: 41.113007
best_luma_ssim: Avg: 0.983419, Std Dev: 0.002109, Min: 0.980131, Max: 0.989254, Mean: 0.983190
best_bits_per_texel: Avg: 2.851078, Std Dev: 0.341672, Min: 2.184774, Max: 3.482361, Mean: 2.822876

128 endpoints:
rdo_luma_psnr: Avg: 38.042171, Std Dev: 1.874003, Min: 34.209053, Max: 41.065495, Mean: 38.749592
rdo_luma_ssim: Avg: 0.974083, Std Dev: 0.004284, Min: 0.960817, Max: 0.983318, Mean: 0.974376
rdo_bits_per_texel: Avg: 2.351300, Std Dev: 0.318168, Min: 1.788859, Max: 2.967855, Mean: 2.344340

512 endpoints:
rdo_luma_psnr: Avg: 39.239567, Std Dev: 2.001313, Min: 34.834538, Max: 41.839687, Mean: 40.379951
rdo_luma_ssim: Avg: 0.979648, Std Dev: 0.002847, Min: 0.973445, Max: 0.987098, Mean: 0.979329
rdo_bits_per_texel: Avg: 2.617640, Std Dev: 0.345818, Min: 2.031942, Max: 3.296285, Mean: 2.604553

1024 endpoints:
rdo_luma_psnr: Avg: 39.490915, Std Dev: 2.033055, Min: 34.942341, Max: 42.026814, Mean: 40.666183
rdo_luma_ssim: Avg: 0.980563, Std Dev: 0.002673, Min: 0.976034, Max: 0.987617, Mean: 0.980514
rdo_bits_per_texel: Avg: 2.693218, Std Dev: 0.356560, Min: 2.069397, Max: 3.390055, Mean: 2.668416

The next 2 graphs show RDO ETC1 compression using 24,576 selectors and endpoints. This disables quantization on the kodak test images, for all practical purposes. Note that adaptive subblock utilization is still enabled here, so it's possible for a block's subblocks to be forced to use the same block colors/intensity tables (endpoints) if the quality loss is < 1.25 dB.

Tests like this are important, because it shows that the RDO compressor is able to utilize all the features available in ETC1: flip/non-flipped, differential/absolute block color encoding, subblocks, etc.



Overall stats:

rdo_luma_psnr: Avg: 39.766113, Std Dev: 2.066657, Min: 35.116722, Max: 42.367085, Mean: 40.845627
rdo_luma_ssim: Avg: 0.981710, Std Dev: 0.002428, Min: 0.978301, Max: 0.988114, Mean: 0.981266
rdo_bits_per_texel: Avg: 2.754947, Std Dev: 0.365874, Min: 2.098104, Max: 3.464823, Mean: 2.714681
rdo_orig_size: Avg: 196676.000000, Std Dev: 0.000000, Min: 196676.000000, Max: 196676.000000, Mean: 196676.000000
rdo_compressed_size: Avg: 135411.166667, Std Dev: 17983.452669, Min: 103126.000000, Max: 170303.000000, Mean: 133432.000000

best_luma_psnr: Avg: 40.009226, Std Dev: 2.154732, Min: 35.193684, Max: 42.750275, Mean: 41.113007
best_luma_ssim: Avg: 0.983419, Std Dev: 0.002109, Min: 0.980131, Max: 0.989254, Mean: 0.983190
best_bits_per_texel: Avg: 2.851078, Std Dev: 0.341672, Min: 2.184774, Max: 3.482361, Mean: 2.822876
best_orig_size: Avg: 196676.000000, Std Dev: 0.000000, Min: 196676.000000, Max: 196676.000000, Mean: 196676.000000
best_compressed_size: Avg: 140136.166667, Std Dev: 16793.846305, Min: 107386.000000, Max: 171165.000000, Mean: 138750.000000

The next graphs are just like the previous ones, except the adaptive subblock feature is disabled. They show that RDO ETC1 with no quantization is virtually identical to basic (highest quality, block by block) ETC1 compression.



Overall stats:

rdo_luma_psnr: Avg: 39.991337, Std Dev: 2.109917, Min: 35.276287, Max: 42.721352, Mean: 41.098907
rdo_luma_ssim: Avg: 0.982858, Std Dev: 0.002269, Min: 0.979608, Max: 0.988770, Mean: 0.982394
rdo_bits_per_texel: Avg: 2.853771, Std Dev: 0.348101, Min: 2.188131, Max: 3.518412, Mean: 2.828857
rdo_orig_size: Avg: 196676.000000, Std Dev: 0.000000, Min: 196676.000000, Max: 196676.000000, Mean: 196676.000000
rdo_compressed_size: Avg: 140268.541667, Std Dev: 17109.836167, Min: 107551.000000, Max: 172937.000000, Mean: 139044.000000

best_luma_psnr: Avg: 40.009226, Std Dev: 2.154732, Min: 35.193684, Max: 42.750275, Mean: 41.113007
best_luma_ssim: Avg: 0.983419, Std Dev: 0.002109, Min: 0.980131, Max: 0.989254, Mean: 0.983190
best_bits_per_texel: Avg: 2.851078, Std Dev: 0.341672, Min: 2.184774, Max: 3.482361, Mean: 2.822876
best_orig_size: Avg: 196676.000000, Std Dev: 0.000000, Min: 196676.000000, Max: 196676.000000, Mean: 196676.000000

best_compressed_size: Avg: 140136.166667, Std Dev: 16793.846305, Min: 107386.000000, Max: 171165.000000, Mean: 138750.000000

Effect of ETC1 selector quantization on Luma SSIM/PSNR

$
0
0
This is like the previous post, except this time only the selectors are quantized while the endpoints are left alone. kodak test images, perceptual colorspace metrics:





Stats for non-RDO ETC1 compression:

best_luma_psnr: Avg: 40.009226, Std Dev: 2.154732, Min: 35.193684, Max: 42.750275, Mean: 41.113007
best_luma_ssim: Avg: 0.983419, Std Dev: 0.002109, Min: 0.980131, Max: 0.989254, Mean: 0.983190
best_bits_per_texel: Avg: 2.851078, Std Dev: 0.341672, Min: 2.184774, Max: 3.482361, Mean: 2.822876

RDO selectors 8192:

rdo_luma_psnr: Avg: 38.225255, Std Dev: 2.628415, Min: 31.853958, Max: 41.955276, Mean: 39.500504
rdo_luma_ssim: Avg: 0.966271, Std Dev: 0.007768, Min: 0.944449, Max: 0.981821, Mean: 0.966354
rdo_bits_per_texel: Avg: 2.366380, Std Dev: 0.231610, Min: 1.902201, Max: 2.793721, Mean: 2.337708

RDO selectors 4096:

rdo_luma_psnr: Avg: 36.581700, Std Dev: 2.874786, Min: 29.814810, Max: 40.718441, Mean: 37.796730
rdo_luma_ssim: Avg: 0.953993, Std Dev: 0.010954, Min: 0.922887, Max: 0.973516, Mean: 0.953305
rdo_bits_per_texel: Avg: 2.132147, Std Dev: 0.220503, Min: 1.668640, Max: 2.535848, Mean: 2.094666

RDO selectors: 2048:

rdo_luma_psnr: Avg: 35.129581, Std Dev: 2.967410, Min: 28.291447, Max: 39.650620, Mean: 36.413860
rdo_luma_ssim: Avg: 0.942579, Std Dev: 0.013760, Min: 0.903846, Max: 0.967203, Mean: 0.941114
rdo_bits_per_texel: Avg: 1.969779, Std Dev: 0.216071, Min: 1.506246, Max: 2.368530, Mean: 1.930033

RDO selectors 1024:

rdo_luma_psnr: Avg: 33.915408, Std Dev: 2.963184, Min: 27.143675, Max: 38.416290, Mean: 35.294361
rdo_luma_ssim: Avg: 0.931751, Std Dev: 0.016440, Min: 0.886028, Max: 0.960691, Mean: 0.929749
rdo_bits_per_texel: Avg: 1.848387, Std Dev: 0.216314, Min: 1.378805, Max: 2.245748, Mean: 1.809530

RDO selectors 512:

rdo_luma_psnr: Avg: 32.898390, Std Dev: 2.920482, Min: 26.292456, Max: 37.282799, Mean: 34.293579
rdo_luma_ssim: Avg: 0.920788, Std Dev: 0.019035, Min: 0.868281, Max: 0.953666, Mean: 0.918912
rdo_bits_per_texel: Avg: 1.753840, Std Dev: 0.215968, Min: 1.278585, Max: 2.150350, Mean: 1.717773

RDO selectors 256:

rdo_luma_psnr: Avg: 32.036631, Std Dev: 2.866251, Min: 25.595591, Max: 36.275482, Mean: 33.285240
rdo_luma_ssim: Avg: 0.909641, Std Dev: 0.021761, Min: 0.849128, Max: 0.946493, Mean: 0.907937
rdo_bits_per_texel: Avg: 1.673566, Std Dev: 0.215763, Min: 1.187663, Max: 2.065999, Mean: 1.631165

RDO selectors 128:

rdo_luma_psnr: Avg: 31.255766, Std Dev: 2.800476, Min: 24.977221, Max: 35.173336, Mean: 32.437733
rdo_luma_ssim: Avg: 0.896458, Std Dev: 0.024306, Min: 0.827130, Max: 0.934879, Mean: 0.895064
rdo_bits_per_texel: Avg: 1.600956, Std Dev: 0.215559, Min: 1.127218, Max: 1.991862, Mean: 1.550741

Viewing all 302 articles
Browse latest View live