Quantcast
Channel: Richard Geldreich's Blog
Viewing all 302 articles
Browse latest View live

Few more random thoughts on a "universal" GPU texture format (originally published 9/9/16)

$
0
0
In my experiments, a simple but usable subset of ETC1 can be easily converted to DXT1, BC7, and ATC. And after studying the standard, it very much looks like the full ETC1 format can be converted into BC7 with very little loss. (And when I say "converted", I mean using very little CPU, just basically some table lookup operations over the endpoint and selector entries.)

ASTC seems to be (at first glance) around as powerful as BC7, so converting the full ETC1 format to ASTC with very little loss should be possible. (Unfortunately ASTC is so dense and complex that I don't have time to determine this for sure yet.)

So I'm pretty confident now that a universal format could be compatible with ASTC, BC7, DXT1, ETC1, and ATC. The only other major format that I can't fit into this scheme easily is my old nemesis, PVRTC.

Obviously this format won't look as good compared to a dedicated, single format encoder's output. So what? There are many valuable use cases that don't require super high quality levels. This scheme purposely trades off a drop in quality for interchange and distribution.

Additionally, with a crunch-style encoding method, only the endpoint (and possibly the selector) codebook entries (of which there are usually only hundreds, possibly up to a few thousand in a single texture) would need to be converted to the target format. So the GPU format conversion step doesn't actually need to be insanely fast.

Another idea is to just unify ASTC and BC7, two very high quality formats. The drop in quality due to unification would be relatively much less significant with this combination. (But how valuable is this combo?)

(This blog post was originally mirrored here: http://geldreich1.rssing.com/chan-32192436/all_p6.html#item116)

Idea for next texture compression experiment (originally published 9/11/16)

$
0
0
Right now, I've got a GPU texture in a simple ETC1 subset that is easily converted to most other GPU formats:

Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel

Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.

This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.

For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.

In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.

Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.

To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.

Unified texture encoder for BC7 and ASTC 4x4

$
0
0
So far, it looks possible to unify a very strong subset of BC7 and ASTC 4x4. Such an encoder would be very useful, even if it didn't support rate distortion optimization. I've been looking at this problem on and off for months, and I'm convinced that there's something really interesting here.

First, it turns out there are 30 2-subset partition patterns in common between ASTC 4x4 and BC7:


This is a collection of very strong patterns. Considering ASTC's partition pattern generator is basically a fancy rand() function, this is surprisingly good! We've got almost half of BC7's 64 patterns in there!

Secondly, the way ASTC and BC7 convert indices (weights) to 6-bit scales, and the way they both interpolate endpoints is extremely similar. So similar that I believe ASTC indices could be converted directly to BC7's with little to no expensive per-pixel CPU work required.

Finally, I wrote a little app that examines hundreds of valid ASTC configurations, trying to find which configurations resemble the strongest and most useful BC7 modes. Here they are:


So basically, all the important things between BC7 and ASTC 4x4 are basically the same or similar enough. ASTC's endpoint ranges are all over the map, but that's fine because in most cases BC7's endpoint precision is actually higher than ASTC's. A unified encoder could just optimize for lowest error across *both* formats simultaneously, and output both ASTC and BC7 endpoints with the same selectors. Or, we could only output ASTC's endpoints and just scale them to BC7's.

The next step is to write an ASTC encoder that is limited to these 12 configs and see how strong it is. After this, I need to see if this ASTC texture data can be directly and quickly converted to BC7 texture data with little loss. (Without any recompression to BC7, i.e. we just convert and copy over the endpoints/partition table index/per-pixel selector indices and that's it.) So far, I think this is all possible.

If all this actually works, it'll give us the foundation we need to build the next version of .basis. We could quantize the unified ASTC/BC7 selector/endpoint data and store it in a ".basis2" file. We then can convert this high-quality texture data to the other formats using fast block encoders (like we do already with PVRTC1 RGB/RGBA and PVRTC2 RGBA).

We could even store hints in the .basis2 file that accelerate conversion to some formats. For example we could store optimized BC1 endpoints in the endpoint codebook. Or we could store the optimal ETC1 base color/table indices, etc. Determining the per-pixel selectors for these formats is cheap once you have this info.

I think that with a strong ASTC 4x4 12-mode encoder that supports perceptual colorspace metrics, we could actually beat (or get really close) to ispc_texcomp BC7's encoder (which only supports linear RGB metrics). I think this encoder would get within a few dB of max achievable BC7.

If the system's quality isn't high enough, we could always tack on more ASTC modes, as long as they can be easily transcoded to one of the BC7 modes without expensive operations.

It's too bad that BC7 isn't well supported in WebGL yet. The extensions are there, but the browser support isn't yet. I have no idea why as the format is basically ubiquitous on desktop GPU's now, and it's the highest quality LDR texture format available. For WebGL we still need very strong BC1-5 support for desktops until this situation changes.

ARM's ASTC encoder patents - is it safe to write encoders for this format?

$
0
0
I put this on Twitter earlier. I found this very disturbing comment in the Arm ASTC Encoder:

/** *
Functions for finding dominant direction of a set of colors. * * Uses Arm patent pending method. */

Source code link.

Wow. I immediately stopped looking at this code and deleted it all once I saw this comment. I will never look at this code again in any way. So basically, ARM seems to be patenting some variant of PCA (Principle Component Analysis)? This is the first software GPU texture *encoder* I've seen that explicitly states that it uses patent pending algorithms.

ASTC is supposed to be "royalty-free": khronos.org/news/press/khr
Yet, if I implement an ASTC encoder that uses PCA, will ARM sue us for patent infringement?

I was very excited about ASTC, but now it's totally clouded by this encoder patent issue. I cannot support a supposed "royalty-free" standard that apparently has encoder patents hanging over its head. We need ARM to fix this, to basically clarify what's going on here, and make a public statement that software developers can write encoders for its format without being sued because they infringed on ARM encoder patents.

You know, just to illustrate what a slippery slope encoder patents are and why they suck for everybody: We could have patented the living daylights out of our texture encoders, our universal codec, etc. It would have been no problem whatsoever. We could take this entire field and patent it up so tight that nobody could write a practical open or closed source GPU texture encoder without having to pay up. We could then sue for patent infringement any IHV's which ship drivers that implement run-time texture compressors, transcoders, or converters that use our patent pending encoding algorithms.

However we didn't want to ignite a texture encoder/texture compression patent war, and I'm very allergic to software patents.

The sad reality is, if the IHV's are going to start patenting the algorithms in their reference GPU texture *encoders*, we will have no choice but to start patenting every single GPU texture encoding and transcoding algorithm we can. For defensive purposes, so we can survive.

Taking this further, we could then turn this encoder patent landgrab into a significant part of our business model. These patents are worth several million each to the big tech corps during acquisitions. We could sell out our encoders and patents to the biggest buyer and retire.

Our defense to the software development community would be: "ARM started patenting their encoders first, not us. We needed defensive encoder patents to survive, just in case they sued."

After parsing the astc-encoder's license a few times, it appears we can legally use the ASTC specification to write our own 100% independent ASTC encoders and distribute the resulting compressed texture data. That's great. But if I go and write (for example) a BC7 texture encoder that accidentally infringes on ARM's encoder patents over their variation of PCA, I'm still screwed.

BTW - The author of the "Slug" texture rendering library has started to patent his algorithms. (I only point this out to show that it's very possible for a tiny middleware company to easily acquire patents.) Personally, I'm against software patents, and I hope ARM fixes this.

Parsing ASTC's overly restrictive end user license

$
0
0
We've been reviewing the licensing situation for all the GPU texture formats Basis Universal supports. (This is basically every LDR GPU format in existence, so this isn't easy.) Most formats are covered by various open Khronos API standards and standard documents and have been fully documented in a variety of very permissive open source works and publications.

However, the ASTC reference encoder, documentation and specification has its own End User License agreement, which I believe makes it unique. This license is distributed with ARM's "astc-encoder" project on github:

https://github.com/ARM-software/astc-encoder/blob/master/LICENSE.md

At first glance, after a casual reading, you may think this legal agreement grants the end user permission to do basically anything they want with ASTC. Actually, it's very restrictive. There's *a lot* you can't legally do with ASTC.

Here are the key/core lines of the license that matters the most (anything in bold is by me). This is just a subset of the full license linked above:
THIS END USER LICENCE AGREEMENT ("LICENCE") IS A LEGAL AGREEMENT BETWEEN YOU (EITHER A SINGLE INDIVIDUAL, OR SINGLE LEGAL ENTITY) AND ARM LIMITED ("ARM") FOR THE USE OF THE SOFTWARE ACCOMPANYING THIS LICENCE. ....
1. DEFINITIONS

"Authorised Purpose" means the use of the Software solely to develop products and tools which implement the Khronos ASTC specification to;

(i) compress texture images into ASTC format ("Compression Results");
(ii) distribute such Compression Results to third parties; and
(iii) decompress texture images stored in ASTC format.


"Software" means the source code and Software binaries accompanying this Licence, and any printed, electronic or online documentation supplied with it, in all cases relating to the MALI ASTC SPECIFICATION AND SOFTWARE CODEC.
2. LICENCE GRANT

ARM hereby grants to you, subject to the terms and conditions of this Licence, a nonexclusive, nontransferable, free of charge, royalty free, worldwide licence to use, copy, modify and (subject to Clause 3 below) distribute the Software solely for the Authorised Purpose.
No right is granted to use the Software to develop hardware.
Notwithstanding the foregoing, nothing in this Licence prevents you from using the Software to develop products that conform to an application programming interface specification issued by The Khronos Group Inc. ("Khronos"), provided that you have licences to develop such products under the relevant Khronos agreements.

3. RESTRICTIONS ON USE OF THE SOFTWARE
.....
TITLE AND RESERVATION OF RIGHTS: You acquire no rights to the Software other than as expressly provided by this Licence. ...
....
What does all this legalese actually mean? First, note under "Definitions" that "Software" actually means astc-encoder, its documentation, and the Mali ASTC Specification. It doesn't mean just code, it means the docs and spec too.

As far as we can tell, this license means you can only legally use astc-encoder and the Mali ASTC Specification to compress texture images into the ASTC format to create Compression Results, then distribute these Compression Results to third parties. Then you can decompress texture images stored in ASTC format. That's it. Notice the key "and" word under Clause 1 (Definitions): "(ii) distribute such Compression Results to third parties; and".  It's not "or".

You can't do anything else with the Software (meaning the astc-encoder, docs, or spec), because those use cases have been expressly forbidden by Clause 3.

This license apparently forbids all sorts of practical real world use cases, like: real-time encoding textures to ASTC on end-user devices, transcoding from other texture formats to ASTC, compressing ASTC using a .CRN-like system and decompressing or transcoding that to ASTC, or processing or converting ASTC data at run-time.

You also cannot compress anything but "Texture Images" into the ASTC format, which is quite restrictive. If your input signal isn't a texture image, well you're out of luck.

Under Clause 2, there's this paragraph that feels crudely hacked into the license contract: "Notwithstanding the foregoing, nothing in this Licence prevents you from using the Software to develop products that conform to an application programming interface specification issued by The Khronos Group Inc. ("Khronos"), provided that you have licences to develop such products under the relevant Khronos agreements." 

So this basically means "in spite of what was just said or written, nothing in this Licence prevents you from using the Software to develop products that conform to a Khronos API". However, there are many uses cases that don't involve directly calling a Khronos API. Basis Universal doesn't call any Khronos API's at all. If you are using a rendering API that isn't a Khronos standard, you're out of luck.

If you develop a real-time ASTC encoder library or product that will be deployed on end-user devices that don't use API's covered by a Khronos standard, you are not covered by this license. 

My current guess is that ARM's lawyers weren't filled in on all the modern ways developers can encode, transcode, and manipulate texture data. As the situation stands right now, you cannot do much with ASTC except encode it offline, distribute this data, and then use it on a device. If your product uses a Khronos API, you may be able to do more, but I can't really tell for sure.

The whole situation is very fuzzy for what is supposed to be an open, royalty free standard.

Note our IP lawyer is still reviewing this license document. (We're actually spending money on this - that's how serious this is to us.)

Universal ASTC (UASTC) Tech Details

$
0
0
We have reached an exciting milestone: We now have a working HQ universal encoder that supports both ASTC and BC7 for RGB/RGBA textures. It's currently a bit slow and it doesn't support RDO yet, but it works. Quality is extremely high (BC7 grade, no block artifacts) and the encoder's behavior is stable across a wide range of RGB/RGBA inputs including XYZ normal maps.

We've settled on the below standard 15 ASTC modes, which we're calling "UASTC". They are 100% standard ASTC configurations, so any ASTC encoder could be modified to limit itself to output just these modes (out of the hundreds available). Any BC7 encoder could be modified too, once it supports ASTC's endpoint quantization tables, ASTC's 4-bit weight table, and ASTC's 16-bit endpoint interpolation. (That's how we prototyped this system.)

For validation purposes we are creating 100% standard ASTC data from the UASTC blocks, unpacking these ASTC blocks using an open source ASTC decoder (the one in Basis Universal), then computing RGB/RGBA average PSNR.

Average RGB PSNR's across 33 test textures:

Original->Near-optimal BC7: 46.67 dB (our high quality SIMD BC7 codec in "slow" mode)
Original->UASTC: 45.14
UASTC->ASTC 45.14 (always lossless)
UASTC->BC7: 44.41  

Original->Near-optimal BC1: 36.96  (stb_dxt STB_DXT_HIGHQUAL)
UASTC->Near-optimal BC1: 36.20

This ASTC subset's quality is on average only ~1.5 dB lower than near-optimal BC7 for opaque content, but it's 9.7 dB higher than near-optimal BC1. Both RGB and RGBA content look *really* good. Our experience building several production BC7 encoders helped guide us to the right ASTC modes.

These modes are easily converted directly to a BC7 texture encoding with no pixel-wise recompression, with low quality loss (around .75 dB on average). To convert to BC7, the endpoints are scaled, you compute the optimal p-bits to represent the ASTC endpoints (if any- this is simple), and then you either clone the ASTC indices or translate them with a tiny table. Transcoding to BC7 is very simple stuff, and doesn't require the large precomputed tables that Basis Universal's ETC1S solution needs.

We're not encoding these modes to the standard ASTC block format (although we could), because the standard ASTC block encoding has a lot of unnecessary fields in there we can repurpose. Instead, we use a simple 128-bit/block BC7-like block format for the UASTC mode/endpoints/weights/partition index/comp rotation. Worst case the packed UASTC data takes 112-113 bits, leaving around 15-16 bits for other things.

We have an interesting plan on how to support ETC1/2 at high quality (way better than ETC1S) with fast transcoding. We can take the 15-16 bits left over in our custom block format to store ETC1/2 hints. These hints greatly accelerate real-time high quality ETC1/2 compression (by ~30x for ETC1 vs. a brute force encoder). The UASTC compressor will re-encode the final UASTC block to ETC1/2 and then determine the set of ETC1/2 hints that result in the lowest ETC1/2 error. 

The next major step for us is to sit down and implement ETC1/2 to make sure this plan works well on a wide range of inputs.

As this is a universal GPU texture compression system it will support ALL LDR GPU texture formats, like Basis Universal does. Here's the plan for the other formats:

ETC2 R11 and RG11 might be able to reuse the ETC1/2 hints.

We have already prototyped BC1 and found a way to make that very fast in the 1-subset cases. For the other relatively rare 2/3-subset UASTC cases we'll need to use PCA+least squares. Real-time BC3-5 are fast. 

PVRTC1, and the other niche/obsolete formats (like PVRTC2, ATC, etc.) will use solutions already implemented in Basis Universal.

UASTC mode constraints/notes:

1. All blocks are always LDR 4x4 pixels, and all UASTC modes use integer weight bits for compatibility with BC7. 

2. Only uses Color Endpoint Mode (CEM) 8 or 12 (RGB/RGBA Direct) to simplify the encoder/transcoder. The other CEM's don't help enough to justify the added complexity.

3. CEM 8 and 12 support Blue Contraction, which is never utilized in UASTC. Instead, we swap the subset's endpoints if the MSB of the last weight index is 1 (exactly like BC7). This guarantees the last weight index has an MSB of 0, so we don't need to store it in the packed block format.

The UASTC->ASTC transcoder needs to check the dequantized endpoints to see if blue contraction would kick in. If so, it'll need to invert the weight indices and swap the subset's endpoints.

4. The 2 and 3 subset modes are constrained to only use the set of common 2/3-subset partition patterns that are in common between ASTC and BC7, which we've documented on our blog and on Twitter. Total of 60 patterns (30+11+19).

5. Mode 7 uses a 3-subset BC7 mode, but only a 2-subset ASTC mode. Two of the BC7 subset endpoints are set to equal colors to simplify the 3-subset partition pattern into a 2-subset pattern that's compatible with ASTC. This gives us 19 more useful partitions.

6. Opaque encodings get transcoded to BC7 modes 1,2,3,5,6. Alpha encodings transcode to BC7 modes 5,6,7. BC7 modes 0 and 4 are unused.

7. When the # of weight bits differ between BC7/ASTC encodings, we chose the closest BC7 weight (just a simple table lookup into a static 4/8 entry table). Note that BC7 and ASTC use the same 2-bit and 3-bit weight tables. Some ASTC 4-bit table entries are different by +- 1 compared to BC7, but the encoder can work around this.

8. BC7 and ASTC interpolate endpoints in a similar way, except ASTC endpoints are scaled up to 16-bits before interpolation and then only the top 8-bits are used. This is a surprisingly minor difference that a good encoder can work around by choosing the lowest overall BC7 error from the hundreds/thousands of possible UASTC configurations/partition patterns/endpoints/etc.

9. Strong encoders can compute both ASTC and transcoded BC7 error to choose UASTC encodings that result in minimal BC7 error. (This isn't necessary, it just helps a little.)

10. A driver could easily transcode UASTC texture data to ASTC or BC7 completely transparently to the user. The blocks are completely independent and the transcode step can be done 4-8 blocks at a time with SIMD operations.

UASTC modes:

Format is:
UASTC Mode #, Dual Plane Flag, Texel Weights BISE Range Index (# quant levels), # Subsets, Endpoint BISE Range Index (# quant levels), BC7 Target Mode

Opaque (CEM 8):
 0. DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 19 (192)  MODE6 RGB
 1. DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256)   MODE3
 2. DualPlane: 0, WeightRange: 5 (8), Subsets: 2, EndpointRange: 8 (16)     MODE1
 3. DualPlane: 0, WeightRange: 2 (4), Subsets: 3, EndpointRange: 7 (12)     MODE2
 4. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40)    MODE3
 5. DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 20 (256)   MODE6 RGB
 6. DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 18 (160)   MODE5 RGB
 7. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40)    MODE2

Solid 
 8. Void-Extent: Solid Color RGBA (MODE5 or MODE6)

Alpha (CEM 12):
 9. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 8 (16)     MODE7
10. DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 13 (48)   MODE6
11. DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 13 (48)    MODE5
12. DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 19 (192)   MODE6
13. DualPlane: 1, WeightRange: 0 (2), Subsets: 1, EndpointRange: 20 (256)   MODE5
14. DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256)   MODE6

Once you have the UASTC mode index, endpoint values, weight indices, and optionally the partition index and component rotation fields extracted from the UASTC block, unpacking proceeds in exactly the same way as with a standard ASTC block. It uses the same ASTC endpoint value dequantization method, the 2/3/4-bit texel indices are converted to [0,64] interpolation weights in the same way, and the endpoints are interpolated as 16-bit values. See in particular sections 18.11-18.20 in the Khronos ASTC data format specification:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc-endpoint-unquantization

The are so few partition patterns that a decoder could use lookup tables, or it could use the ASTC pattern generator function (in section 18.21) with the correct seeds. The UASTC format stores partition pattern indices, not 10-bit seeds, to save space.

This is an RDO codec, so we're depending on a good LZ codec for compression. To implement multiple quality levels the current plan is to use an LZ dictionary simulator, bit price estimator, and Lagrangian optimization to choose block selector bytes which have been recently emitted into the output data stream. The quality level will control the error threshold used to choose "good enough" selectors which we've already sent (so they'll be cheap for LZ to encode). We've implemented this before in Basis BC1, but that was with already quantized selectors. So there will be some things to figure out.

This system is designed to be compatible with and explicitly exploit KTX2's support for RDO compression overtop of block based formats.

UASTC block format encoding

$
0
0
UASTC is a 15 mode 4x4 pixel LDR-only subset of the ASTC specification with a simpler 128-bit block format. It can be losslessly transcoded to the standard ASTC block format, quickly transcoded to BC7 with very low quality loss (.75 RGB dB PSNR on average), or re-encoded to high quality ETC1/2 with a small amount of per-pixel work. There are 8 opaque modes, 1 solid color mode, and 6 alpha modes. These UASTC modes each map to one of 6 BC7 modes (all except 0 and 4). UASTC is the first high quality universal GPU texture format that supports block partitioning.

This post shows how each mode is laid out in a 128-bit UASTC block at the bit level. Bits are written starting from the beginning of the block (at the first byte's LSB) working "down" towards bit 128. The mode field is always first and is stored at bit 0 in the block (bit 0 of byte 0).

See the previous post for a description of each UASTC mode: How many subsets, the weight/endpoint BISE ranges, number of planes, etc.

Unlike ASTC, the weights are not stored in reverse bit order starting from the end of the block. Instead they are stored immediately following the endpoint bits in regular (LSB first) bit order.

The CEM field is always 8 (RGB Direct) for modes 0-7, and 12 (RGBA Direct) for 9-14. Blue Contraction isn't supported (i.e. the endpoints can be in arbitrary order, which we exploit to reduce the few index bits like BC7 does). Mode 8 is void-extent.

This is a snapshot of the current encoding. This may change somewhat over the next few weeks.

Field Definitions:


Mode: Huffman coded mode (2, 4, or 5 bits). One mode (15) is saved for future expansion. The Huffman codes and code lengths are (first bit of Huffman code is the LSB):

{ 0xB, 5 }, { 0x1B, 5 }, { 0x7, 5 }, { 0x17, 5 }, { 0xF, 5 }, { 0x1F, 5 }, { 0x2, 4 },
{ 0xA, 4 }, { 0x6, 4 }, { 0xE, 4 }, { 0x1, 4 }, { 0x0, 2 }, { 0x9, 4 }, { 0x5, 4 }, { 0xD, 4 }, { 0x3, 4 }

ETC1F, ETC1D, ETCI0, ETCI1: 8-bits of ETC1 transcode hints (flip, differential, inten table 0, inten table1).

These hints are used by the transcoder to quickly create ETC1 blocks from the unpacked UASTC texels. To use them, the transcoder computes each 4x2 or 2x4 subblock's average color, quantizes them to 555:333 or 444:444 bits, then computes the selectors in luma space. No other work is necessary (because all the hard work was done in the UASTC encoder).

ETC2TM: 8-bits of ETC2 EAC A8 transcode hints (4-bit table, 4-bit multiplier)

This is similar to how ETC1 blocks are packed, except these hints are for the alpha portion of ETC2 EAC A8 blocks. These bits are only present in modes 9-14 (the alpha modes).

ETQ: Packed endpoint trits/quints values. A simplified form of BISE is used in UASTC, see:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc-integer-sequence-encoding

See the "UASTC BISE Endpoint Ranges table" below for the # of trits or quints for each endpoint range. Some of the ranges don't have trits or quints, so there will be no ETQ fields.

We store the trits/quints first, followed by each value's bits. The bit interleaving and trit/quint rearranging and preprocessing in section 18.2 aren't used. Instead the encoded trits/quints are stored in UASTC as-is.

For quints, each encoded value is up to 7-bits: quint2*25+quint1*5+quint0, and similar for trits except each encoded value is up to 8-bits. When the number of endpoint values isn't a multiple of 5 or 3 values, the size of the final code is the minimum # of bits necessary to represent the encoded value (to save bits).

EBITS: Endpoint bits (one set of bits per ASTC endpoint value). See the "UASTC BISE Endpoint Ranges table" below for the # of bits for each endpoint range. Endpoint order is the same as ASTC's: RL, RH, GL, GH, BL, BH, etc. Max of 18 values (RGB 3-subsets: 3*2*3).

To retrieve the endpoint values, you extract the trits/quints from the encoded ETQ values, shift each one left the appropriate number of bits (depending on the UASTC mode's endpoint range) and logically OR in the EBITS values.

Endpoint values are a sequence of integers that must be dequantized to [0,255] by following the ASTC spec in section 18.13, see:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc-endpoint-unquantization

WEIGHTS: Encoded weight indices. Just like BC7, the first weight of each subset's "anchor" texel index always has an MSB of 0, so these weights can be encoded with one less bit than the others. (UASTC doesn't use Blue Contraction so we can use this trick.)

Weights are always encoded as plain bits (no BISE necessary). Weight ordering is the same as ASTC's (raster order, left to right/top to bottom scanline). In dual plane mode, the ordering is also ASTC's: p0 p1, p0 p1, p0 p1, etc. (two weight indices per texel).

The weights are dequantized to 6-bit interpolation values in the same way as ASTC's:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#_weight_unquantization

And the endpoints are interpolated in the same way as ASTC's:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc_weight_application

PAT: Index into the common BC7/ASTC partition pattern table. This table contains BC7 pattern indices, ASTC pattern seeds, and permutation/flip flags which indicate how to map ASTC pattern subset indices to BC7's. There are three tables and 60 total partition patterns.

A UASTC decoder can either use ASTC's partition pattern generator or BC7's partition tables. To map ASTC's partition patterns to BC7's, the pattern subset indices are either used as-is, inverted, permuted, and/or combined to get BC7 partition pattern subset indices (see the tables/example code at the very bottom). These simple transformations correspond to changing the order of the encoded BC7 endpoints, or setting 2 endpoints in a 3-subset BC7 block to the same color/alpha values. Every ASTC pattern included in the below common tables maps to a BC7 pattern without loss (i.e. there is no subset "crosstalk" when mapping a UASTC to a BC7 pattern).

COMPSEL: ASTC's Color Component Selector field. Only present on Dual Plane modes.
This maps to BC7 mode 5's 2-bit component rotation field (the value must be remapped).

Other notes:


- The number of color components is 3 for modes [0,7], or 4 for modes [8,14].
- The number of subsets is [1,3].
- The total number of endpoint values is num_comps * 2 * num_subsets.
- The number of planes is either [1,2].
- The total number of weight values is either 16 (non-dual plane modes) or 32 (dual plane modes).
- Dual plane modes always have 1 subset in UASTC.
- Weight indices are always 1, 2, 3, or 4-bits for compatibility with BC7. BISE is not used at all for weight indices, only endpoints.
- Various endpoint value ordering examples for 1 and 2 subsets (this is the same as ASTC):
1 subset RGB: RL0 RH0 GL0 GH0 BL0 BH0
1 subset RGBA: RL0 RH0 GL0 GH0 BL0 BH0 AL0 AH0
2 subset RGB: RL0 RH0 GL0 GH0 BL0 BH0 RL1 RH1 GL1 GH1 BL1 BH1
2 subset RGBA: RL0 RH0 GL0 GH0 BL0 BH0 AL0 AH0 RL1 RH1 GL1 GH1 BL1 BH1 AL1 AH1
- Transcoding UASTC->ASTC is always a 100% lossless operation. The endpoints may need to be swapped (and the corresponding weight indices inverted) to disable blue contraction, but this is a lossless transformation.
- The primary source of loss when transcoding UASTC->BC7 is mapping UASTC endpoints to BC7 endpoints. This is done using a simple scale with optional optimal p-bit computation. The UASTC weight indices are either copied as-is, or converted to the closest corresponding BC7 weight indices using a lookup table. The partition patterns are lossless, the weight tables are the same for 2/3-bits and very similar for 4-bits, and the endpoint interpolation method is nearly the same (16-bits in UASTC/ASTC, 8-bits with BC7, and both formats use [0,64] weights with rounding in the linear interpolation).

Modes:

Format is "field: bit_offset num_bits"

**** Mode: 0 (CEM 8)
DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 19 (192) MODE6 RGB
mode: 0 5
ETC1F: 5 1
ETC1D: 6 1
ETC1I0: 7 3
ETC1I1: 10 3
ETQ: 13 8
ETQ: 21 2
EBITS: 23 6
EBITS: 29 6
EBITS: 35 6
EBITS: 41 6
EBITS: 47 6
EBITS: 53 6
WEIGHTS: 59 63
Total bits: 122, endpoint bits: 46, weight bits: 63

**** Mode: 1 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) MODE3
mode: 0 5
ETC1F: 5 1
ETC1D: 6 1
ETC1I0: 7 3
ETC1I1: 10 3
EBITS: 13 8
EBITS: 21 8
EBITS: 29 8
EBITS: 37 8
EBITS: 45 8
EBITS: 53 8
WEIGHTS: 61 31
Total bits: 92, endpoint bits: 48, weight bits: 31

**** Mode: 2 (CEM 8)
DualPlane: 0, WeightRange: 5 (8), Subsets: 2, EndpointRange: 8 (16) MODE1
mode: 0 5
ETC1F: 5 1
ETC1D: 6 1
ETC1I0: 7 3
ETC1I1: 10 3
PAT: 13 5
EBITS: 18 4
EBITS: 22 4
EBITS: 26 4
EBITS: 30 4
EBITS: 34 4
EBITS: 38 4
EBITS: 42 4
EBITS: 46 4
EBITS: 50 4
EBITS: 54 4
EBITS: 58 4
EBITS: 62 4
WEIGHTS: 66 46
Total bits: 112, endpoint bits: 48, weight bits: 46

**** Mode: 3 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 3, EndpointRange: 7 (12) MODE2
mode: 0 5
ETC1F: 5 1
ETC1D: 6 1
ETC1I0: 7 3
ETC1I1: 10 3
PAT: 13 4
ETQ: 17 8
ETQ: 25 8
ETQ: 33 8
ETQ: 41 5
EBITS: 46 2
EBITS: 48 2
EBITS: 50 2
EBITS: 52 2
EBITS: 54 2
EBITS: 56 2
EBITS: 58 2
EBITS: 60 2
EBITS: 62 2
EBITS: 64 2
EBITS: 66 2
EBITS: 68 2
EBITS: 70 2
EBITS: 72 2
EBITS: 74 2
EBITS: 76 2
EBITS: 78 2
EBITS: 80 2
WEIGHTS: 82 29
Total bits: 111, endpoint bits: 65, weight bits: 29

**** Mode: 4 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) MODE3
mode: 0 5
ETC1F: 5 1
ETC1D: 6 1
ETC1I0: 7 3
ETC1I1: 10 3
PAT: 13 5
ETQ: 18 7
ETQ: 25 7
ETQ: 32 7
ETQ: 39 7
EBITS: 46 3
EBITS: 49 3
EBITS: 52 3
EBITS: 55 3
EBITS: 58 3
EBITS: 61 3
EBITS: 64 3
EBITS: 67 3
EBITS: 70 3
EBITS: 73 3
EBITS: 76 3
EBITS: 79 3
WEIGHTS: 82 30
Total bits: 112, endpoint bits: 64, weight bits: 30

**** Mode: 5 (CEM 8)
DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 20 (256) MODE6 RGB
mode: 0 5
ETC1F: 5 1
ETC1D: 6 1
ETC1I0: 7 3
ETC1I1: 10 3
EBITS: 13 8
EBITS: 21 8
EBITS: 29 8
EBITS: 37 8
EBITS: 45 8
EBITS: 53 8
WEIGHTS: 61 47
Total bits: 108, endpoint bits: 48, weight bits: 47

**** Mode: 6 (CEM 8)
DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 18 (160) MODE5 RGB
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
COMPSEL: 12 2
ETQ: 14 7
ETQ: 21 7
EBITS: 28 5
EBITS: 33 5
EBITS: 38 5
EBITS: 43 5
EBITS: 48 5
EBITS: 53 5
WEIGHTS: 58 63
Total bits: 121, endpoint bits: 44, weight bits: 63

**** Mode: 7 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) MODE2
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
PAT: 12 5
ETQ: 17 7
ETQ: 24 7
ETQ: 31 7
ETQ: 38 7
EBITS: 45 3
EBITS: 48 3
EBITS: 51 3
EBITS: 54 3
EBITS: 57 3
EBITS: 60 3
EBITS: 63 3
EBITS: 66 3
EBITS: 69 3
EBITS: 72 3
EBITS: 75 3
EBITS: 78 3
WEIGHTS: 81 30
Total bits: 111, endpoint bits: 64, weight bits: 30

**** Mode: 8 (Void-Extent)
Void-Extent: Solid Color RGBA (MODE5 or MODE6)
mode: 0 4
R: 4 8
G: 12 8
B: 20 8
A: 28 8
Total bits: 36

**** Mode: 9 (CEM 12)
DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 8 (16) MODE7
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
PAT: 20 5
EBITS: 25 4
EBITS: 29 4
EBITS: 33 4
EBITS: 37 4
EBITS: 41 4
EBITS: 45 4
EBITS: 49 4
EBITS: 53 4
EBITS: 57 4
EBITS: 61 4
EBITS: 65 4
EBITS: 69 4
EBITS: 73 4
EBITS: 77 4
EBITS: 81 4
EBITS: 85 4
WEIGHTS: 89 30
Total bits: 119, endpoint bits: 64, weight bits: 30

**** Mode: 10 (CEM 12)
DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 13 (48) MODE6
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
ETQ: 20 8
ETQ: 28 5
EBITS: 33 4
EBITS: 37 4
EBITS: 41 4
EBITS: 45 4
EBITS: 49 4
EBITS: 53 4
EBITS: 57 4
EBITS: 61 4
WEIGHTS: 65 63
Total bits: 128, endpoint bits: 45, weight bits: 63

**** Mode: 11 (CEM 12)
DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 13 (48) MODE5
mode: 0 2
ETC1F: 2 1
ETC1D: 3 1
ETC1I0: 4 3
ETC1I1: 7 3
ETC2TM: 10 8
COMPSEL: 18 2
ETQ: 20 8
ETQ: 28 5
EBITS: 33 4
EBITS: 37 4
EBITS: 41 4
EBITS: 45 4
EBITS: 49 4
EBITS: 53 4
EBITS: 57 4
EBITS: 61 4
WEIGHTS: 65 63
Total bits: 128, endpoint bits: 45, weight bits: 63

**** Mode: 12 (CEM 12)
DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 19 (192) MODE6
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
ETQ: 20 8
ETQ: 28 5
EBITS: 33 6
EBITS: 39 6
EBITS: 45 6
EBITS: 51 6
EBITS: 57 6
EBITS: 63 6
EBITS: 69 6
EBITS: 75 6
WEIGHTS: 81 47
Total bits: 128, endpoint bits: 61, weight bits: 47

**** Mode: 13 (CEM 12)
DualPlane: 1, WeightRange: 0 (2), Subsets: 1, EndpointRange: 20 (256) MODE5
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
COMPSEL: 20 2
EBITS: 22 8
EBITS: 30 8
EBITS: 38 8
EBITS: 46 8
EBITS: 54 8
EBITS: 62 8
EBITS: 70 8
EBITS: 78 8
WEIGHTS: 86 31
Total bits: 117, endpoint bits: 64, weight bits: 31

**** Mode: 14 (CEM 12)
DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) MODE6
mode: 0 4
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
EBITS: 20 8
EBITS: 28 8
EBITS: 36 8
EBITS: 44 8
EBITS: 52 8
EBITS: 60 8
EBITS: 68 8
EBITS: 76 8
WEIGHTS: 84 31
Total bits: 115, endpoint bits: 64, weight bits: 31

UASTC BISE Endpoint Ranges table:

Range    Bits Trits Quints       UASTC Modes   Quant. Levels
7        2    1                  3             12
8        4                       2 9           16
12       3          1            4 7           40
13       4    1                  10 11         48
18       5          1            6             160
19       6    1                  0 12          192
20       8                       1 5 13 14     256


UASTC/BC7 2-subset partition pattern table:


const uint32_t TOTAL_ASTC_BC7_COMMON_PARTITIONS2 = 30

struct
{
  int m_bc7_pattern;
  int m_astc_seed;
// if true, invert the BC7 pattern's subset index to match ASTC's subset index
  bool m_invert;
} g_astc_bc7_common_partitions2[TOTAL_ASTC_BC7_COMMON_PARTITIONS2] =

{
  { 0, 28, false  }, { 1, 20, false }, { 2, 16, true }, { 3, 29, false },
  { 4, 91, true }, { 5, 9, false }, { 6, 107, true }, { 7, 72, true },
  { 8, 149, false }, { 9, 204, true }, { 10, 50, false }, { 11, 114, true },
  { 12, 496, true }, { 13, 17, true }, { 14, 78, false }, { 15, 39, true }, 
  { 17, 252, true }, { 18, 828, true }, { 19, 43, false }, { 20, 156, false }, 
  { 21, 116, false }, { 22, 210, true }, { 23, 476, true }, { 24, 273, false },
  { 25, 684, true }, { 26, 359, false }, { 29, 246, true }, { 32, 195, true },
  { 33, 694, true }, { 52, 524, true }
};


UASTC/BC7 3-subset partition pattern table:


const uint32_t TOTAL_ASTC_BC7_COMMON_PARTITIONS3 = 11;

const struct
{
  uint8_t m_bc7;
  uint16_t m_astc;

// maps ASTC to BC7 subset indices using g_astc_bc7_subset_index_perm_tables[][]
  uint8_t m_astc_to_bc7_perm;
} g_astc_bc7_common_partitions3[TOTAL_ASTC_BC7_COMMON_PARTITIONS3] =
{
  { 4, 260, 0 },  { 8, 74, 5 },  { 9, 32, 5 },  { 10, 156, 2 },
  { 11, 183, 2 },  { 12, 15, 0 },  { 13, 745, 4 },  { 20, 0, 1 },
  { 35, 335, 1 },  { 36, 902, 5 },  { 57, 254, 0 }
};


const uint8_t g_astc_bc7_subset_index_perm_tables[6][3] = 
{
{ 0, 1, 2 },{ 1, 2, 0 },{ 2, 0, 1 },{ 2, 1, 0 },{ 0, 2, 1 },{ 1, 0, 2 }
};

UASTC/BC7 2-subset partition pattern table (mapped to the BC7 3-subset patterns, used only in UASTC mode 7):


const uint32_t TOTAL_BC73_ASTC2_COMMON_PARTITIONS = 19;

const struct
{
uint8_t m_bc73;
uint16_t m_astc2;
// [0,5] - how to modify the BC7 3-subset pattern to match the ASTC pattern (LSB=invert). See convert_subset_index_3_to_2().
uint8_t k;
} g_bc73_astc2_common_partitions[TOTAL_BC73_ASTC2_COMMON_PARTITIONS] =
{
{ 10, 36, 4 },{ 11, 48, 4 },{ 0, 61, 3 },{ 2, 137, 4 },
{ 8, 161, 5 },{ 13, 183, 4 },{ 1, 226, 2 },{ 33, 281, 2 },
{ 40, 302, 3 },{ 20, 307, 4 },{ 21, 479, 0 },{ 58, 495, 3 },
{ 3, 593, 0 },{ 32, 594, 2 },{ 59, 605, 1 },{ 34, 799, 3 },
{ 20, 812, 1 },{ 14, 988, 4 },{ 31, 993, 3 }
};

uint32_t convert_subset_index_3_to_2(uint32_t p, uint32_t k)
{
    assert(k < 6);
    switch (k >> 1)
    {
    case 0:
        if (p <= 1)
            p = 0;
        else 
            p = 1;
        break;
    case 1:
        if (p == 0)
            p = 0;
        else 
            p = 1;
        break;
    case 2:
        if ((p == 0) || (p == 2))
            p = 0;
        else 
            p = 1;
        break;
    }
    if (k & 1)
        p = 1 - p;
    return p;
}


UASTC weight tables:


const uint32_t g_astc_bc7_weights1[2] = { 0, 64 };
const uint32_t g_astc_bc7_weights2[4] = { 0, 21, 43, 64 };
const uint32_t g_astc_bc7_weights3[8] = { 0, 9, 18, 27, 37, 46, 55, 64 };
const uint32_t g_bc7_weights4[16] = { 0, 4, 9, 13, 17, 21, 26, 30, 34, 38, 43, 47, 51, 55, 60, 64 };
const uint32_t g_astc_weights4[16] = { 0, 4, 8, 12, 17, 21, 25, 29, 35, 39, 43, 47, 52, 56, 60, 64 };

Note BC7 and ASTC use the same 2 and 3 bit weight tables, while the 4-bit tables are slightly different.

ARM's ASTC encoder now uses the Apache 2.0 license


LDR ASTC mode list (all CEM's the same)

$
0
0
ASTC is a very complex format. There are 407 valid 4x4 LDR ASTC encodings (or configurations?) that meet the following criteria:

- LDR only, 4x4 block size
- Planes: 1 or 2
- Subsets: 1-4 (one plane) or 1-3 (dual plane)
- CEM's: LDR only (0, 1, 4, 5, 6, 8, 9, 10, 12, 13), all CEM's the same for each subset
- Weight Ranges: 0-11
- Endpoint Ranges: 0-19

If the "all CEM's the same" rule was relaxed there would be a ridiculous number of modes to list (in the thousands).

Here's the list. I generated it by iterating through all the various configurations and trying to encode each to a valid ASTC block. To double check I used an open source ASTC decompressor to ensure the block was decodable without errors. I went through this list to determine the 18 modes UASTC uses.

DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 1 (3 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 2 (4 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 3 (5 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 4 (6 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 5 (8 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 1, WeightRange: 6 (10 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 1, WeightRange: 7 (12 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 8 (RGB Direct   ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 1, WeightRange: 8 (16 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 8 (RGB Direct   ), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 1, WeightRange: 9 (20 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 8 (RGB Direct   ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 1, WeightRange: 10 (24 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 4 (LA Direct    ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 8 (RGB Direct   ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 1, WeightRange: 11 (32 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 8 (RGB Direct   ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 2, WeightRange: 1 (3 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 8 (RGB Direct   ), EndpointRange: 12 (40 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 12 (40 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 12 (40 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 2, WeightRange: 2 (4 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 4 (LA Direct    ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 8 (RGB Direct   ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 3 (5 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 4 (LA Direct    ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 8 (RGB Direct   ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 2, WeightRange: 4 (6 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 4 (LA Direct    ), EndpointRange: 15 (80 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 15 (80 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 15 (80 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 8 (RGB Direct   ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 2, WeightRange: 5 (8 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 4 (LA Direct    ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 8 (RGB Direct   ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 2, WeightRange: 6 (10 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 4 (LA Direct    ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 8 (RGB Direct   ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 2, WeightRange: 7 (12 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 4 (LA Direct    ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 8 (RGB Direct   ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 2, WeightRange: 8 (16 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 2, WeightRange: 9 (20 levels), CEM: 0 (L Direct     ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 2, WeightRange: 9 (20 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 2, WeightRange: 9 (20 levels), CEM: 4 (LA Direct    ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 9 (20 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 9 (20 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 2, WeightRange: 10 (24 levels), CEM: 0 (L Direct     ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 2, WeightRange: 10 (24 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 2, WeightRange: 10 (24 levels), CEM: 4 (LA Direct    ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 2, WeightRange: 10 (24 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 2, WeightRange: 10 (24 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 2, WeightRange: 11 (32 levels), CEM: 0 (L Direct     ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 2, WeightRange: 11 (32 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 14 (64 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 8 (RGB Direct   ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 1 (3 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 12 (40 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 12 (40 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 12 (40 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 8 (RGB Direct   ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 3, WeightRange: 2 (4 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 4 (LA Direct    ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 8 (RGB Direct   ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 3, WeightRange: 3 (5 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 4 (LA Direct    ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 8 (RGB Direct   ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 3, WeightRange: 4 (6 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 4 (LA Direct    ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 8 (RGB Direct   ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 3, WeightRange: 5 (8 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 3, WeightRange: 6 (10 levels), CEM: 0 (L Direct     ), EndpointRange: 18 (160 levels)
DualPlane: 0, Subsets: 3, WeightRange: 6 (10 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 18 (160 levels)
DualPlane: 0, Subsets: 3, WeightRange: 6 (10 levels), CEM: 4 (LA Direct    ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 3, WeightRange: 6 (10 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 3, WeightRange: 6 (10 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 3, WeightRange: 7 (12 levels), CEM: 0 (L Direct     ), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 3, WeightRange: 7 (12 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 16 (96 levels)
DualPlane: 0, Subsets: 3, WeightRange: 7 (12 levels), CEM: 4 (LA Direct    ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 3, WeightRange: 7 (12 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 3, WeightRange: 7 (12 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 3, WeightRange: 8 (16 levels), CEM: 0 (L Direct     ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 3, WeightRange: 8 (16 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 3, WeightRange: 8 (16 levels), CEM: 4 (LA Direct    ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 3, WeightRange: 8 (16 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 3, WeightRange: 8 (16 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 3, WeightRange: 9 (20 levels), CEM: 0 (L Direct     ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 3, WeightRange: 9 (20 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 10 (24 levels)
DualPlane: 0, Subsets: 3, WeightRange: 10 (24 levels), CEM: 0 (L Direct     ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 10 (24 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 3, WeightRange: 11 (32 levels), CEM: 0 (L Direct     ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 3, WeightRange: 11 (32 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 4, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 4, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 4, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 4, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 4, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 4, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 4, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 0, Subsets: 4, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 4, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 4, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 8 (16 levels)
DualPlane: 0, Subsets: 4, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 4, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 19 (192 levels)
DualPlane: 0, Subsets: 4, WeightRange: 3 (5 levels), CEM: 4 (LA Direct    ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 4, WeightRange: 3 (5 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 4, WeightRange: 3 (5 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 4, WeightRange: 4 (6 levels), CEM: 0 (L Direct     ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 4, WeightRange: 4 (6 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 17 (128 levels)
DualPlane: 0, Subsets: 4, WeightRange: 4 (6 levels), CEM: 4 (LA Direct    ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 4, WeightRange: 4 (6 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 4, WeightRange: 4 (6 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 6 (10 levels)
DualPlane: 0, Subsets: 4, WeightRange: 5 (8 levels), CEM: 0 (L Direct     ), EndpointRange: 15 (80 levels)
DualPlane: 0, Subsets: 4, WeightRange: 5 (8 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 15 (80 levels)
DualPlane: 0, Subsets: 4, WeightRange: 5 (8 levels), CEM: 4 (LA Direct    ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 4, WeightRange: 5 (8 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 4, WeightRange: 5 (8 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 4, WeightRange: 6 (10 levels), CEM: 0 (L Direct     ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 4, WeightRange: 6 (10 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 13 (48 levels)
DualPlane: 0, Subsets: 4, WeightRange: 6 (10 levels), CEM: 4 (LA Direct    ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 4, WeightRange: 6 (10 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 4, WeightRange: 6 (10 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 4 (6 levels)
DualPlane: 0, Subsets: 4, WeightRange: 7 (12 levels), CEM: 0 (L Direct     ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 4, WeightRange: 7 (12 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 11 (32 levels)
DualPlane: 0, Subsets: 4, WeightRange: 8 (16 levels), CEM: 0 (L Direct     ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 4, WeightRange: 8 (16 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 9 (20 levels)
DualPlane: 0, Subsets: 4, WeightRange: 9 (20 levels), CEM: 0 (L Direct     ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 4, WeightRange: 9 (20 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 7 (12 levels)
DualPlane: 0, Subsets: 4, WeightRange: 10 (24 levels), CEM: 0 (L Direct     ), EndpointRange: 5 (8 levels)
DualPlane: 0, Subsets: 4, WeightRange: 10 (24 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 0 (2 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 8 (RGB Direct   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 17 (128 levels)
DualPlane: 1, Subsets: 1, WeightRange: 1 (3 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 17 (128 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 8 (RGB Direct   ), EndpointRange: 18 (160 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 18 (160 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 18 (160 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 1, WeightRange: 2 (4 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 8 (RGB Direct   ), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 1, WeightRange: 3 (5 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 4 (LA Direct    ), EndpointRange: 14 (64 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 14 (64 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 14 (64 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 8 (RGB Direct   ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 1, WeightRange: 4 (6 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 1, WeightRange: 5 (8 levels), CEM: 0 (L Direct     ), EndpointRange: 15 (80 levels)
DualPlane: 1, Subsets: 1, WeightRange: 5 (8 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 15 (80 levels)
DualPlane: 1, Subsets: 1, WeightRange: 5 (8 levels), CEM: 4 (LA Direct    ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 1, WeightRange: 5 (8 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 1, WeightRange: 5 (8 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 4 (LA Direct    ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 8 (RGB Direct   ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 2, WeightRange: 0 (2 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 13 (48 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 8 (RGB Direct   ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 12 (RGBA Direct  ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 1 (3 levels), CEM: 13 (RGBA Base+Ofs), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 8 (16 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 8 (RGB Direct   ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 2 (4 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 2, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 2, WeightRange: 3 (5 levels), CEM: 4 (LA Direct    ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 3 (5 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 3 (5 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 2, WeightRange: 4 (6 levels), CEM: 0 (L Direct     ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 2, WeightRange: 4 (6 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 5 (8 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 0 (L Direct     ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 20 (256 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 4 (LA Direct    ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 8 (RGB Direct   ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 9 (RGB Base+Ofs ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 0 (2 levels), CEM: 10 (RGB Base+Sc2A), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 1 (3 levels), CEM: 0 (L Direct     ), EndpointRange: 18 (160 levels)
DualPlane: 1, Subsets: 3, WeightRange: 1 (3 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 18 (160 levels)
DualPlane: 1, Subsets: 3, WeightRange: 1 (3 levels), CEM: 4 (LA Direct    ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 1 (3 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 1 (3 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 2 (4 levels), CEM: 0 (L Direct     ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 3, WeightRange: 2 (4 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 12 (40 levels)
DualPlane: 1, Subsets: 3, WeightRange: 2 (4 levels), CEM: 4 (LA Direct    ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 3, WeightRange: 2 (4 levels), CEM: 5 (LA Base+Ofs  ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 3, WeightRange: 2 (4 levels), CEM: 6 (RGB Base+Sc  ), EndpointRange: 4 (6 levels)
DualPlane: 1, Subsets: 3, WeightRange: 3 (5 levels), CEM: 0 (L Direct     ), EndpointRange: 7 (12 levels)
DualPlane: 1, Subsets: 3, WeightRange: 3 (5 levels), CEM: 1 (L Base+Ofs   ), EndpointRange: 7 (12 levels)

UASTC benchmark

$
0
0
RGB PSNR over a 1,048,576 4x4 block compression torture test (random blocks from 81 test textures):
                                     
    Near-opt BC7 (BC7E slower):   41.743
    astcenc_thorough:             40.892
    UASTC (veryslow)->ASTC        40.373
    UASTC (veryslow)->BC7         39.965
    UASTC (slower)->ASTC          40.163
    UASTC (slower)->BC7:          39.782
    UASTC (default)->ASTC         39.372
    UASTC (default)->BC7:         39.171
    UASTC (faster)->ASTC          39.269
    UASTC (fastest)->ASTC         34.654
    UASTC (fastest)->BC7          34.554
    ispc_texcomp ASTC alpha_slow: 39.768
    stb_dxt BC1 HIGHQUAL:         32.479
    UASTC (slower)->BC1:          32.148
    UASTC (fastest)->BC1          32.256
    UASTC (slower)->ETC1:         30.956
    UASTC (fastest)->ETC1:        30.113
    UASTC (slower)->R11:          37.942

The 4096x4096 .PNG is here.

The EAC R11 format is R PSNR, and is included for comparison purposes.

Notice that the UASTC->BC1 quality actually increased when going from "slower" to "fastest" mode. This is because in "fastest" mode, almost all the blocks used UASTC mode 0, which is more compatible with BC1. (UASTC has 1-2 bits of BC1 hints per block that allow the UASTC block to be converted directly to BC1 blocks, skipping real-time encoding.)


BC1/BC7/ASTC encoding notes

$
0
0
There are probably only a dozen or so developers interested in this level of detail about high quality texture encoding, but here you go. I've been spending a lot of time exploring fast BC1 encoding again, which triggered this blog post. Here's where I'm currently at:


ispc_texcomp and libsquish both use SIMD instructions, while the others are scalar.

This is BC1's "Pareto Frontier", which is a key concept used in lossless compression benchmarking. (There are 2 BC1 codecs missing: the ones in NVidia Texture Tools and AMD Compressonator. I don't think either would change this graph in a fundamental way, but I'll add them.) This applies to BC7/UASTC/ASTC (RGB/RGBA Direct CEM's) too, because the same core algorithms are used on each subset. Any basic improvements made here will benefit all similar endpoint-centric texture formats. So this frontier matters to us a great deal. (Note that GPU encoders may be much faster in an absolute sense, but they all boil down to using the same basic algorithms which is what we're really interested in here.)

For BC1 encoding (and this applies to BC7/ASTC and GPU encoders too), here are the key things you should do to best balance quality vs. perf that I've learned:

1. Do PCA, find 2 colors in block furthest apart along this axis. Use these colors as initial endpoints.

Note it's slightly better (and faster/simpler) to use 2 colors from the block as initial endpoints - not the versions projected along the axis. (The "stb_dxt" approach.)

Interestingly, the PCA step can be approximated. This all-integer approximation is surprisingly effective, except on crazy outlier images like frymire (where it still performs admirably well, especially with 2 least squares passes). You can also specialize the common grayscale case (see the same "alternate" encoder function).

2. Compute initial selectors using these endpoints.

There are numerous approaches, but the one I like best computes a trial selector index using a scaled dot product that results in a selector index from [0,N-1], clamps this to [1,N-1], then computes the errors of using colors[trial_s-1] and colors[trial_s] and chooses the best. This avoids having to check every block color (a big win for 16/32 color BC7/UASTC blocks).

This is like stb_dxt's method, except we compute the actual colorspace error to two trial colors nearest the projection.

3. Least Squares (LS) using these selectors. If LS fails, try the block's average color using optimal solid-color tables. Find optimal selectors.

There are many LS methods, see this code for two different approaches. One uses an incremental PCA approach that works well in 4D, the other computes covariance then uses 3-8 power iterations.

4. Optional: Try LS one more time (that's what STB_DXT_HIGHQUAL does). Good win.

Importantly, try LS a second time even if the first time failed and you chose endpoints from your optimal solid color tables. This is a small win.

5. For higher quality, carefully vary the selectors and try least squares:
- Try encoding the block's average color using optimal single-color tables. (This is a surprising win for formats with low bit endpoint components.)
- Try incrementing all the minimum selectors+LS.
- Try decrementing all the maximum selectors+LS.
- Try both incrementing the min and decrementing the max selectors+LS.

These selector manipulations are big wins. Others are possible. One usesa precomputed table driven approachof best unique total orderings to try given the current total ordering. (I've been tweeting this over the past couple days. It's a big win.) This exploits the property of some BC1 selector total orderings being used much more often than others:

More info here.

The other tries scaling the selectors to better exploit endpoint interpolation (tiny/marginal win).

Notes:
Least squares gives you floating point endpoints, which must be quantized to 5 or 6 bit components for BC1 (and similar for BC7/ASTC). To do this correctly, use Castano's optimal rounding method: https://gist.github.com/castano/c92c7626f288f9e99e158520b14a61cf

The optimal rounding method applies to all the formats.

Also, you need to carefully tie break between selectors that result in the same encoding error: https://twitter.com/richgel999/status/1243894923000254466

Why? Because how you break ties subtly interacts with the following least squares pass. (I called this "improved rounding" for some reason in my post-quarantined state.)

It's possible for both endpoints to quantize into a single colorspace voxel, and your encoder unnecessarily loses "freedom". We deal with this in our UASTC/BC7 encoders by manually "pulling" the endpoints apart. It's a tricky problem that needs more attention.

Note if you're doing BC7 you must implement p-bits correctly or you're totally wasting the format's potential:
https://richg42.blogspot.com/2018/04/proper-pbit-computation-in-bc7-texture.html

Most of the above applies to the basic "RGB/RGBA Direct" modes in UASTC/ASTC too.

stb_dxt's BC1 encoder is, as far as I can tell, Pareto optimal for scalar BC1 encoding (once you add Castano's optimal endpoint rounding, fix its selector tie breaking, and the precision of the axis vector used in its selector determination step). If you can improve the quality of scalar stb_dxt BC1 without slowing it down, it's likely to be an important change that will benefit all the endpoint-centric texture formats.

For reference, here's Simon Brown's "DXT Compression Techniques" blog post and link to libsquish:
http://sjbrown.co.uk/2006/01/19/dxt-compression-techniques/
https://github.com/svn2github/libsquish

I've examined all of the available GPU/CPU encoders I can find for endpoint-centric formats, and the above algorithms are the most competitive I know about (highest quality per unit of CPU time). Generally, for every .25-.5 dB you can push a SIMD encoder "up", the faster it can be made to go for the same average quality (as quality and perf. are interrelated).

More on how BC1 is approximated by actual GPU's (BC3-5 are too):
https://twitter.com/richgel999/status/1244638912401809409
https://twitter.com/richgel999/status/1244657623695339520
http://www.ludicon.com/castano/blog/2009/03/gpu-dxt-decompression/


Lookup table based real-time PVRTC encoding

$
0
0
I've found a table-based method of improving the output from a real-time PVRTC encoder. Fast real-time encoders first find the RGB(A) bounds of each 4x4 block to determine the block endpoints, then they evaluate the interpolated endpoints at each pixel to determine the modulation values which minimize the encoded error. This works okay, but the results are barely acceptable in practice due to banding artifacts on smooth features.

One way to improve the output of this process is to precompute, for all [0,255] 8-bit component values, the best PVRTC low/high endpoints to use to encode that value assuming the modulation values in the 7x7 pixel region are either all-1 or 2 (or all 0, 1, 2, or 3):

// Tables containing the 5-bit/5-bit L/H endpoints to use for each 8-bit value      
static uint g_pvrtc_opt55_e1[256];
static uint g_pvrtc_opt55_e2[256];

// Tables containing the 5-bit/4-bit L/H endpoints to use for each 8-bit value     
static uint g_pvrtc_opt54_e1[256];
static uint g_pvrtc_opt54_e2[256];

const int T = 120;

for (uint c = 0; c < 256; c++)
{
    uint best_err1 = UINT_MAX;
    uint best_l1 = 0, best_h1 = 0;
    uint best_err2 = UINT_MAX;
    uint best_l2 = 0, best_h2 = 0;

    for (uint l = 0; l < 32; l++)
    {
        const int lv = (l << 3) | (l >> 2);

        for (uint h = 0; h < 32; h++)
        {
            const int hv = (h << 3) | (h >> 2);

            if (lv > hv)
                continue;

            int delta = hv - lv;
            // Avoid endpoints that are too far apart to reduce artifacts
            if (delta > T)
                continue;

            uint e1 = (lv * 5 + hv * 3) / 8;

            int diff1 = math::iabs(c - e1);
            if (diff1 < best_err1)
            {
                best_err1 = diff1;
                best_l1 = l;
                best_h1 = h;
            }

            uint e2 = (lv * 3 + hv * 5) / 8;
            int diff2 = math::iabs(c - e2);
            if (diff2 < best_err2)
            {
                best_err2 = diff2;
                best_l2 = l;
                best_h2 = h;
            }
        }
    }

    g_pvrtc_opt55_e1[c] = best_l1 | (best_h1 << 8);
    g_pvrtc_opt55_e2[c] = best_l2 | (best_h2 << 8);
}

// 5-bit/4-bit loop is similar

Now that you have these tables, you can loop through all the 4x4 pixel blocks in the PVRTC texture and compute the 7x7 average RGB color surrounding each block (it's 7x7 pixels because you want the average of all colors influenced by each block's endpoint accounting for bilinear endpoint interpolation). You can look up the optimal endpoints to use for each component, set the block's endpoints to those trial endpoints, find the best modulation values for the impacted 7x7 pixels, and see if the error is reduced or not. The overall error is reduced on smooth blocks very often. You can try this process several times for each block using different precomputed tables.

For even more quality, you can also use precomputed tables for modulation values 0 and 3. You can also use two dimensional tables [256][256] that have the optimal endpoints to use for two colors, then quantize each 7x7 pixel area to 2 colors (using a few Lloyd algorithm iterations) and try those endpoints too. 2D tables result in higher quality high contrast transitions.

Here's some psuedocode showing how to use the tables for a single modulation value (you can apply this process multiple times for the other tables):

// Compute average color of all pixels influenced by this endpoint
vec4F c_avg(0);

for (int y = 0; y < 7; y++)
{
const uint py = wrap_or_clamp_y(by * 4 + y - 1);
for (uint x = 0; x < 7; x++)
{
const uint px = wrap_or_clamp_x(bx * 4 + x - 1);

const color_quad_u8 &c = orig_img(px, py);

c_avg[0] += c[0];
c_avg[1] += c[1];
c_avg[2] += c[2];
c_avg[3] += c[3];
}
}

// Save the 3x3 block neighborhood surrounding the current block
for (int y = -1; y <= 1; y++)
{
    for (int x = -1; x <= 1; x++)
    {
        const uint block_x = wrap_or_clamp_block_x(bx + x);
        const uint block_y = wrap_or_clamp_block_y(by + y);
        cur_blocks[x + 1][y + 1] = m_blocks(block_x, block_y);
    }
}

// Compute the rounded 8-bit average color
// c_avg is the average color of the 7x7 pixels around the block
c_avg += vec4F(.5f);
color_quad_u8 color_avg((int)c_avg[0], (int)c_avg[1], (int)c_avg[2], (int)c_avg[3]);

// Lookup the optimal PVRTC endpoints to use given this average color,
// assuming the modulation values will be all-1
color_quad_u8 l0(0), h0(0);
l0[0] = g_pvrtc_opt55_e1[color_avg[0]] & 0xFF;
h0[0] = g_pvrtc_opt55_e1[color_avg[0]] >> 8;

l0[1] = g_pvrtc_opt55_e1[color_avg[1]] & 0xFF;
h0[1] = g_pvrtc_opt55_e1[color_avg[1]] >> 8;

l0[2] = g_pvrtc_opt54_e1[color_avg[2]] & 0xFF;
h0[2] = g_pvrtc_opt54_e1[color_avg[2]] >> 8;

// Set the block's endpoints and evaluate the error of the 7x7 neighborhood (also choosing new modulation values!)
m_blocks(bx, by).set_opaque_endpoint_raw(0, l0);
m_blocks(bx, by).set_opaque_endpoint_raw(1, h0);

uint64 e1_err = remap_pixels_influenced_by_endpoint(bx, by, orig_img, perceptual, alpha_is_significant);
if (e1_err > current_best_err)
{
    // Error got worse, so restore the blocks
    for (int y = -1; y <= 1; y++)
    {
        for (int x = -1; x <= 1; x++)
        {
            const uint block_x = wrap_or_clamp_block_x(bx + x);
            const uint block_y = wrap_or_clamp_block_y(by + y);

            m_blocks(block_x, block_y) = cur_blocks[x + 1][y + 1];
        }
    }
}

Here's an example for kodim03 (cropped to 1k square due to PVRTC limitations). This image only uses 2 precomputed tables for modulation values 1 and 2 (because it's real-time):

Original:


Before table-based optimization:
RGB Average Error: Max:  86, Mean: 1.156, MSE: 9.024, RMSE: 3.004, PSNR: 38.577


Endpoint and modulation data:





After:
RGB Average Error: Max:  79, Mean: 0.971, MSE: 6.694, RMSE: 2.587, PSNR: 39.874



Endpoint and modulation data:





The 2D table version looks better on high contrast transitions, but needs more memory. Using 4 1D tables followed by a single 2D lookup results in the best quality.

The lookup table example code above assumes the high endpoints will usually be >= than the low endpoints. Whatever algorithm you use to create the endpoints in the first pass needs to be compatible with your lookup tables, or you'll loose quality.

You can apply this algorithm in multiple passes for higher quality. 2-3 passes seems sufficient.

For comparison, here's a grayscale ramp encoded using PVRTexTool (best quality), vs. this algorithm using 3 passes:

Original:



PVRTexTool:

Lookup-based algorithm:



New BC1 benchmark

$
0
0
Optimizing BC1 encoding is still useful and interesting because the same core algorithms are used in BC7 and ASTC/UASTC encoders. Most improvements made to BC1 encoding carry over nicely to the 2-bit and 3-bit selector modes of other formats.

Here's my latest benchmark:



The highest performing samples (above 37 dB) are rgbcx in 3-color block mode, where it can use transparent black colors (selector 3) for opaque black or very dark texels. (The only other BC1 encoder that might support this mode is the one in NVidia Texture Tools, but I'm not sure.) This technically turns opaque textures into textures with a useless alpha channel, but if the engine or shader just ignores alpha then this mode performs exceptionally well in the average case. The flags are cEncodeBC1Use3ColorBlocksForBlackPixels | cEncodeBC1Use3ColorBlocks

This mode is super useful because it allows the 3-color block encoder to focus the endpoints on the brighter texels within the block, potentially greatly increasing quality. Blocks with very dark or black texels are common in practice.

If your engine supports ignoring the alpha channel in sampled BC1 textures then everyone using BC1 should be using encoders that support this.

Data:

rgbcx.h flags:

- h is cEncodeBC1HighQuality
- ut is cEncodeBC1UseLikelyTotalOrderings
- ub is cEncodeBC1Use3ColorBlocksForBlackPixels
- 3 is cEncodeBC1Use3ColorBlocks

From the benchmarks I've seen it appears NVidia Texture Tools BC1 is around the same perf. as libsquish at slightly higher quality:


I believe this was rgbcx using 10 total orderings (the default setting). The max is 32, and every additional total ordering increases average quality. So at higher settings rgbcx is likely competitive against nvtt while being faster.

I'm currently working on integrating NVTT into my test app.

CPU BC1 Encoding Pareto Frontier

$
0
0
rgbcx.h now defines the BC1 Pareto Frontier for high quality CPU BC1 encoding (i.e. it's stronger than all other available practical high quality CPU encoders for both performance and quality):


Data:

Image

I didn't include AMD Compressonator's encoder because in previous benchmarks (conducted by others) it was beaten by a weaker version of rgbcx.h for both perf. and quality.

The overall CPU BC1 Pareto frontier is defined by ispc_texcomp (at low quality: ~33.1 dB) and rgbcx for any higher quality level. We're going to need SIMD to compete against ispc_texcomp BC1 (a weak stb_dxt clone), which is my next major goal.

To get rgbcx to compete against icbc for max. quality I had to add prioritized cluster fit support for 3-color blocks (not just 4).

It's possible to permit rgbcx to go to even higher quality levels by enlarging the total ordering tables. They're currently limited to 32 entries per total ordering.

I think rgbcx.h's max quality is slightly higher than icbc's HQ mode because prioritized cluster fit can afford to do optimal rounding and evaluate accurate MSE errors in every trial. Regular cluster fit can't afford to do so because it has to evaluate so many total orderings.

Links:
rgbcx: https://github.com/richgel999/bc7enc
libsquish: https://github.com/richgel999/libsquish
icbc: https://github.com/castano/icbc/blob/master/icbc.h

AMD GPU BC1 decoding lookup tables

$
0
0
Here are the lookup tables you can use to determine how AMD GPU's decode BC1 textures: https://pastebin.com/raw/LSgn0ent

These tables were gathered straight from a Radeon RX 580 by using a small D3D9 app that rendered a textured BC1 quad with point sampling and did a CPU readback. I used this same D3D9 app on an NVidia 1080 and the pixels I read back exactly matched what the NV BC1 formulas on the web predicted, so I'm confident in the approach.

For selectors 0 and 1, the 5->8 and 6->8 endpoint conversion just uses bitshifts/OR's (same as ideal BC1). For 4-color selector 2, use the tables. For selector 3, just invert the low/high endpoints. (I've verified you can do this.) For 3-color selector 2, use the tables.

To access the tables, use [color0_component*32+color1_component], or *64 for 6-bits:
Block Compression (Direct3D 10) - Win32 appsdocs.microsoft.com

Converting the tables to formulas sounds like an interesting puzzle.

Example showing exactly how to use the tables to decode AMD BC1:



BC1 encoding initial endpoint determination benchmark

$
0
0
Benchmark of BC1 encoders using different methods to determine the initial endpoints: 

stb_dxt.h PCA: 35.754 dB, .551 us/block 
rgbcx.h PCA: 35.794, .651 
rgbcx.h PCA+inset: 35.925, .640 
rgbcx.h 2D LS+inset+opt round: 35.920 dB, .541 
rgbcx.h bounds+inset+XY covar: 35.836 dB, .472

This is across 100 textures, so even small avg. improvements are significant. Amazingly, the inset method (a few lines of code) buys rgbcx.h PCA .131 dB! All encoders should be doing this. You *must* pay attention to every little detail in these texture encoders.

Quality is performance in competitive texture block encoding, so even small boosts in quality allow us to dial down the # of total orders to check for the same average quality. This leads to a more competitive encoder.

Methods:

- bounds+inset+XY covar method is Castano's/van Waveren's. 
All encoders should be applying the "inset" method describes in this paper, because from a quantization perspective it makes perfect sense.

- 2D LS is Humus's method, ported to mostly integer math, with added inset+optimal rounding to 565: 

- stb_dxt.h and rgbcx.h PCA is 3D integer PCA (3x3 covar+4 power iters, pick 2 colors along principle axis). 

- PCA+inset+optimal rounding does PCA, picks 2 colors, then lerps the 2 colors by 1/16 or 15/16, then optimal rounds to 565.

.basis file format specification

$
0
0
[This is a work in progress. It will be copied & pasted into the Basis Universal wiki.]

The Basis Universal GPU texture codec supports reading and writing ".basis" files. Currently the file format supports ETC1S or UASTC 4x4 texture data:

  • ETC1S is a simplified subset of ETC1.

The mode is always differential (diff bit=1), the Rd, Gd, and Bd color deltas are always (0,0,0), and the flip bit is always set. ETC1S texture data is fully 100% compliant with all existing software and hardware ETC1 decoders. Existing encoders can be easily modified to limit their output to ETC1S.

  • UASTC 4x4 is a 19 mode subset of the ASTC texture format. Its specification is here. UASTC texture data can always be losslessly transcoded to ASTC.

At a high level, a typical .basis file consists of multiple sections:

  • The file header
  • Optional ETC1S compressed endpoint/selector codebooks
  • Optional ETC1S Huffman table information
  • A required "slice" description array describing the resolutions and file offset/compressed sizes of each texture slice present in the file
  • 1 or more slices containing ETC1S or UASTC compressed texture data. 
  • For future expansion, the format supports an "extended" header which may be located anywhere in the file. This section contains .PNG-like chunked data. 

Apart from the header, which must always be present at the start of the file, the other sections can appear in any order.


Enums

.basis file enums:
enum basis_texture_type
{
  cBASISTexType2D = 0,
  cBASISTexType2DArray = 1,
  cBASISTexTypeCubemapArray = 2,
  cBASISTexTypeVideoFrames = 3,
  cBASISTexTypeVolume = 4,
  cBASISTexTypeTotal

};

enum basis_slice_desc_flags
{
  cSliceDescFlagsHasAlpha = 1,
  cSliceDescFlagsFrameIsIFrame = 2

};

enum basis_tex_format
{
  cETC1S = 0,
  cUASTC4x4 = 1

};

enum basis_header_flags
{
  cBASISHeaderFlagETC1S = 1.
  cBASISHeaderFlagYFlipped = 2,
  cBASISHeaderFlagHasAlphaSlices = 4

};


File Header

The file header must always be at the beginning of the file. The individual values are byte aligned and always little endian.

struct basis_file_header
{
  uint16      m_sig;              // 2 byte file signature
  uint16      m_ver;              // File version
  uint16      m_header_size;      // Header size in bytes, sizeof(basis_file_header) or 0x4D
  uint16      m_header_crc16;     // CRC16/genibus of the remaining header data

  uint32      m_data_size;        // The total size of all data after the header
  uint16      m_data_crc16;       // The CRC16 of all data after the header

  uint24      m_total_slices;     // The number of compressed slices 
  uint24      m_total_images;     // The total # of images
         
  byte        m_tex_format;       // enum basis_tex_format
  uint16      m_flags;            // enum basis_header_flags
  byte        m_tex_type;         // enum basis_texture_type
  uint24      m_us_per_frame;     // Video: microseconds per frame

  uint32      m_reserved;         // For future use
  uint32      m_userdata0;        // For client use
  uint32      m_userdata1;        // For client use

  uint16      m_total_endpoints;          // ETC1S: The number of endpoints in the endpoint codebook 
  uint32      m_endpoint_cb_file_ofs;     // ETC1S: The compressed endpoint codebook's file offset relative to the header
  uint24      m_endpoint_cb_file_size;    // ETC1S: The compressed endpoint codebook's size in bytes

  uint16      m_total_selectors;          // ETC1S: The number of selectors in the selector codebook 
  uint32      m_selector_cb_file_ofs;     // ETC1S: The compressed selector codebook's file offset relative to the header
  uint24      m_selector_cb_file_size;    // ETC1S: The compressed selector codebook's size in bytes

  uint32      m_tables_file_ofs;          // ETC1S: The file offset of the compressed Huffman codelength tables.
  uint32      m_tables_file_size;         // ETC1S: The file size in bytes of the compressed Huffman codelength tables.

  uint32      m_slice_desc_file_ofs;      // The file offset to the slice description array, usually follows the header

  uint32      m_extended_file_ofs;        // The file offset of the "extended" header and compressed data, for future use
  uint32      m_extended_file_size;       // The file size in bytes of the "extended" header and compressed data, for future use
};

Details:
  • m_sig is always 'B' * 256 + 's', or 0x4273.
  • m_ver is currently always 0x10.
  • m_header_size is sizeof(basis_file_header). It's always 0x4D.
  • m_header_crc16 is the CRC-16 of the remaining header data. The CRC-16 parameters are "CRC-16/genibus"(aka CRC-16 EPC, CRC-16 I-CODE, CRC-16 DARC). See the "CRC-16" section for more information.
  • m_data_size, m_data_crc16: The size of all data following the header, and its CRC-16.
  • m_total_slices: The total number of slices, from [1,2^24-1]
  • m_total_images: The total number of images (where one image can contain multiple mipmap levels, and each mipmap level is a different slice).
  • m_tex_format: basis_tex_format. Either cETC1S (0), or cUASTC4x4 (1).
  • m_flags: A combination of flags from the basis_header_flags enum.
  • m_tex_type: The texture type, from enum basis_texture_type
  • m_us_per_frame: Microseconds per frame, only valid for cBASISTexTypeVideoFrames texture types.
  • m_total_endpoints, m_endpoint_cb_file_ofs, m_endpoint_cb_file_size: Information about the compressed ETC1S endpoint codebook: The total # of entries, the offset to the compressed data, and the compressed data's size.
  • m_total_selectors, m_selector_cb_file_ofs, m_selector_cb_file_size: Information about the compressed ETC1S selector codebook: The total # of entries, the offset to the compressed data, and the compressed data's size.
  • m_tables_file_ofs, m_tables_file_size: The file offset and size of the compressed Huffman tables for ETC1S format files. 
  • m_slice_desc_file_ofs: 
    The file offset to the array of slice description structures. There will be m_total_slices structures at this file offset.
  • m_extended_file_ofs, m_extended_file_size: The "extended" header, for future expansion. Currently unused.




Yet another BC1 encoder benchmark

$
0
0
stb_dxt v1.09, icbc, rgbcx v1.12, original crunch, and Unity's optimized variant of crunch. Both 4 and 3 color blocks can be used, but transparent texels are not utilized to get black/dark texels in this benchmark. Across a diverse assortment of 100 textures (not just images).



Same benchmark except this time with 3-color transparent texels used for black or dark texels in rgbcx (purple samples):


Here's an update, now with nvdxt.exe (black sample) and ispc_texcomp (brown sample). Note that the nvdxt.exe time is approximate because I had to spawn nvdxt.exe and it loads a .png and saves a .dds file. I did spawn it twice, once without timing it, then immediately again timing it.


nvdxt.exe command line:

nvdxt.exe -nomipmap -quality_highest -rms_threshold 50 -file image.png -output nvcompressed.dds -dxt1c -weight 1.0 1.0 1.0


This is why we're working on Basis.

LZHAM and "crunch" IP will be placed into the Public Domain on 9/15/2020

$
0
0

 As the owner of the "LZHAM" and "crunch" software IP, I have decided to place these two works into the Public Domain in the United States, expressly waiving copyright protection. The upload placing these works into the Public Domain will occur on 9/15/2020 around noon EST.

Viewing all 302 articles
Browse latest View live