Binomial stuff

April 30, 2017, 12:27 pm

≫ Next: Basis's RDO DXTc compression API

One MS employee recently said to Stephanie (my partner) that (paraphrasing) "your company isn't stable and can't possibly last". My reply: We've been in business for over a year now, and our business is just a natural extension and continuation of our careers. I've been programming since 1985, and developing commercial data compression and other software since 1993. I've been doing this for a while and I'm not going to stop anytime soon.

Having my own small consulting company vs. just working full-time for a single corporation is just a natural next step to me. One thing I really liked about working at Valve was the ability to wheel my desk to virtually anywhere in the company and start adding value. I can now "wheel my desk" to anywhere in the world, and the freedom this gives us is amazing.

Binomial is a self-funded startup. We work on both development contracts and our current product (Basis). We haven't taken any investment money. Our "runway" is basically infinite.

↧

Basis's RDO DXTc compression API

June 19, 2017, 2:47 pm

≫ Next: Seattle

≪ Previous: Binomial stuff

This is a work in progress, but here's the API to the new rate distortion optimizing DXTc codec I've been working on for Basis. There's only one function (excluding basis_get_version()): basis_rdo_dxt_encode(). You call it with some encoding parameters and an array of input images (or "slices"), and it gives you back a blob of DXTc blocks which you then feed to any LZ codec like zlib, zstd, LZHAM, Oodle, etc.

The output DXTc blocks are organized in simple raster order, with slice 0's blocks first, then slice 1's, etc. The slices could be mipmap levels, or cubemap faces, etc. For highest compression, it's very important to feed the output blocks to the LZ codec in the order that this function gives them back to you.

On my near-term TODO list is to allow the user to specify custom per-channel weightings, and to add more color distance functions. Right now it supports either uniform weights, or a custom model for sRGB colorspace photos/textures. Also, I may expose optional per-slice weightings (for mipmaps).

I'm shipping the first version (as a Windows DLL) tomorrow.

// File: basis_rdo_dxt_public.h
#pragma once

#include <stdlib.h>
#include <memory.h>

#ifdef BASIS_DLL_EXPORTS
#define BASIS_DLL_EXPORT __declspec(dllexport)
#else
#define BASIS_DLL_EXPORT
#endif

#if defined(_MSC_VER)
#define BASIS_CDECL __cdecl
#else
#define BASIS_CDECL
#endif

namespace basis
{
const int BASIS_VERSION = 0x0100;

typedef unsigned int basis_uint;
typedef basis_uint rdo_dxt_bool;

enum rdo_dxt_format
{
cRDO_DXT1 = 0,
cRDO_DXT5,
cRDO_DXN,
cRDO_DXT5A,

cRDO_DXT_FORCE_DWORD = 0xFFFFFFFF
};

const basis_uint RDO_DXT_STRUCT_VERSION = 0xABCD0001;

const basis_uint RDO_QUALITY_MIN = 1;
const basis_uint RDO_QUALITY_MAX = 255;

struct rdo_dxt_params
{
basis_uint m_struct_size;
basis_uint m_struct_version;

rdo_dxt_format m_format;

basis_uint m_quality;

basis_uint m_alpha_component_indices[2];

basis_uint m_lz_max_match_dist;
basis_uint m_output_block_size;

basis_uint m_num_color_endpoint_clusters;
basis_uint m_num_color_selector_clusters;

basis_uint m_num_alpha_endpoint_clusters;
basis_uint m_num_alpha_selector_clusters;

float m_l;
float m_selector_rdo_quality_threshold;
float m_endpoint_selector_rdo_quality_threshold;

float m_selector_rdo_quality_threshold_low;
float m_endpoint_selector_rdo_quality_threshold_low;

float m_block_max_y_std_dev_rdo_quality_scaler;

basis_uint m_endpoint_refinement_steps;
basis_uint m_selector_refinement_steps;
basis_uint m_final_block_refinement_steps;

float m_adaptive_tile_color_psnr_derating;
float m_adaptive_tile_alpha_psnr_derating;

basis_uint m_endpoint_rdo_max_search_distance;

rdo_dxt_bool m_optimize_final_endpoint_clusters;
rdo_dxt_bool m_optimize_final_selector_clusters;

rdo_dxt_bool m_srgb_metrics;
rdo_dxt_bool m_debugging;
rdo_dxt_bool m_debug_output;
rdo_dxt_bool m_hierarchical_mode;
rdo_dxt_bool m_multithreaded;
};

inline void rdo_dxt_params_set_to_defaults(rdo_dxt_params *p)
{
memset(p, 0, sizeof(rdo_dxt_params));

p->m_struct_size = sizeof(rdo_dxt_params);
p->m_struct_version = RDO_DXT_STRUCT_VERSION;

p->m_format = cRDO_DXT1;

p->m_quality = 128;

p->m_alpha_component_indices[0] = 0;
p->m_alpha_component_indices[1] = 1;

p->m_l = .001f;

p->m_selector_rdo_quality_threshold = 1.75f;
p->m_endpoint_selector_rdo_quality_threshold = 1.75f;

p->m_selector_rdo_quality_threshold_low = 1.3f;
p->m_endpoint_selector_rdo_quality_threshold_low = 1.3f;

p->m_block_max_y_std_dev_rdo_quality_scaler = 8.0f;

p->m_lz_max_match_dist = 32768;
p->m_output_block_size = 8;

p->m_endpoint_refinement_steps = 2;
p->m_selector_refinement_steps = 2;
p->m_final_block_refinement_steps = 1;

p->m_adaptive_tile_color_psnr_derating = 1.5f;
p->m_adaptive_tile_alpha_psnr_derating = 1.5f;
p->m_endpoint_rdo_max_search_distance = 8;

p->m_optimize_final_endpoint_clusters = true;
p->m_optimize_final_selector_clusters = true;

p->m_hierarchical_mode = true;

p->m_multithreaded = true;
}

const basis_uint RDO_DXT_MAX_IMAGE_DIMENSION = 16384;

struct rdo_dxt_slice_desc
{
// Pixel dimensions of this slice. A slice may be a mipmap level, a cubemap face, a video frame, or whatever.
basis_uint m_image_width;
basis_uint m_image_height;
basis_uint m_image_pitch_in_pixels;

// Pointer to 32-bit raster image. Format in memory: RGBA (R is first byte, A is last)
const void *m_pImage_pixels;
};

} // namespace basis

extern "C" BASIS_DLL_EXPORT basis::basis_uint BASIS_CDECL basis_get_version();

extern "C" BASIS_DLL_EXPORT bool BASIS_CDECL basis_rdo_dxt_encode(
const basis::rdo_dxt_params *pEncoder_params,
basis::basis_uint total_input_image_slices, const basis::rdo_dxt_slice_desc *pInput_image_slices,
void *pOutput_blocks, basis::basis_uint output_blocks_size_in_bytes);

↧

Seattle

June 25, 2017, 2:48 pm

≫ Next: Why crunch likes uncompressed texture data

≪ Previous: Basis's RDO DXTc compression API

I'm by no means an expert on anything San Diego, having been there only around 1.5 months since leaving Seattle. I did spend 8 years in Seattle though, and here's what I think:

- Seattle is just way too dark of a city for me to live there year round. Here's Seattle vs. San Diego's sunshine (according to city-data.com).

The winter rain didn't bother me much at all. It was the lack of sun. (Hint to Seattle-area corporate recruiters: Fly in candidates from sunnier climates like Dallas to interview during July-August.)

- There's a constant background noise and auditory clutter to Seattle and the surrounding areas that's just getting louder and louder as buildings pop up and people (and their cars) move in.

Eventually this background noise got really annoying. Even downtown San Diego is surprisingly peaceful and quiet by comparison.

- Seattle's density is both a blessing and a curse. It's a very walkable city, so going without a car is possible if you live and work in the right places.

The eastside and westside buses can be incredibly, ridiculously over packed. Seattle needs to seriously get its public transportation act together.

- As a pedestrian, I've found Seattle's drivers to be so much nicer and peaceful on the road vs. San Diego's. CA drivers seem a lot more aggressive.

- San Diego is loaded with amazing beaches. Seattle - not so much.

A few misc. thoughts on Seattle and the eastside tech workers I encountered:

I lived and worked on the eastside (near downtown Bellevue) and westside (U District) for enough time to compare and contrast the two areas. The people in Seattle itself are generally quite friendly and easy going. Things seem to change quickly once you get to the eastside, which feels almost like a different state entirely.

I found eastside people to be much less friendly and living in their own little worlds. I wish I had spent more of my time living in Seattle itself instead of Bellevue. Culturally Bellevue feels cold and very corporate.

The wealthier areas on the eastside seemed the worse. Wealth and rudeness seem highly correlated. So far, I've yet to meet a Bellevue/Redmond tech 10-100 millionaire (or billionaire) that I found to be truly pleasant to be around or work with. I also learned over and over that there is only a weak correlation between someone's wealth and their ability to actually code. In many cases someone's tech wealth seemed to be related to luck of the draw, timing, personality, and even popularity. Some of the wealthiest programmers I met here were surprisingly weak software engineers.

I've seen this happen repeatedly over the years: Average software engineers get showered with mad cash and suddenly they turn inward, become raging narcissistic assholes, and firmly believe they and their code is godly. Money seems to bring out the worse personality traits in people.

↧

Why crunch likes uncompressed texture data

August 17, 2017, 1:29 am

≫ Next: Things learned while running your own self-funded startup

≪ Previous: Seattle

We've recently gotten some interest in creating a RDO compressor specifically for already compressed textures, which is why I'm writing this.

crunch works best with (and is designed for) uncompressed RGBA texture data. You can feed crunch already compressed data (by compressing to DXT, unpacking the blocks, and throwing the unpacked pixels into the compressor), but it won't perform as well. Why you ask?

crunch uses top down clusterization on the block endpoints. It tries to create groups of blocks that share similar endpoints. Once it finds a group of blocks that seem similar enough, it then uses its DXT endpoint optimizers on these block clusters to create the near-optimal set of endpoints for that cluster. These clusters can be very big, which is why crunch/Basis can't use off the self DXT/ETC compressors which assume 4x4 blocks.

DXT/ETC are lossy formats, so there is no single "correct" encoding for each input (ignoring trivial inputs like solid-color blocks). There are many possible valid encodings that will look very similar. Because of this, creating a good DXT/ETC block encoder that also performs fast is harder than it looks, and adding additional constraints or requirements on top of this (such as rate distortion optimization on both the endpoints and the selectors) just adds to the fun.

Anyhow, imagine the data has already been compressed, and the encoder creates a cluster containing just a single block. Because the data has already been compressed, the encoder now has the job of determining exactly which endpoints were used originally to pack that block. crunch tries to do this for DXT1 blocks, but it doesn't always succeed. There are many DXT compressors out there, each using different algorithms. (crunch could be modified to also accept the precompressed DXT data itself, which would allow it to shortcut this problem.)

What if the original compressor decided to use less than 4 colors spaced along the colorspace line? Also, the exact method used to interpolate the endpoints colors is only loosely defined for DXT1. It's a totally solvable problem, but it's not something I had the time to work on while writing crunch.

Things get worse if the endpoint clusterization step assigns 2+ blocks with different endpoints to the same cluster. The compressor now has to find a single set of endpoints to represent both blocks. Because the input pixels have already been compressed, we're now forcing the input pixels to lie along a quantized colorspace line (using 555/565 endpoints!) two times in a row. Quality takes a nosedive.

Basis improves this situation, although I still favor working with uncompressed texture data because that's what the majority of our customers work with.

Another option is to use bottom-up clusterization (which crunch doesn't use). You first compress the input data to DXT/ETC/etc., then merge similar blocks together so they share the same endpoints and/or selectors. This approach seems to be a natural fit to already compressed data. Quantizing just the selector data is the easiest thing to do first.

↧

Things learned while running your own self-funded startup

September 27, 2017, 2:58 pm

≫ Next: On whiteboard coding interviews

≪ Previous: Why crunch likes uncompressed texture data

Here's a brain dump of the things we've learned while running our business and shipping our first product (Basis).

My experience at Valve somewhat helped prepare me for doing this. Working at Valve was like a microcosm of working at your own company. You needed to find customers, interact with them, and figure out what was valuable to them. (You also needed to identify "competitors" and do your best to ignore or respond to whatever challenges they might throw your way.) Financial concerns weren't an issue, but time and your reputation at the company was. I noticed a feedback loop there: The more success you got at Valve, the easier it was to find projects to help out on. As you earned "Valve Bucks" doors got opened much easier.

Entering Valve with basically zero Valve Bucks was a big challenge. It wasn't enough to merely be a good engineer at Valve. If you were a good engineer with zero communication skills your chances at surviving and thriving when I was there were pretty low. If you acted like an asshole and didn't have many friends it didn't matter how good you were or how awesome your accomplishments were. People like this would be fired sooner or later.

Anyhow, running your own company has a number of additional challenges. There are no bi-weekly paychecks, no free lunches, no PTO, no yearly Hawaiian vacation, and no on-site lawyers. You are now in the real world, and you're leaving the high school like corporate drama behind. Everything, including staying financially solvent, is now your responsibility.

Some things we've learned:

1. This is a dramatically more mature way of working vs. full-timing. Your boss is basically the bank. Keeping your account in the green is like an optimization problem. If you fail you go under, or you wind up in the arms of potentially predatory investors.

2. You want a product ASAP. Contract work is basically linear income relative to time, while products can be exponential. Just choose a product and ship it. If it fails, try again and again, because the things you learned while working on the first product will help you immensely on your second.

3. Products can take a long time to develop and monetize. Contract work can bring in immediate income, but only a trickle. The big challenge is working on contracts to stay afloat in the short term, but also finding time to work on your product for long term success.

4. There are lots of ways to stay funded until your product takes off: You can use savings, loans from friends, investor funds, government grants, and income from short contracts. I would recommend staying away from investors as much as you can, because once you get in bed with investors you no longer totally own your company (and it can be basically taken away from you).

5. Every decision must be made extremely carefully. Bad decisions cost money.

6. The large companies move very slowly. Do not place any bets on getting paid quickly by large companies, no matter how happy they say they are with you.

So, at least with a software middleware product, I would first target small customers because they move more quickly.

7. If you have a product that a very large company really wants, they'll still do everything they can to delay purchasing it for the market price. They'll try to hire you or your partner(s) away individually, or they'll wait as long as possible to see if you encounter hard financial times and go under. They won't come and just offer to license your product or buy you out until they've exhausted all other possibilities.

8. If your product offers evaluation licenses, then be very careful with the eval time period. Some companies will purposely demand very long evals as a form of negotiation leverage.

9. A company can feign interest in licensing your product, get your lawyer bogged down negotiating terms of the license (or eval license), then pull away or suddenly change their mind at the last minute. This costs money. To avoid this, a "put up or shut up" mentality can help. Either the company accepts your eval license with little fuss, or just move on.

10. No Hard Sells: If the company you are negotiating with gets overly emotional about the terms in your eval license, then move on. Either they want the value your product offers, or they don't.

11. Research pricing: Your competitor(s) will publicly advertise low-ball prices to help lure in customers, but once you start negotiating with them the price goes up (sometimes massively). Talk to your competitor's customers and just ask them what they actually paid, and you'll be amazed at how much software middleware is actually worth in the market.

The publicly advertised price is basically just for corporate programmers, who generally don't understand the true market value of their code when properly packaged as a product. The public price is optimized so programmers won't feel bad about how underpaid they are, but it won't be too low so the coders will still perceive the product as having sufficient value.

Research the concept of "Death Prices". If the price is too low, it won't be perceived as having enough value to bother with, and low prices won't sustain your efforts. Set the price sufficiently high and let the market set the actual price. Most likely, if you're a programmer, you'll set the price too low because you've been brainwashed into thinking that your software doesn't have much value.

Large companies will pay high prices to be first to use your software, if it's perceived to be groundbreaking or awesome enough.

12. Find a good lawyer. Get your eval and software license figured out early. This is going to cost money, so save up. A lawyer with patent and software license experience is a huge bonus.

13. Interactions with real customers is priceless. Ask them what they want. For us, we were amazed at all the different ways the open source predecessor of Basis (crunch) was utilized. We pivoted our strategy to RDO encoders based off customer feedback. Our long term roadmap is based off what customers are actually doing with our software right now.

14. Open source is forever: Be extremely careful releasing open source software. "Thou shall not release too much functionality or features as open source". Open sourcing your work is both a blessing and a curse, and can be actually dangerous from a patent troll perspective.

Open source is great, because potential customers will get a chance to try out your work without spending a dime or ever talking to you. This boosts your credibility. On the flip side, if you give away too much, you are basically competing against yourself when you attempt to monetize your work as a product.

Your open source release should be a demo of the product, and no more. Give things away with the goal of eventually converting the users of the free software into paying customers.

Even if you don't intend on turning the software into a product, always keep in mind ways it can be eventually monetized. Your time is worth something.

Talk to every user of your open software software that you can find. Gather intelligence about how they actually use your software.

15. Your company must appear and be stable at all costs. If one year your large and expensive GDC booth isn't present, people will notice and you'll lose business. Even the biggest game middleware vendors have had serious cashflow problems. One almost went under a few years ago until they were bailed out by a big player. Even the biggest players sometimes take contract work to stay in the green, because product income isn't reliable.

16. Align yourself well: Being associated with CoMotion Labs and Khronos was invaluable to us. At CoMotion we were exposed to tons of other startups, and this cultural immersion was valuable.

17. Perception and psychology is extremely important. If you're a programmer, you're probably going to suck at the skills needed to bring your software to market. Find a partner who compliments you well.

18. You need friends, inside and outside of companies. Make as many friends as you can.

19. Some big corps can be very nasty:

"OMG, you can't do this due to patents!"
"You'll never take off because your price is too high"
"You'll run out of money and just come work for us, so we'll just wait you out"
"You must work for us because we're going to write this software ourselves and that'll impact your market share"

Corporate programmers at these megacorps can be horrifically nasty. Also, study and become acutely aware of triangulation when dealing with large hierarchical companies.

20. Some large teams have egos and won't want to license your software because of it. The challenge to licensing software in this situation will be overcoming this institutional ego, or just waiting to see how things pan out.

21. Be aware that companies talk to each other. A larger corp can use a smaller corp to help establish prices.

22. If you're at a company with deep pockets and you want to really have influence, offer the potential of a very large license fee for key software middleware. It works.

23. Do not blindly sign NDA's. Get a checklist from your lawyer and read them very carefully. If you try to negotiate over a clause in the NDA that totally sucks and the company refuses to budge, then move on.

Also, the NDA negotiation process can be very revealing. If the company is hard to deal with at this stage, then it's safer to just move on.

24. Treat your employees well, and with respect. We don't have any employees, but we've learned a lot while talking to employees at other middleware companies.

25. Your company will be defined just as much (if not more) by the customers you turn down vs. the ones you take.

If you get a bad feeling from a potential customer, or they aren't respectful, or they treat you substantially differently vs. how they treat your partner, then it's probably best to move on. Selling software like this is actually the establishment of a relationship, and every new relationship you take on has both risks and rewards. We're careful about who we work with.

↧

On whiteboard coding interviews

November 12, 2017, 8:07 pm

≫ Next: "Universal" GPU texture/image format examples

≪ Previous: Things learned while running your own self-funded startup

I'm in a ranty mood this evening. Looking through my past, one thing that bothers me is the ritual called "whiteboarding".

I've taken and given a lot of these interviews. I personally find the process demeaning, dehumanizing, biased, and subjective. And if the company uses the terms "cultural fit" or "calibration" when teaching you how to whiteboard, be wary.

My first software development interview was in 1996. I walked in, showed my Game Developer Magazine articles and demos (in DOS of course), spoke with the developers and my potential manager, and they made me an offer. Looking back, I was so young, inexperienced and naive at 20 years old. It was a tough gig but we shipped a cool product (Montezuma's Return). There was no whiteboard, all that mattered was the work and results.

Anyhow, my interview at Blue Shift was similar. No whiteboard, just lots of meetings.

At Ensemble (Microsoft), I got a contract gig at first. This turned into a full-time gig. The interviews there were informal and very rarely (if ever) involved problem solving on a whiteboard.

Right before Ensemble, I also interviewed at Microsoft ATG. It was a stressful, heavy duty whiteboard interview with several devs. It was intense, and that night I fell asleep at the table of an unrelated dinner with friends. I got an offer, but Ensemble's was better. I later learned it was basically a form of "Trauma Bonding". Everyone else did it, so you had to go through it too to get "in". Overall, I remember the Microsoft engineers I interviewed with seemed to be all tired and somewhat stressed out, but they were very professional and respectful.

After Age3 shipped, I interviewed at Epic. I was tired from crunching on Age3, and was unprepared. It was the most horrific interview I've ever taken or seen. Incredibly unprofessional. The devs didn't want to be interviewing anyone. I flopped this interview (and probably dodged a bullet as the working conditions there at the time seemed really bad). Nobody at Ensemble knew I interviewed there, and I'm glad I didn't leave.

Years later, I interviewed at Valve. It was another exercise in Trauma Bonding. I was so stressed it was ridiculous, and I found Dune's "The Litany Against Fear" helpful. Somehow I got through, and looking back I think Gabe Newell (who visited Ensemble and met me there) might have helped get me in without my knowledge. I was lucky to get in at all, because I interviewed as a generalist. If I had interviewed as a graphics specialist I never could have gotten in (because at the time the gfx coders at Valve had a pact of sorts, and unless you were Carmack it was virtually impossible to survive the whiteboard).

Anyhow, one of my points is, I've been pretty lucky to get to work at these places. I learned a lot. Most of the companies I worked at didn't use whiteboarding. Interestingly, the cultures of the non-whiteboarding companies were much healthier.

I sometimes wonder: if I wasn't a white male, or overweight, with all other things unchanged, would I have got these gigs? I very highly doubt it.

I've implemented and shipped tons of algorithms, products, etc. But I hate whiteboarding.

I think the tech companies use this process to slow down horizontal movement between companies. It keeps labor in place, and developer prices down. The "price" of moving between companies (in terms of stress, and potential "whiteboard defeat") is purposely held high. Independent of whether or not this is done purposely, this is the end result.

If you've got to whiteboard, it can't hurt to practice like crazy. And read a few whiteboard coding interview books. Also, tap your social network and find devs who interviewed at your target company, and ask them what happened. If companies are going to do this, at least make them put some effort into it.

↧

"Universal" GPU texture/image format examples

November 22, 2017, 7:10 pm

≫ Next: Universal GPU texture format: DXT5 support

≪ Previous: On whiteboard coding interviews

The DXT1 images were directly converted from the ETC1 (really "ETC1S" - a compatible subset with no subblocks) data using a straightforward lookup table to convert the ETC1 base color to the DXT1 low/high colors, and the selectors were remapped appropriately using a byte from the lookup table. The ETC1->DXT1 lookup table is currently 3.75MB, and can be computed on the fly very quickly (using a variant of ryg_dxt) or (for higher conversion quality) precomputed offline.

The encoder in these examples is still my old prototype from 2016. I'm going to be replacing it with Basis's much better ETC1S encoder next. This format can also support alpha/grayscale data.

This format is a tradeoff: for slightly reduced quality, you can distribute GPU textures to most GPU's on the planet. Encode once, use anywhere is the goal. We are planning on distributing free encoders for Linux and Windows (and eventually OSX but it's not my preferred dev platform).

The current intermediate format design supports none, partial or full GPU transcoding. Full transcoding on the GPU will only work on those GPU's that support LZ in hardware (or possibly a compute shader). The process of converting the ETC1S data to DXT1 and the block unpack to either ETC1 or DXT1 can also be done in a shader, or the CPU. By comparison, crunch's .CRN design is 100% CPU oriented. We'll be releasing the transcoder and format as open source. It's an LZ RDO design, so it's compatible with any LZ (or whatever) lossless codec including GPU hardware LZ codecs. It'll support bitrates around .75-2.5 bpp for RGB data (using zlib).

All PSNR figures are luma PSNR. The ETC1 were software decoded from the ETC1 block texture data (actually "ETC1S" because all 4x4 pixel blocks use 5:5:5 base colors with no subblocks, so the differential color is 0,0,0).

ETC1 41.233

DXT1 40.9

ETC1 45.964

DXT1 45.322

ETC1 46.461

DXT1 44.865

ETC1 43.785

DXT1 43.406

ETC1 33.516

DXT1 33.339

↧

Universal GPU texture format: DXT5 support

November 23, 2017, 2:39 pm

≫ Next: More universal GPU texture format examples

≪ Previous: "Universal" GPU texture/image format examples

Got grayscale ETC1 to DXT5A conversion working, using a small 32*8*3 entry table. This work is for DXT5 support in the universal texture format. Now that this is working I can proceed to finishing the full universal encoder.

The groundwork is laid out and it's all downhill from here now. My main worry now is the ETC1S->DXT1 lookup table's size, which is currently around 3-4MB. It can be quickly computed dynamically at startup or on the fly as needed, or it can be precomputed into the executable.

Note none of these images were created with my best ETC1 encoder. They use an early prototype from late 2016 that has so-so quality. The main point of these experiments is to prove that efficiently converting ETC1 data to DXT1/5 is practical and looks reasonable. The encoder is now aware of DXT5A transcoding, but it is aware of the ETC1S->DXT1 transcoding (which helps a lot).

All stats are dB vs. the original image. This image's subtle gradients are hard to handle, you can see this in the DXT1 version.

To those who argue that a universal GPU texture format that is based off ETC1/DXT1 isn't high quality enough: You would be amazed at the low quality levels teams use with crunch/Basis. This tech isn't about achieving highest texture quality. It's about enabling easy distribution of supercompressed GPU texture data. It's a "JPEG-like format for GPU texture data", usable on mobile or desktop.

Original

ETC1 near-optimal 48.903

ETC1S 46.322 (universal format base image in ETC1 mode)

ETC1S->DXT1 45.664

ETC1S green channel converted to DXT5A (43.878)

Original

ETC1 near-optimal 51.141

ETC1S 46.461

ETC1S->DXT1 44.865

ETC1S green channel converted to DXT5A 46.107

↧

More universal GPU texture format examples

November 23, 2017, 6:37 pm

≫ Next: Universal GPU texture codec update

≪ Previous: Universal GPU texture format: DXT5 support

I've improved the quality of the ETC1S->DXT1 conversion process. All of these images come from the same exact ETC1 data. Only a straightforward transform is required on the compressed texture bits to derive the DXT1/DXT5A version. It's simple/fast enough to do in a Javascript transcoder.

ETC1:

DXT1:

DXT5A:

ETC1:

DXT1:

DXT5A:

ETC1:

DXT1:

DXT5A:

ETC1:

DXT1:

DXT5A:

ETC1:

DXT1:

DXT5A:

ETC1:

DXT1:

DXT5A:

ETC1:

DXT1:

DXT5A:

↧

Universal GPU texture codec update

November 24, 2017, 11:02 pm

≫ Next: 10 abusive company types

≪ Previous: More universal GPU texture format examples

I've reduced the size of the ETC1->DXT1 lookup table to around 85KB, vs. the previous 3.75MB. There's a slight loss in quality (around .1 - .3 dB), but it's worth it. The larger table can still be used. The worse artifacts occur on very high contrast blocks. The size of this table is a baseline tax (especially on web) of using this codec, so it must be lightweight.

The previous conversion table was 4D, one dimension for each component of the ETC1 base color (5:5:5 bits) and a final dimension for the intensity value (3 bits). The new method is 2D: one dimension for the 5-bit component, and another for the intensity. There are two tables, one for R/B and another for G, because in DXT1 G is 6 bits and R/B are 5. There are some additional complexities, but that's the gist of it. The transcoder has to do a tiny bit of per-block work in this scheme to determine how to map the ETC1 selectors to DXT1 selectors, but it all boils down to some table lookups and adds.

The 85KB table can be precomputed, computed on the fly, or computed once at init.

Original:

ETC1 near-optimal:

ETC1S (the universal texture):

DXT1:

DXT5A:

↧

10 abusive company types

February 4, 2018, 8:44 pm

≫ Next: Lessons learned while developing Age of Empires 1 Definitive Edition

≪ Previous: Universal GPU texture codec update

These categories were originally about abusive men, but my friend Stephanie noticed these categories could be adapted to describe abusive companies, too. From the book "Why Does He Do That?":

1. Drill Sergeant: Micromanages you, wants to control everything.

2. Mr. Sensitive: Builds up a public image of being a great company so people think you're crazy if you criticize them.

3. The Water Torturer: Is an expert at not doing anything OBVIOUSLY wrong, you feel wronged but can't pinpoint why and wonder if you're crazy.

4. The Demand Man (or Company): Everything seems fine if you never ask for anything, like a raise. If you do that, you're suddenly painted as ungrateful and treated poorly.

5. Mr. Right: Everything is fine so long as you don't question the company's actions or say anything critical about them.

6. The Player: Never lets you feel like the job is stable. Acts interested in you only to hook you in, then you're neglected and treated poorly again.

7. Rambo: Treats everyone like shit, but tells you you're special and an exception.

8. The Victim: You caused the company so much trouble, you really messed up that one time, any mistreatment happening to you now is making up for that.

9. The Terrorist: Reminds you of the power they have to ruin your career or life, so you better not go against them.

10. Bipolar: The company oscillates between being angry and then happy with you depending on the state of your current project. They become angry when a problem is identified, and when you fix it they are temporarily happy.

↧

Lessons learned while developing Age of Empires 1 Definitive Edition

February 14, 2018, 5:33 pm

≫ Next: Age DE's latency matrix

≪ Previous: 10 abusive company types

In late 2016 I began helping Forgotten Empires on Age 1 DE, a UWP app shipping in the Windows Store on Feb 20th. I only helped occasionally for the first couple months or so (because I was working on Basis and an aerospace project), but as the title got closer to shipping I spent more and more of my time working on Age problems. We started with the original 20 year old Age 1 codebase. Here are some of the things I've learned:

1. Get networking and multiplayer working early.
DE supports both traditional peer to peer (with optional TURN server relaying to handle problematic NAT routers), and a new client-server like mode ("host command forwarding") where all clients send their commands to the host which are then forwarded to the other clients. Age 1 uses a lockstep simulation model, except for most AI code which is only executed on the host (see here).

Do not underestimate the complexity of lockstep peer to peer RTS multiplayer games. If possible, choose an already debugged/shipped low-level networking library so you can focus on higher-level game-specific networking problems.

If you do use an off the shelf network library, test it thoroughly to help build a mental model of how it actually works (vs. how you think it works or how it's supposed to work). Develop a test app you can send the library developers to reproduce problems. If the library supports reliable in-order messaging then (at the minimum) put sequence numbers in all of your packets and assert if the library drops, reorders or duplicates packets in case there are bugs in the reliable layer.

For debugging purposes make sure all timeouts can be increased by a factor of 10x or whatever. Sometimes, debugging real-time network code is impossible in the debugger (because it inserts long pauses), so be prepared to do a lot of printf()-style debugging on multiple machines.

If you're taking an old codebase and changing it to use a new networking library or API, try to (at first) minimize the amount of changes you make to the original code. No matter how ugly it is, the original code worked, was bug fixed and shipped, and don't underestimate the value of this.

If you develop your own reliable messaging system, develop a network simulator testbed (which simulates packet loss, etc.) to automate the validation of this layer whenever it's modified and always keep it working.

Trust nothing and verify everything at multiple levels. CRC your packets, CRC the uncompressed data if you use packet compression, use session nonces in your connection-oriented layer to validate connections, validate that your reliable layer is actually reliable, etc. Make sure the initial connection process is well defined and completely understood. Everything needs timeouts of some sort and when sending unreliable messages any packet can get lost. Gaffer on Games is a great guide to this domain of problems.

Getting the game to run smoothly with X random machines across a variety of network conditions is difficult. Plan on spending a lot of time tuning the system which controls the game's turntime (command latency and sim tick rate). There are multiple sources of MP hitches (which players hate): Turntime too low (so one or more machines can't keep up with the faster ones), random CPU spikes caused by AI/pathing/etc., reliable messaging retransmit delays, random client latency spikes, AI's sending too much command data, etc. Develop strong tools to track these problems down when they occur in the field and not in your test lab.

Add cheat commands to the game to help simulate a wide range of various networking and framerate conditions.

If you send unreliable ping/pong packets to measure roundtrip client latency, filter the results because some routers are quite noisy. The statistics that go into computing the sim tick rate and turntimes should be well filtered.

Establishing the initial connections between two random machines behind NAT's is still a challenging problem - test this early.

Identify your most important packets and consider adding some form of forward error correction to them to help insulate the system from packet loss. In lockstep designs like Age, the ALL_DONE packets sent by each client to every other client to indicate end of turn are the most important and currently sent twice for redundancy. (Excluding AI's, there are no commands from the player on most turns!)

Internal testing doesn't mean much. You must have MP betas to discover the real problems. It seems virtually impossible to simulate network conditions as they occur in the wild, or the game running on customer machines. Make sure you get valuable test data back from MP betas to help diagnose problems.

Age DE's reliable messaging system is based on Brownlow's "A Reliable Messaging Protocol" in GPG 5. This is an elegant and simple NACK-based reliable protocol, except the retransmit method described in the article is not powerful enough and is sensitive to network latency (supporting only 1 packet retransmit request per roundtrip). We had to modify the system to support retransmit packets containing 64-bit bitmasks indicating which speciific packets needed to be resent.

2. Develop strong out of sync (OOS) detection tools early, and learn how to use them.
As a lockstep RTS codebase is modified you will introduce many mysterious and horrifying OOS problems. Don't let them smolder in the codebase, fix them early and fix new ones ASAP.

Functions which are not safe to use in the lockstep simulation should be marked as much. We had an accessor function which returned true if the entire map was visible, which got accidentally used in some code to determine if a building could be placed at a location. This caused OOS's whenever the user resigned (which locally exposes the entire map) and another client built walls. This little OOS took 2 days to track down.

If you are getting mysterious OOS's, you need to identify the initial cause of divergence and fix that, then repeat the OOS debugging process until no more divergences remain. Don't waste time looking at downstream effects (such as out of sync random number generators) - identify and fix that first divergence.

In Age, the original developers logged virtually everything they could in the lockstep sim. Some important events (such as where objects were being created) were left out, so we had to add unique "origin" parameters to all object creations so we knew where in the code objects were being created.

3. Do not underestimate the complexity and depth of UWP and Xbox Live development.
Your team will need at least 1-2 developers who live and breathe these platforms. These individuals are rare so you'll just have to bite the bullet and make an investment into these technologies.

4. Develop clean and defensive coding practices early on. Use static analysis, use debug heaps, pay attention to warnings, etc. Being sloppy here will increase your OOS rate and cause player and developer pain. Be smart and use every tool at your disposal.

5. Do not disable or break "old" logging code. Make sure it always compiles.
This logging code is invaluable for tracking down mysterious/rare problems and OOS's. The original developers put all this logging code in there for a reason..

6. Add debug primitives if the engine doesn't have any
This is a basic quality of life thing: You need the ability to efficiently render 2D text, debug primitives in the world, etc. If the engine doesn't support them then get them in early.

7. Profile early and make major engine architectural decisions based off actual performance metrics.
If your new renderer design relies on a specific way of rendering the game in a non-mainstream manner, then verify that your design will actually work in a prototype before betting the farm on it. Be willing to pivot to an alternate renderer design with better performance if your initial design is too slow.

Get perf. up early: Lockstep RTS multiplayer games can only tick the simulation at the rate of the slowest machine in the game. So if one machine is a dog and can only handle 20Hz, the game will feel choppy for everyone. Other major sources of perf problems like pathing or AI spikes will be obscured if rendering is running slow.

8. Figure out early on how to split up a singled threaded engine to be multithreaded.
Constraining an RTS to live on only a single thread is a recipe for performance disaster, especially if you are massively increasing the max map size and pop caps vs. the original title.

9. Many RTS systems rely on emergent behavior and are interdependent.
If you modify one of these systems, you MUST test the hell out of it before committing, and then be prepared to deal with the unpredictable downstream effects.

For example, modifying the movement code in subtle ways can break the AI, or cause it to behave suboptimally. The movement code in Age1 DE is like Starcraft's: an unholy mess. To be successful modifying code like this you must deeply understand the game and the entire system's emergent behavior.

Carelessly hacking the movement or path finding code in an RTS is akin to hacking the kernel in an OS: expect chaos.

10. Automated regression testing
The more you automate and objectify testing of movement, AI, etc. the happier your life will be and the easier you will sleep at night.

11. Playtest constantly and with enough variety
It's not enough to just play against AI's on the same map over and over. Vary it up to exercise different codepaths. You MUST playtest constantly to understand the true state of the title.

12. Assume the original developers knew what they were doing.
The old code shipped and was successful. If you don't understand it, most likely the problem is you, not the code.

For example, Age 1's original movement system has some weird code to accelerate objects as they moved downhill. This code didn't have a max velocity cap, so on very long hills units could move very quickly. We resisted modifying this code because it turns out it's a subtle but important aspect of combat on hills.

13. Don't waste time developing new templated containers and switching the engine to use them, but do reformat and clean up the old code.
Nobody will have the time to figure out your new fancy custom container classes, they'll just use std because we all know how they work.

Instead, spend that time making the old code readable so it can be enhanced without the developers going crazy trying to understand it: fix its formatting, add "m_" prefixes, etc.

↧

Age DE's latency matrix

February 21, 2018, 8:28 pm

≫ Next: On Age DE's pathing/movement

≪ Previous: Lessons learned while developing Age of Empires 1 Definitive Edition

Getting the netcode to work reliably in a peer to peer multiplayer title is tricky. Every peer must be able to quickly and reliably send and receive packets with every other peer, or the whole thing falls apart. Also, if any machine runs a turn slower than expected for any reason, the entire system will hitch while waiting for the slow peer to catch up. DE constantly monitors the pings and framerates of all connections and machines in a MP game, but it can only compensate so much for bad connections.

In DE's game lobby there's a 2D matrix of blocks that shows the systemwide roundtrip latencies between all players (circled in red):

Once you're in a lobby, the game's peer to peer multiplayer code (parts of which date back to the original game) is active, and your machine is actively communicating with all the other machines. Every 4 seconds your machine pings all the other clients, the results are sent to the host, and every few seconds the host then sends the entire matrix to all peers.

For each row of this matrix, the latency to all the other players is visualized. So the first block on row 2 represents the latency from player 2 to player 1, and the third block on row 2 is the latency from player 2 to player 3, etc. Grey means no response (yet), green is <200ms ping, yellow is <=400ms, and red is >400ms. The game won't start if there are any grey blocks (even in "dedicated server" mode). The ping matrix is not necessarily symmetrical, but usually is.

If a block has a thin blue rectangle around it, that means that client has to use a TURN server relay to get its packets to the other client due to NAT traversal issues. This means extra overhead.

The latencies visualized here are low pass filtered over approx. 8 pings.

The "Ping" column shows the local roundtrip latencies to the other clients. Each player will have its own unique column of values. Apart from maybe the host, I think this column is kind of useless, because it's only showing local latencies. It would have been better if it displayed each player's worst latency.

This matrix is used to compute the turntimes used during the actual game. I believe all peer to peer titles should display something like this, to help players quickly see at a glance how healthy the connections are between peers.

↧

On Age DE's pathing/movement

February 28, 2018, 7:27 pm

≫ Next: Basis v1.11 with universal GPU texture support has shipped

≪ Previous: Age DE's latency matrix

Typical Age DE forum post:

The pathfinding in the game is terrible.

First off, Age of Empires Definitive Edition is a remaster of Age of Empires. It's not a rewrite, it's not a new engine, and that's what we've been saying for almost a year. Age of Empires 1's path finding was really bad, as most reviews point out:

https://gamespot.com/reviews/age-of-empires-the-rise-of-rome-review/1900-2532811/

This is the code we started with in DE. Not Age 2, not new code, but the original code which had a ton of flaws and quirks we had to learn about the hard way. This code was almost a quarter of a century old, and it showed. The original movement/pathing code was very weak to say the least. Yet entire complex systems above it (combat, AI, etc.) depended on this super quirky movement/pathing code. It took multiple Ensemble engineers several years of development to go from Age 1 to Age 2 level pathing.

We made a number of improvements to the path finding and unit movement code without breaking the original system. It must be emphasized that Age1's pather and movement code is extremely tricky and hard to change without breaking a hundred things about the game or AI (sometimes in subtle ways). It was a very tricky balance. The current system still has problems with chokepoints, which can be fixed with more work, but we instead had to focus on multiplayer which had to basically be 85% rewritten.

Here's a list of fixes made so far to DE's pathing and movement code in the time I had, which was only like 2 months:

DE's pathing system's findPath() function was speeded up by approx 3-4x faster vs. Age1's
I performed around a dozen separate optimizations passes on the core pather. I implemented the A* early exploration optimization (eliminating 1 open list insertion/removal per iteration), and massively tuned the C++ code to generate reasonably efficient x64 assembly. We retested the pather and game thoroughly after each major optimization pass.
Age1's pather's A* implementation was outright broken (the open list management was flawed, so the cheapest node wasn't always expanded upon during each iteration). DE's pather fixes all these bugs and is a proper implementation of A*.
DE's pather gives up if after many thousands of iterations it can't make forward progress towards the goal, to avoid spending CPU cycles on hopeless pathing unnecessarily. (It's more complex than this, but that's the gist of it.)
Added multiple lane support to villager pathing.
Villagers can use one of two collision sizes (either small or large), so if a villager bumps into another friendly villager we can immediately switch to the smaller radius to avoid stopping. So basically, villagers can get very close to each other, avoiding gathering slowdowns.
Movement of units through single-tile openings was greatly improved and tested with all unit types. Age 1's handling of single tile openings was so bad that players would exploit it:
http://artho.com/age/placeb.html
The DE pather was modified to have a much higher max iteration count than Age1's, so longer and more complex routes can be found.
The per-turn pathing cap in Age1 was switched to short and long range pathing categories in DE. 8 short range paths can occur per turn, and for long range paths it supports up to 4 findPaths() per turn.
For short range paths, straight line paths are preferred vs. the tile path returned by findPath() if the straight line path is safe to traverse.
Boat movement was modified to have deceleration.
Waypoints along a path can be skipped if a unit can safely move from its current position to the next waypoint
Added support for 32-facing angles vs. Age's original 8. Also, the unit direction/facing angle is interpolated in DE, instead of "snapped" to like in Age1. The interpolation is purposely disabled when units switch angles during combat.
Added stuck unit detection logic to DE's movement code, to automatically detect and fix permanently stuck units (rare, but possible).
We ported Age2's entire obstruction manager into DE, replacing the old bitmap system. Units use circular obstructions, and buildings use square obstructions.
Added several new behaviors to the movement code to help with chokepoints: A "wait" behavior, that checks every second or so for up to 45 seconds to see if the unit can be moved to the destination, and a stuck unit "watchdog", which watches to see if a unit hasn't made forward progress and tries to switch behaviors to get the unit unstuck. Age1's code would just give up at the slightest problem.
The pather tries to path starting from the center of each tile, but this sometimes fails in tight spaces or with lots of units around. DE tries harder to find a good starting position, so movement through single tile openings isn't broken.
Path caching system: Villagers and boats can reuse previously found paths in DE, for efficiency.
In situations that Age1's pather would just outright give up and stop, DE's pather tries a lot harder to get the unit where it needs to go using several randomized fallback behaviors.

Age1's pathing/movement systems implements a form of randomized, emergent behavior. The units are basically like dumb ants. It's imperfect in chokepoints, but it's a continuation of the essence of what made Age 1 what it was. If all the units are moving in the same direction it can usually handle chokepoints (I tested this over and over with a wide variety of units on a pathing torture test scenario from MS before release). The fundamental behaviors the AI and combat systems expected were accurately preserved in DE's pather, which was our goal.

Instead of people saying "the pathing in DE sucks!", I would much rather hear about the specific issues with movement/pathing, and actually constructive suggestions on how to improve the system without breaking the game or turning it into Age 2.

↧

Basis v1.11 with universal GPU texture support has shipped

March 26, 2018, 9:52 pm

≫ Next: Basis update - now with PVRTC support!

≪ Previous: On Age DE's pathing/movement

We've sent drops to two companies so far. This is the first version that supports fast block-level transcoding of .basis files to multiple formats: ETC1 (mobile) or BC1-5 (desktop). This is a major milestone for us, because Basis is the first system available to support efficient platform independent distribution of highly compressed GPU texture data. We've been working up to this release for over a year.

For some encoded example images created during development, see this, this, or this post.

You encode your textures/images a single time, store a single set of .basis files (which are approximately the size of JPEG files), download the file on the remote device, and then transcode to the format you need for that device. Our transcoder converts the block-level data to DXT or ETC format GPU texture bits on the fly. The encoder is aware of all the formats and balances the quality levels of each.

.basis files consist of one or more 2D texture "slices", where each slice can be any dimension. Slices can be mipmap levels, tiles, cubemap faces, video frames, etc. - whatever you want.

We think the primary use case for .basis files are web apps of various types, or any kind of app that needs to distribute GPU texture data across a wide range of GPU devices. We've tested this solution on normal maps, diffuse maps, gloss maps, satellite photos, photographs, grayscale images, flight navigation maps, etc.

Anyhow, here's what you get with Basis:

bin, bin_linux, bin_osx: DLL/so/dylib's containing the precompiled encoder library (which is closed source) and several command line tools. The main tools are basiscomp (our new .basis file encoder, and our RDO ETC1 compressor) and rdodxt (uses our new RDO BC1-5 encoders that are around 10-25% better than crunch's).
basisexample: Shows how to use the encoder DLL to encode .basis or RDO .KTX files.
inc: Transcoder library source code/headers.
lib: static import library for encoder DLL
transcoding_demo: Sample that uses the included transcoder library (provided in source code form in the
rdodxt: Sample that uses the encoder DLL to do RDO BC1-5 compression

I've tried to keep the few API's in the product as simple as possible, so not much documentation is needed for them. The readme file covers them. Encoding involves filling out a struct and calling a single C function in the DLL. Transcoding slices is similar, except you use a couple simple methods in inc/basis_decoder.h.

↧

Basis update - now with PVRTC support!

March 30, 2018, 12:34 am

≫ Next: Basis GPU format support update

≪ Previous: Basis v1.11 with universal GPU texture support has shipped

Basis (our new GPU texture compression product and the successor to our popular open source crunch lib) now supports PVRTC1, along with ETC1 and BC1-5 (DXTc). This means a .basis file can be utilized on pretty much every GPU in the universe that matters, independent of platform or API. A .basis file is conceptually like JPEG but for GPU texture data, and can be used on the web (using Emscripten and WebGL) or by native apps (using a small C++ transcoder library).

All textures are 1024x1024 (due to PVRTC1 limitations). Click on each one to see them at full-res (they are reduced in size on the page itself).

Each image below was transcoded directly to each GPU format from the .basis file, and then converted to 24bpp .PNG. On my desktop, ETC1 is fastest (~3ms), followed by BC1/4 (~7.9ms), then PVRTC (~37ms). The transcoders (particularly PVRTC) are not yet fully optimized, and are written in straightforward C++.

The PVRTC transcoder really needs SIMD optimizations, which should give it a nice speed boost (probably around 2-3x). It would be trivial to thread the PVRTC transcoder too. The PVRTC's transcoder's quality is visually somewhere in between PVRTexTool's "Lower Quality" and "Good" settings. In many cases, it looks a little better than "Good", but it's a tossup.

Note that BC3 and BC5 formats are supported by calling the transcoder twice from different input image slices. So a RGBA GPU texture is encoded into two slices (sharing the same codebooks) in a single .basis file, and it transcodes to either two ETC1 textures, a ETC1 texture twice as high, or a single BC5 texture. PVRTC2 and ETC2 support will be very easy and transcode times will be comparable to ETC1 or BC1 (PVRTC1 will always be the most expensive). The PVRTC transcoder doesn't support alpha yet (it's next).

Image: laststarfighter_1024.basis, 133966 bytes, 1.022 bits/pixel

Original:

.basis->ETC1:

.basis->BC1:

.basis->BC4:

.basis->PVRTC:

Image: map_1024.basis, 180603 bytes, 1.38 bits/pixel

Original:

.basis->ETC1:

.basis->BC1:

.basis->BC4:

.basis->PVRTC:

Image: delorean_1024.png, 138894 bytes, 1.06 bits/pixel

Original:

.basis->ETC1:

.basis->BC1:

.basis->BC4:

.basis->PVRTC:

↧

Basis GPU format support update

April 1, 2018, 10:09 pm

≫ Next: Basis feature support

≪ Previous: Basis update - now with PVRTC support!

Our goal is to support all the GPU formats (literally). Here's an update on our format support:

We just added PVRTC1 4bpp and BC7 support. PVRTC1 quality is approximately equal to PVRTexTool's middle setting ("good"), and significantly better than its lower two settings. Max quality in BC7 mode is currently limited to BC1/ETC1-grade quality levels (what we're calling "baseline" quality).

We've devised several ways of improving the max quality to near-BC7 grade by storing extra data in the .basis file. (You can't get something for nothing!) This high quality data would be optional, so users that don't care about super high quality levels can disable it and the codec will just transcode the baseline data to BC7 instead.

Here's what we support transcoding .basis into right now, in order of transcoding speed from fastest to slowest:

ETC1
BC1
BC3-5
BC7: RGB
PVRTC1 4bpp RGB

Here are the formats we're going to eventually support in order of importance (with no changes to the .basis format needed):

PVRTC1 4bpp RGBA
ETC2 RGBA
BC7: RGBA
PVRTC1 2bpp RGB/RGBA
ASTC RGB/RGBA

None of these formats require raw RGB/RGBA pixel processing during transcoding, i.e. we aren't just using real-time GPU format compressors here. Transcoding occurs at the level of GPU blocks, endpoints, and selector/modulation values.

At some point, we're going to boost quality above baseline, to better exploit BC7/ASTC. Most of our early users of this tech (which aren't native game apps) are happy with baseline quality, so the priority of doing this is relatively low. (Games will probably want BC7/ASTC specific codecs anyway.) We are designing the .basis format with this eventual goal, so when we add "enhanced quality" support we won't break compatibility with older baseline-only transcoders.

We'll be posting benchmarks comparing .basis to crunch (and Unity's crunch) and releasing WebAssembly (or asm.js) demos within the upcoming weeks.

↧

Basis feature support

April 3, 2018, 9:24 pm

≫ Next: Imaginary GPU formats

≪ Previous: Basis GPU format support update

Here's what we support right now:

.basis universal format, which is transcodable to BC1-5, ETC1, PVRTC1 4bpp (currently opaque only), and BC7 (currently opaque only). Alpha support for PVRTC1/BC7 is coming, and enhanced quality for BC7 and ETC2 are on the way. This is a universal solution with two quality modes (baseline and BC7), so by its very nature it trade offs max achievable quality for GPU format support.

For really small images (think icon-size), .basis can switch to using fixed selector codebooks to cut down on selector codebook overhead
Format supports arbitrary resolution texture arrays, all referring to a single set of compressed codebooks.

RDO BC1-5 - Creates more compressible files than crunch's (RDO in crunch was an afterthought and was pretty dumb/low quality), but slower compression. We put the most effort into optimizing BC1's output for LZ coding. Supports up to 32K entry codebooks (vs. crunch's 8K).
RDO ETC1 - Supports all ETC1 features, and very high quality levels (up to 32K entry codebooks), usable even on complex normal maps.
ETC1 intermediate format (supports all features of the ETC1 format, i.e. flips and both differential and individual colors). 10-20% smaller files at same SSIM vs. Unity crunch's the last I checked.

All of these codecs have been utilized by customers for different purposes.

We don't support an intermediate file format exclusively for BC1-5, only ETC1. Instead, we're focusing on universal solutions first, and then we'll focus on an intermediate format solution for BC7, BC6H, and ASTC.

I get asked all the time how these solutions compare to crunch's. I'll be working on extensive benchmarks soon. I've learned a lot since I designed and wrote crunch in 2009.

↧

Imaginary GPU formats

April 4, 2018, 8:15 pm

≫ Next: BC7 encoding using weighted YCbCr colorspace metrics

≪ Previous: Basis feature support

Every once in a while I wonder about alternative GPU texture format encodings. (Why not? It's fun.) There must be a sweet spot somewhere along the continuum between BC1 and BC7. Something that is more complex than BC1 but simpler than BC7. (I somewhat dislike ASTC, mostly because of its insanely complex encoding format.)

Here's one idea for an 128-bit per 4x4 block format (8 bits/texel) that mashes together ETC1+BC7. One thing I learned from ETC1 is that a lot of bits can be saved by forcing each subset's principle axis to always lie along the intensity direction. With a strong encoder, this constraint isn't as bad as one would think.

The format only has two modes: opaque and transparent. The opaque mode has 3 subsets, and the transparent mode has 2 subsets for RGB and 1 subset for alpha. Each color has 1 shared pbit, and each mode has 16 partitions for colors.

The color encoding is "RGB PBit IntensityTable". The intensity tables could be borrowed from ETC1 and expanded to 8 entries. For the transparent blocks, two 8-bit alpha values are specified (like BC4), and by borrowing degeneracy breaking from BC7 we can shave one bit from the alpha selectors. "CompRot" is a BC7-style component rotation, so any of the channels can be encoded into alpha.

Some things I like about this format: equal precision for all components, and there are only two simple modes. The opaque mode is powerful but simple: always 3 subsets, with color and selector precision better than BC1 and even better than BC7's 3 subset modes. The transparent mode is more powerful than BC3 for RGB (better color precision, and 2 subsets), but weaker for alpha (2 bit selectors vs. 3).

The main downside is that each subset's endpoints are constrained to lie along the intensity axis. I've seen commercial games ship with normal maps encoded into ETC1 and DXT1 so I know this isn't a total deal breaker.

Opaque block:
ModeBit 1
Partition 4
Color0 777 1 3
Color1 777 1 3
Color2 777 1 3

Color selectors;
3 3 3 3
3 3 3 3
3 3 3 3
3 3 3 3

Total bits: 128

Transparent block:
ModeBit 1
Partition 4
Color0 666 3
Color1 666 3
AlphaLoHi 8 8
CompRot 2

Color selectors:
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2

Alpha selectors:
1 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2

Total bits: 128

A strong encoder would adaptively choose between opaque blocks and transparent blocks using various component rotations, to minimize overall error. Transparent blocks can be used even on all-opaque textures.

I have no idea if this format is useful. On a rainy day I'll make a simple encoder and compare it against BC1 and BC7.

↧

BC7 encoding using weighted YCbCr colorspace metrics

April 17, 2018, 10:06 am

≫ Next: A few Intel SPMD Compiler (ispc) C porting tips

≪ Previous: Imaginary GPU formats

I've written my second BC7 block encoder. My first was written in a straightforward way to gain experience with the format. My second was more focused on competing against the Fast ISPC Texture Compressor, but without using any SIMD, and was over 30x faster than my first attempt.

The BC7 encoders I've studied seem to be hyper focused on RGB PSNR metrics, which is just the wrong metric for many types of textures. Encoding authors that treat input textures as opaque arrays of 4x4 vectors are at a disadvantage in this domain. RGB PSNR tends to spread the error equally between the channels, which isn't what we want on sRGB textures. Instead, it's desirable to tradeoff a small amount of additional R/B error for less G error. This is what perceptual codecs like JPEG do: they transform the input into YCbCr space, then downsample and quantize the hell out of the CbCr coefficients because preserving chroma is a waste of bits.

Many other BC1 block compression codecs support weighted RGB metrics because in BC1 not doing so visually looks worse on sRGB photos/albedo textures/etc. Encoders using perceptual metrics look better on color gradients and with highly saturated blocks. Heavy usage of perceptual metrics dates back to at least NVidia's original nvdxt compressor, and it wasn't possible for crunch to compete against nvdxt without supporting perceptual metrics. The squish library recommends using perceptual metrics by default, because BC1 without perceptual metrics looks worse.

Anyhow, etc2comp by John Brooks takes things a step further and supports computing error metrics in weighted YCbCr space. Compared to vanilla RGB weighted metrics, this looks better in my experience writing Basis (especially with ETC1). I'm currently using weights (128,64,16).

Here's the REC 709 luma PSNR of 31 test textures encoded with ispc_texcomp (slow/highest quality - uses 7 modes) and my non-SIMD encoder in perceptual mode using just 4 modes:

The overall average PSNR for ispc_texcomp was 48.57, mine was 50.4. Even with ispc_texcomp's massive mode and SIMD advantages it does worse on this metric. ispc_texcomp doesn't support optimizing for perceptual metrics, which puts it at a huge disadvantage on many texture types.

I re-encoded the textures with linear metrics. My encoder used 6 modes: 0, 1, 3, 4, 5, and 6 (including all component rotations and the index flag).

ispc_texcomp's average PSNR was 46.77, mine was 46.50. My encoder can easily bridge this ~.25 dB gap (by using more modes and trying more partitions), but at a time penalty.

Note that ispc_texcomp in its best/slowest profile is pretty slow, and is much easier to compete against without SIMD code. It's just trying way too hard. It's faster in its lower quality "basic" profile, but it still doesn't support perceptual metrics so it'll continue to fight up a very steep hill.

For benchmarking, I ran each encoder in a single thread, and called ispc_texcomp with 64 blocks at a time.

Other findings: ispc_texcomp has a very weak mode 0 encoder, and it's weaker than it should be on grayscale textures. I'll blog examples soon.

↧

Latest Images