Quantcast
Channel: Richard Geldreich's Blog
Viewing all 302 articles
Browse latest View live

Binary delta compression and seed dictionaries as universal OS services

$
0
0
Some late Saturday night compression ideas:

Brotli's inclusion of a seed dictionary that helps it gain higher ratios on textual data, and our recent work on offline and real-time binary delta compression (patching), has given me this idea:

First, Brotli's static dictionary isn't "breaking" any rules at all. What they've done makes a lot of sense. They've tailored their codec for a very common use case (text), a place where it will have a lot of value. The seed dictionary is a legitimate part of the decompressor, except instead of pure executable code they also tacked on some interesting "universal" data to it.

Which got me thinking, the OS itself should just include a universal seed dictionary, or a set of them for different data types. A small set of seed data, like even just 4-16MB, would be fine for starters. This data can be automatically distributed to clients as part of an OS update or upgrade. It could contain natural language phrases, common x86 opcode sequences, XML, common JPEG quantization tables/markers/headers, common GLSL/HLSL keywords, etc. Once Linux does it, everybody will do it sooner or later.

Normally, when you want to compress some data for either real-time transmission, or archival purposes, you would use plain non-delta based compression. You can't assume the remote side has your seed dictionary right? But this is a weak if not flawed argument, because the remote side already has a compatible decompressor. They can surely download a little more data to have a good, stable universal seed dictionary too. And if they only download it once (through OS updates), then it's not a big cost.

Alternately, we can adopt a new codec that has a good built-in selection of universal seed data, and embed that into the OS. But the seed data should be available to OS processes as well, so they can compress/decompress data against it without having to carry around and distribute their own private seed dictionaries.

Other related thoughts:


Obviously, static dictionaries don't help in all cases. No harm done, just don't use them in those cases.

This method can be applied to networking too. The seed dictionary can be made part of the actual protocol. Packets can be delta compressed against app specific or universal seed dictionaries.

New codecs should just support seed dictionaries as required features. New LZ algorithms should be competitive against bsdiff in binary patching scenarios.

In the open source world, we need something better than bsdiff, which requires huge amounts of RAM on large files (and is just unreliable). Another interesting area of work would be to figure out the "best" universal seed dictionary to use for a OS service like this.

Plain lossless data compression and binary delta compression should not be separate concepts or tools: let's unify them. LZ-Sub shows one potential way forward.

Data Compression in Dungeon Boss

$
0
0
Dungeon Boss (DB) is a Unity iOS/Android title I helped Boss Fight Entertainment ship this year. I recently wrote a blog post basically containing a brain dump of all the low-level memory related things we learned in the trenches while trying to make the product stable.

Due to the sheer amount of content and code, DB couldn't have shipped at all on our target devices without aggressively utilizing every built-in Unity compression option we could get our hands on:

- Audio: We found and modified a custom editor script that allowed DB's sound designer to fine tune the music and FX audio bitrates to the absolutely minimum needed. We gave more bits to music vs. sound effects. (It also helped that DB's sound effects purposely have a classic 8-bit feel.)

- Meshes: We used Unity's built-in mesh compression feature as aggressively as we could. This caused a few issues on map meshes that we had to work around by hand.

- Animation: We tuned our FBX Importer compression settings several times to find the best balance between visible artifacts and memory/file savings.

- Texture compression: At the last minute, we had to switch the majority of our textures to PVRTC 2bpp to free up as much memory headroom as possible. Without doing this DB was too unstable on lower end iOS devices. Quality wise, 2bpp looked amazingly good (much better than I expected).

- Asset bundles: We disabled Unity 4.6's built-in LZMA asset bundle compression, grouped the assets by type, then we packed the raw type-grouped asset data using a better compressor (LZHAM). Thankfully, there were enough Unity modding tools available, and Unity was flexible enough, that we could do this without having access to the source code of Unity itself.

Also see this page in the Unity docs for more info on how to examine and reduce file sizes of Unity builds.

Light Indexed Deferred Rendering

$
0
0
I can't find this information anywhere on the net, so I'll put it here. I've always been fascinated by the various alternatives to pure deferred shading, so I was really excited when I first encountered a light indexed variant.

Backstory: Halo Wars

I tired of the challenges inherent in pure deferred rendering approaches by around 2005, so I purposely used several deferred rendering alternatives on Halo Wars. HW combined X360 specific multi-light forward lighting for precise lighting (or for lights that required dynamic shadowing and/or cookies), along with an approximate voxel-based method for much cheaper, lower priority "bulk" lights.

In the voxel based approximate approach, one volume texture contained the total incoming light at each voxel's center (irradiance), and the other contained the weighted average of all the light vectors from all the lights that influenced each voxel (with the total weight in alpha). From this data, it was possible to compute approximate exit radiances at any point/normal. This "voxel light buffering" system was very efficient and scalable, so it was also used to illuminate terrain chunks and meshes by most omni/spot lights in the game. As an added bonus, it was easy and cheap to illuminate particles using this approach too.

We couldn't use multi-light forward on everything because the GPU fillrate cost from each added light was too high, and our object vs. light overlap check (just simple bounding box vs. sphere or cone) was very rough. Especially on terrain chunks, the extra pixel cost was brutal.

I had the GPU update the voxel textures each frame. Lights became as cheap as super low-res sprites, so we could have hundreds of them. The voxel texture's bounds in worldspace always covered the terrain in the view frustum (up to a certain depth and world height). The resolution was either 64x64x16 or 128x128x16 (from memory).

I found this approach interesting because it could achieve both diffuse and specular lighting, and because it visually degraded well when more than one light hit a voxel. Anyhow, after Ensemble collapsed I put all this stuff into the back of my mind.

Source 2

When I first came to Valve after HW shipped (late 2009), I kept encouraging the graphics programmers to look for another viable alternative to deferred rendering. One day, in early to middle 2010, I came in to work and Shanon Drone showed me his amazing prototype implementation of a voxel based forward rendering variant of what is known as Light Indexed Deferred Lighting. (So what would the name be? Voxed Based Light Indexed Forward Lighting?)

From what I remember, his approach voxelized in projection space. Each voxel contained an integer index into a list of lights that overlapped with that voxel's worldspace bounding box. The light lists were stored in a separate texture. The CPU (using SIMD instructions) was used to update the voxel texture, and the CPU also filled in the lightlist texture.

The scene was rendered just like with forward lighting, except the pixel shader fetched from the voxel light index texture, and then iterated over all the lights using a dynamic loop. The light parameters were fetched from the lightlist texture. I don't remember where the # of lights per voxel was stored, it could have been in either texture.

The new approach was ported and tested on X360, and I remember we switched to packing the light indices (or something - it's been a long time!) into an array of constant registers to make the approach practical at all on X360.

I have no idea if this is published anywhere, but I remember the day very clearly as an internal breakthrough. It's been many years now, so I doubt anyone will mind me mentioning it.

Published material on light indexed deferred/forward rendering techniques (or similar variants):

I'll update this blog post as I run across links:

By Damian Trebilco:

Scatter graphs can be beautiful

$
0
0
This is amazing. zlib - please step aside!



zlib in serious danger of becoming obsolete

$
0
0

Intro


I’m now starting to deeply analyze the performance of two new general purpose data compression codecs. One is Google’s Brotli codec, another is a brand new codec from Rad Game Tools named “BitKnit”. Both codecs are attempting to displace zlib, which is used by the Linux kernel, and is one of the most used compression libraries in the world. So I’m paying very close attention to what’s going on here.

To put things into perspective, in the lossless compression world we’re lucky to see a significant advancement every 5-10 years. Now, we have two independently implemented codecs that are giving zlib serious competition on multiple axes: throughput, ratio, and even code size.

Summary


I’m now using what I think is a very interesting and insightful approach to deeply analyze the practical performance characteristics of lossless codecs. As I learned while working on the Steam Linux/SteamOS project, robust benchmarking can be extremely difficult in practice. So I'm still gathering and analyzing the data, and tweaking how it’s graphed. What I’ve seen so far looks very interesting for multiple reasons.

First, it's looking pretty certain that both BitKnit and Brotli compete extremely well against zlib's decompression performance, but at much higher (LZMA/LZHAM-like) compression ratios. Amazingly, BitKnit’s compressor is also extremely fast, around the same speed as zlib’s. (By comparison, at maximum compression levels, both Brotli’s and LZHAM's compressors are pretty slow.) The graphs in this post only focus on decompression throughput, however. I’m saving the compression throughput analysis for another post.

One rough way of judging the complexity of a compressor vs. others is to compare the number of lines of code in each implementation. BitKnit at 2,700 lines of code (including comments) is smaller than both LZ4 (3,306 - no comments), zlib's (23,845 - no comments, incl. 3k lines of asm), or LZHAM’s (11,651 - no comments). Brotli's is rather large at 47,919 lines (no comments), but some fraction of this consists of embedded static tables.

Interestingly to me, BitKnit’s decompressor uses around half the temporary work RAM of LZHAM's (16k vs. 34k).

New Benchmarking Approach


While writing and analyzing LZHAM I started with a tiny set (like 5-10) of files for early testing. I spent a huge amount of time optimizing the compressor to excel on large text XML files such as enwik8/9, which are popular in the lossless data compression world. I consider this a serious mistake, so I've been rethinking how to best benchmark these systems.

The new codec analysis approach I’m using runs each decompressor on thousands of test corpus files, then I plot the resulting (throughput, ratio) data pairs in Excel using scatter graphs. The data points are colored by codec, and the points are transparent so regions with higher density (or with data points from multiple overlapping codecs) are more easily visualized. This is far better than what the Squash Compression Benchmark does IMO, because at a single glance you can see the results on thousands of (hopefully interesting) files, instead of the results on only a single file at a time from a tiny set of corpus files.

I generated these scatter graphs on 12k data files from the final LZHAM vs. LZMA corpus. There is some value in using these data files, because I used this same test corpus to analyze LZHAM to ensure it was competitive against LZMA. This corpus consists of a mix of game data, traditional textual data, every other compression corpus I could get my hands on, some artificial XML/JSON/UBJ test data, and lots of other interesting stuff I’ve encountered on the various projects I’ve worked on over the years. (Unfortunately, I cannot publicly distribute the bulk of these data files until someone comes up with an approach that allows me to share the corpus in a one way encrypted manner that doesn’t significantly impact throughput or ratio. I have no idea how this could really be done, or even if it's possible.)

The Data


The X axis is decompression throughput (right=faster), and the Y axis is compression ratio (higher=better ratio or more compression). The very bottom of the graph is the uncompressible line (ratio=1.0).

Color code:

Black/Gray = LZHAM
Red = Brotli
Green = zlib
Blue = BitKnit
Yellow = LZ4

Totals for 11,999 files (including uncompressible files):

Uncomp:   2,499,169,096
lz4:      1,167,777,908 2.14
zlib:     1,044,180,362 2.39
brotli:     918,949,263 2.72
bitknit:    898,621,908 2.78
lzham:      882,723,287 2.83

Totals after removing 1,330 files with a zlib compression ratio less than 1.1 (i.e. uncompressible files):

Uncomp:    2,147,530,394
lz4:         815,221,536 2.63
zlib:        693,090,474 3.1
brotli:      568,461,065 3.78
bitknit:     547,869,148 3.92
lzham:       532,235,143 4.03

This is a log2 log2 plot, basically an overview of the data:


This is a zoomed linear plot , looking more closely at the uncompressible (ratio=1) or nearly uncompressible (ratio very close but not 1) regions:



This log2 log2 plot is limited to just LZHAM vs. BitKnit:


Finally, another log2 log2 plot showing just BitKnit vs. zlib:



Current Observations


Fabian Giesen (Rad) and I have noticed several interesting things about these scatter plots:

- The data points with a ratio of 1 (or extremely close to 1) show how well the algorithm handles uncompressible data, which is hopefully near memcpy() performance.

(Note LZMA’s data will be very interesting, because it doesn’t have good handling for uncompressible data.)

There are a handful (around 50-60 depending on the codec) of data points with a ratio slightly below 1 (.963-.999). The majority are small (287-1kb) uncompressible files.

- Slightly "above" this ratio (very close to ratio 1, but not quite), literal handling dominates the decompressor's workload. There are distinct clusters on and near the ratio=1 line for each compressor.

LZHAM actually does kinda well here vs. the others, but it falls apart rapidly as the ratio increases.

- Notice the rough pushed down "<" shape of each algorithm's plot. LZHAM's is pretty noticeable. At the bottom right (ratios at/close to 1.0), literals dominate the decompression time.

Interpreting this as if all algorithms are plain LZ with discrete literals and matches:
As you go "up" to higher ratios, the decompressor has to process more and more matches, which (at least in LZHAM) are more expensive to handle vs. literals. After the “bend”, as you go up and to the right the matches grow increasingly numerous and longer (on average).

- LZHAM has an odd little cluster of data points to the right (on the ratio ~3 line) where it almost keeps up with BitKnit. Wonder what that is exactly? (Perhaps lots of easily decoded rep matches?)

- Notice zlib’s throughput stops increasing and plateaus as the ratio increases - why is that? Somebody needs to dive into zlib’s decompressor and see why it’s not scaling here.

I need to add my implementation of zlib’s core API (miniz) to see how well it compares.

Important Notes:

- The x64 benchmark command line app was run on Win10, 2x Xeon E5-2690 V2 3.0GHz (20 cores/40 threads). Benchmark app is single threaded.

- All test corpus files are between 256 bytes and 127.7 MB

- All algorithms were directly linked into the executable, and the decompressors were invoked in the same way

- Each decompressor is invoked repeatedly in a loop until 10ms has elapsed, this is done 4 times and shortest average runtime is taken

- Brotli was limited to comp level 9, as level 10 can be too slow for rapid experimentation, 16MB dictionary size (the largest it supports)

- LZHAM was limited to using its regular parser (not it's best of X arrivals parser, i.e. the "extreme parsing" flag was disabled), 64MB dictionary size.

- LZ4HC compressor, level 8

- zlib's asm modules were not used in this run. It'll be interesting to see what difference the asm optimizations make.

Next Step


The next major update will also show LZMA, Zstd, and miniz. I'm also going to throw some classes from crunch in here to clusterize the samples, so we can get a better idea of how well each algorithm performs on different classes of data.

I feel strongly that scatter graphs like these can be used to help intelligently guide the design and implementation of new practical compressors.

A big thanks to Fabian “ryg” Giesen at Rad Game Tools for giving me access to "BitKnit" for analysis. BitKnit is going to be released in the next major version of Oodle, Rad’s network and general purpose data compression product.

Compression ratio plots of zlib competitors Brotli and BitKnit

$
0
0
Continuing yesterday's post, here are the compression ratios for all 12k data points (between 256 -128MB) in the LZHAM vs. LZMA test file corpus, on various codecs: LZ4HC L8, Brotli L9, Rad's BitKnit VeryHigh, LZHAM m4, and of course zlib L9.

Vertical Axis = Compression ratio (higher is more compression)
Horizontal Axis = File, sorted purely by zlib's compression ratio
Color = Codec (using the same color coding as my previous post)

The data points have been sorted by zlib's compression ratio, which is why the green line is so nice and smooth. These are the same data points as yesterday's scatter graphs.


LZ4HC vs. BitKnit vs. zlib:



LZ4HC vs. Brotli vs. zlib:



 Brotli vs. BitKnit vs. zlib:



Here are a couple bonus plots, this time for LZHAM vs. LZ4HC or Brotli:



Comments:


It's clear from looking at these plots that simply stating "codec X has a higher ratio than codec Y" is at best a gross approximation. It highly depends on the file's content, the file's size, and even how well a file's content resembles what the codec designer's tuned the codec to handle best.

For example, Brotli has special optimizations (such as a precomputed static dictionary) for textual data. Also, like zlib, it uses entropy coding tables (the Huffman symbol code lengths) precomputed by the compressor, which can give it the edge on smaller files vs. codecs that use purely adaptive table updating approaches like LZHAM. (Also, I would imagine that the more stationary the data source, the more precomputed Huffman tables make sense.)

Another advantage Brotli shares with zlib due to its usage of precomputed Huffman tables: It doesn't need to spend valuable CPU time computing Huffman code lengths during decompression. LZHAM struggles to do this quickly, particularly on small files (less than approx. 4-8KB) where most decompression time is spent computing Huffman code lengths (and not actually decompressing the file!).

It's also possible to design a codec to be very strong at handling binary files. Apparently, BitKnit is tuned more in this direction. It still handles text files well, but it makes some intelligent design tradeoffs that favor really high and symmetrical compression/decompression performance with only a small sacrifice to text file ratios. This tradeoff makes a lot of sense, particularly in the game development context where a lot of data files are in various special binary formats.

Interestingly, Brotli and BitKnit seem to flip flop back and forth as to who is best ratio-wise. There are noticeable clusters of files where Brotli is slightly better, then clusters with BitKnit. I'll be analyzing these clusters soon to attempt to see what's going on. I believe this helps show that this data file corpus has a decent amount of interesting variety.

Finally, Brotli's compression ratio is just about always at least as good as zlib's (or extremely close). IMO, the Brotli team's Zopfli roots are showing strongly here.

Next Steps:


I need to break down these ratio graphs into clusters somehow. So we can then show results for "small text files" vs. "large binary files" etc. As the compressed totals show yesterday, Brotli and BitKnit have approximately equal compression power across the entire corpus. But there are categories of data were one codec is better than the other.

Looking into the future, it may be a good idea for the next major compressor to support both precomputed (Brotli-style) and semi-adaptive (LZHAM and presumably BitKnit-style) entropy table updating approaches.

Thanks to Blue Shift's CEO, John Brooks, for suggesting to chart this way.

Quotes from "My Year In Startup Hell"

$
0
0
I loved this article on Fortune.com (exerted from "Disrupted: My Misadventure in the Start-Up Bubble"). It covers so many corporate company culture things I've seen or experienced in my game development career in a single article. The article is from a startup perspective, but much of this applies to many other more mature companies. These quotes especially struck a chord with me:
"Arriving here feels like landing on some remote island where a bunch of people have been living for years, in isolation, making up their own rules and rituals and religion and language—even, to some extent, inventing their own reality. This happens at all organizations, but for some reason tech startups seem to be especially prone to groupthink. Every tech startup seems to be like this. Believing that your company is not just about making money, that there is a meaning and a purpose to what you do, that your company has a mission, and that you want to be part of that mission—that is a big prerequisite for working at one of these places."
On people that get fired:
"Dharmesh’s culture code incorporates elements of HubSpeak. For example, it instructs that when someone quits or gets fired, the event will be referred to as “graduation.” In my first month at HubSpot I’ve witnessed several graduations, just in the marketing department. We’ll get an email from Cranium saying, “Team, just letting you know that Derek has graduated from HubSpot, and we’re excited to see how he uses his superpowers in his next big adventure!” Only then do you notice that Derek is gone, that his desk has been cleared out. Somehow Derek’s boss will have arranged his disappearance without anyone knowing about it. People just go up in smoke, like Spinal Tap drummers."
On what I call "Reality Shaping":
"The ideal HubSpotter is someone who exhibits a quality known as GSD, which stands for “get shit done.” This is used as an adjective, as in “Courtney is always in super-GSD mode.” The people who lead customer training seminars are called inbound marketing professors and belong to the faculty at HubSpot Academy. Our software is magical, such that when people use it—wait for it—one plus one equals three. Halligan and Dharmesh first introduced this alchemical concept at HubSpot’s annual customer conference, with a huge slide behind them that said “1 + 1 = 3.” Since then it has become an actual slogan at the company. People use the concept of one plus one equals three as a prism through which to evaluate new ideas. One day Spinner, the woman who runs PR, tells me, “I like that idea, but I’m not sure that it’s one-plus-one-equals-three enough.”
This is so true:
"Another thing I’m learning in my new job is that while people still refer to this business as the “tech industry,” in truth it is no longer really about technology at all. “You don’t get rewarded for creating great technology, not anymore,” says a friend of mine who has worked in tech since the 1980s, a former investment banker who now advises startups. “It’s all about the business model. The market pays you to have a company that scales quickly. It’s all about getting big fast. Don’t be profitable, just get big."
 Yup:
"On top of the fun stuff you create a mythology that attempts to make the work seem meaningful. Supposedly millennials don’t care so much about money, but they’re very motivated by a sense of mission. So, you give them a mission. You tell your employees how special they are and how lucky they are to be here. You tell them that it’s harder to get a job here than to get into Harvard and that because of their superpowers they have been selected to work on a very important mission to change the world. You make a team logo. You give everyone a hat and a T-shirt. You make up a culture code and talk about creating a company that everyone can love. You dangle the prospect that some might get rich."
Umm yea I know the feeling:
"Training takes place in a tiny room, where for two weeks I sit shoulder to shoulder with 20 other new recruits, listening to pep talks that start to sound like the brainwashing you get when you join a cult. It’s everything I ever imagined might take place inside a tech company, only even better."
On the office environment:
"Everyone works in vast, open spaces, crammed next to one another like seamstresses in Bangladeshi shirt factories, only instead of being hunched over sewing machines people are hunched over laptops. Nerf-gun battles rage, with people firing weapons from behind giant flat-panel monitors, ducking and rolling under desks. People hold standing meetings and even walking meetings, meaning the whole group goes for a walk and the meeting takes place while you’re walking."
Personally, I've learned over the years that I really need to live solidly in reality. I can't switch back and forth from "Corporate Enforced Reality #57" to "Real-World Ground-Level Reality" every single day of the week and stay happy and healthy long term.

I've learned that I don't need to associate myself with any corporation to be a "real" developer. If you aren't treated with respect by someone because you aren't associated with a company label, that person may very well not be a person you want to associate yourself with.

Here's a little rant: I think having to rigidly conform to a corporate mental model (like the insane one described at the company above) to earn money is demeaning and even dehumanizing. Folks, 1+1 does not equal 3. This is insane.

Somewhere back in time I must have fallen into some jacked parallel universe where treating workers like utterly replaceable mind controllable automatons is normal, accepted, encouraged, and even something to be proud of. In the universe of 1984, 1+1=3.

And the money the company gives you? It's just binary bits in some bank computer. There are plenty of ways of making money that don't involve becoming mentally insane. Trading off sanity for a bi-weekly set of binary digits added to your account balance actually isn't the greatest idea in my experience. Staying at a job you are unhappy with thinking that, eventually, you will achieve true long lasting happiness during "retirement" is also a pretty extreme way of living life. There are other paths to happiness.

My brain now pushes back when I think about living and working this way again. It basically says "hey that's super unhealthy and unsustainable!". I used to be irrationally fearful of working outside of a corporate bubble. Fear is your worst enemy and can make you much more manipulatable to others.

A long time ago, at Ensemble Studios in Dallas, I was completely wrapped up in our company's special little super insular "tribal" company culture. The company collapsed overnight and we all learned what was actually occurring at the corporate level for the previous 6-9 months. It became super clear that we all were living in a fairy-tale corporate enforced reality bubble. Even many of my "company friends" evaporated overnight into non-friends. It was ugly: even the very personalities of formally awesome coworkers instantly changed.

My ego was tied solidly into this company's culture and products. After it collapsed I had to carefully hit the "ego reset button". I had to strongly resist automatically following the locally exploitative paths laid out for me after Ensemble collapsed.

Why I left Unity

$
0
0
I left Unity about a month ago. It was a short, but amazing and enlightening, experience. I felt that my job there was done (or here). My first little contribution to the world of VR.

Instead of pivoting onto some new project or something, I've decided to basically pivot my entire life. Currently, it seems a large subset of the industry has rearranged itself around VR/AR, so I figure my timing couldn't be better. Unsurprisingly, pivoting your entire life isn't easy but I believe it's for the better. Working a full-time software job endlessly bulldozing countless lines of anonymous C/C++ code isn't on my list of happy things to do anymore. Screw that!

I'm now living in Seattle, a city I've only visited like 3 times in my entire 6 years living in Washington. Holy shit, what an amazing city! Bellevue is such a barren wasteland compared to Seattle it's ridiculous. I'm avoiding the eastside completely because the place reminds me of a mental prison. I've got some particularly bad memories there. I can't even look at downtown anymore without thinking of various horrible memories. I'm now a much happier independent consultant here in Seattle.

You know, just thinking here: I must be a bunch of game/software industry execs worst nightmare. "Oh shit, Rich is some crazy ass SOB, he's going to spill all of our endless secrets and damage our hiring and spotless reputations!" Yes, I've seen a lot of shit in my career. A lot of it the world should know about -- someday. If it makes you feel any better, I'm almost as public as I possibly can be, and I'm staying that way. I really love blogging.

I've been thinking a lot about the word "counterculture", especially in relation to the software industry. I know exactly what the "mainstream", or "normal" software developer culture is. The corporate software development and game industry culture I've seen so far is immature, exploitative, abusive, and downright dehumanizing. Where are the alternative software developer cultures?

Imagine the kinds of amazing new software that could be created in alternative cultural environments. To really improve this industry we need to upgrade our culture, not our hardware, comp-sci curriculum, or office arrangements. We need to fix the root problem, which is to escape from this insane mainstream culture and create something healthier.


Tips for Interviewing at Software Companies

$
0
0
Here's another blog post in the "Rich goes off the rails and reveals a bunch of shit he's learned over the years while working as a corporate programmer" category.

Companies Must be Continually Reminded that the Interview Goes Both Ways


Many corps have an internal company culture that places the company in the superior position relative to job candidates. These companies feel they can choose who they want from a seemingly endless variety of potential employees, so who cares how they're treated right? The reasoning goes "we'll just hire someone else" if a candidate pushes back.

We need to collectively turn the tables on companies like this. Let's give them a powerful form of feedback. Let's exercise our right to "route around" bad companies and not apply or accept job offers from corporations that act like we are replaceable cogs. (Alternately, let's all talk between ourselves and collectively compare notes and boost our compensation rates and working conditions. They can't stop us!)

I'm hoping this blog post will help make people more able to discern the good companies from the bad ones. For the record, I do believe there are many good companies out there, but for every good company there seems to be a bunch of bad ones.

Remember: We Write the Software Which Runs the System


Here's a key concept to internalize: We write the software that literally drives this entire system. Food production and distribution, electricity production and distribution, telecommunications, government, finance, trucking, planes, etc. It's all ran by computers one way or the other, and we write the code that makes these computers work. Without computers this entire system crumbles into the dark ages.

As time goes by more and more of the system is becoming automated and computerized. This means we as programmers collectively have the power in these relationships with corporations, but we haven't effectively organized ourselves or figured out how to best exercise our power yet. We now have the technology to instantly communicate between ourselves, which if we all start using it can lead to massive changes in a relatively short period of time.

Interview Tips


Some corps are exquisitely designed to extract as much "value" from you as quickly as possible, your health and sanity be damned. Above all, I want to help other programmers avoid places like this. When applying for a position at a software corp, keep these things in mind:

- Follow your instincts.

Ask a lot of questions. Learn how to interpret body language. Are you treated with respect? Are your questions answered in a straightforward way?

Remember, the hiring pipelines of these companies are tuned to take advantage of the macro-level psychological profile of "typical" programmers. Get educated, fast. These companies are not your friends. They will try to get into your brain and "bend" you psychologically in order to make you conform and "fit in" to their brand of corporate utopia.

Trust your gut feelings during the interview! If you feel disrespected or not taken seriously, don't ignore it. It's not just in your head. Run away! You won't grow there, and it'll be a dehumanizing place.

- "You need us to achieve success, you are nothing so follow us!"

Run away fast! I've seen this tactic applied against dozens of developers after one company collapsed in Dallas. Sadly, it worked with a bunch of people and they ultimately all got screwed.

- Deeply analyze any critique given to you during the interview

Sometimes critique that seems merit-based is really bias in disguise. It's very important that if you get bad feedback, you stand back and think "Is this true? Or is there just something wrong with this company (elitism, sexism, they just didn't want to hire you, etc)?"

- How much is your time worth?

Ultimately, you are selling your time for digital digits in some bank computer somewhere. You will not get this time back, period. It's worth a lot, probably much more than you think.

How much income does the company actually make given your time? Some companies make millions of dollars per software developer, yet pay only a fraction of this to you.

Remember, this is a market and market principles apply here. By increasing our communication levels and giving feedback to the market (by routing around bad companies, demanding higher pay during negotiations, pushing back during interviews, etc.) we can collectively raise our salaries, compensation packages, and improve our working conditions.

- Are you gambling your time away trying to get your stock "lottery ticket"?

In this scenario, you're willing to be underpaid, but you're hoping the company will sell out in X months or years and make you millions. Just beware that this is a form of gambling with your time and finances. 

I've seen a few companies exquisitely exploit and continually encourage this "gambler mindset" with its workers in order to suppress wages. Be careful!

- Admit that you probably suck at negotiating

Generally, in my experience programmers make horrible negotiators. The most important thing is to approach the negotiation with the proper mindset. They need you, not vice versa, and they probably have much more money (and the capability to pay you) than you suspect.

This is a topic definitely worthy of another blog post.

- Learn how to recognize negative psychological traits like sociopathy and narcissism.

Some companies are full of sociopathic monsters whose job description (and honestly, their corporate duty) is to exploit you as much as possible.

Learn to recognize the signals. They will try to get into your head, quickly build a mental model of you, then play off your willingness to not rock the boat or be seen as a "troublemaker" to the corporation. They will find subtle ways to threaten you, if needed, to keep you in line.

Narcissists can be especially horrible to work around, especially when they are in management. Learn the signs.

- Beware of code words like "Elite", "10Xer", "no bosses", "scheduled crunch", etc.

"Elite" - Programmers willing to work endlessly in sometimes horrific conditions are labeled "Elite". Yes, we have a very screwed up system here, where people who get exploited and are worked until exhaustion are labeled "Elite". Avoid companies like the plague that use this word in their job descriptions or recruiting emails. 

(Attention recruiters: Please stop sending me emails with the word "elite" anywhere in them. Thanks!)

"10X programmer" - Someone who hacks up some shit (sometimes actually using stolen ideas), and then depends on other practicing programmers to do the actual work of making these (sometimes very sub-optimal) ideas actually shippable. These programmers tend to publicly take full public credit for things they only partially worked on or thought up. We don't need more of these "10x" assholes, instead we need to completely reboot our programmer culture so the very concept of a "10x'er" is totally alien.

"No bosses" - This recruiting shtick from 1999 means there are many powerful bosses in hiding, and/or that everyone is effectively your boss. Also avoid companies that advertise this like the plague. It's a recruiting tactic designed to attract programmers who had bad bosses in the past. (Believe it or not, there are many very good managers out there!) 

Also, without managers, you will be directly exposed to the many wolves out there who can make your programming life a living hell. A good manager will shield their programmers from endless bullshit and insanity.

"Crunching" - This means the company externalizes the cost to your health and sanity of working endless hours. They may have bad planning, or bad business models, whatever. Avoid companies that crunch endlessly like the plague unless you just don't care about your health.

"Scheduled Crunch" or "we occasionally crunch" - Again, any company that schedules crunch just doesn't understand or care how to plan, or is ignoring the cost of working crazy hours on your health.

- Ask: Does the corp own all your work?

Can you work on stuff at home, like open source software or your own commercial software? Avoid companies like the plague that don't let you work on your own stuff.

- Ask or check for insane contract clauses

Is there a non-disparagement clause? If you quit, must you wait X months or whatever before you can work on something else? Push back and avoid companies that do questionable shit like this.

- Who contacted you first?

If the company reached out to you first, did a recruiter contact you, or an actual programmer at the company? 

Ideally, a real-life programmer reached out to you. Only a programmer working at the company can genuinely answer your questions and give you a real idea about the working conditions and types of problems you will be working on there.

Remember recruiters are just part of the company's "hiring pipeline". The pipeline is basically "X Programmers In -> Y Programmers Out". You are just a replaceable number to these companies. Recruiters will say anything in order to get you to sign the dotted line.

- Is there a whiteboard interview?

Is there a whiteboard interview? Push back and say no. I've helped hire at several successful companies (easily over 100 people over the years) that didn't use whiteboard interviews at all. Anyone saying "this is just the way things are done" doesn't have perspective and is part of the serious problems our industry has.

After taking and giving way too many of these whiteboard interviews, I think they are total fucking bullshit. Whiteboard interviews test a candidate's ability to scribble uncompilable pseudo-code on a chalkboard (!) while being faced down by multiple adversarial programmers. (Who in some cases would rather be doing anything else, and who don't want more internal programmer competition in the first place!)

Some companies give programmers whiteboard questions from various books verbatim, like this one. This is just downright ridiculous, a waste of time for everyone involved, and a total demonstration that the process is completely bogus.

Whiteboard interviews are extremely stressful to candidates. I've seen amazing programmers just lock up and become dysfunctional in these conditions. We are testing candidates for the wrong abilities. I refuse to take part in any more of this insanity.

Some programmers use these bogus hazing ritual-like whiteboard interviews to help drive down the applicant's ego while simultaneously driving up their egos. This is a huge red flag -- avoid these programmers and the companies who employ them.

- "We only hire senior programmers"

Let's translate: We don't help train new programmers. The phrase "programmer empathy" isn't on our radar here. We probably treat each other like complete trash. We actually assume you are an idiot until you battle your way into a position of respect. Avoid companies like this!

Remember, all senior programmers at one time were junior programmers.

- What's the company's culture?

Ask lots of questions from people who work there. Is the official company message great, but when you pull programmers aside they actually hate working there? Search the web for reviews of the company, search linkedin and find former employees and ask them about the company.

- Talk to the executives

Are they sociopaths? Raging narcissists? Ask them what they look for in programmer candidates, and see how they respond. Do they treat you with respect?

I've known execs who thought programmers were literally crazy, and trust me the companies they ran were not healthy.

- Look around the office

Little things can give you a lot of information. Is the office a mess? How much space and privacy do employees have? Is the environment quiet or loud?

- Does this company give proper attribution for ideas it uses?

I'm throwing this out here because I've noticed one very well known VR company outright steal ideas from its competitors or academics working in the space. Explicitly ask the company about its attribution policy.

Personally, I will never involve myself in any way with people or corporations who outright steal ideas for personal or corporate gain. (I can't believe I have to even say this. We have fallen to the level of stealing and re-branding ideas from each other!)

- Master your fears

What do you fear? Recognize it and look past it, because these companies are designed to exploit your fears and use them against you as a weapon.

Hiring Group Dynamics

$
0
0
So there are several interesting hiring related phenomenon I've seen at various companies. I think some of the most exaggerated hiring behavior will emerge at "flat" companies with yearly bonuses based (partially) from the data gathered during peer feedback.

Here's a description of one category of emergent behavior I noticed when the programmers have nearly free reign to run the hiring process and who will ultimately get hired:

Want a good bonus? Never hire new competition!

You would think the programmers doing the hiring would always be fair and unbiased in their assessments of each candidate's abilities, right? And they would, of course, always optimize for adding value to the company by making good hires. The company programmers involved in the process would choose good candidates for each opening, irrespective of politics, or concerns over their future positions or bonuses, etc.

In practice, I think especially at companies with massive yearly bonuses, the company's programmers will band together unofficially and make it practically impossible for potential competition to enter the company, make waves, and possibly eclipse the old guard. We have a classic conflict of interest situation here. This tendency to embargo your competition is especially effective when hiring specialists, such as graphics programmers, although I've seen it happen regardless of specialty.

At one well-known company, I watched around a dozen experienced graphics programmers get rejected in our interview process. Each time, without exception it was a NO HIRE, even though we were in dire need of graphics programmers. A few of the names were pretty well known in graphics circles, so my jaw dropped after several of these NO HIRE interviews.

I was involved in some of these interviews. Almost every time, these candidates would do sometimes incredible things during the whiteboard interview, but somehow one or two graphics programmers would always find some other reason to be thumbs down. (I didn't say anything at the time, because I was afraid doing so would have made enemies and hurt my career at this company. I was basically incentivized to say nothing by the peer feedback based bonus system.)

Eventually, upper management quietly noticed that irrespective of our company's dire need of graphics engineers, we weren't hiring them anyway. This company had a major upcoming threat to its primary profit generating product looming in its future, and the counter to this competitive threat involved some very specialized graphics engineering. The CEO had to step in and basically just subvert the entire completely broken hiring process and just start hiring graphics contractors almost sight unseen.

Unfortunately, these graphics contractors had virtually no path to full-time employment, so they got treated like 3rd class citizens at best and all were eventually pushed out. (Sometimes years later, even after delivering massive amounts of value to some teams.)

Anyhow, how do I know all this stuff? At this particular company, I somehow fell through the cracks and was interviewed and hired as a generalist programmer, not a graphics specialists. Eventually, the old graphics guard basically got lazy and shied away from the company's toughest graphics problems (or actually shipping anything involving new graphics code), but somebody had to do this "dirty" graphics work. The non-graphics programmers figured things out and started sending graphics work my way, and I started asking myself "why are we not hiring any graphics programmers?!"

Unfortunately I'm terrible at saying "no" to requests for help, so this resulted in a lot of work.

Turns out, that refined, "fair" hiring machine that management was so proud of was a total joke.

More company culture quotes from Disrupted

$
0
0
Interesting quotes from Disrupted: My Misadventure in the Start-Up Bubble (which I'm only 1/3rd through):
"I've been warned that at a place like HubSpot the worst thing you can say is anything that was done at your last company is something we should think about doing here. Even if your last company was Google or Apple, nobody at HubSpot wants to be told, especially by some newcomer, some outsider, that there might be a better way. HubSpot is HubSpot. It's unique. It's different. HubSpot has its own way of doing things. We're rethinking everything. We're challenging all the assumptions. We're not just making software, we're reinventing the way companies do business."
A funny take on HubSpot's usage of the DISC personality test:
"Managers, people like Zack, get the same training that I'm getting, but then they go to an extra class where they learn how to use DISC when they are managing people. Try to image the calamity of that: Zack, age twenty-eight, with no management experience, gets training from Dave, a weekend rock guitarist, on how to apply a set of fundamentally unsound psychological principles as a way to manipulate the people who report to him."
It's crazy what people will do for money.

Rad's Kraken is looking awesome

A consultant's perspective on working full-time for a corporation

$
0
0
I've been working as a software consultant at Binomial LLC (a company Stephanie Hurlburt and I just started) for just enough time now that my perspective has begun to shift from my former full-time mentality. I'm obviously no expert at this, but I'm learning as fast as possible, and I intend on sharing everything I learn.

Let's compare what it's like to work full-time somewhere from the viewpoint of a consultant's perspective. This is something I couldn't do before, and very few if any of my coworkers ever talked about stuff like this at all:

- Corporations are Control Freaks


With full-timing, your "client" (your employer) is extremely controlling, even in open office and "no manager" type companies.

Your control fetish "client" probably requires you to commute every day, sit in an office somewhere in a noisy non-optimal environment, and bang out code like a machine every day of the week for weeks and months at a time.

Some clients actually encourage the other "consultants" to use peer pressure on you so you work longer than is sustainable or healthy. Or, they just require you to overwork yourself.

Some clients record and monitor all of your network traffic (or your emails, instant message traffic etc.), monitor when you enter and leave the workplace, etc.

You know what? Working for ultra-controlling clients like this sounds terrifying from my new perspective.

Corporations and recruiters work together to suppress wages and opportunities


I know many programmers who are locked completely up within the corporate fill-timing paradigm. When they inevitably loose their jobs (due to layoffs, random purges, teams or companies failing, etc.) they are thrown to the full-timing wolves that orbit all the major tech companies.

Now you need to put yourself on the market, start talking with recruiters, and dust-off your elite whiteboarding skills - fast.

With consulting, a good goal is to build up a community of potential clients over time and keep this community as happy as possible. You're not dependent on any single client this way, and you gain access to more potential projects this way. Price fixing is unlikely if your client community is diverse.

- Corporations can harbor, even encourage Toxic Teams


A few years ago at one company, I got tossed onto a horribly abusive, brutal, and kinda nutty team right out of the starting gate. That was a super intense, Alice in Wonderland type experience that I will never repeat. (Why did I stay on this team for around a year, even though I should have left? Because I loved the company's products.)

With consulting, we can first sign a short-term contract (from days to a few months depending on the scope) to feel each other out. If it doesn't work out, no big deal, the damage (to both parties) is minimized. Working for a new company as a consultant isn't this stressful, life changing event like changing jobs is.

- Corporations Demand Intense and Extreme Loyalty


These "clients" want to basically own your life.

Working full-time at a corporation, you negotiate, then sign a contract. That's usually it. Truly renegotiating can be tough, because to have a real negotiation you must be willing to leave. This is a type of client that demands total absolute loyalty! Many corporations even claim ownership over all your work, even things you do on your own time at home.

As a consultant, you can potentially renegotiate in between every project. It's not that big of a deal.

- Corporations are Work-A-Holics


At a corp, if I want to take time off I must check to see if I'm allowed first. At many places you only get a handful of weeks off once every year. If you get sick, or have some life changing event you need to deal with, you're potentially screwed.

This seems analogous to a "toxic client" in consulting, one that is super jealous and extremely demanding of your time! 

- Corporations limit your freedom


At one of my previous companies, I could theoretically wheel my desk anywhere and pick and choose what team I worked with. In practice, office politics and super-cliquish teams massively constrained this freedom. At another company, I was stuck on the same team basically forever, because the org structure had ossified around us over many years.

As a consultant I can pick and choose who I want to work with. Obviously, I tend to choose nicer clients, because at the end of the day this is a social process and I personally like working with friendly and no-ego teams. My market of teams to work with is potentially massive now. I can wheel my little virtual desk anywhere in the world.

We Need to Collectively Renegotiate

$
0
0
I'm sitting here watching season 2 of Halt and Catch Fire. This season wipes the slate mostly clean and starts over at an early 80's garage-style software service startup in Texas. At first, I pushed back at the idea of a real-time online gaming service using early 80's Commodore-era computer, disk storage and modem technology. Then I realized, everything they are showing here was more or less technologically feasible, or at worst was at the very edge of that era's hardware/software technology.

While watching this I had another interesting realization. Lots of my previous posts are really my way of telling every full-time software engineer I can reach to basically "wake up".

Let's mentally model the current employment situation as a 2D simulation. See all those little dots? Those are the full-time software developers working at corporations. Let's hit fast forward. Wow, that's weird! All these super valuable programmers keep going to and from the same bland corporate company nodes to work every morning. Their working conditions sometimes really suck and they are generally underpaid. These corporations have even been known to illegally cooperate with each other (i.e. conspire) to keep compensation to a minimum.

We've been interacting with lots of clients, some very well known in their fields, and most paint a similar picture: Their view is that too many engineers are "locked up" inside these corporations. It's actually very hard to find good software developers. There is room in the system for more software consultants, little consulting companies with amazing programmers like Blue Shift.

So here's my idea:

Now let's try upping the communication, empathy, independent organization and trust levels across all these agents in the simulation and see what happens. A bunch of smaller companies pop up and start offering their services to a potentially huge array of clients. They can negotiate for the best pay and conditions possible in this changed economy.

To pull this off in the real world, what we need to do is start talking, trusting, and cooperating with each other much more, especially across teams and companies. We all have a common interest here that totally transcends pretty much any corporate NDA. Collectively, we as software engineers have way too much power and value in the system to be working as atomized individuals competing with each other for scraps.

We can leave these corporations to form our own consulting or product companies. This will force the market to reorganize itself. Do this and working conditions and compensation levels can be organically pressured upwards. We actually have the power to do this if we would just organize and communicate more effectively.

Personally I believe even just a small number of programmers doing this can have a surprising economic and perhaps even a cultural impact.

In practice, doing this isn't that hard. I've started three companies so far, in between working at various companies. The first one did very early deferred shading research for Microsoft, the second one created crunch, and the third (Binomial) is consulting oriented.

To start: While still employed, work on building a community of other engineers at various companies. Up your visibility by making sure your code and work is easily found online, attend every event you can, give presentations, teach and help people, and be as public as possible. Save up 6 months or whatever of finances, find some friends and make the leap.

And if you fail? No big deal, just sign up for another full-time gig for a while. One that likely pays more, because this collective renegotiation strategy results in higher average wages, and because by changing companies you might even get a raise for being more experienced now!

To find clients, tap into your network and offer your services. This will free up amazing teams to work across companies, instead of them being locked up inside a few corporate fortresses.

Another fallback strategy if your new company fails is to get acqui-hired by a larger company, skipping the ridiculous interview process many companies use. Just develop a cool piece of technology that you think one or more companies would be interested in.

(So, think I'm crazy? I have a bunch of detailed mental models here, built up over time by working at several large strategically placed software companies in several states. This can work. We just need to organize better and teach each other how to do it.)

What a company-wide "reorg" looks like in a flat, manager-less company

$
0
0
Working at a bunch of companies over the years has given me a lot of interesting perspective. I really enjoy trying to describe how processes in top-down companies can be done in non-hierarchical ("no boss") companies. Let's try to describe, say to a hierarchical company employee, what a company-wide "reorg" could look like in a no-manager company.

I first heard the word "reorg" in relation to how Microsoft periodically reorganizes its corporate structure to "better align the company to new corporate-level goals and strategies". (That's a joke.) Issuing a company-wide reorg in a hierarchical company is very much a executive-level decision. It's a top-down directive that the company intentionally follows, like a military maneuver. It just happens, you know about it as an employee, and you must go with the flow.

But what does a deep reorg actually look like in a non-hierarchical, manager-less company? The CEO can't just come cruising in totally reorg'ing the place. (Remember, the CEO is not your boss in companies like this!) Such a traumatic "mass adjustment of resources" is just not in the culture. (Small-scale "horse trades" occur all the time in manager-less companies. I'm talking about a deep, planned reorganization that impacts a large chunk of the company.)

Well, here's one way you can re-org a manager-less company. This approach assumes the company actor(s) attempting to pull the reorg off have the power to form new teams and make internal/external hires.

First, you need to form a small team around some new product or technology. Do it just below the radar (internally). It needs to show promise and be a rising-star type project. You should work to get as much strategic press exposure about this new team's work as possible.

Next, you start internally recruiting and externally hiring for that new team. You optimize the external hiring process to streamline it, to accept some candidates as contractors (who you may eventually hire) and some as immediate full-time employees. For the internal recruits, you only hire those internal developers who are the most passionate about the new project's goals or its technology. Hiring on the new team must be done carefully, because it's ultimately part of a greater company-wide sorting and reorganization process.

If the new project becomes large enough, it creates a rift of sorts in the organization. The new team gets more power and size over time. An entire ecosystem of other friendly teams can form around the new team. The company self-organizes itself into a market of teams around the new project, and a block of other "deprecated teams" who may not be aligned with the reorg's goals.

These deprecated teams can be reduced in size by letting go of internal developers over time. Anyone the company doesn't want long-term can be quickly moved onto a deprecated team. To minimize shock to the deprecated team's product (which may need to remain live), the team can fall back to possibly cheaper external contractors as it internally shrinks. Ultimately, the product can be put on long-term life support with minimal internal cost.

Now, if you are a developer in a company like this, and you want to survive the reorg, you should be asking yourself right about now "am I on a deprecated team?". If you are, you better learn the company's new religion quickly or you may be pushed out. (Or, you need to visibly work on background projects that support post-reorg goals or needs.)

If you are a senior team member in this scenario, and you want your team to not become deprecated, you need to quickly figure out how to transition your product into the "new era" so it remains relevant.


Welcome to "The Hunger Games"

$
0
0
Pretty much required reading if you're going to work (and stand out!) at a self-organizing company:

The Hunger Games


enet networking library

$
0
0
I switched over all the low-level networking in a VR app I've been working on to enet today. It's a UDP-based networking library that supports optional reliable and in order packet delivery, packet fragmentation (so very large packets can be sent over UDP), and multiple channels.

The API was super easy to use, the code is written in C, and the thing just works. Compiling it was as easy as dropping the .C files into the project and hitting Build.

I love libraries like this.

New lossless compression benchmarks on the way

$
0
0
I've been benchmarking several new lossless codecs from Rad Game Tools: Kraken, Selkie, and Mermaid. (How does Rad think up these odd but cool sounding names?) Stay tuned!


lz4hc vs. lz4 performance on the LZHAM test corpus

$
0
0

Both use LZ4_decompress_safe(). lz4hc uses LZ4_compressHC2_limitedOutput(), lz4 uses LZ4_compress_limitedOutput().

22,827 total files, all files >= 1KB.

total 5374152762
lz4hc 2199213331
lz4   2575990728


RAD's ground breaking lossless compression product benchmarked

$
0
0

Intro


Progress in the practical lossless compression field has been painfully stagnant in recent years. The state of the art is now rapidly changing, with several new open source codecs announced in recent times (such as Brotli and Zstd) offering high ratios and fast decompression. Recently, RAD Game Tools released several new codecs as part of its Oodle data compression product.

My first impression after benchmarking these new codecs was "what the hell, this can't be right", and after running the benchmarks again and double checking everything my attitude changed to "this is the new state of the art, and open source codecs have a lot of catching up to do".

This post uses the same benchmarking tool and data corpus that I used in this one.

Updated Aug. 5th: Changed compiler optimization settings from /O2 to /OX and disabled exceptions, re-ran benchmark and regenerated graphs, added codec versions and benchmarking machine info.

Codec Settings


All benchmarking was done under Win 10 x64, 64-bit executable, in a single process with no multithreading.

  • lz4 (v1.74): level 8, LZ4_compressHC2_limitedOutput() for compression, LZ4_decompress_safe() for decompression (Note: LZ4_decompress_fast() is faster, but it's inherently unsafe. I personally would never use the dangerous fast() version in a project.)
  • lzham (lzham_codec_devel on github): level "uber", dict size 2^26
  • brotli (latest on github as of Aug. 1, 2016): level 10, dict size 2^24
  • bitknit (standalone library provided by RAD): BitKnit_Level_VeryHigh
  • Zstd (v0.8.0): ZSTD_MAX_CLEVEL
  • Oodle library version: v2.3.0
  • Kraken: OodleLZ_CompressionLevel_Optimal2
  • Mermaid: OodleLZ_CompressionLevel_Optimal2
  • Selkie: OodleLZ_CompressionLevel_Optimal2
  • zlib (v1.2.8): level 9

Data corpus used: LZHAM corpus, only files 1KB or larger, 22,827 total files

Benchmarking machine:


Compiler and optimization settings:

Visual Studio 2015 Community Edition, 14.0.25424.00 Update 3

/GS- /GL /W4 /Gy /Zc:wchar_t /Zi /Gm- /Ox /Zc:inline /fp:precise /WX- /Zc:forScope /arch:AVX /Gd /Oi /MT

Totals


Sorted by highest to lowest ratio:

Original    5374152762

brotli      1671777094 
lzham       1676729104 
kraken      1685750158 
zstd        1686207733 
bitknit     1707850562 
mermaid     1834845751 
zlib        1963751711 
selkie      1989554820 
lz4         2131656949 

Ratio vs. Decompression Throughput - Overview




On the far left there's LZHAM (dark gray), which at this point is looking pretty slow. (For a time, it was the decompression speed leader of the high ratio codecs, being 2-3x faster than LZMA.) Moving roughly left to right, there's Brotli (brown), zlib (light blue), Zstd (dark green), BitKnit (dark blue), Kraken (red), then a cluster of very fast codecs (LZ4 - yellow, Selkie - purple, Medmaid - light green, and even a sprinkling of Kraken - red).

Notes:
  • Kraken is just amazingly strong. It has a very high ratio with ground breaking decompression performance. There is nothing else like it in the open source world. Kraken's decompressor runs circles around the other high-ratio codecs (LZHAM, Brotli, Zstd) and is even faster than zlib!
  • Mermaid and Selkie combine the best of both worlds, being as fast or faster than LZ4 to decompress, but with compression ratios competitive or better than zlib! 

Ratio vs. Decompression Throughput - High decompression throughput (LZ4, Mermaid, Selkie)



* LZ4 note: LZ4's decompression performance depends on whether or not the data was compressed with the HC or non-HC version of the compressor. I used the HC version for this post, which appears to output data which decompresses a bit faster. I'm guessing it's because there's less compressed data to process in HC streams.



Ratio vs. Decompression Throughput - High ratio codecs



LZHAM, Brotli, Kraken, BitKnit, and Zstd, with zlib and lz4 included for reference purposes. Kraken is starting to give LZ4 some competition, which is pretty amazing considering Kraken is a high ratio codec!


Ratio vs. Compression Throughput - All codecs


For the first time on my blog, here's a ratio vs. compression throughput scatter graph.



Notes:

  • LZHAM's compressor in single threaded mode is very slow. (Compression throughput was never a priority of LZHAM.) Brotli's compressor is also a slowpoke.
  • Interestingly, most of the other compressors cluster closely together in the 5-10mg/sec region.
  • zlib and lz4 are both very fast. lz4 isn't a surprise, but I'm a little surprised by how much zlib stands apart from the others. 
  • There's definitely room here for compression speed improvements in the other codecs.

Conclusion


The open source world should be inspired by the amazing progress RAD has made here. If you're working on a product that needs lossless compression, RAD's Oodle product offers a massive amount of value. There's nothing else out there like it. I can't stress how big of a deal RAD's new lossless codecs are. Just their existence clearly demonstrates that there is still room for large improvements in this field.

Thanks to RAD Game Tools for providing me with a drop of their Oodle Data Compression library for me to benchmark, and to Charles Bloom for providing feedback on the codec settings and benchmark approach.


Viewing all 302 articles
Browse latest View live