A simple way to reduce memory pressure on Unity iOS (and maybe Android) titles

The key question on the Compression Team that we should be answering is: How do we build a data compression engine that Unity can talk to better, so we get higher ratios?

Right now everybody else spends endless dev time on optimizing the backend compressor itself (entropy coders, LZ virtual machine/instruction set changes etc.), which is a great thing to do but there are other ways of improving ratio too. This is done because the folks writing compressors usually don't control the caller. But this is not true for us, because Alexander and I can change anything in the entire Unity stack (we can change the caller and we can change the data compressor).

The golden rule in mainstream lossless compression has been: "you cannot change the submission API", for compatibility/simplicity reasons. Basically, "API and Data Format Compatibility=God" for a codec. The coding literature I've seen doesn't go much (if at all) into the API side either, because the agreed assumption is that the caller doesn't typically know much or care about the details inside lossless data compression libraries. It's just a blob of bytes, here you go.

Since artificial "rules are meant to be broken" (Joachim's observation), there seems to be at least two interesting approaches to breaking down the barrier, that can also be mixed together. Both involve a superset compressor/decompressor API:

1. Direct Context Control (DCC)

Let's try allowing the caller to have some control over the compressor's internal coding contexts (or select between statistical models).

In this API, the caller specifies the upcoming index of the current context to use while streaming data to/from the codec. The context can be as simple as a programmer chosen ID, or a structure member ID - anything the caller wants as long as it's consistent (and as long as they also specify same context ID during decompression) They can say stuff like "context0=DWORD's" and "context1=floats", or they can describe individual bytes within these elements, etc.

We don't care what the context really means, all we care about is that the caller doesn't lie/is consistent. We don't serialize the context ID's anywhere, this is just info supplied by the caller that they already have. The caller may need to experiment to find the proper way to break down their data into specific context ID's.

LZHAM alpha uses many more contexts internally, adding them back isn't that bad. Let's let the caller control it, as long as they understand the constraints (compressor and decompressor must always agree on contexts, don't lie, be consistent) we're okay.

2. Universal Preprocessing

This benefits from, but doesn't need, DCC. The compressor accepts Granny-style fixup metadata, and we preprocess the data before compression and postprocess the data after compression. We use DCC internally, but it's not strictly required (depends on codec).

The extra API allows the caller to describe how their data is laid out, by providing exactly the same structure, data member, and array size/pointer markup metadata as Granny's powerful binary fixup metadata system works. (Rad puts little tidbits of info about this technology on changenotes here - look for "fixups".)

So the plan here is:

We modify Unity to provide universal binary markup metadata of the important binary data it serializes into files
- It doesn't need to be exact or complete, just describe most/a lot of the data well
- Markup describes structures with offsets to other arrays of structures, basically a Granny-style "data tree". (Granny uses this for byteswapping and offset to pointer remapping.)
- Note Granny's markup metadata is "universal". Serialized data is described as structs pointing to other arrays of structs, and structs can have any arbitrary members, like bytes, words, dword's, offsets/pointers, etc. Like run-time type info type metadata.
- Worst case, you have an array of bytes for any arbitrary data, but you don't get any gains here, you need higher level info.
- See Runtime/Serialize/IterateTypeTree.h in Unity code
Next, compressor walks the data tree and reserializes it to compressor
- This is a specialized tree sampling algorithm that uses the metadata to walk over each byte of every marked up structure member in the tree.
- Both compressor and decompressor must traverse the tree data in the exact same way.
- So if the entire file is an array of floats, we first emit byte 0's of all floats, then byte 1's, etc. (this is a AOS->SOA style byte swizzle of each unique data type in file). Or other options.
- We also can provide compressor with member byte or whatever specific context ID's
- This could help on images, DXT textures, sound data, arbitrary serialized binary data
- Should be especially easy to integrate into existing auto-serialization systems
- Like data preprocessors used by RAR and other archivers
- Sampling algorithm ideas are in another doc
Compression/decompression as usual, except we can have DCC calls too (if we want)
Decompressor deswizzles data during postprocess (just like Granny does when it loads a Granny file and has to byteswap and convert from offsets to pointers)

Notes And Conclusion

Now I understand one important reason why tech companies like Apple, Facebook, Google, Microsoft, and Unity hire or internally find data compression specialists. Once you have an internal set of compression specialists those engineers can freely move up/down that company's stack. Once they do that they can achieve superior results by developing a full understanding of the API stack, usage patterns and company-specific datasets. Writing completely custom codecs becomes a doable and profitable thing.

Idea #1, "direct context control" is very interesting but will take some R&D to figure out the technical details and true feasibility. I've learned that adding more contexts must be done very carefully. Too many contexts and they get too sparse, possibly cause performance problems, memory usage goes up, etc.

Idea #2 is the "universal preprocessing" idea I've been mulling over in my head for several years. It describes how to more closely couple or blend a binary data serialization system, like the one used by Unity (or Granny, or the Halo Wars engine), with a lossless data compression system like LZHAM.

We wrote this around the same time Colt McAnlis's interesting blog post, where he mentions his ideas on several unsolved problems in data compression. (I don't quite get point #1 though - need more detail.) This post is closely related to problem #3, maybe a little of #2 as well. It's also related to "compression boosters", that Colt says "are preprocessing algorithms that allow other, existing, compressors to produce better results".

Basically, if we add a binary metadata description API to the compressor (like the Granny-style "fixup" data used for offset->pointer conversion and byteswapping) this opens up a bunch of interesting possibilities. At least on the types of binary data we deal with all the time: images, GPU textures, meshes, animation, etc.

Many engines have serialization systems like this, that handle things like byteswapping, offset->pointer conversion, and auto serialization/deserialization to binary or text formats. (Any open source ones? MessagePack solves some similar problems.)

Colt's point on Kolmogorov complexity is a key related concept to pull inspiration from. We already have the algorithm+input data (the metadata) to serialize or deserialize binary files to/from raw byte arrays. It's just the datatree graph traversal algorithm that I implemented to byteswap/pointerize "Raw" Granny data files on Halo Wars. (For reasons unrelated to this article.)

We still don't have a program that creates the data in a pure shortest program Kolmogorov sense, of course. Our program has two data inputs (metadata and raw "value" data), so all we've done is expand the total amount of real data to compress (by a small amount due to the metadata, but it's still expansion). But we do have at least one little program that can create and manipulate a new set of compressor input, or give us key type information about the compressor's input. We can use this information to better context model and/or reorganize ("sort" or permute the data) the input data so it leads to higher compression with a backend coder like LZHAM/LZMA.

The metadata itself can be transmitted in the compressed data stream (just like Granny now does according to its changenotes), or it can be present in the game engine's executable itself. This data will be necessary for deserialization, anyway, so it makes no sense to duplicate it and hurt ratio.

Unfortunately, one detail not mentioned above is that a datatree can have arrays of objects, which requires "size" fields to be present in the parent objects which point to the array. So it may be necessary for the datatree serializer to compose a separate list of array size fields and supply that information to the compressor (which will also need to be transmitted to the decompressor). Due to object serialization order, the array size fields should always appear before the array data itself, so this may not be a big deal.

Another possibility is to use a sort to rearrange the input data fed into the compressor, like the BWT transform. The close coupling between the serializer and the compressor gives the compressor a sideband of extra type/context information describing the input data to compress. The per-byte sort key can be context ID's computed by traversing the datatree, on both the compression and decompression side.

All of these ideas make my head hurt. It's very possible we're missing something key here, but I believe that our major point (deep compressor integration with a game engine's data serializers) has value. The next steps will be to conduct some quick experiments with an existing set of compressors, and see what problems and interesting new opportunities we encounter.

Type	Time
fzcat (modified for test)	64.25435829162598
zcat	109.0133900642395

Installation

Build Instructions

Usage Instructions

Observations/notes:

Short Term

Long Term

Interesting Recent Developments

The Current Situation

Now let's introduce CompressQuery()

Let's go further: push/pop the compressor state

Conclusion

miniz vs zlib

Conclusions

Intro

Some ideas on how to deeply integrate a data compressor into a game engine

Notes And Conclusion

Okay, but why is this useful?

Let's build a prototype

Some Terminology

Bring in deltacomp

Current Results

Next Steps

The old way

The Current New Way

The Other New Way (that nobody knows about)

The Demo of real-time crunch compression

GPU texture compression tools: What we really want

Now, the distant future

Conclusion

Multithreading Skinning on Age of Empires 3

Complexity Management

One solution: RCU

Another solution: Deep Copy

Conclusion