Quantcast
Channel: Richard Geldreich's Blog
Viewing all 302 articles
Browse latest View live

Article: "DirectX Creator Says Apple’s Metal Heralds the End of OpenGL"


State of GL 4.x revealed via "apitest" benchmark

$
0
0
This excellent GL 4.x micro-benchmark that has been making waves recently is really interesting. Now that it's on Phoronix it's about as mainstream as it's going to get: NVIDIA Slaughters AMD Catalyst On Linux In OpenGL 4.x Micro-Benchmarks

At first glance the results sound great for NV: "The AMD Catalyst driver gets absolutely annihilated for these GL4 micro-benchmarks". But unfortunately it's bad news for everyone working in GL because it clearly demonstrates just how fractured and inconsistent the GL driver landscape actually is when the rubber hits the road.

More apitest related links and notes

$
0
0
More apitest related links:

OpenGL Stop Breaking my Heart

apitest results on AMD comparing various OpenGL and D3D11 approaches

Some important things about apitest and the results worth pointing out:

1. apitest results should not be compared vendor vs. vendor.
The test was not originally designed to be used in this way. Accurate benchmarking is surprisingly hard, and it's possible apitest's results are flawed or misleading in some way when compared vendor vs. vendor.

2. In many cases AMD's GL driver is within the same ballpark, or faster, compared to their D3D11 driver.

3. The relative sorted order of techniques is approximately the same on both vendors. 
This is good, because apps tend to use the slowest techniques and the authors are encouraging developers to use the faster approaches.

4. We're taking possible performance gains of 15x-20x, on drivers from both vendors.
5x-10x would be fantastic, 15x+ is amazing. 

Now all that's needed are drivers from all vendors that not only support these techniques, but handle them reliably and with reasonably consistent performance.

Moving back to Texas!

$
0
0
My five year, mostly sunless odyssey in the Seattle area is finally coming to an end. I'll be visiting occasionally, but I can't wait to move back to Dallas next month. Thanks to everyone at Valve for making the place such an amazing company to work at. Also, a huge thanks to the truly world class developers at Rad Game Tools for their key help during the Steam Linux launch and helping us kick start vogl's development. Without the devs at Rad a lot of the stuff we did over the past few years just would not have happened. (Umm Gabe, why don't you just buy these guys already and officialize the Valve "satellite office" in Kirkland?)

To the Linux and GL community, I feel bad about quitting Valve before completing vogl. (Not that something like vogl could ever really be completed!) In the couple months before I quit I did everything I could think of (wrote the wiki, got UE 4 compatibility, built the regression suite, wrote up a 6+ month itemized task roadmap, etc.) to ensure vogl's development would continue moving forward without me. From studying the changes made on vogl's github repo after I quit it certainly looks like the devs at Valve and LunarG have done a good job moving it forward.

I think it'll be 3 years or more before OpenGL-Next is usable and relevant to shipping products. So even though vogl's has little chance of scaling beyond GL v4.x, it should remain a useful tool for a long time. I may fork it one day if I have to do any hardcore GL development again.


Interesting Talks and Articles

$
0
0
What Your Culture Really Says
Talks on the Science Behind Motivation, why bonuses don't work
Dan Pink: The puzzle of motivation (transcript):
http://www.ted.com/talks/dan_pink_on_motivation?language=en

Kathy Sierra: "The Secrets of the Whisperers" (motivation and gamification)
https://www.youtube.com/watch?v=QNsl5D-V8T0&app=desktop

For Best Results, Forget the Bonus
http://www.alfiekohn.org/managing/fbrftb.htm

Why Bonus Systems Don't Work
http://brodzinski.com/2013/11/bonus-systems-dont-work.html

The Unreasonable Effectiveness of C
http://damienkatz.net/2013/01/the_unreasonable_effectiveness_of_c.html

Joel on Software Stuff

Whaddaya Mean, You Can't Find Programmers?

Alex St. John on OpenGL vs Direct3D
http://www.alexstjohn.com/WP/2014/03/25/opengl-vs-direct3d-yawn/

Alex St. John on Recruiting Giants
http://www.alexstjohn.com/WP/download/Recruiting%20Giants.pdf

State of Linux Gaming

$
0
0
I've got one more blog post before I depart for Dallas. Here's an interesting report showing framerates and loading times of various big titles on Linux vs. Windows:

Slashdot: PCGamingWiki Looks Into Linux Gaming With 'Port Reports'
PC Gaming Wiki: Linux port report

Sadly, it's pretty clear that if you run these games on Linux your experience isn't going to be as good, and you'll be getting less "gaming value" vs. Windows. We're not talking about a bunch of little indy titles, these are big releases: Borderlands: The Pre-Sequel, Borderlands 2, Tropico 5, XCOM: Enemy Unknown, Sid Meier's Civilization V. My take is the devs doing these ports just aren't doing their best to optimize these releases for Linux and/or OpenGL.

A nice little tidbit from this report: "Unfortunately, Aspyr are currently still unable to provide support for non-Nvidia graphics cards, as with Borderlands 2. This doesn't mean the game won't work if you have an AMD or Intel GPU, but just that you're not guaranteed to receive help from the developer - the current driver situation for non-Nvidia cards may lead to degraded performance." Huh? This is not a good situation.

I know it's possible for Linux ports to equal or outperform their Windows counterparts, but it's hard. At Valve we had all the driver devs at our beck and call and it was still very difficult to get the Source engine's perf. and stability to where it needed to be relative to Windows. (And this was with a ~8 year old engine - it must be even harder with more modern engines.) These devs are probably glad to just release anything at all given how alien it can be for Windows/Xbox devs to develop, debug, and ship stuff under Linux+OpenGL.

Hey, this is just a thought, but maybe Valve developers could stop locally optimizing for their bonuses by endlessly tweaking and debugging various half-broken dysfunctional codebases and instead do more to educate developers on how to do this sort of work correctly.

The entire Intel driver situation remains in a ridiculous state. I know Intel means well and all but really, they can do better. (Are they afraid of pissing off MS? Or is this just big corp dysfunctionalism?) Valve is still paying LunarG to find and fix silly perf. bugs in Intel's slow open source driver:

Major Performance Improvement Discovered For Intel's GPU Linux Driver

Surely this can't be a sustainable way of developing a working driver?

Anyhow, onto SteamOS/Steambox. Here's a surprisingly insightful comment I found on Slashdot. I don't agree that SteamOS is done just yet, but you've got to wonder what is really going on. (So where are all those shiny Steam machines they showed earlier this year anyway? Does all this just go into the Valve memory hole now?)

by Qzukk (229616) on Friday October 24, 2014 @11:56AM (#48222551Journal
Let's be honest, SteamOS is done. Steam got exactly what they wanted from Microsoft and dropped it like a hot potato (so sorry, you'll never get to use that cool controller).
Consider that for decades Microsoft has not allowed anyone, anyone to touch the user experience. Even after Netscape's antitrust lawsuit over active desktop, even after BeOS withered and died hoping someone would sell a windows computer with dualboot, or hell just a windows computer with a "Setup BeOS" icon on the desktop. Steam is facing the Microsoft Store and a real threat that the Microsoft Store will become the way to buy programs (see also: iOS). Steam trots out SteamOS, and Microsoft snickers. The hype train builds up, and Microsoft sweats. Games start to port and Microsoft snaps.
Alienware ships a Windows 8 PC that boots to Steam instead of Metro.
Now, let's step back a second and look at the big picture here. At the time, windows 8 adoption is absolute total shit, swirling the drain of a public restroom that hasn't been washed for years. The last windows evangelists are all hanging on imploring people to just try it out, just give it a chance, and oh by the way install Start8 to fix metro. Think about that. PC vendors are on the verge of revolt, their customers refuse to buy their goods, and all for the want of installing a $5 program to fix the metro experience. Best Buy is probably screaming at Microsoft, begging them to allow them to remove the metro experience so they can move their inventory. Hell, they're probably begging them to let them advertise their Geek Squad services to "optimize" the experience and install that $5 program for $100. But no, the Microsoft Experience is inviolate, the holiest of holies, eternally immutable. No matter how much hatred it gets, it Must. Not. Be. Changed .
And then Alienware ships a Windows 8 PC that boots to Steam instead of Metro.
SteamOS's job is done. When no-one was looking, Steam took Microsoft and snapped it like a twig. We'll never know exactly what dark magicks were invoked here, but in the blink of an eye, Valve routed Microsoft in a war that nobody even realized was being fought. When Japan makes an anime out of this event, GabeN will point at Steve Ballmer, say omae wo shindeiru and Ballmer's head will implode, without GabeN throwing a single visible punch.
Steam OS will probably putter along, we'll probably see a few things be trotted out to keep the dream alive, after all the hype train did build up a lot of steam (pun not intended). Eventually a few of these AAA developers will say "it's really just not ready for the prime time" and we'll go back to getting a few wine ports and indie games from hardcore dedicated guys who just really love Linux.
But the masses will probably never get to hold that controller.

Open Office Spaces and Cabal Rooms Suck

$
0
0

In case it wasn't clear: I really dislike large open office spaces. (Not 2-3 person offices, but large industrial scale 20-100 person open office spaces of doom.) Valve's was absolutely the worst expression of the concept I've ever experienced. I can understand doing the open office thing for a while at a startup, where every dollar counts, but at an established company I just won't tolerate this craziness anymore. (See the scientific research below if you think I feel too strongly about this trend.)

As an engineer I can force myself to function in them, but only with large headphones on and a couple huge monitors to block visual noise. I do my best to mentally block out the constant audio/visual (and sometimes olfactory!) interruptions, but it's tough. It's not rocket science people: engineers cannot function at peak efficiency in Romper Room-like environments. 


In case you've never seen or worked in one of these horrible office spaces before, here's a public shot showing a small fraction of the Dota 2 cabal room:


I heard the desks got packed in so tightly that occasionally a person would lower or raise their desks and it would get caught against other nearby desks. One long-time Valve dev would try to make himself a little cubicle of sorts by parking himself into a corner with a bunch of huge monitors on his desk functioning as walls, kind of like this extreme example:


He also had little mirrors on the top of a couple monitors, so he could see what people were doing behind him. At first I thought he was a little eccentric, but I now understand.

After a while I realized "Cabal rooms" (Valve's parlance for a project-specific open office space) resembled panopticon prisons:


See that little cell in the back left there? That's your desk. Now concentrate and code!

Here's the list of issues I encountered while working in cabal (open office layout) rooms:

1. North Korea-like atmosphere of self-censorship:


Now at a place like my previous company, pretty much everyone is constantly trying to climb the stack rank ladders to get a good bonus, and everyone is trying to protect their perceived turf. Some particularly nasty devs will do everything they can to lead you down blind alleys, or just give you bad information or bogus feedback, to prevent you from doing something that could make you look good (or make something they claimed previously be perceived by the group as wrong or boneheaded).

Anyhow, in an environment like this, even simple conversations with other coworkers can be difficult, because all conversations are broadcasted into the room and you've got to be careful not to step on the toes of 10-20 other people at all times. Good luck with that.

2. Constant background noise: visual, auditory, olfactory, etc.
As an engineer, I do my best (highest value) work while in the "flow". Background noise raises the mental cost of getting into and staying in this state.

3. Bad physical cabal room placement: Don't put a cabal room next to the barber or day care rooms people (!).

4. Constant random/unstructured interruptions. 

It's can be almost impossible to concentrate on (for example) massive restructurings of the Source1 graphics engine, or debugging the vogl GL debugger with UE4 while the devs next to you are talking about their gym lessons while the other dude is bragging about the new Porsche he just bought with the stock he sold back to the company.

5. Hyper-proximity to sick co-workers.
Walls make good neighbors, especially after they've caught a cold but feel pressured to be seen working so they come in anyway.

6. Noise spike in the afternoon in one cabal room, as everyone all the sudden decides to start chatting (usually about inane crap honestly) for 30-60 minutes. There's a feedback effect at work here, as everyone needs to chat louder to be heard, causing the background noise to go up, causing everyone to speak louder etc. Good luck if you're trying to concentrate on something.

7. Environmental issues: Temperature either too high or too low, lighting either too bright, too dark, or wrong color spectrum. Nobody is ever really happy with this arrangement except the locally optimizing bean counters.

8. Power issues or fire hazards due to extreme desk density.

9. Mixing electrical or mechanical engineers (who operate power tools, solder, destruct shit, etc.) next to developers trying their best to concentrate on code.

Related: Don't put smelly 3D printers etc. right next to where devs are trying to code.

10. Guest developers causing trouble:

Hyper-competitive graphics card vendors would watch the activity on our huge monitors and get pissed off when we emailed or chatted, even about inane crap, with other vendors.

Some guest developers treated coming to Valve like an excuse to party. We learned the hard way to always separate these devs into separate mini-cabal rooms.

11. No (or bad access to) white boards.
At Ensemble Studios (Microsoft), each 2-3 person office had a huge whiteboard on one wall. This was awesome for collaboration, planning, etc. 

More articles on the nuttiness of open office layouts:

Open-plan offices make employees less productive, less happy, and more likely to get sick

Study: Open Offices Are Making Us All Sick

The Open Office Trap


Example of a GOOD office space:


Here's a quick summary of the scientific research (from The Open Office Trap):

"The open office was originally conceived by a team from Hamburg, Germany, in the nineteen-fifties, to facilitate communication and idea flow. But a growing body of evidence suggests that the open office undermines the very things that it was designed to achieve. In June, 1997, a large oil and gas company in western Canada asked a group of psychologists at the University of Calgary to monitor workers as they transitioned from a traditional office arrangement to an open one. The psychologists assessed the employees’ satisfaction with their surroundings, as well as their stress level, job performance, and interpersonal relationships before the transition, four weeks after the transition, and, finally, six months afterward. The employees suffered according to every measure: the new space was disruptive, stressful, and cumbersome, and, instead of feeling closer, coworkers felt distant, dissatisfied, and resentful. Productivity fell."
"In 2011, the organizational psychologist Matthew Davis reviewed more than a hundred studies about office environments. He found that, though open offices often fostered a symbolic sense of organizational mission, making employees feel like part of a more laid-back, innovative enterprise, they were damaging to the workers’ attention spans, productivity, creative thinking, and satisfaction. Compared with standard offices, employees experienced more uncontrolled interactions, higher levels of stress, and lower levels of concentration and motivation. When David Craig surveyed some thirty-eight thousand workers, he found that interruptions by colleagues were detrimental to productivity, and that the more senior the employee, the worse she fared."
"Psychologically, the repercussions of open offices are relatively straightforward. Physical barriers have been closely linked to psychological privacy, and a sense of privacy boosts job performance. Open offices also remove an element of control, which can lead to feelings of helplessness. In a 2005 study that looked at organizations ranging from a Midwest auto supplier to a Southwest telecom firm, researchers found that the ability to control the environment had a significant effect on team cohesion and satisfaction. When workers couldn’t change the way that things looked, adjust the lighting and temperature, or choose how to conduct meetings, spirits plummeted."
Ultimately, I noticed the biggest proponents of open office spaces have no idea how programmers actually work, aren't up to date on the relevant science (if they are aware of it at all), and in many cases do their best to actually avoid working in the very open office spaces they enforce on everyone else.

Bungie sure packs them in

$
0
0
For the record, my previous post wasn't intended to be focused on Valve in particular (but of course any mention of the big V will be latched on). I used V's offices as an example because it's the last open office environment I've experienced, and it sucked probably more than it should have due to a company culture that had utterly failed to adapt as the company scaled from a few dozen to hundreds of developers.

If I could go back in time, I would have inserted more examples from other companies. My major concern before I hit publish was pissing off the open office zealots who ignore the science.

Anyhow, Bungie's offices are literally right up the street from Valve's, so let's see what they look like. At least this space has high ceilings, so it probably doesn't feel as claustrophobic as V's and background noise doesn't propagate so much. Nonetheless, it still looks like a cattle pen to me:


One forum comment (by IISANDERII) about this pic:
"Beginning to understand why they wanted to make players suffer and grind in Destiny. It was a silent revolt. I got a feeling it didn't look like this 15yrs ago."
I think very high ceilings are an important component of making open office spaces work at all, along with a company culture actually compatible with the open expression of ideas. Also, bad company culture can massively amplify the worst aspects of open office spaces.

The reality is, this is a very competitive industry, and it's difficult and expensive to hire skilled/experienced developers. The smart employers will realize that things like enforced industrial-sized cabal rooms and toxic peer-based stack ranking systems are boneheaded ideas and they'll come up with competitive alternatives to attract the best people. Ignoring the science and developer feedback to get X more hats live pronto is not smart.

Having experienced pretty much all possible layouts in my career, I would like to see a combination: A central room for say 15 people, surrounded by a large number of 1, 2, or 3 person offices, with at least 2-3 ways of leaving the central area. The small offices should resemble Ensemble's or Microsoft's: each with a door and a small vertical window near the door. Devs should be able to work where they want. Sometimes it makes sense to work together and collaborate, and sometimes you just need to concentrate. Usable whiteboards distributed throughout the space are critical.

Sheetrock, doors, whiteboards and glass are very cheap these days. These things are much less expensive to a company vs. the cost of a developer's time.


Microsoft's vs. Valve's digs

$
0
0
I can't stand Microsoft's terrible "Modern" UI, but I'll give them props for having an actual Office Innovation Team that actually thinks about this stuff. (Total side note to the OIT people: Choose a different title, it's virtually impossible to search for it.)

From Pics of MS Building 4:




Contrast the above to this (from here):


Now where do you want to work?

You know, you can tell a lot about a company and its culture by just looking at photos of their offices.


Robot Entertainment knows how to make open offices work better

$
0
0
There are precious few public pics available of Robot Entertainment's (The Orcs Must Die people near Dallas, TX - one of the post-Ensemble Studios companies) offices. This is a shame, because they've got a very smartly laid out space. (Update: Found more pics.)







The important elements:
  • Discipline based "pod" organization and half partitions above eye height.
  • Disruptive foot traffic and audio/visual noise mostly confined within a single pod.
  • Pervasive availability of usable and visible whiteboards
  • High ceilings
  • No power cords ducktaped to the floor
  • Lower density desk layout. 
  • Small desks with whiteboards encourages small meetings
Borderline genius considering how little most studios, even insanely successful ones, think about this stuff.

And at this studio you get an entire Beer Garden in your offices. No high school lunch room cafeteria here!




Another major difference between Robot's offices verses the spaces of companies with dehumanizing, industrial scale open layouts is persistence and presence. At Robot's (and the old Ensemble Studios) offices, employees can bring in books, references, games, etc. and arrange them about their office in actual physical book cases and shelves. Sometimes Kindle doesn't cut it, especially for historical references. You can't practically do this at companies that attach wheels to your little desks and force you to play bumper cars every quarter.



For completeness, here's their lobby and meeting area:




Some impressions I get about this space by just looking at the pics: openness, welcoming, and team oriented.

BonusXP's hybrid office arrangement

$
0
0
After my Valve experience I'm now deeply interested in how companies arrange and maintain the actual space their employees work in. The first thing I do when I enter a studio is look beyond the reception area and poke around to see how much importance and thought went into their actual workspace. This can tell you a lot about a company. Ignore the marketing and just observe.

BonusXP (makers of Monster Crew and Cave Mania, a growing indie game developer near Dallas in Allen, TX) used feedback from everyone at the studio to decide how their new office would be arranged. They decided on a hybrid mix of a low-density open plan surrounded by normal offices, with some key desk additions to cut down on visual distractions and give devs a better sense of control over their environment.

They also have a pretty sweet optional work from home on Friday policy.





Key things about this space that I see:
  • Tidy: no rats nests, shabby furniture, or giant collections of 2000 nerf guns and action figures.
  • Opaque glass dividers on each desk.
  • Small offices with windows surrounding the open office area.
  • Real desks with actual storage space.
  • Low density and lots of space around each mini "pod" of desks. 
  • Blue color theme, which triggers a relaxing response.
They are working on a secret project with Stardock. You can read more about BonusXP on their blog.

Utils/tools to help transition to OSX from Windows

$
0
0
I've developed software under Kubuntu and Windows for years now. I found adapting to Kubuntu from a Windows world to be amazingly easy (ignoring the random Linux driver installation headaches). After a few days mucking around with some keyboard shortcuts and relearning Bash and I was all set. 

These days, the center of gravity in the mobile development world is OSX due to iOS's market share, so it's time I bit the bullet and dive into the Apple world.

I'll admit, transitioning to OSX has been mind numbingly painful at times. (Apple, why do you persist on using such a wonky keyboard layout? Where's my ctrl+left??! No alt+tab?? wtf? ARGH.) I mentioned this to Matt Pritchard, a long time OSX/iOS developer, and he passed along a list of OSX utilities and tools that can help make the transition easier for long-time Windows developers.


Prices are for single user. Multi-user or multi-platform licenses may be more.

Version support - Often it extends back further than listed if you download older versions.


Important Features & Utilities - Must-Haves

USB Overdrive  - Alessandro Levi Montalcini  - http://www.usboverdrive.com/
    "Take full advantage of any USB mouse, trackball, joystick or gamepad."
    Lets you adjust the acceleration curve of your mouse to match windows, prserve your muscle memory
    OSX 10.8-10.10,   Shareware + Boot Reminder,  $20 to Register

KeyRemap4MacBook / Karabiner -  Takayama Fumihiko -  https://pqrs.org/osx/karabiner/
    "A powerful and stable keyboard customizer for OS X."
    Lets you remap the cursor keys and keypad keys on your favorite keyboard to work the same as Windows
    OSX 10.4-10.10,   Free + Donations Accepted

Trim Enabler - Cindori - http://www.cindori.org/software/trimenabler/
    "The ultimate SSD utility for Mac OSX"
    Adds TRIM suspport to OSX for NON-APPLE SSD drives (I.ee you added an SSD)
    OSX 10.7 - 10.10*,   Free, Pro Version with more features is $10

Asepsis - BinaryAge - http://asepsis.binaryage.com/
    "Asepsis prevents creation of .DS_Store files. It redirects their creation into a special folder."
    Keeps those &@#(*! .DS_Store files out of your project directories (and accidently zipped up) or anywhere else
    OSX 10.8-10.10,   Free, Source Available ( https://github.com/binaryage/asepsis )

Little Snitch -  Objective Development Software GmbH  http://www.obdev.at/products/littlesnitch/index.html  
    "Network Monitoring Redefined."
a firewall for managing outbound traffic - similar to Zone Alarm et al on windows.
    OSX 10.8-10.10,  Demo Mode (runs for 3 hours/30 days), Purchase License for for $34.95


Useful Utilities

Witch - Get alt+tab back. http://manytricks.com/witch/

Wine - Guide to installing Wine on OSX (run Windows apps without using a VM) - http://www.davidbaumgold.com/tutorials/wine-mac/


Afloat - infinite labs - https://www.macupdate.com/app/mac/22237/afloat
    "It adds always-on-top, transparency, Spaces window management and more. "
    Keep app windows on top as needed (notepad, etc)
    OSX 10.6-10.10,   Free, Source Available ( https://github.com/millenomi/afloat )

TotalFinder - BinaryAge - http://totalfinder.binaryage.com/
    "Brings colored labels back to your Finder and more!"
    Very good finder imporvement - Adds tabs, dual panes, toggle hidden files and a bunch more
    OSX 10.8-10.10,  14-Day Trial, $18 to Register

TotalTerminal - BinaryAge - http://totalterminal.binaryage.com/
    "It provides persistent Visor Window which slides down when you press a hot-key (remember Quake console?)."
    Haven't used it, but seems worth checking out
    OSX 10.8-10.10,   Free

Blue Harvest - zero one twenty -  http://www.zeroonetwenty.com/blueharvest/
    "The most powerful way to keep your disks clean of Mac metadata."
    Keep your USB Thumb Drives, Network Drives, etc clean of .Trashes .FSEvents, etc
    OSX 10.8-10.10,   30-Day Trial, 14.95

SwitchResX  - Stéphane Madrau - http://www.madrau.com/
    "Get Back Control Of Your Screens!"
    Useful to set custom resolutions or if OS X has problems recognizing your (external) monitor
    OSX 10.6-10.10,   10-Day Trial, Nags to Register for Eur 18.40

gfxCardstatus - Cody Kriger -  https://gfx.io/  
    "gfxCardStatus is an unobtrusive menu bar app for OS X that allows MacBook Pro users to see which apps are affecting their battery life by using the more power-hungry graphics. "
    Utility for users of macbook pro's with multiple GPU's to force-select/switch between these.
    OSX 10.7-10.10,  Free, Source Available ( https://github.com/codykrieger/gfxCardStatus )

Quickboot - Buttered Cat Software - https://buttered-cat.com/products/QuickBoot
    "Quickly boot an alternate OS."
    Once click from mewnu to reboot into Bootcamp.  Useful if you switch between OS's often.
    OSX 10.8-10.10,   Free, Broken link to donate bitcons

Alfred - Andrew & Vero Pepperrell - http://www.alfredapp.com/  
    "Alfred saves you time when you search for files online or on your Mac. Be more productive with hotkeys, keywords and file actions at your fingertips."
    Essentially a launcher app with some neat extra functionality.
    OSX 10.6-10.10,  Free, Powerpack version adds features for UKP £17

Better Touch Tool - Andreas Hegenberg - http://www.bettertouchtool.net/  
    "BetterTouchTool is a great, feature packed FREE app that allows you to configure many gestures for your Magic Mouse, Macbook Trackpad and Magic Trackpad."
    Utility that allows the above but more also enables window snapping (a la windows hot corners).
    OSX 10.7-10.10,  Free

Carbon Copy Cloner - Bombich Software - https://bombich.com/  
    "Make your bootable backup today!"
    Creates (bootable) clones/backups of your hard drives. Older versions are free/ad-sponsored.
    OSX 10.6-10.10,  30-Day Trial,  Buy Personal License for $39.99

The Unarchiver -  Dag Ågren - http://unarchiver.c3.cx/unarchiver  
    "A set of applications for Mac OS X, iOS and other systems that can extract and inspect the contents of archive files in nearly any format."
    Much better in my experience than the standard osx archive manager.
    OSX 10.6-10.10,   Free,  Donations accepted.  Source Available ( https://code.google.com/p/theunarchiver/ )

iStat - bjango - http://bjango.com/mac/istatmenus/  
    "An advanced Mac system monitor for your menubar"
    Mac system stats monitoring in the system bar
    OSX 10.8-10.10,  14-Day Trial, Buy single license for $16

HyperDock - Christian Baumgart -  http://hyperdock.bahoom.com/  
    "HyperDock adds long awaited features to your Dock"
    Does quite a few things, biggest are snapping windows to the edges of your screen like Windows and Dock previews
    OSX 10.6-10.10,  Shareware,  Buy Full License for $9.95

Choosy - George Brocklehurst - http://www.choosyosx.com/  
    "Forget the default browser, Choosy opens links in the right browser."
    Offers you an option to which browser a link should be opened in. Really useful with multiple accounts in multiple browsers.
     OSX 10.5-10.10,  Shareware, Register fater 45 days for $12

NameChanger - mrr software - http://mrrsoftware.com/namechanger/  
    "Rename a list of files quickly and easily.  See how the names will change as you type."
    Great batch/mass file renaming tool
    OSX 10.6-10.10,  Free,  Donations accepted

Bartender - Surtees Studios - http://www.macbartender.com/  
    "Organize your menu bar apps."
    Hide some of the menu bar clutter (DropBox, Copy, SideKick, Bluetooth etc.)
    OSX 10.8-10.10,   28-Day Trial, Purchase License for $15

Spectacle - Eric Czarny - http://spectacleapp.com/  
    "Move and resize windows with ease."
    That great and useful Win+Arrow shortcut from Windows. Now you can get it on Mac.
    OSX 10.8-10.10,  Free, Donations Accepted,  Source Available ( https://github.com/eczarny/spectacle )


Device / Asset Tracking Services

Prey - Fork, ltd -  Prey - Fork, ltd - https://preyproject.com/  
    "Protect your devices from theft."
    Free software that allows you to track the location of your hardware. Open source.
    Windows XP+, OSX 10.6-10.10, iOS 4.3+, Android 2+,  Free w/ Limited Features, Paid plans start @ $5 / Month
    Source Available ( https://github.com/prey )

Hidden app - Hidden - http://hiddenapp.com/   
    "The Anti-Theft Software for your Mac, iPad and iPhone"
    Software + Service to track devices.  Includes Remote access, lock, wipe, keylogging, other features
    OSX 10.6-10.10, iOS (?),  Montly plans tart at $2.50/Month for 3 devices and go up.


Important Applications - General

Firefox  - Mozilla - www.mozilla.org/
    Because Safari doesn't handle sooo many sites correctly 
    OSX 10.6-10.10,  Free,  Open Source / Mozlilla Public License
    Recommended Add-ons: Flash,  Ad Block Plus, Ghostery, etc..

Parallels Desktop for Mac - Parallels IP Holdings - http://www.parallels.com/products/desktop/
    "Run Windows on Your Mac"
    My Preferred VM for running Windows 7+ under OSX.   Alternatives: VMWare or VirtualBox also.
    OSX 10.8-10.10,   14-Day Trial.  Purchase New for $80 / $50 Upgrade

Vox - Coppertino - http://coppertino.com/  
    "Feature-Rich Music Player for Mac"
    A rather lightweight music player with a very compact interface, not like the itunes behemoth. Best one i have found so far.
    I especially like that when dragging files onto it, it gives you the choice to add to the current playlist or clear it.
    OSX 10.6-10.10,  Free


Important Applications - Developer

SourceTree - Altassian - http://www.sourcetreeapp.com/
    "A free Mercurial and Git client for Windows or Mac"
    Really useful compliment to the git command line tools.
    OSX 10.7-10.10, Windows 7+.   Free,  Free Registration (free) after 30 days

Beyond Compare  - Scooter Software - http://www.scootersoftware.com/download.php
    "Reconcile Your Differences" (Great folder/file/soruce diff tool)
    Excellent diff tool for files, folders, and more. Integrates nicely with SourceTree
    OSX 10.6-10.10, Windows XP+, Linux.   30-day Trial, Register for $30, Has Pro Version also


Package Managers

Homebrew - Max Howell / The homebrew community - http://brew.sh/  
    "The missing package manager for OS X"
    One-stop shop which provides a lot of the bits which are inexplicably missing from OSX.
    OSX 10.8-10.10,  Free,  Source Available ( https://github.com/Homebrew )

Macports - The MacPorts Project- https://www.macports.org/  
    "An open-source community initiative to design an easy-to-use system for compiling, installing, and upgrading either command-line, X11 or Aqua based open-source software on the OS X operating system"
    Aptitude style package installer.  Super useful when you want a command line version of Mercurial, or ImageMagick, or a bunch of other stuff;
    just run the installer, then it's "port install mercurial" from the Terminal.
    OSX 10.8-10.10,  Free, Source Available


Text Editors

Smultron - https://www.peterborgapps.com/smultron/  
    "An elegant and powerful text editor that is easy to use."
    Solid text editor with syntax highlighting, etc - now payware but older 3.x releases should still be free.
    OSX 10.6-10.10,  Free Trial (Unlimited?), Buy License for $5

Atom - Atom team - http://atom.io  
    "A hackable text editor for the 21st Century"
Lovely text editor, modular package system that can add features, you can write your own if you like, crazy fast once you get used to it,
    and great for people who float between *nix and Windows environments regularly. I use it every day.
    OSX 10.8-10.10,  Free,  Source Available ( https://github.com/atom/atom )

Sublime Text - Sublime HQ Pty Ltd - http://www.sublimetext.com/  
    "Sublime Text is a sophisticated text editor for code, markup and prose."
    Popular text editor.  Very customizable and programmable.
    OSX 10.6-10.10,  Free, but purchase license for continued use for $70

XVim - XVimProject - https://github.com/XVimProject/XVim  
    "Xcode plugin for Vim keybindings "
    Makes XCode's editor behave like vim.
    OSX Whatever with XCode.  Free, Sourve Available ( in fact, the only way to get it )


GameDev Specific Apps

ImageOptim - porneL - https://imageoptim.com/
    "ImageOptim is a free app that makes images take up less disk space and load faster"
    Excellent tool for compressing mobile app assets.  GUI tries all and picks best compression method.
    OSX 10.7-10.10,  Free,  Donations Accepted, Source Available ( https://github.com/pornel/ImageOptim )

ImageAlpha - pornel - http://pngmini.com/
    "ImageAlpha greatly reduces file sizes of 24-bit PNG files (including alpha transparency)"
    Quantizes True-color assets and handles alpha channel.
    OSX 10.7-10.9 (10?),  Free,  Donations Accepted, Source Available ( https://github.com/pornel/ImageAlpha )

TexturePacker - CodeAndWeb GmbH - https://www.codeandweb.com/texturepacker  
    "20 seconds to your optimized sprite sheet"
    Packs textures.   Does the job i needed it to do.
    OSX 10.7-10.10,  Free Mode + 7-Day Trial of Pro Version,  purchase Pro Version license for $39.95


Specialty Developer Applications

ccache - Joel Rosdahl & Andrew Tridgell. - https://ccache.samba.org/  
    "A fast C/C++ compiler cache"
    Will basically cache object files and simply fetch them when it detects you are compiling the same files with the same command line options.
    Very easy to integrate into makefiles and can massively improve clean rebuild times.
    OSX 10.8-10.10,  Free, Source Available ( https://github.com/jrosdahl/ccache )

C++ Builder XE7 (Professional) - Embarcadero - http://www.embarcadero.com/products/cbuilder
    "The C++ solution to build connected apps for Windows, OS X, iOS, Android, Gadgets, and Wearables"
    Descended of Borland C++.  Build all C++ console and GUI apps for OSX, Windows, even mobile from same source.
    Windows 7+, plus OSX 10.8-10.10.   30-Day Trial (Full ver), $1039 New User (pro)/ 569 Upgrade

Improved Unity asset bundle file compression

$
0
0

Download Times Matter


Sean Cooper, Ryan Inselmann and I have been building a custom lossless archiver designed specifically for Unity asset bundle files. The archiver itself uses several well-known techniques in the hardcore archiving/game repacking world. This post is mostly about how we've begun to tune the LZMA settings used by the archiver to be more effective on Unity asset bundle data. We'll cover the actual archiver in a later post.

LZMA has several knobs you can turn to potentially improve compression:
http://stackoverflow.com/questions/3057171/lzma-compression-settings-details

The LZHAM codec (my faster to decode alternative to LZMA) isn't easily tunable in the same way as LZMA yet, which is a flaw. I totally regret this because tuning these options works:

Total asset bundle asset data (iOS): 197,514,230
Unity's built-in compression: 91,692,327
Our archiver, un-tuned LZMA: 76,049,530
Our archiver, tuned LZMA (48 option trials): 75,231,764

Our archiver, tuned LZMA (225 option trials): 74,154,817

LZ codecs with untunable models/settings are much less interesting to me now. (Yet one more thing to work on in LZHAM.)

Here are the optimal settings we've found for each of our Unity asset classes on iOS. So for example, textures are compressed using different LZMA options (3, 3, 3) vs. animation clips (1, 2, 2).

Best LZMA settings after trying all 225 options. (In case of ties the compressor just selects the lowest lc,lp,pb settings - I just placed a triply nested for() loop around calling LZMA and it chooses the first best as the "best".)

Class (id): lc lp pb

GameObject (1): 8 0 0
Light (108): 2 2 2
Animation (111): 8 2 2
MonoScript (115): 2 0 0
LineRenderer (120): 0 0 0
SphereCollider (135): 0 0 0
SkinnedMeshRenderer (137): 8 4 2
AssetBundle (142): 0 2 2
WindZone (182): 0 0 0
ParticleSystem (198): 0 2 3
ParticleSystemRenderer (199): 8 3 3
Camera (20): 0 2 2
Material (21): 3 0 1
SpriteRenderer (212): 0 0 0
MeshRenderer (23): 8 4 2
Texture2D (28): 8 2 3
MeshFilter (33): 8 4 1
Transform (4): 6 2 2
Mesh (43): 0 0 1
MeshCollider (64): 1 4 1
BoxCollider (65): 7 4 2
AnimationClip (74): 0 2 2
AudioSource (82): 0 0 0
AudioClip (83): 8 0 0
Avatar (90): 2 0 2
AnimatorController (91): 2 0 2
? (95): 7 4 1
TrailRenderer (96): 0 0 0

More LZHAM notes

$
0
0
It's been a while since I've made any major changes to LZHAM (except for minor cmake related stuff). This was a codec I wrote over a few nights and weekends while I was also working my day job. I eventually had to let active dev on LZHAM go to sleep because I got "sidetracked" shipping Portal 2. The codec been successfully deployed in several products, such as Planetside 2 and Titanfall, which isn't bad for a few nights of R&D and implementation work.

I covered what I was thinking of doing with LZHAM in this blog post. I have more interest in improving it again. For the types of products I'm now working on, what matters a lot is the title's retention rate, from first starting the product to the customer actually getting into real gameplay. Slow downloads or updates, loading screens, etc. equals lost users. Lost users=lower monetization. We actually measure the retention rate of every aspect of this in the field. So things like background downloading, streaming, proper organization of asset data into Unity asset bundles, and of course good data compression matter massively to us.

Anyhow, some ideas for LZHAM decompression startup and throughput improvements which I can do pretty quickly:

- After much testing on our game data, I now realize I underestimated how useful the various LZMA settings are. Right now LZHAM always uses the upper 3 MSB's of the prev. two literals for literal/delta literal contexts. Allow the user to control all of this: which prev. literal(s) (if any), say up to 8 bytes back, and which bits from those literals, separately for each type of prediction (literals/delta literals).

- In my quest to get LZHAM's ratio up to be similar to LZMA I made several tradeoffs which can greatly impact decompression perf, especially on uncompressible data. Right now the codec must always init and manage 64*2 Huffman tables. Allow the user to reduce or even increase the # of tables.

- LZHAM was designed for "solid" compression, where you give the codec dozens to hundreds of MB's containing many assets, and you don't restart/reinit the codec in between assets. It's like a slow to start drag racer. So it can suck on small files.

I'm not honestly 100% sure what to do about this yet that won't kill decompression perf. The way LZHAM updates Huffman tables seems like an albatross here. Amortized over many MB's it typically works fine, but on small files they can't be updated (adapted) quickly enough. Less tables are probably good here.

I could just integrate something like miniz into the codec, and try using it on each internal compressor block and using whatever is better. But that seems horrible.

- The Huffman table update frequency needs to be better tuned. If I can't think of anything smarter, allow the user to control the update schedule.

Note if you are very serious about fast, high ratio compression and decompression, Rad's Oodle product is very good. Given what I know about it, it's the best (fastest, highest compression, and most scalable/portable) production class lossless codec I know of.

LZHAM v1.0 progress

$
0
0
Ported to OSX, and exposed several new compression/decompression parameters to allow the user to configure some of the codec's inner workings: literal/delta_literal bitmasks (or number of literal/delta literal bits - less versatile but simpler), the Huff table max interval between updates, and the rate at which the Huff table update interval slows between updates. These settings are absolutely critical to the decompressor's performance, memory, and CPU cache utilization.

The very early results are promising: 25-30% faster decoding (Core i7) and much less memory usage (still determining) by just tuning the settings (less tables/slower updating), with relatively little impact on compression ratio. The ratio reduction is only a fraction of 1% on the few files I've tested. (Disclaimer: I've only just got this working. These results do make sense -- it takes a bunch of CPU to update the Huff decode tables.)

Also, by reducing the # of Huff tables the decompressor shouldn't bog down nearly so much on mostly incompressible data. The user can currently select between 1-64 tables (separately for literals and delta literals, for up to 128 total tables). The codec supports prediction orders between 0-2, with 2 programmable predictor bitmasks for literals/delta_literals. (I'm not really sure exposing separate masks for literals vs. delta literals is useful, but after my experience optimizing LZMA's options with Unity asset data I'm now leaning to just exposing all sorts of stuff and let the caller figure it out.)

I'm also going to expose dictionary position related bitmasks to feed into the various predictions, just like LZMA, because they are valuable on real-life game data.

Annoyingly, when I lower the compression ratio decompression can get dramatically faster. I believe this has to do with a different mix of the decode loop exercised by the lower ratio bitstream, but I'm not really sure yet (and I don't remember if I figured out why 3+ years ago). I'll be writing a guide on how to tune the various settings to speed up LZHAM's decompressor.

On the downside, the user has more knobs to turn to make max use of the codec.



More LZHAM v1.0 progress

$
0
0
I'm currently seeing overall decompression speedups around 1.8x - 3.8x faster vs. previous LZHAM releases on Unity asset bundle files. On relatively incompressible files (like MP3's), it's around 2-2.3x faster, and 30-40% faster on enwik9. This is on a Core i7, I'll have statistics on iOS devices early next week.

I'm removing some experimental stuff in LZHAM that adds little to no real value:

- Got rid of all the Polar coding stuff and some misc. leftover params (like the cacheline modeling stuff)

- No more literal prediction, or delta literal predictions, for slightly faster decoding, lower memory usage, and faster initialization of the decompressor.

- Reduced is_match contexts from 768 to 12. The loss in ratio was a fraction of a percent (if any), but the decompressor can be initialized more quickly and the inner loop is slightly simplified because the prev/prev_prev decoded chars don't need to be tracked any more.

Just 2 main Huffman tables for literals/delta literals now, instead of 128 tables (!) like the previous releases. The tiny improvement in ratio (if any on many files) just didn't justify all the extra complexity. The decompressor's performance is now more stable (i.e. not so dependent on the data being compressed) and I don't need to worry about optimizing the initialization of a zillion Huff tables during the decoder's init.

I'm adding several optional, but extremely useful comp/decomp params:

// Controls tradeoff between ratio and decompression throughput. 0=default, or [1,LZHAM_MAX_TABLE_UPDATE_RATE], higher=faster but lower ratio.
lzham_uint32 m_table_update_rate;

m_table_update_rate is a higher level/simpler way of controlling these 2 optional params:

// Advanced settings - set to 0 if you don't care.
// def=64, typical range 12-128, controls the max interval between table updates, higher=longer interval between updates (faster decode/lower ratio)
lzham_uint32 m_table_max_update_interval;

// def=16, 8 or higher, scaled by 8, controls the slowing of the update update freq, higher=more rapid slowing (faster decode/lower ratio), 8=no slowing at all.
lzham_uint32 m_table_update_interval_slow_rate;

These parameters allow the user to tune the scheduling of the Huffman table updates. The out of the box defaults now cause much less frequent table updating than previous releases. The overall ratio change from the slowest (more frequent) to fastest setting is around 1%. The speed difference during decompression from the slowest to fastest setting is around 2-3x.

Next up: going to generate some CSV files to make some nice graphs, then the iOS and (eventually) Android ports.

Good lossless codec API design

$
0
0
I've seen many potentially good lossless codecs come out with almost useless interfaces (or none at all). Here are some attributes I've seen of good codecs:

- If you want others to use your codec in their apps, don't just provide a single command line executable with a awkward as hell command line interface. Support static libraries and SO's/DLL's, otherwise it's not useful to a large number of potential customers no matter how cool your codec is.

- Minimum number of source files, preferably in all-C.
If you use C++, don't rely on a ton of 3rd party crap like boost, etc. It just needs to compile out of the box.

Related: Programmers are generally a lazy bunch and hate mucking around with build systems, make files, etc. Make it easy for others to build your stuff, or just copy & paste into your project. Programmers will gladly sacrifice some things (such as raw perf, features, format compatibility, etc. - see stb_image.h) if it's fast and trivial to plop your code into their project. Don't rely on a ton of weird macros that must be configured by a custom build system.

- Even if the codec is C++, provide the interface in pure C so the codec can be trivially interfaced to other languages.

- Must provide heap alloc callbacks, so the caller can redirect all allocations to their own system.

- Support a "compile as ANSI C" mode, so it's easy to get your codec minimally working on new platforms. The user can fill in the platform specific stuff (atomics, threading, etc.) later, if needed.

Related: If you use threads, support pthreads and don't use pthread spinlocks (because OSX doesn't support pthread spinlocks). Basic pthreads is portable across many platforms (even Win32 with a library, but just support Win32 too because it's trivial).

- Don't assume you can go allocate a single huge 256MB+ block on the heap. On mobile platforms this isn't a hot idea. Allocate smaller blocks, or ideally just 1 block and manage the heap yourself, or don't use heaps.

- Streaming support, to minimize memory consumption on small devices. Very important in the mobile world.

- Expose a brutally simple API for memory to memory compression.

- Support a zlib-compatible API. It's a standard, everybody knows it, and it just works. If you support this API, it becomes almost trivial to plop your codec into other existing programs. This allows you to also leverage the existing body of zlib docs/knowledge.

- Support in-place memory to memory decompression, if you can, for use in very memory constrained environments.

- Single threaded performance is still important: Codecs which depend on using tons of cores (for either comp or decomp) to be practical aren't useful on many mobile devices.

- In many practical use cases, the user doesn't give a hoot about compression performance at all. They are compressing once and distributing the resulting compressed data many times, and only decompressing in their app. So expose optional parameters to allow the user to tune your codec's internal models to their data, like LZMA does. Don't worry about the extra time needed to compress, we have the cloud and 40+ core boxes.

- Provide a "reinit()" API for your codec, so the user can reuse all those expensive heap allocations you've made on the first init on subsequent blocks.

- Deal gracefully with already compressed, or incompressible data. Don't expand it, except by a tiny amount, and don't slow to a crawl. Related: don't fall over on very compressible data, or data containing repeated strings, etc.

- Communicate the intended use cases and assumptions up front:
Is it a super fast but low ratio codec that massively trades off ratio for speed?
Is it a symmetrical codec, i.e. is compression throughput roughly equal to decompression?
Is it a asymmetric codec, where (typically) compression time is longer than decompression time?
Is the codec useful on tiny or small blocks, or is it intended to be used on large solid blocks of data?
Does your codec require a zillion cores or massive amounts of RAM to be practical at all?

- Test and tune your codec on mobile and console devices. You'll be surprised at the dismally low performance available vs. even mid-range x86 devices. Also, these are the platforms that benefit greatly from data compression systems. One some CPU's, stuff like int divides, variable length int shifts, L2 cache misses are surprisingly expensive. On some platforms, CPU load hit stores can crush performance on seemingly decent looking code.

Also, think about your codec's strengths and weaknesses, and how it will be used in practice. It's doubtful that one codec will be good for all real-world use cases. Some example use cases I've seen from the video game world:

- If a game is displaying a static loading screen, the codec probably has access to almost the entire machine's CPU(s) and possibly a good chunk of temporary memory. The decompressor must be able to keep up with the data provider's (DVD/BlueRay/network) rate, otherwise it'll be the bottleneck. As long as the codec's consumption rate is greater or equal to the provider's data rate, it can use up a ton of CPU (because it won't be the pipeline's bottleneck). A high ratio, heavy CPU, potentially threaded codec is excellent in this case.

- If a game is streaming assets in the background during gameplay, the codec probably doesn't have a lot of CPU available. The decompressor should be optimized for low memory consumption, high performance, low CPU cache overhead, etc. It's fine if the ratio is lower than the best achievable, because streaming systems are tolerant of high latencies.

- In many games I've worked on or seen, the vast majority of distributed data falls into a few big buckets: Audio, textures, meshes, animations, executable, compiled shaders, video. The rest of the game's data (scripts, protodata, misc serialized objects) forms a long tail (lots of tiny files and a small percent of the total). It can pay off to support optimizations for these specific data types.

LZHAM v1.0 vs. LZMA relative decompression rate on 262 Unity asset bundle files

$
0
0
Going to be honest here, graphing this stuff in a way that makes sense and is useful is tricky and time consuming. I really like the way Rad does it with their lossless data compression library, Oodle. I'm going to try making graphs like theirs.

Anyhow, here's a quick graph showing the relative speedup of LZHAM vs. LZMA on 262 uncompressed Unity asset bundle files (160MB total) from a single game. The bundle data consists of a mix of MP3 audio, Unity meshes, anims, PVRTC4 texture data, and lots of small misc. serialized Unity asset files.

The bundles are on the X axis (sorted by decompression speedup, from slowest to fastest), and the actual relative speedup is Y (higher is better for LZHAM). The blue line is the relative speedup (or slowdown on the first 5 files, each between 54-16612 bytes - these files are either very small or with pretty high ratios). Blue line at 1.0=same decompression rate as LZMA.

The red line represents each bundle's compression ratio relative to LZMA's. Above 1.0 means LZMA was better (smaller compressed file), and below 1.0 means LZHAM was better. LZHAM tracks LZMA's ratio pretty well, and equals or beats it in many cases. (I did put LZHAM's compressor into its slowest/best parsing mode to generate this data.)

I make no claims this is the best way to visualize a codec's perf relative to another, I'm just experimenting and trying to gain insight into LZHAM's actual performance relative to LZMA.

This was on a Core i7, x86 build, LZHAM options: -h12 -x (fastest/least frequent Hufftable updating, up to 4 solutions per node parsing).


Lossless codec performance on Unity asset bundle data

$
0
0
I really like Rad's way of graphing Oodle's (their lossless compression product) overall performance at different disk read (or download) rates. With graphs like this it's easy to determine at a glance the best codec to use given a particular CPU and download/disk rate, assuming your primary metric is just getting the original data into memory as quickly as possible.

As far as I can tell, the X axis is the disk read/download rate, and the Y axis is the effective content "delivery" rate assuming the client downloads the compressed data first and then decompresses it (load time+decomp time). We actually do these steps in parallel, which I'm going to graph next, but this is useful and simple to understand at a glance.

Inspired by this, here are some graphs hopefully computed in a similar manner comparing the effective performance of LZ4 (LZ4_decompress_safe()), miniz (my Deflate - close enough to zlib for this test, and pretty fast), LZMA, raw (no decompression), and LZHAM v1.0 (regular parsing/fastest Huffman table updating). I timed this on my Core i7 970 running at 3.3GHz according to CPUZ.

The first two graphs are for download rates up to 24 megabytes/sec, and the second three zoomed in graphs are for rates up to 2.4 megabytes/sec. I'm including these zoomed charts because a large number of our mobile customers can only download between .5 - 2 megabits/sec., and I don't care about disk read rates (we decompress as we download and store uncompressed bundle data on disk).

The 2nd and 4th graphs simulate a slower CPU, by just multiplying the decompression time by 5.0. These results are particularly interesting because the impact of a slower CPU significantly changes the relative rankings of each codec. (I need to make graphs on several popular iOS/Android devices, which would be most enlightening.)

Test Data: A full game's uncompressed iOS Unity asset bundles (a mix of MP3 audio, meshes, anims, PVRTC4 textures, and serialized objects), ~166MB total, compressed as a single .TAR file

1. FAST CPU UNZOOMED: 3.3 GHz - X axis scale: 24 megabytes/sec.




2. SLOW CPU UNZOOMED: ~.66 GHz - X axis scale: 24 megabytes/sec.




3. FAST CPU, 10x ZOOMED: 3.3 GHz - X axis scale: 2.4 megabytes/sec.





4. SLOW CPU, 10x ZOOMED: ~.66 GHz - X axis scale: 2.4 megabytes/sec.



Here's one more zoomed chart, 10x zoomed (2.4 megabytes/sec), simulating a very slow CPU (1/12th the perf. of my 3.3 GHz Core i7):

5. VERY SLOW CPU, 10x ZOOMED: ~.28 GHz - X axis scale: 2.4 megabytes/sec.



This last graph demonstrates why I've been sinking way too much time working on LZHAM:

- I don't care much about decompression at really fast disk speeds: LZ4 obviously fills that niche nicely. A smart content system will read from the slower medium (network, DVD, etc.), decompress and recompress to LZ4, and cache the resulting LZ4 data on a fast local device (HD/SSD).

- There's a lot of value in optimizing for low download speeds, and minimizing the overall amount of data downloaded (thanks to crappy ISP monthly download caps). We have real-life data showing many customers can download at only 500-2000 kbps, and this directly impacts how quickly they can get into our title on first run.

- Mobile CPU's are too slow to execute LZMA effectively. On slow CPU's even ancient Deflate can be a better choice vs. LZMA at high enough download rates.

So all I need to do now is actually test LZHAM v1.0 on a real A5 CPU and see if it performs as well as I hope it will.

Parallelized download+decomp performance of various codecs

$
0
0
I finished porting and testing LZHAM v1.0 on OSX today. Everything works, including multithreaded compression. I've also tuned the Huffman table updating options more. Next stop is iOS.

The graphs in the previous post show the summation of load_time+decomp_time at various download rates. For large amounts of data (not small blocks), it makes sense to decompress each buffer of compressed data as it becomes available (from the disk or network) using streaming decompression, instead of waiting for all the compressed data to be fully downloaded first.

The following graphs show uncompressed_size / MAX(load_time, decomp_time) at various download rates on ~166MB of uncompressed Unity iOS asset bundle data compressed as a single .TAR file with various codecs. (Of course, in case it's not obvious, I'm assuming everywhere here that the data has already been pre-compressed. I am not including compression times anywhere here.)

My brain is tired, but this should roughly approximate the total amount of time it would take to deliver the uncompressed data (ignoring the effects of buffering and other overheads). I'm also assuming it's a single large stream of compressed data and you're not doing something fancy like parallelizing decompression.

The version of LZHAM referenced in this post is v1.0, which I'll be releasing on github hopefully this week (not the old alpha on Google Code).

1. 3.3GHz CPU, X axis scale=281.63 megabytes/sec.


Graph 1: On fast CPU's, up until 13 MB/sec. download rates LZMA is the winner (because its ratio is highest), then up to 95MB/sec. LZHAM v1.0 switches to the winner (because its ratio is almost as high as LZMA but it decompresses more quickly - at these rates LZMA is just too slow to keep up with the download rate). After than, good old Deflate (miniz's decompressor) is best up until ~130MB/sec., then LZ4 takes over from there.

2. 3.3GHz CPU, X axis scale=2.35 megabytes/sec. (zoomed version of above)



Graph 2: A zoomed in version of the left of graph 1 showing more detail at the slower downloads speeds. Graph 2 shows that on fast CPU's and slow networks, LZMA is the clear winner because all that matters to the overall delivery time is compression ratio. LZHAM is close but loses because it has a slightly lower ratio (usually 1-4% lower) on this data. The other codecs just have too low a ratio to compete at these low network speeds.

3. VERY SLOW CPU, X axis scale=4.7 megabytes/sec.


Graph 3: A much slower CPU (simulated 1/12th the performance of my desktop CPU - just multiplied the decomp time by 12). This graph is zoomed in to show detail at low network speeds. LZMA is the winner up to ~1MB/sec., then it plateaus because it's too slow to keep up on this slow ass CPU. LZHAM can sustain up to 3MB/sec, then Deflate takes over. From there LZ4 will eventually take over as the winner at the higher network rates because Deflate eventually won't be able to keep up. At the very highest network rates it makes more sense to not even bother using compression at all, because the CPU won't be able to keep up, even with LZ4.

My 2 cents: The "best" codec to use (where "best" minimizes the amount of time the user needs to wait for the full download) depends on the client's CPU speed, the codec's compression ratio, and the download (or disk) rate. A smart content server (that doesn't give a crap about how much data it actually sends over the network) could choose the best codec to use depending on these factors.

To minimize content delivery (download) time on fast desktop CPU's, use something like LZMA or LZHAM. Single threaded LZMA doesn't scale beyond ~13MB/sec. on my CPU, where LZHAM will scale up to 95MB/sec. but has a slight download time penalty (around 1-4%) at the slower download rates.

On slow CPU's, LZMA plateaus very early. Beyond this download rate, LZHAM v1.0 is the winner, then Deflate followed by LZ4.

Of course, we do care about how much data must be downloaded, due to ISP caps, monthly rate plans, etc. In this scenario, we need to stick to using a high ratio codec like LZMA or LZHAM. LZMA doesn't scale well on slow CPU's, so we need something like LZHAM that has a similar ratio but scales more effectively.

To ultimately improve the current state of the art, we need a new codec with higher ratios than LZMA that doesn't run like a complete dog, use massive amounts of RAM, or require large #'s of cores to be practical. I don't know of any open source codecs that fit these requirements yet. Somebody needs to write one.

(I think my eyes are bleeding from all this data & graphs..)

Viewing all 302 articles
Browse latest View live