lca2008 — cache efficient data structures minitalk

The second talk I attended on Tuesday was “Cache Efficient Data Structures” given by Joern Engel. Personally, I found this talk to be very interesting on several levels, but primarily because it focused on the nexus between theory and implementation that happens to excite me (yes, I am a dork. A good-lookng dork.).

The key takeaway from Joern’s talk, IMO, was that computer science theory is dandy in the classroom, but when it comes to actually implementing stuff, theory is insufficient.

The classic data structure that computer scientists prefer using to store information is the hash table, because it can be proven mathematically that operations such as insertions, removals, and lookups all have the perfect tradeoff in performance.

Unfortunately, in real life, it turns out that once a data set grows large enough, it doesn’t really matter what you do with it so much as how you access it. In other words, the penalty incurred from a cache miss completely dominates the theoretical running time of data structure accesses.

This means that while a computer scientist is happy to use a hash table to store a million objects, a kernel hacker realizes that the number of cache misses incurred when inserting / removing objects is going to be sufficiently large (as you can only fit one hash table object per cache line) to the point where performance will suffer. You’re going to be spending all your time flushing your caches instead of doing useful work. And may the gods help you if you’re on some sort of multi-cpu cache-coherent system with concurrent readers and writers. Egads.

More importantly for a kernel hacker, that memory doesn’t even belong to you; it’s the user’s memory! Bad hacker! Bad!

The solution is to use more exotic data structures like radix trees or perhaps a B tree variant. The memory characteristics of these structures happen to be much more cache friendly (lower memory overhead and higher number of objects per cacheline), and the gains are worth the increased complexity when managing these structures.

Two quick final notes (watch the video if you’re interested; I’ll not repeat it all here): willy mentioned a crazy data structure called an “xor list” which is basically an optimization trick on a doubly-linked list where you xor the forward and backward pointers together to save yourself one pointer’s worth of overhead, and second, Judy hashes (or whatever they’re called) could be cool if normal humans could understand them. As it is, probably only the author of the library is the only person on this planet that actually understands how they work. The verbatim quote from the slide:

Judy trees: !#@$$%@##%^&& ????

Might be an interesting project for the motivated individual, to rewrite this library with a usable API.