Solving The Mapping Dilemma
In his essay Why Are Black Boxes So Hard To Reuse? Gregor Kiczales outlines the lamentable truth about object oriented software development - the mapping dilemma. The gist of his excellent essay is that it’s incredibly difficult to implement good abstractions in our programming languages that scale to all levels and deliver the requirements at each level.
My experience as a developer, and I think many developers, is that his analysis is heartbreakingly accurate. You always end up giving up something or your abstraction simply cannot be applied to a large class of cases that we know are fundamentally trivially different. You shake your head, you roll up your sleeves and basically write the whole thing again filled with minutae that cannot abstracted away.
Well, for what seems like the very first time in my own code, I’ve avoided the mapping dilemma.
As you might know, I’ve been working on match, a pattern matching library for Clojure. Like the facilities present in Standard ML, OCaml, Haskell, Scala, Erlang, and Racket, the library allows you to easily pattern match data. If you’re a functional programming enthusiast all the goodies are there - matching maps, sequences, or patterns, guards, Java types, and naming submatches. Unlike the aforementioned programming languages (with the exception of Racket), match is extensible.
One feature I left off until recently was vector pattern matching. Originally this was going to be specifically for Clojure’s persistent vectors, but I realized the work should apply to any array-like (random access) data structure. This should include primitive arrays, buffers, and even individual bytes.
Now here’s the rub, if you’re pattern matching primitive arrays and bytes you probably care quite a bit about performance. Introducing even one object instantiation is an exorbitant cost. So wrapper types are out. Yet hardcoding the types that a person might want to pattern match is very undesirable.
So what did I do? I opened up the machinery to match and allow clients to tell me what the most optimal access is for a particular datatype. The match library is an optimizing pattern match compiler. I provide hooks to the compiler so you can specify the precise mapping that you need. You can pattern match primitive arrays and bytes without introducing any overhead. No wrapper types.
The benefit? Pattern matching arrays is 30 times faster than in Scala and about 10 times faster than in Racket.
How much work did this all involve? The library is only around 1100 lines of code. We can use the same abstractions to pattern match everything from Java objects to individual bits. At every level of scale you get exactly the performance profile you want.