Clojure, Node.js and Concurrency Fail
My first comparisons between Aleph and Node.js were on my i7 Macbook Pro. In those tests Clojure came out the winner because Netty, the underlying library to Aleph, transparently distributes work across available cores.
From the responses to that post and some research I've learned that Node.js can be made to take advantage of more cores by creating processes. Poking around, I found a recent Node.js library multi-node which makes creating a multi-process Node server easy.
As is often the case with Node.js the simplicity of the code is lovely.
So I went and fired up an Amazon Cluster Compute Instance (8 cores) and ran some basic benchmarks again using ab -n 5000 -c 50 http://localhost:8080/. Lo and behold ... Node.js is not faster than Aleph. In fact they are pretty much neck-to-neck. Even if I wrapped my Aleph responses in some futures, Aleph pretty much pushed 20K requests per second (as did Node.js). I'm not going to the show the Aleph code because it remains the same as the code shown in my first blog post.
So it seems at first that Node.js can hold its own. However I decided to see how Node.js might fare when sharing data between processes. First the Clojure code:
Here we have a shared resource - a simple counter. Many threads are hitting this counter and we can always report back a consistent value.
I looked into how this might be accomplished with multi-node and came up with the following:
First off, it's much more complicated than the Clojure code since we have no language support for reading consistent values in a concurrent situation. We have to communicate across processes via messages. This adds a lot of incidental complexity to the code.
Second, this code is simply wrong. If there are any mishaps whatsoever, a process will have an inconsistent state. Turns out that when this code is run for the first time using ab -n 5000 -c 50 http://localhost:8080/, many of the requests fail. The processes end up with incorrect states pretty much immediately.
So I sat there and thought about this problem. The only way to get a correct snapshot would be for a process upon receiving a request to ask every other process for their current value and then sum those values to its own. While this is not hard, this also not something I could whip up in 10-15 minutes of considering the problem. I hope someone will submit a correct solution.
In anycase, yet again we see the benefits of Clojure's concurrency model - it's brain dead simple. To be fair, Node.js's problems are largely JavaScript's problems, and those problems exist in pretty much every other language that doesn't have tools for making these kinds of things easy and performant.