STM, CouchDB, and pushing 5500 docs/sec
The more I play around with Clojure and CouchDB the more they seem like they were made for each other. From my earlier post, you might have gotten the impression that CouchDB can't hold its own to MySQL or PostgreSQL in terms of raw write performance in an everyday scenario - calling into a web server, hitting the db, and returning a response. After noodling around some more I came to the conclusion that the bottleneck was the relatively expensive HTTP connections required to talk the database. While this has larger scalability benefits this doesn't help much if you're starting out on your project from a single server and the speed at which your own code can talk to the database matters to you.
So I contemplated the problem some and wondered whether Clojure's STM (Software Transactional Memory) could be leveraged. As requests come in, instead of connecting immediately to the database, why not queue them up until we have an optimal number and then do a bulk insert? STM makes coordinating this sane, and it turned out to be ridiculously simple to implement in few minutes:
This code actually takes us very close to the maximum write performance of CouchDB. On an AWS Cluster Compute Instance I was able to insert a million small documents in about 3 minutes. That's an average of ~5500 documents per second. Not too shabby. Add a couple more CouchDB instances into the mix with replication and things get interesting pretty quickly.
You might notice that this code works in a fire-and-forget manner. I went ahead and implemented a version that uses Clojure promises that actually returns the ids of the documents written into the database. On my laptop this version performed as well as the MySQL version that used connection pooling. What should blow your mind when looking at this code is how straight line it is. Lovely.