Most people have heard the buzz surrounding Node. Words like “fast”, “scalable”, “concurrency” come to mind. At Factual, we pride ourselves on using (and finding) the right tool for the job. We use everything from jquery to hadoop, postgresql to mongodb and fortunately our engineering culture gives us the leeway to experiment a little. However, any technology that we deploy has to be measured and justified. Qualities such as agility, performance, stability and cost are all part of the equation.
We understand that there are many misinformed developers out there that think Node means instant scalability and performance. The truth is, as with a lot of technologies out there, it solves a very specific problem. What does it solve? How well does it solve it? What are the trade offs? This is our brief experience with Node and how it worked for us.
Since then, we’ve started to focus more on curated data and having an API that can deliver it quickly. This major shift in our product came with a whole set of new requirements. We now have to deliver responses in under 200ms while handling basic things like: authentication, permission checking, real time statistics, and query processing. We had to do this in an agile language that allowed us to build out features quickly to respond to popular developer requests.
Since we had experience in Ruby, we built our first prototype using a barebones Sinatra stack. This gave us decent performance (+20ms with 120 concurrent connections on top of our datastores), but still, it didn’t quite scale enough for the type of traffic that we were anticipating.
Before we dive too deep into praising Node, let’s list out some of the tradeoffs we made:
- Still an immature framework (we discovered a socket leak in the earlier versions of their http libraries)
- Spaghetti code: callbacks galore. This can be mitigated through use of good design and sticking with certain patterns. However, it’s still not fun.
- Debugging can be soul crushing at times (mostly due to the first 2 problems)
The reason why Node was such a good fit for us was because:
- We were IO bound
- We used very little CPU per request
- We were able to use evented programming to help us aggressively cache
The first two reasons are nothing new. In fact, the combination of these two seem to be the poster child of what Node does well. These two reasons fit our problem set perfectly. The third reason, however, is the one we want to shed some light onto.
Consider the following example of getting user data from the database:
Now let’s add a caching layer to this:
Since we have caching, we need a way to invalidate this. The Redis pubsub feature is a great way to handle things like this for realtime updates to your cache:
For fun, we have a call that gives us stats on urls that were visited:
We understand that there are more cases where Node is NOT the solution. However, in our case, it was a great fit. Since we’ve started using Node about a year ago, it has matured tremendously and has great support from both Joyent and the rest of the open source community. We’ve started using it for various other internal projects. It is becoming more of a general solution and less niche. I encourage any developer to explore Node and see if it is a good fit for his problem. If anything, it’ll get you thinking about IO and how much time you spend waiting on it.