Varnish is an open source, high performance http accelerator that sits in front of a web stack and caches pages. This caching layer is very configurable and can be used for both static and dynamic content.
One great thing about Varnish is that it can improve the performance of your website without requiring any code changes. If you haven’t heard of Varnish (or have heard of it, but haven’t used it), please read on. Adding Varnish to your stack can be completely noninvasive, but if you tweak your stack to play along with some of varnish’s more advanced features, you’ll be able to increase performance by orders of magnitude.
One of Factual’s first high profile projects was Newsweek’s “America’s Best High Schools: The List.” After realizing that we had only a few weeks to increase our throughput by tenfold, we looked into a few options. We decided to go with Varnish because it was noninvasive, extremely fast and battlefield tested by other companies. The result yielded a system that performed 15 times faster and a successful launch that hit the front page of msn.com. Varnish now plays a major role in our stack and we’re looking to implement more performance tweaks designed with Varnish in mind.
A Simple Use Case
The easiest and safest way to add Varnish to your stack is to serve and cache static content. Aside from using a CDN, Varnish is probably the next best thing that you can use for free. However, dynamic content is where you can squeeze real performance out of your stack if you know where and how to use it. This guide will only scratch the surface on how Varnish can drastically improve performance. Advanced features such as edge side includes and header manipulation allow you to leverage Varnish for even higher throughput. Hopefully, we’ll get to more of these advanced features in future blog posts, but for now, we’ll just give you an introduction.
Assuming you’ve installed it correctly, you should be able to run both your webserver and Varnish on different ports. The rest of this guide will assume that you have your webserver running on port 8080, Varnish running on port 80.
Varnish Configuration Language: VCL
Varnish uses its own domain specific language for configuration. Unlike a lot of other projects, Varnish’s configuration language is not declarative. Its very expressive and yet easy to follow. For ubuntu, Varnish’s config file is located here: /etc/varnish/default.vcl A lot of the examples we’ll dive into are based on Varnish’s own documentation here.
This is a simple Varnish config file that will cache all requests whose URI begins with “/sytlesheets”. There are a few things to note here that we’ll explain later:
the removal of the Accept-Encoding header
the removal of Set-Cookie
return(lookup) and return(pass) in vcl_recv
Now lets look at a few things in detail:
Removing Accept-Encoding Header
The reason this is done is because Varnish doesn’t handle encodings (gzip, deflate, etc…). Instead, Varnish will defer to the webservers to do this. For now, we’re going to ignore this header and just have the webservers give us non-encoded content. The proper way to handle encodings is to have the encoding normalized, but we’ll discuss this later.
Removal of Set-Cookie
We do this because we don’t want the webserver giving us session-specific content. This is just a safe guard and is probably a little unnecessary, but its probably a good thing to note when caching. We’ll discuss session-specific content later.
Returning “pass” vs “lookup”
Returning “pass” tells Varnish to not even try to do a cache lookup. Returning “lookup” tells Varnish to lookup the object from its cache in lue of fetching it from the webserver. If the object is cached, the webserver is never hit. If it isn’t in the cache, then vcl_fetch is called before fetching the content from the webserver.
Manipulating the Hashing Function
User/Session Specific Content
Let’s say that we want to cache every users “/profile” page. This can be done by including the cookie in the hash function like this:
Canonicalized Url Caching
In Ruby on Rails, it is common practice to attach trailing timestamps at the end of static content to ensure that the web browser doesn’t cache it (e.g. /stylesheets/main.css?123232113). Let’s say we don’t want to include this when we cache our stylesheets. Here is an example that will remove the trailing timestamp.
Browser Specific CSS
Caching browser specific content. One trick we use is to have a small portion of our css be browser specific to handle various differences between browsers. We do this by having a dynamic call that will serve up css based on the User-Agent header. The problem with this technique is that we’ll have different css being served by the same url. Varnish can still cache this by adding the User-Agent header to the hash like such:
Varnish has options to create ACL’s to allow access to certain requests:
Purging by lookup uses the vcl_hit function and “PURGE” http action:
Purge by URL
Purging by url is probably a safer bet if you are using cookies or any other tricks in your hash function:
Its good to canonicalize your encoded requests because you could either get redundent cached objects, or you could end up returning incorrect encoded objects. For more details, please refer to the Varnish FAQ on Compression. Below is a snippet from that page.
Lets pretend that we have a special assets server that serves up just our stylesheets. Here is an example of having multiple backends for this purpose:
Round Robin and Random Multiple Server Backend
Varnish stays on our stack happily ever after…
When we first started using Varnish, it was out of desperation and all new to us. Over the past year, we’ve been figuring out ways to leverage its performance in more creative ways. At this point, we couldn’t imagine putting together a stack that didn’t include this great project.
We hope this post has been helpful for anyone interested in getting varnish setup for the first time. Future blog posts will include more advanced features.