The Keys to Caching Drupal

The Keys to Caching Drupal

If you have ever tried to run a site without any caching at all, you will pretty quickly appreciate the value it brings to a site. Caching is basically storing previous information requests and calculations and checking to see if it has already been done before doing it again. For example, if I asked you to look up a name in the address book and you put your finger there and closed the book, if I again asked you for the same name you could very quickly look it up. This is very useful for common tasks and lookups since it can save time and resources that can be dedicated elsewhere. You can't cache everything though for example, if I asked you to look up 10,000 names in the address book, caching wouldn't help at all. In serving up a web page with Drupal there are many, many layers of cache involved but to understand all of them, we have to follow the request from beginning to end when loading a page. Requests start at the browser of the end user requesting a web page. They then go to the web host where a process, apache, receives the request. Since Drupal is a php language and php isn't a compiled language, php compiles all the code that it needs to service the request. It then begins to service the request. Drupal decides which module needs to respond with the page request through database queries and much computation. When this is done it returns the response to apache which sends it to your browser. Browser->Apache->PHP->Mysql This process is actually fairly slow on your average machine if it is run from beginning to end. It also takes up a lot of CPU and RAM. Without any caching at all, an average machine would only be able to serve up a few requests at a time and each request would take at least several seconds, possibly longer. Obviously this is not ideal and that is where caching comes in. Turns out much of what is going on is very similar if not identical the vast majority of the time. We can add layers of caching and drastically speed things up. If you are running Drupal correctly you should have caching on just about every layer of serving up a request. The first level of caching to enable is browser caching. This basically tells the user's browser how long it can keep using a file without checking the server for an update and downloading it again. A good example of this would be your logo. There is no reason that every time someone goes to a new page on your website they need to download the logo again. Their browser should cache this and use the same one over and over again. To enable browser caching make sure your have apache mod_expires turned on. If you have this turned on, it will tell the browser to cache all files for 2 weeks. If you want to tweak this more you can find the settings in your .htacess file. The next level of caching you should consider is a reverse proxy caching system, most commonly varnish. This changes the requests a little to this: Browser->Varnish->Apache->PHP->Mysql Many people coming to your website are going to be anonymous users coming to your home page. Is it really necessary to rebuild the home page for each one of these? Varnish allows you to cache many of the common non-dynamic pages on your website and quickly server them up. So the first anonymous person to go to your home page Varnish would see that there is no cache for it and ask Apache to render the page. The next anonymous user would be served directly from Varnish and apache would never need to get involved. To use Varnish be sure to install and configure the varnish daemon, modify the default.vcl file and install the varnish drupal module. Also be sure to use Pressflow instead of vanilla Drupal as it has some tweaks that allow it to better work with Varnish. For more information see http://drupal.org/project/varnish The third level of caching is the php being compiled on your server. Php is a non-compiled language meaning that you never compile binaries to run. Instead, every time the code is run it is compiled again. This is great for a simple site that you are constantly updating but adds a performance hit for larger production sites. Luckily php has a couple of good Op code caches that will cache the compiled code. They are APC and eaccelerator. These are very easy to install and use and require no drupal changes or customizations. Simply install them on your sever and add a few lines of config code to php.ini. I've seen sites drop 25% of the load time after adding a simple Op code cache. See http://drupal.org/project/apc The fourth level of caching is Drupal's internal caching system. Drupal allows modules to store different things in the cache that will more quickly speed up page loads. You can store blocks, views and even whole pages. This is a bit slower than Varnish for caching the whole page as Apache and php compiles have to get involved. By default the caching system in Drupal stores the cache in the database. It is pluggable though and a very popular alternative is to use memcache. Memcache uses system memory that is very fast to cache things instead of storing them in a slow database. To use memcache, install the daemon and use the drupal memcache module. You can find out more about memcache here: http://drupal.org/node/1131458 Finally there is one more level of caching involved in serving up Drupal web pages. That is the database cache. Each database request (and there are a lot!) is cached and served up from mysql's cache if possible. By default mysql comes with decent settings for caching queries. For very high performance sites you may want to tweak the settings and improve them based on your server's capacities. You can find more information on tuning mysql here: http://drupal.org/node/51263 For more information on high performance and scalability see http://drupal.org/node/326504 Obviously there are lots more things you can do with caching a website but these are a great place to start and can dramatically improve the performance of your site for many of your users. It's amazing to take a site that is taking 7-10 seconds to load and properly configuring the caches and seeing the response times come down to under a second. Photo credit: http://www.flickr.com/photos/jmrosenfeld/2903513401/