Varnish: Your site faster and more stable

Are you managing a server running a blog or content website and expecting or experiencing a massive number of requests?

I have been in this situation recently myself, running a WordPress instance on a single virtual machine that suddenly got over 30,000 requests in just a single day (hint: it’s one you probably know if you’re reading this).

How did I manage this without the website going down or becoming unbearably slow? With the help of the amazing tool called Varnish, which is available in Fedora!

Introducing: Varnish

Varnish is a caching daemon that sits between your visitors and the web daemon itself. For every request that comes in, it looks if it already retrieved a cached version of that page.

If it does, it serves that; if it doesn’t, it will request the page from the backend web server, serve that to the visitor, and cache the page for when the next visitor comes by requesting the same page.

How does Varnish work?

Varnish almost seems like magic and can improve your site’s performance massively, but there are some things to take into account.

Amongst others:

  • It will not cache any requests that had cookies in them.
  • It will deem /mypage?counter=1 as a different page than /mypage?counter=2, so hitting the second will not result in a cache hit for the first one.
  • How do you tell Varnish that the page has changed and it needs to refresh its cache?

First things first: Varnish is an amazing tool if your website has a lot of public and static content, it is less well-suited to websites where the majority of requests are authenticated and should serve different results for different people, because of the fact that it caches the entire web page.

If you want caching for websites that require users to be signed in or on very dynamic websites, Varnish will be less of an ideal fit and may even increase the load. In this case, you need caching that’s more specific to the website itself. For example, things like Memcached work better in these situations.

However, one case where Varnish is ideal is with blogs, where most people that visit your website will not be logged in and will be reading (mostly) static content.

Getting started with Varnish

So, how do you set this up? Very easily – you start by installing Varnish!

$ sudo dnf install varnish

Next up, you need to make sure it is used. For this article, I’m going to assume you are running the website on the same server as your Varnish setup and you are using Apache as the backend server.

You now have two ways to use Varnish:

  • Allow it to handle requests directly on port 80 — which proxies to Apache on port 8080.
  • Set up two sites with Apache: One that listens on port 80 and proxies to Varnish on port 8080, which proxies to Apache on some other port.

I’m going to focus on the first one since that’s slightly more tricky and leave the second one, which has the advantage of enabling SSL, as a challenge to you, the reader (basically you use

ProxyPass http://localhost:6081/

in Apache, and add a second virtual host listening on port 8080 to Apache where you serve the actual application).

Configuring Apache for Varnish

  1. To set up Apache, go to
    /etc/httpd/conf/

    , and open up

    httpd.conf

    .

  2. Search for the line saying ”
    Listen 80

    “, and change that to ”

    Listen 8080

    “.

    1. Note: 8080 could be any random port in essence, but I’m picking 8080 because that’s marked by the SELinux policy as “secondary httpd port”, which makes sure that Varnish is allowed to connect and Apache is allowed to listen. If you pick another port, you’ll need a custom policy to allow those two steps.
  3. Next up, we set up Varnish. Go to
    /etc/varnish/

    and open

    varnish.params

    .

  4. Search for
    VARNISH_LISTEN_PORT=6081

    , and change that to

    VARNISH_LISTEN_PORT=80

    .

  5. Now, you will want to enable Varnish (
    systemctl enable varnish.service

    ), to make sure it starts on boot of the system.

  6. To take both port changes into effect without any noticeable downtime, you can restart both services at the same time.
    1. systemctl restart httpd.service varnish.service

      (you will want to make sure httpd is the first of the two to make sure varnish doesn’t crash on start because Apache is still listening on port 80)

Now, at this point you have set up Varnish with a default setup, which actually works quite well!

Testing Varnish

You can verify that it works by doing a

curl -i http://yoursite.com

: you should see an X-Varnish header with one number behind it. This is the Varnish request ID, which can be used for debugging.

If you perform the same request a second time (hit arrow key up and enter), you should see that the X-Varnish header now has two numbers. The first one is the request ID of the current request, and the second one is most likely the same number as the first request. That’s the number of the request that Varnish has cached and is now responding for with the cached result.

If you see two numbers on the second request, that means Varnish cached the request and Apache was not hit for this request. Was this easy or not?

At this moment, you should have an actually working Varnish instance that will cache all unauthenticated requests, and speed up your static website considerably!

Customizing Varnish

You might want to customize some parts of the Varnish handling flow. You do that by opening

/etc/varnish/default.vcl

, which is actually a program written in Varnish Configuration Language. This has a few subroutines that get fired during different parts of the request handling.

Ignoring arguments and cookies

One of the main parts you may want to handle here is making sure that for some pages query arguments or cookies are ignored to determine if a page should be served from cache (for example because they’re static content and don’t depend on either but your framework still sends it).

For example, if you’re running WordPress, it’s fine to always cache

/wp-includes/

, since that’s just static JavaScript and CSS files. To tell Varnish about this, add the following in sub vcl_recv:

if (req.url ~ "^/wp-includes/") {
 unset req.http.cookie;
 set req.url = regsub(req.url, "\?.*", "");
 }

That will tell Varnish to ignore cookies and query parameters, so even if you request

/wp-includes/page.js?timestamp=1

, it will return the cached entry even if the cache was recorded for

/wp-includes/page.js?timestamp=2

.

(Actually it just removes all cookies and query parameters from the URL before checking if it’s already in the cache. Note that this also means the query argument and cookies are not available to the backend).

Not caching specific pages

Another common thing might be that you have some pages that you never want to get served from the cache. For example, the WordPress cron page is not something you want cached.

For that, we can do in vcl_recv:

if (req.url ~ "^/wp-cron.php") {
 return (pass);
 }

This will tell Varnish to always pass the request on to the backend, and never serve it from cache.

Clearing Varnish cache

Now, the last thing is that you might want your application to tell Varnish to clear its cache. For example, WordPress can tell Varnish that a new article is published, so it should purge its caches and generate a new cache that includes the new article.

Since we don’t want to allow these requests from external sources (since that would enable anyone to clear the cache, defeating the entire purpose), we will first setup an Access Control List (ACL) named “purge”. You can name it anything, but this always makes sense to me.

Put the following before any of the “sub” blocks in the VCL file:

acl purge {
 "localhost";
 "127.0.0.1";
 }

Now to actually use it and allow the localhost to purge the cache, add the following to

vcl_recv

:

if (req.method == "PURGE") {
 if (!client.ip ~ purge) {
 return(synth(405,"Not allowed."));
 }
 return (purge);
 }

After doing that, the localhost machine can send a purge request. Try it with ”

curl -I --request PURGE http://localhost/

. You should get “HTTP/1.1 200 Purged“, and the next request will again be freshly requested from the backend (hint: Varnish HTTP Purge is a WordPress plugin that can perform these PURGE requests automatically upon changes).

Now you have a Varnish caching server that will cache any static requests and requests for static content, and should speed up your site considerably. You can always tweak the Varnish VCL file a lot more, but this should at least get you started with a basic cache server.


Featured image credit to 33Hops.com.

For System Administrators

9 Comments

  1. Honestly, it’s one of the most interesting articles that I have found in the Internet this year.

  2. Jim

    What about using Varnish for secured site?

    • Hi Jim,

      What do you mean here?
      Do you mean secured as in using SSL (which is the ProxyPass hint I mentioned in the article), or for logged in pages?

      • Jim

        Yeah, I was referring to SSL. But ah, crap, I missed that part of the article. lol Ok thanks for pointing that out to me.

        • For SSL, please consider hitch, https://hitch-tls.org/ . It is a lightweight TLS proxy well suited for Varnish as, it is developed by Varnish Software for exactly this purpose.

          It is also packaged for Fedora and EPEL

          # sudo dnf install hitch

          Basic setup is very simple. There is a guide at http://bit.ly/1dA3TpU , but it boils down to selecting ports and adding a pem-file with a certificate.

          br,
          Ingvar (varnish and hitch package maintainer)

  3. Han

    i have used varnish in the past and the only that i have to say is that every minute i spent to learn and configure it , i took it back as performance boost .

    with memcached for sql queries , varnish skyrocket boost a drupal site

    • arthur

      I, too, use it with Drupal. For MySQL, I use the database’s inbuilt query cache mechanism

  4. arthur

    There is one setting that you missed, that of the backend definition.

    You see, you have set Varnish to listen on port 80, but it also needs to know what port Apache is running from, in order to fetch content from there. I have set mine in the file default.vcl as follows:

    Default backend definition. Set this to point to your content

    server.

    backend default {
    .host = “127.0.0.1”;
    .port = “8008”;
    .connect_timeout = 60s;
    .first_byte_timeout = 300s;
    .between_bytes_timeout = 10s;
    }

    Something else worth noting is that if you need to look at Apache logs, you will find that all traffic originates from 127.0.0.1 (localhost). This is not necessarily the case but it seems so because Varnish (running on the localhost) is redirecting traffic to Apache.

    For a more accurate log IP, we use the X-Forwarded-For header. I have set mine also in default.vcl by adding this:

    sub vcl_recv
    {
    set req.http.X-Forwarded-For = client.ip;
    }

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions