Varnish

Description

Varnish is a caching front-end server. This document has notes on how to use Varnish with Plone.

Introduction

This chapter contains information about using the Varnish caching proxy with Plone.

To use Varnish with Plone

  • Learn how to install and configure Varnish
  • Add Plone virtual hosting rule to the default varnish configuration

Installation

  • You can install using packages (RPM/DEB) - consult your operating system instructions. This is the recommended method.
  • You can install backports
  • You can install using buildout

Backporting examples (Ubuntu 8.04 Hardy Heron)

Buildout examples

Management console

varnishadm

You can access Varnish admin console on your server by:

# Your system uses a secret handshake file
varnishadm -T localhost:6082 -S /etc/varnish/secret

(Ubuntu/Debian installation)

Telnet console

The telent management console is available on some configurations where varnishadm cannot be used. The functionality is the same.

Example:

ssh yourhost
# Your system does not have a secret handshake file
telnet localhost 6082

Note

Port number depends on your Varnish settings.

Quit console

Quit command:

quit

Purging the cache

This will remove all entries from the Varnish cache:

url.purge .*

Loading new VCL for the live varnish daemon

More often than not, it is beneficial to load new configuration without bringing the cache down for maintenance. Using this method also checks the new VCL for syntax errors before activating it. Logging in to Varnish CLI requires the varnishadm tool, the address of the management interface, and the secret file for authentication.

See the varnishadm man-page for details.

Opening a new CLI connection to the Varnish console, in a buildout-based Varnish installation:

parts/varnish-build/bin/varnishadm -T localhost:8088

Port 8088 is defined in buildout.cfg:

[varnish-instance]
telnet = localhost:8088

Opening a new CLI connection to the Varnish console, in a system-wide Varnish installation on Ubuntu/Debian:

varnishadm -T localhost:6082 -S /etc/varnish/secret

You can dynamically load and parse a new VCL config file to memory:

vcl.load <name> <file>

For example:

vcl.load newconf_1 /etc/varnish/newconf.vcl

vcl.load will load and compile the new configuration. Compilation will fail and report on syntax errors. Now that the new configuration has been loaded, it can be activated with:

vcl.use newconf_1

Note

Varnish remembers <name> in vcl.load, so every time you need to reload your config you need to invent a new name for vcl.load / vcl.use command pair.

Logs

To see a real-time log dump (in a system-wide Varnish configuration):

varnishlog

By default, Varnish does not log to any file and keeps the log only in memory. If you want to extract Apache-like logs from varnish, you need to use the varnishncsa utility.

Stats

Check live "top-like" Varnish statistics:

parts/varnish-build/bin/varnishstat

Use the admin console to print stats for you:

stats
200 2114

       95717  Client connections accepted
      132889  Client requests received
       38638  Cache hits
       21261  Cache hits for pass
      ...

Varnish buildout restart snippet

The following snippet will restart a varnishd instance which has been started from the plone.recipe.varnish buildout directly invoking bin/varnish-instance command.

It will also create an Apache-compatible log file which you can examine using text editing tools by running the varnishncsa command which will read log data from the Varnish memory-mapped file, and write it to a text file in Apache format.

Example:

#!/bin/sh
# Varnish restart script
sudo killall varnishd
sudo bin/varnish-instance
# Create Apache compatible log file
sudo kill `cat var/varnishncsa.pid`
sudo parts/varnish-build/bin/varnishncsa -D -d -a -w var/log/varnish.log -P var/varnishncsa.pid

Virtual hosting proxy rule

When Varnish has been set-up you need to include Plone virtual hosting rule in its configuration file.

If you want to map Varnish backend directly to Plone-as-a-virtualhost (i.e. Zope's VirtualHostMonster is used to map site name to Plone site instance id) use req.url mutating.

The following maps the Plone site id plonecommunity to the plonecommunity.mobi domain. Plone is a single Zope instance, running on port 9999.

Example:

backend plonecommunity {
        .host = "127.0.0.1";
        .port = "9999";
}

sub vcl_recv {
        if (req.http.host ~ "^(www.)?plonecommunity.mobi(:[0-9]+)?$"
            || req.http.host ~ "^plonecommunity.mfabrik.com(:[0-9]+)?$") {

                set req.backend = plonecommunity
                set req.url = "/VirtualHostBase/http/" req.http.host ":80/plonecommunity/VirtualHostRoot" req.url;
                set req.backend = plonecommunity;
        }
}

Varnishd port and IP address to listen

You give IP address(s) and ports to Varnish to listen to on the varnishd command line using -a switch. Edit /etc/default/varnish:

DAEMON_OPTS="-a 192.168.1.1:80 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"

Sanitizing cookies

Any cookie set on the server side (session cookie) or on the client-side (e.g. Google Analytics Javascript cookies) is poison for caching the anonymous visitor content.

HTTP caching needs to deal with both HTTP request and response cookie handling

  • HTTP request Cookie header. The browser sending HTTP request with Cookie header confuses Varnish cache look-up. This header can be set by Javascript also, not just by the server. Cookie can be preprocessed in vcl_recv.
  • HTTP response Set-Cookie header. This is server-side cookie set. If your server is setting cookies Varnish does not cache these responses by default. Howerver, this might be desirable behavior if e.g. multi-lingual content is served from one URL with language cookies. Set-Cookie can be post-processed in vcl_fetch.

Example how remove all Plone related cookies besides ones dealing with the logged in users (content authors):

sub vcl_recv {

  if (req.http.Cookie) {
      # (logged in user, status message - NO session storage or language cookie)
      set req.http.Cookie = ";" req.http.Cookie;
      set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
      set req.http.Cookie = regsuball(req.http.Cookie, ";(statusmessages|__ac|_ZopeId|__cp)=", "; \1=");
      set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
      set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

      if (req.http.Cookie == "") {
          remove req.http.Cookie;
      }
  }
  ...

# Let's not remove Set-Cookie header in VCL fetch
sub vcl_fetch {

    # Here we could unset cookies explicitly,
    # but we assume plone.app.caching extension does it jobs
    # and no extra cookies fall through for HTTP responses we'd like to cache
    # (like images)

    if (!beresp.cacheable) {
        return (pass);
    }
    if (beresp.http.Set-Cookie) {
        return (pass);
    }
    set beresp.prefetch =  -30s;
    return (deliver);
}

The snippet for stripping out non-Plone cookies comes from http://www.phase2technology.com/node/1218/

That article notes that "this processing occurs only between Varnish and the backend [...]; the client, typically a user’s browser, still has all the cookies. Nothing is happening to the client’s original request." While it's true that the browser still has the cookies, they never reach the backend and are therefor ignored.

Another example how to purge Google cookies only and allow other cookies by default:

sub vcl_recv {


         # Remove Google Analytics cookies - will prevent caching of anon content
         # when using GA Javascript. Also you will lose the information of
         # time spend on the site etc..
         if (req.http.cookie) {
            set req.http.Cookie = regsuball(req.http.Cookie, "__utm.=[^;]+(; )?", "");
            if (req.http.cookie ~ "^ *$") {
                remove req.http.cookie;
            }
          }

          ....

Do not cache error pages

You can make sure that Varnish does not accidentally cache error pages. E.g. it would cache front page when the site is down:

sub vcl_fetch {
        if ( beresp.status >= 500 ) {
                set beresp.ttl = 0s;
                set beresp.cacheable = false;
        }
        ...
}

More info

Custom and full cache purges

Below is an example how to create an action to purge the whole Varnish cache.

First you need to allow HTTP PURGE request in default.vcl from localhost. We'll create a special PURGE command which takes URLs to be purged out of the cache in a special header:

acl purge {
        "localhost";
        # XXX: Add your local computer public IP here if you
        # want to test the code against the production server
        # from the development instance
}

...

sub vcl_recv {

        ...

        # Allow PURGE requests clearing everything
        if (req.request == "PURGE") {
                if (!client.ip ~ purge) {
                        error 405 "Not allowed.";
                }
                # Purge for the current host using reg-ex from X-Purge-Regex header
                purge("req.http.host == " req.http.host " && req.url ~ " req.http.X-Purge-Regex);
                error 200 "Purged.";
        }
}

Then let's create a Plone view which will make a request from Plone to Varnish (upstream localhost:80) and issue PURGE command. We do this using Requests Python lib.

Example view code:

import requests

from Products.CMFCore.interfaces import ISiteRoot
from five import grok

from requests.models import Request

class Purge(grok.CodeView):
    """
    Purge upstream cache from all entries.

    This is ideal to hook up for admins e.g. through portal_actions menu.

    You can access it as admin::

        http://site.com/@@purge

    """

    grok.context(ISiteRoot)

    # Onlyl site admins can use this
    grok.require("cmf.ManagePortal")

    def render(self):
        """
        Call the parent cache using Requets Python library and issue PURGE command for all URLs.

        Pipe through the response as is.
        """

        # This is the root URL which will be purged
        # - you might want to have different value here if
        # your site has different URLs for manage and themed versions
        site_url = self.context.portal_url() + "/"

        headers = {
                   # Match all pages
                   "X-Purge-Regex" : ".*"
        }

        resp = requests.request("PURGE", site_url + "*", headers=headers)

        self.request.response["Content-type"] = "text/plain"
        text = []

        text.append("HTTP " + str(resp.status_code))

        # Dump response headers as is to the Plone user,
        # so he/she can diagnose the problem
        for key, value in resp.headers.items():
            text.append(str(key) + ": " + str(value))

        # Add payload message from the server (if any)

        if hasattr(resp, "body"):
                text.append(str(resp.body))

More info

Round robin balancing

Varnish can do round robin load balancing internally. You want to distribute CPU intensive load between several ZEO front end client instances each listeting to its own port.

Example:

# Round-robin between two ZEO front end clients

backend app1 {
.host = "localhost";
.port = "8080";
}

backend app2 {
.host = "localhost";
.port = "8081";
}

director app_director round-robin {
  {
      .backend = app1;
  }
  {
      .backend = app2;
  }
}


sub vcl_recv {

 if (req.http.host ~ "(www\.|www2\.)?app\.fi(:[0-9]+)?$") {
    set req.url = "/VirtualHostBase/http/www.app.fi:80/app/app/VirtualHostRoot" req.url;
    set req.backend = app_director;
  }



Edit this document

The source code of this file is hosted on GitHub. Everyone can update and fix errors in this document with few clicks - no downloads needed.

  1. Go to Varnish on GitHub.
  2. Press Fork and edit this file button.
  3. Edit file contents using GitHub's text editor in your web browserm
  4. Fill in the Commit message text box at the end of the page telling why you did the changes. Press Propose file change button next to it when done.
  5. On Send a pull request page you don't need to fill in text anymore. Just press Send pull request button.
  6. Your changes are now queued for review under project's Pull requests tab on Github.

For basic information about updating this manual and Sphinx format please see Writing and updating the manual guide.