Running multiple squid instances on Ubuntu

December 3, 2008 at 7:00 pm | Posted in http | 3 Comments
Tags: ,

As described in earlier posts, our standard web application setup at work is TurboGears behind squid as transparent caching proxy behind Apache. One of the reasons for this setup is that we want fine granular control over the services.

Since we already decided to run each application in its own application server, we want to keep things separate in the squid layer as well. Which brings us to the challenge of running multiple squid instances. It turns out this isn’t too hard to do, reusing most of the standard squid installation on ubuntu.

  1. Create a link to have the squid deamon available under a different name:
    cd /usr/local/sbin
    ln /usr/sbin/squid squid2
  2. Create directories for logs and cache files:
    mkdir /var/log/squid2
    chown proxy:proxy /var/log/squid2
    mkdir /var/spool/squid2
    chown proxy:proxy /var/spool/squid2
  3. Create the configuration in /etc/squid/squid2.conf, specifying pid_filename, cache_dir and access_log in particular.
  4. Create an init script. We started out with the one installed with the package and only had to apply the following changes:
    # diff /etc/init.d/squid2 /etc/init.d/squid
    8,10c8,9
    < NAME=squid2
    < CONF=/etc/squid/$NAME.conf
    < DAEMON=/usr/local/sbin/$NAME
    ---
    > NAME=squid
    > DAEMON=/usr/sbin/squid
    13c12
    < SQUID_ARGS="-D -sYC -f $CONF"
    ---
    > SQUID_ARGS="-D -sYC"
    38c37
    < sq=$CONF
    ---
    > sq=/etc/squid/$NAME.conf
    82c81
    < $DAEMON -z -f $CONF
    ---
    > $DAEMON -z
  5. And install the init script running
    update-rc.d squid2 defaults 99
    .
Advertisements

LinkedData with TurboGears and squid

December 3, 2008 at 6:33 pm | Posted in http, programming, python | 1 Comment
Tags:

At work we run several TurboGears web applications, deployed behind Apache acting as proxy. I like this setup a lot, because it allows us to reuse our log file analysis infrastructure for the web applications as well.

Some of the web applications serve data that rarely changes, so to lower the traffic for TurboGears, I decided to use a transparent cache proxy. Since logging is already taken care of by Apache, I don’t care about not all requests hitting the application server.

We settled on putting a squid cache between Apache and TurboGears, which worked well after some fiddling.

Recently a new requirement came up: Serving Linked Data with these web applications. This is actually pretty easy to do with TurboGears. Creating RDF+XML with the templating engines works well, and even the content negotiation mechanism recommended for serving the data is well supported. To have TurboGears serve the RDF+XML representation for a resource just decorate the corresponding controller as follows:

@expose(as_format="rdf",
format="xml",
template="...",
content_type="application/rdf+xml",
accept_format="application/rdf+xml")

TurboGears will pick the appropriate template based on the Accept header sent by the client.

Unfortunately this setup – different pages served for the same URL – doesn’t work well with our squid cache. But the Vary HTTP header comes to our rescue. To tell squid that certain HTTP request headers have to be taken into account when caching a response, send back a Vary header to inform squid about this; thus, squid will use a key combined from URL and the significant headers for the cached page.

Now the only header important for our content negotiation requirement is the Accept header, so putting the following line in the controller does the trick:

response.header['Vary'] = 'accept'

Blog at WordPress.com.
Entries and comments feeds.