LinkedData with TurboGears and squid

December 3, 2008 at 6:33 pm | Posted in http, programming, python | 1 Comment
Tags:

At work we run several TurboGears web applications, deployed behind Apache acting as proxy. I like this setup a lot, because it allows us to reuse our log file analysis infrastructure for the web applications as well.

Some of the web applications serve data that rarely changes, so to lower the traffic for TurboGears, I decided to use a transparent cache proxy. Since logging is already taken care of by Apache, I don’t care about not all requests hitting the application server.

We settled on putting a squid cache between Apache and TurboGears, which worked well after some fiddling.

Recently a new requirement came up: Serving Linked Data with these web applications. This is actually pretty easy to do with TurboGears. Creating RDF+XML with the templating engines works well, and even the content negotiation mechanism recommended for serving the data is well supported. To have TurboGears serve the RDF+XML representation for a resource just decorate the corresponding controller as follows:

@expose(as_format="rdf",
format="xml",
template="...",
content_type="application/rdf+xml",
accept_format="application/rdf+xml")

TurboGears will pick the appropriate template based on the Accept header sent by the client.

Unfortunately this setup – different pages served for the same URL – doesn’t work well with our squid cache. But the Vary HTTP header comes to our rescue. To tell squid that certain HTTP request headers have to be taken into account when caching a response, send back a Vary header to inform squid about this; thus, squid will use a key combined from URL and the significant headers for the cached page.

Now the only header important for our content negotiation requirement is the Accept header, so putting the following line in the controller does the trick:

response.header['Vary'] = 'accept'

The Power of CGI

September 19, 2007 at 7:50 am | Posted in cgi, http, python | 4 Comments

Today, with all these web application frameworks around, CGI has almost become obsolete. At least in my toolbox, it’s slid to the bottom. And whenever I stumble upon it, I have to look up the spec.

Recently a colleague of mine had the following problem: He wanted to hand out cookies to passers-by, i.e. redirect requests but making sure, the user agents have a cookie when requesting the specified location.

First idea: A job for Apache’s mod_rewrite. But there’s no way to add a Set-Cookie header with mod_rewrite alone. So mod_headers should do. But the mod_headers directives are not evaluated because mod_rewrite has already returned the redirect response.

So CGI to the rescue. But how do you set a particular response status or trigger a redirect via CGI? Tha’s what the spec says:

Parsed headers

The output of scripts begins with a small header. This header consists of text lines, in the same format as an HTTP header, terminated by a blank line (a line with only a linefeed or CR/LF). Any headers which are not server directives are sent directly back to the client. Currently, this specification defines three server directives:

  • Content-type This is the MIME type of the document you are returning.
  • Location This is used to specify to the server that you are returning a reference to a document rather than an actual document.If the argument to this is a URL, the server will issue a redirect to the client.If the argument to this is a virtual path, the server will retrieve the document specified as if the client had requested that document originally. ? directives will work in here, but # directives must be redirected back to the client.
  • Status This is used to give the server an HTTP/1.0 status line to send to the client. The format is nnn xxxxx, where nnn is the 3-digit status code, and xxxxx is the reason string, such as “Forbidden”.

Voila. Something like this does the trick:

print "Status: 302 Found"
print "Location: /"
print "Set-Cookie: key=value; path=/; expires=Wednesday, 09-Nov-07 23:12:40"
print

python saxparser woes

September 5, 2007 at 6:34 am | Posted in python | Leave a comment

To make a long story short: When you’re trying to use python’s xml.sax saxparser for namespace aware xml processing, you may be in for trouble.

If you run the following little script

import xml.sax, StringIO

class Parser(xml.sax.handler.ContentHandler):
    def startElementNS(self, name, qname, attrs):
        print name, qname, attrs
    def endElementNS(self, name, qname):
        print name, qname

p = xml.sax.make_parser(["drv_libxml2"])
p.setFeature(xml.sax.handler.feature_namespaces, 1)
p.setContentHandler(Parser())
s = xml.sax.xmlreader.InputSource()
s.setByteStream(StringIO.StringIO("""<?xml version='1.0' encoding='utf-8'?>
<prefix:element xmlns:prefix="http://www.python.org/sax_error"/>
"""))
p.parse(s)

you should see something like


@:/tmp$ python test.py
(u'http://www.python.org/sax_error', u'element') prefix:element
(u'http://www.python.org/sax_error', u'element') prefix:element

What you may see though, is


~ > python test.py
(u'http://www.python.org/sax_error', u'element') prefix:element
(u'http://www.python.org/sax_error', u'element') None

In the latter case, endElementNS does not get a proper qualified name and code relying on this (e.g. feedparser) will not work as expected (e.g. not find elements in an RSS 2.0 feed).

It turns out that setting


p = xml.sax.make_parser(["drv_libxml2"])
p.setFeature(xml.sax.handler.feature_namespaces, 1)

(i.e. specifying drv_libxml2 as preferred driver) does not ensure you get a namespace aware parser. Instead, if python bindings for libxml2 are not installed, xml.sax will silently fall back to the default, which – as exhibited above – does not what you want.

This behaviour is in my opinion totally inappropriate – and makes problems really hard to debug. My colleague and I actually started doubting the universality of the universal feedparser. It’s also not in line with “explicit” is better than “implicit”.

But then again, maybe it’s just one of the things you need to know.

Create a free website or blog at WordPress.com.
Entries and comments feeds.