What Designers of a Protocol can learn from OAI-PMH

July 12, 2009 at 5:35 pm | Posted in cgi, programming | 1 Comment

I spent the last couple of days writing an OAI-PMH data provider (two, actually). And to give a positive twist to the things that bothered me about OAI-PMH, I’ll list them as “things not to do in a protocol”:

  • additional arguments are not ignored but are to be handled as error.

    This leads to a lot of additional code just to make sure a data provider is compliant. But for a different OAI-PMH data provider I wrote before it was even worse: I had one data provider serving different repositories, but I couldn’t simply add an additional URL parameter to the base URL to distinguish these.

  • HTTP is just used as transport layer.

    An error response in OAI-PMH is not an HTTP reponse with a status != 200, but rather an XML error message delivered with HTTP 200. This means more programming work on both server and client side.

  • noRecordsMatch error message instead of delivering an empty list.

    Again this means more work on the data provider side – but also for most harvesters i assume. Additionally, since this error can only be detected after running some logic on the server, it is hard to just implement one check_args routine, which is called before any other action and then only handle the success case.

  • The resumption token.

    The flow control mechanism of OAI-PMH is way too specific. It’s completely geared towards stateful data providers. Imagine your resources live in an SQL database; what you’d do to retrieve them in batches is using simple LIMIT and OFFSET settings. But you can’t just put the offset in the resumption token, becaue it is an exclusive argument, i.e. with the next request from the client, you will receive only the resumption token, but none of the other arguments supplied before. So what the typical CGI programmer ends up with is embedding all other arguments in the resumption token, and upon the next request, parsing the resumption token to reconsruct the complete request.

So if you are about to design a protocol on top of HTTP learn from the mistakes of OAI-PMH.


The Power of CGI

September 19, 2007 at 7:50 am | Posted in cgi, http, python | 4 Comments

Today, with all these web application frameworks around, CGI has almost become obsolete. At least in my toolbox, it’s slid to the bottom. And whenever I stumble upon it, I have to look up the spec.

Recently a colleague of mine had the following problem: He wanted to hand out cookies to passers-by, i.e. redirect requests but making sure, the user agents have a cookie when requesting the specified location.

First idea: A job for Apache’s mod_rewrite. But there’s no way to add a Set-Cookie header with mod_rewrite alone. So mod_headers should do. But the mod_headers directives are not evaluated because mod_rewrite has already returned the redirect response.

So CGI to the rescue. But how do you set a particular response status or trigger a redirect via CGI? Tha’s what the spec says:

Parsed headers

The output of scripts begins with a small header. This header consists of text lines, in the same format as an HTTP header, terminated by a blank line (a line with only a linefeed or CR/LF). Any headers which are not server directives are sent directly back to the client. Currently, this specification defines three server directives:

  • Content-type This is the MIME type of the document you are returning.
  • Location This is used to specify to the server that you are returning a reference to a document rather than an actual document.If the argument to this is a URL, the server will issue a redirect to the client.If the argument to this is a virtual path, the server will retrieve the document specified as if the client had requested that document originally. ? directives will work in here, but # directives must be redirected back to the client.
  • Status This is used to give the server an HTTP/1.0 status line to send to the client. The format is nnn xxxxx, where nnn is the 3-digit status code, and xxxxx is the reason string, such as “Forbidden”.

Voila. Something like this does the trick:

print "Status: 302 Found"
print "Location: /"
print "Set-Cookie: key=value; path=/; expires=Wednesday, 09-Nov-07 23:12:40"

Blog at WordPress.com.
Entries and comments feeds.