What Designers of a Protocol can learn from OAI-PMH
July 12, 2009 at 5:35 pm | In cgi, programming | 1 CommentI spent the last couple of days writing an OAI-PMH data provider (two, actually). And to give a positive twist to the things that bothered me about OAI-PMH, I’ll list them as “things not to do in a protocol”:
- additional arguments are not ignored but are to be handled as error.
This leads to a lot of additional code just to make sure a data provider is compliant. But for a different OAI-PMH data provider I wrote before it was even worse: I had one data provider serving different repositories, but I couldn’t simply add an additional URL parameter to the base URL to distinguish these.
- HTTP is just used as transport layer.
An error response in OAI-PMH is not an HTTP reponse with a status != 200, but rather an XML error message delivered with HTTP 200. This means more programming work on both server and client side.
noRecordsMatcherror message instead of delivering an empty list.Again this means more work on the data provider side – but also for most harvesters i assume. Additionally, since this error can only be detected after running some logic on the server, it is hard to just implement one
check_argsroutine, which is called before any other action and then only handle the success case.- The resumption token.
The flow control mechanism of OAI-PMH is way too specific. It’s completely geared towards stateful data providers. Imagine your resources live in an SQL database; what you’d do to retrieve them in batches is using simple
LIMITandOFFSETsettings. But you can’t just put the offset in the resumption token, becaue it is an exclusive argument, i.e. with the next request from the client, you will receive only the resumption token, but none of the other arguments supplied before. So what the typical CGI programmer ends up with is embedding all other arguments in the resumption token, and upon the next request, parsing the resumption token to reconsruct the complete request.
So if you are about to design a protocol on top of HTTP learn from the mistakes of OAI-PMH.
Maximum object size for squid
July 3, 2009 at 5:35 pm | In http, squid | Leave a CommentNote to self: the default maximum_object_size setting for squid is an – to me at least – non-intuitively small 4 MB.
Stop Password Masking?
June 28, 2009 at 8:02 pm | In Uncategorized | Leave a CommentI think Jakob Nielsen has this one wrong. Masking password fields may make it harder for the user, but having seen people log in to applications in live demos – i.e. with a whole bunch of bystanders – tells me that there is a place for this technique.
A note on INNER JOIN and sqlite
April 29, 2009 at 12:20 pm | In Uncategorized | Leave a CommentCompare
robert@forkel02:~/projects/lwt/trunk/lwt
14:11:15$ time sqlite3 devdata.sqlite "select distinct l.name from source_word_donor_language as swdl inner join source_word as sw inner join word_source_word as wsw inner join word as w inner join language as l on swdl.source_word_id = sw.id and sw.id = wsw.source_word_id and wsw.word_id = w.id and w.language_name = l.name where swdl.donor_language_id = 873046508145964;"
Imbabura Quechua (Quichua)
Yaqui
Otomi
Tzotzil of Zinacantan
Mapudungun
Q'eqchi'
Kali'na
Wichà [Matacoan]
Berber
Dutch
English
Ceq Wong
Indonesian
Japanese
Romanian
Seychelles Creole
Hawaiian
real 4m34.178s
user 3m59.379s
sys 0m32.766s
with
robert@forkel02:~/projects/lwt/trunk/lwt
14:10:43$ time sqlite3 devdata.sqlite "select distinct l.name from source_word_donor_language as swdl, source_word as sw, word_source_word as wsw, word as w, language as l where swdl.source_word_id = sw.id and sw.id = wsw.source_word_id and wsw.word_id = w.id and w.language_name = l.name and swdl.donor_language_id = 873046508145964;"
Imbabura Quechua (Quichua)
Yaqui
Otomi
Tzotzil of Zinacantan
Mapudungun
Q'eqchi'
Kali'na
Wichà [Matacoan]
Berber
Dutch
English
Ceq Wong
Indonesian
Japanese
Romanian
Seychelles Creole
Hawaiian
real 0m12.060s
user 0m12.017s
sys 0m0.048s
Wikipedia says the above SQL statements are equivalent. But that doesn’t mean an SQL engine will treat them equivalently.
Running multiple squid instances on Ubuntu
December 3, 2008 at 7:00 pm | In http | Leave a CommentTags: squid, ubuntu
As described in earlier posts, our standard web application setup at work is TurboGears behind squid as transparent caching proxy behind Apache. One of the reasons for this setup is that we want fine granular control over the services.
Since we already decided to run each application in its own application server, we want to keep things separate in the squid layer as well. Which brings us to the challenge of running multiple squid instances. It turns out this isn’t too hard to do, reusing most of the standard squid installation on ubuntu.
- Create a link to have the squid deamon available under a different name:
cd /usr/local/sbin
ln /usr/sbin/squid squid2
- Create directories for logs and cache files:
mkdir /var/log/squid2
chown proxy:proxy /var/log/squid2
mkdir /var/spool/squid2
chown proxy:proxy /var/spool/squid2
- Create the configuration in
/etc/squid/squid2.conf, specifyingpid_filename,cache_dirandaccess_login particular. - Create an init script. We started out with the one installed with the package and only had to apply the following changes:
# diff /etc/init.d/squid2 /etc/init.d/squid
8,10c8,9
< NAME=squid2
< CONF=/etc/squid/$NAME.conf
< DAEMON=/usr/local/sbin/$NAME
---
> NAME=squid
> DAEMON=/usr/sbin/squid
13c12
< SQUID_ARGS="-D -sYC -f $CONF"
---
> SQUID_ARGS="-D -sYC"
38c37
< sq=$CONF
---
> sq=/etc/squid/$NAME.conf
82c81
< $DAEMON -z -f $CONF
---
> $DAEMON -z
- And install the init script running
.
update-rc.d squid2 defaults 99
LinkedData with TurboGears and squid
December 3, 2008 at 6:33 pm | In http, programming, python | 1 CommentTags: squid
At work we run several TurboGears web applications, deployed behind Apache acting as proxy. I like this setup a lot, because it allows us to reuse our log file analysis infrastructure for the web applications as well.
Some of the web applications serve data that rarely changes, so to lower the traffic for TurboGears, I decided to use a transparent cache proxy. Since logging is already taken care of by Apache, I don’t care about not all requests hitting the application server.
We settled on putting a squid cache between Apache and TurboGears, which worked well after some fiddling.
Recently a new requirement came up: Serving Linked Data with these web applications. This is actually pretty easy to do with TurboGears. Creating RDF+XML with the templating engines works well, and even the content negotiation mechanism recommended for serving the data is well supported. To have TurboGears serve the RDF+XML representation for a resource just decorate the corresponding controller as follows:
@expose(as_format="rdf",
format="xml",
template="...",
content_type="application/rdf+xml",
accept_format="application/rdf+xml")
TurboGears will pick the appropriate template based on the Accept header sent by the client.
Unfortunately this setup – different pages served for the same URL – doesn’t work well with our squid cache. But the Vary HTTP header comes to our rescue. To tell squid that certain HTTP request headers have to be taken into account when caching a response, send back a Vary header to inform squid about this; thus, squid will use a key combined from URL and the significant headers for the cached page.
Now the only header important for our content negotiation requirement is the Accept header, so putting the following line in the controller does the trick:
response.header['Vary'] = 'accept'
Claim your online persona
November 11, 2008 at 9:43 am | In Uncategorized | Leave a CommentAndy powell has an interesting hands-on post on the topic of claiming your online persona:
define:digital identity. What I especially like: he even gives recommendations!
wlan with WPA on ThinkPad T61p with Ubuntu 7.10
November 11, 2008 at 8:24 am | In Uncategorized | Leave a CommentFor some reason, i never got wlan with WPA to work right seamlessly on my ThinkPad T61p with ubuntu 7.10. So here’s a short writeup of what it takes.
- Stop the network
- Put the WPA network information in /etc/wpa_supplicant.conf (see
man wpa_supplicant.conffor examples) - Start the wpa supplicant running
wpa_supplicant -Dwext -iwlan0 -c/etc/wpa_supplicant.conf - Start the DHCP client running
dhclient wlan0
Creating portable picture galleries from flickr photos
November 8, 2008 at 4:09 pm | In Uncategorized | Leave a CommentHaving been a flickr user for a couple of years now, my infrastructure for creating backups of the data stored at flickr grew more and more sophisticated. So finally I decided to polish it so that it can be used by others as well and put on google code.
XmlHttpRequest
September 11, 2008 at 8:43 pm | In programming | 1 CommentTags: bears_repeating
Last week i found out why it was a piece of cake for JSON to become the X in Ajax. XmlHttpRequest with XML is just a pain. Here’s why.
I wanted to pull data from a feed into a page. The feed is served as application/rss+xml. After a successful XmlHttpRequest, i wanted to retrieve the items form the request object’s responseXML property.
Unfortunately, this didn’t work on IE 6/7. Turns out IE does only parse the response text when the response is served as text/xml. Ok, so we just grab responseText and parse it ourselves. Problem solved.
Hm. It doesn’t work in Konqueror either. Fortunately the trouble with IE led me into the right direction. While Konqueror does parse the response for mime-type application/xml my feed still would end up unparsed.
Of course there’s no cross-browser way of parsing XML. So what i ended up with was:
try {
var items = req.responseXML.documentElement.getElementsByTagName('item');
} catch(e) {
try {
// for IE we have to do the parsing ourselves, because the feed isn't delivered as text/xml ...
var doc = new ActiveXObject("Microsoft.XMLDOM");
doc.loadXML(req.responseText);
var items = doc.documentElement.getElementsByTagName('item');
} catch(e) {
try {
// ... same for Konqueror
var p = new DOMParser();
var doc = p.parseFromString(req.responseText, "text/xml");
var items = doc.documentElement.getElementsByTagName('item');
} catch(e) {
// well, at least we'll get the title later
var items = [];
}
}
}
And don’t get me started on handling namespaced XML …
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.


