Conan The Librarian

[Damian wrote this]
So Libby and I have been redecorating Dan’s room. Before you run away screaming from this depiction of domestic bliss, please bear with me.
Dan has quite a few books, and we had planned to install enough shelving to hold all of them. But since we were moving so many dead trees why not find out what’s there? The inspiration, in some ways, was Delicious Library. However although Delicious library’s use of the iSight for scanning was cunning, it would have been really slow.

Happily Dan has a barcode scanner, so we could capture the barcode numbers easily. Ultimately armed with ~900 numbers, it was time to look them up…
The best lookup site I found was ISDBdb. As far as I can tell it uses Amazon (US) and Library Of Congress to grab as much useful information as it can, including classifications and descriptions. It includes a simple api, and will deal with 13 digit numbers for you. 13 digit numbers?
Well, when you scan you don’t get the 10 digit ISBN. What you get is a 13 digit European article number. It’s pretty easy to convert (although I have found cases where the EAN and ISBN were unrelated), but I’m happy the ISBNdb will cope with them.
However ISBNdb wasn’t finding everything. We did have some CDs and DVDs in the mix, but many UK books weren’t turning up. I couldn’t change ISBNdb, so tried reproducing it. The result is appalling, frankenstein code, but it mostly works. Here’s roughly what it does:

  1. (If required) convert EAN to ISBN
  2. If that isn’t possible, try, then, which provide EAN lookups, and get the Amazon id (ASIN, which is identical to the ISBN for books). This finds DVDs and CDs.
  3. Try to find the item in the UK, then US, then CA, then DE, then JP stores (and yes, we have a couple of books only in the Japanese store). Grab details, including subjects and editorial review.
  4. Also try a z39.50 lookup in the Library of Congress.
  5. Merge everything, and spit out as N3

I’d like to add the British Library, and COPAC, to the z39.50 lookups, but baby steps.
Here’s an example run (trimmed):

$ echo 9781851683321 | ruby lookup_amazon_z3950.rb
Skipping: 0
Line: 1 Found: '9781851683321' (1851683321) [UK]
a ex:Book ;
rdfs:seeAlso <> ;
dc:identifier """1851683321""" ;
dc:subject """956.9405""" ;
dc:subject """DS119.7""" ;
dc:subject """Arab-Israeli conflict""" ;
dc:subject """Asian / Middle Eastern history: postwar, from c 1945 -""" ;
dc:subject """Asian studies""" ;
dc:subject """Foreign Relations""" ;
dc:subject """Palestine""" ;
dc:subject """World - General""" ;
dct:issued """2003-07-01""" ;
dc:creator """Dan Cohn-Sherbok""" ;
dc:creator """Dawoud Sudqi El Alami""" ;
dc:identifier """1851683321""" ;
dc:description """Of all the intractable and inflammatory world conflicts, ...""" ;
dc:title """The Palestine-Israeli Conflict: A Beginner's Guide""" ;
dc:identifier """9781851683321""" ;
ex:pages """256""" ;
dct:hasFormat """Paperback""" ;
dc:publisher """Oneworld Publications""" ;

This is mostly amazon, but includes Dewey and LOC subjects from LOC. You can feed it more than one number (it just loops over STDIN). I’m not happy with the output, but RDF modeling issues are the easy bit. It will change to indicate LOC vs Dewey, and (hopefully) use skos for the amazon subjects.
The script is lookup_amazon_z3950.rb. You need an Amazon dev token (insert it at the start of the script), Ruby/ZOOM, and REXML (which comes with ruby 1.8). It will also have a go at DVDs and CDs, so have a play.
The final score was 934 numbers, 64 not found. A pretty good total, all in all, but other z39.50 sites are calling me…