[Damian wrote this]
So Libby and I have been redecorating Dan’s room. Before you run away screaming from this depiction of domestic bliss, please bear with me.
Dan has quite a few books, and we had planned to install enough shelving to hold all of them. But since we were moving so many dead trees why not find out what’s there? The inspiration, in some ways, was Delicious Library. However although Delicious library’s use of the iSight for scanning was cunning, it would have been really slow.
Happily Dan has a barcode scanner, so we could capture the barcode numbers easily. Ultimately armed with ~900 numbers, it was time to look them up…
The best lookup site I found was ISDBdb. As far as I can tell it uses Amazon (US) and Library Of Congress to grab as much useful information as it can, including classifications and descriptions. It includes a simple api, and will deal with 13 digit numbers for you. 13 digit numbers?
Well, when you scan you don’t get the 10 digit ISBN. What you get is a 13 digit European article number. It’s pretty easy to convert (although I have found cases where the EAN and ISBN were unrelated), but I’m happy the ISBNdb will cope with them.
However ISBNdb wasn’t finding everything. We did have some CDs and DVDs in the mix, but many UK books weren’t turning up. I couldn’t change ISBNdb, so tried reproducing it. The result is appalling, frankenstein code, but it mostly works. Here’s roughly what it does:
- (If required) convert EAN to ISBN
- If that isn’t possible, try Amazon.de, then Amazon.co.jp, which provide EAN lookups, and get the Amazon id (ASIN, which is identical to the ISBN for books). This finds DVDs and CDs.
- Try to find the item in the UK, then US, then CA, then DE, then JP stores (and yes, we have a couple of books only in the Japanese store). Grab details, including subjects and editorial review.
- Also try a z39.50 lookup in the Library of Congress.
- Merge everything, and spit out as N3
I’d like to add the British Library, and COPAC, to the z39.50 lookups, but baby steps.
Here’s an example run (trimmed):
$ echo 9781851683321 | ruby lookup_amazon_z3950.rb Skipping: 0 Line: 1 Found: '9781851683321' (1851683321) [UK] <urn:isbn:1-851-68332-1> a ex:Book ; rdfs:seeAlso <http://www.amazon.co.uk/exec/obidos/redirect?tag=ws%26l...> ; dc:identifier """1851683321""" ; dc:subject """956.9405""" ; dc:subject """DS119.7""" ; dc:subject """Arab-Israeli conflict""" ; dc:subject """Asian / Middle Eastern history: postwar, from c 1945 -""" ; dc:subject """Asian studies""" ; dc:subject """Foreign Relations""" ; dc:subject """Palestine""" ; dc:subject """World - General""" ; dct:issued """2003-07-01""" ; dc:creator """Dan Cohn-Sherbok""" ; dc:creator """Dawoud Sudqi El Alami""" ; dc:identifier """1851683321""" ; dc:description """Of all the intractable and inflammatory world conflicts, ...""" ; dc:title """The Palestine-Israeli Conflict: A Beginner's Guide""" ; dc:identifier """9781851683321""" ; ex:pages """256""" ; dct:hasFormat """Paperback""" ; dc:publisher """Oneworld Publications""" ; .
This is mostly amazon, but includes Dewey and LOC subjects from LOC. You can feed it more than one number (it just loops over STDIN). I’m not happy with the output, but RDF modeling issues are the easy bit. It will change to indicate LOC vs Dewey, and (hopefully) use skos for the amazon subjects.
The script is lookup_amazon_z3950.rb. You need an Amazon dev token (insert it at the start of the script), Ruby/ZOOM, and REXML (which comes with ruby 1.8). It will also have a go at DVDs and CDs, so have a play.
The final score was 934 numbers, 64 not found. A pretty good total, all in all, but other z39.50 sites are calling me…
See also Mark Gaved and Tom Heath’s work in this area: http://bookshelf.open.ac.uk/
Yeah, we had similar issues of coverage with the Bookshelf Project (http://bookshelf.open.ac.uk/). Our first attempt used http://isbn.org.uk/ to retrieve records. This service acts as a kind of middleman to Z39.50 targets at academic and public libraries round the world. We liked the open, non-commercial aspect, but the coverage wasn’t wide enough so we switched to the amazon.co.uk web services API.
In the tests we’ve done on a range of different peoples’ bookshelves the coverage is pretty good using just the .co.uk catalogue, even for some quite obscure stuff, and it understandably gets more reliable the more recent the publication. We had mixed success with Mark Gaved’s O’Reilly books bought in India, but hey, we can’t have everything.
On the ISBN vs EAN issue, you can probably reprogram the barcode scanner to scan ISBNs if you wanted (if there’s a manual then it’s probably got control barcodes in it that do this for you – it’s geek heaven). However, we decided to scan EAN-13s wherever possible as a future-proofing exercise, and then rely on our script to convert them back to ISBNs before they went off to the amazon api.