I read the Saturday Guardian every week, and quite often buy a bunch of books reviewed in it. But equally, I don’t buy quite a lot of them as they’re only available in expensive and bulky hardback (plus I resent being market segmented like that, sorry). The Guardian’s reviews are very good but they only really review hardbacks in any depth or breadth, so it’s hit and miss whether I actually get to read any of them by the time they get to paperback. I just forget. I bet a lot of people do this.
Anyway, a couple of months ago I realised there was a Guardian content API as well as a data API. I applied for a developer key and, to my surprise, got one (the docs said they were giving out very few). This weekend I finally got around to having a play with it. It’s pretty neat. I’ve not explored it very thoroughly – I’m sure people can think of much more profound applications to make – but for book reviews there is lots of interesting data, and it’s available in JSON and XML.
My initial plan was to programmatically create an Amazon list – but this isn’t possible using the Amazon ECS API. However it is possible to search (on books, title, and authors) and get XML back, including a link to the Amazon page that describes it. I made a very simple page that does a request for book reviews with the appropriate date, and then for each result returned, identify the author and title and do an Amazon lookup to get the URL (I just pick the first one returned – I’m feeling lucky). It’s not as covenient as I’d hoped, but it does make it that tiny bit easier to
- Buy things from the list straight away
- Put things that are only available in hardback into my wishlist so I don’t forget about them
There are a couple of issues:
- The title and author aren’t available as separate fields in the Guardian API. Usually the linktext is very formulaic and the information can be parsed out of that, but sometimes there are non-standard items and these fail
- Characters with accents are returned as HTML entities so those need to be swapped back to characters in order to do the Amazon search
- There’s no data about whether the book is in paperback or not, annoyingly. Amazon seems to mostly return the paperback version first if available, but maybe this is just good luck, and it probably needs more thought
The result isn’t too bad though and maybe I’ll buy a few more books. The Ruby code is here – you’ll need your own API keys for the Guardian and for Amazon though (they are both free and you can just get an Amazon one if you have an account with them)
If you get really keen, how would you feel about mashing it up with freebase / writing back to freebase?
For instance,
Finding Moonshine by Marcus du Sautoy
is authored by:
http://www.freebase.com/view/en/marcus_du_sautoy
… and you can add in all of the facts about him.
mqlread and mqlwrite endpoints, which speak JSON.
Similar stuff is being done with movies, for instance:
http://blog.freebase.com/2009/06/25/freebase-data-now-on-wsj-com/
*there are the mqlread / mqlwrite endpoints for doing so, which speak JSON.
This mashup of the Guardian and Amazon API is really rather neat. I’m going to give it a go myself in PHP.
Thanks – it’s not as simple now as Amazon changed the authentication mechanism and I haven’t made the time to update the script yet.