Foaf Slurper

Being newly back in semweb land got me thinking about what all the existing mounds of foaf data could be used for. There's lot of it out there but not used much as far as I can tell. I tried to make a foaf importer library eons sgo, before there were decent query laguages and before google social graph and qdos and Sindice and so on. The difference now is that companies often use the nasty 'give-me-your-password' antipattern to fnd you friends within a social networking site. This is horrible (encouraging acceptance of phishing) and also slows down the user's entrance into the site, maybe losing some of them. In many cases you can simply buy a library to do this for you - so why not make a library that does that with freely available foaf with no password involved?

So: usecases: let's pretend I have a social networking site and I want to:

increase use of my site by inviting a 'captured' user's friends onto the site
get the user interested and using the site - and get her to get others interested too - by matching up her friends with existing users of the site.
fill in some of the user's details with data, to save them typing it all inI can think of some other usecases for a library like this one but these are pretty straightforward, and I thought it was worth looking into the possibilities.

There seem to be two technical options.

The first, tempting, one, is to use the idea of Foaf as a machine-readable homepage. We have the user specify their foaf file, we get the info from that, and, boom. Dopplr have in fact done this but with XFN.

Two issues here. The first is that the user may have more than one Foaf file. This could work for us or against us. The user may prefer to be specific about which group of friends they add in. Or, more than one may simply make the number of friends and invites we get smaller, and reduce the potential happiness of our user too ("where's Bob?").

The problem with the Foaf file approach is that it screams geek. This may be appropriate depending on what my site does (for a semweb geek dating site it might work excellently), or I might think it's worth having a special geek option for the interest I might get - but it's not suitable for my average user. I could ask the user if they are in Livejournal, FriendFeed or what have you and get it via that - but, it's an extra step for the user, which I want to avoid.

Second option: use an aggregator. One of the big things that's changed in the last three years is the rise of the big RDF aggregators, sunch as Swoogle, Sindice, Falcon and more specificly Foafy ones like Qdos and Google Social Graph. I chose Google Social Graph as a starting point specifially because it harvests and processes Foaf files (and XFN too), i.e. it's specific to people-related data.

The API is pretty straightforward. There are nodes and edges. URIs representing people are nodes, edges are relationship types. The restful, json-returning API allows you to get nodes claimed by other nodes (i.e. versions of 'me') plus the attributes of a user: rss, atom, name, photo, foaf file, url, profile, plus your contacts and the type of your relationship with them if you care about that sort of thing. You can ask for a number of nodes at a time. You can do lookups on urls (homepage, weblog) as well as mboxsha1sum and email.

Here's an example query:

http://socialgraph.apis.google.com/lookup?pretty=1&edo=1&q=danbri.org&fme=1

The Social Graph basically does the trick. If I can get the user to give me some piece of information about them, such as an email that they have used elsewhere or their homepage or weblog, then I can - without them doing anything else at all:

suggest some friends on my site that they aleady know
ask if they want to invite some of their other friends to my site
fill in their name, photo, homepage and so on automatically

The main disadvantage I can see here is simply that people might be a bit freaked out by it. Sometimes knowing how much the web knows about you can be a little startling - and semantic web data could produce a whole new level of being startled by the specific and repurposable nature of the data. The long-dead foaf aggregation pioneer Plink closed because people kept getting angry when they found their data in it. Not everyone realises how many sites produce machine-processible versions of their data. I think I'd have to see it in action to see whether this was a genuine problem. Tom Morris pointed me at Huffduffer, which rather neatly grabbed my image for me - but which doesn't seem to use friends.

In the end I tried both ideas. I wanted to get it to a state such that a developer could take an identifier from the user, get a list of simple objects back, and then pass them to her own API to determine if they were present in the site's database.

For the Foaf file reader, I used a couple of SPARQL queries plus Jena, so it would be trivial to repurpose in a different language.

sh foafuser.sh --details http://swordfish.rdfweb.org/people/libby/rdfweb/webwho.xrdf

(foafuser.sh is repurposed from a Jena example, to point to the Jena files in the classpaths)

For the Social Graph I used the Jackson Json library to make the queries to the server and parse them. I do several queries to the social graph to get the names of friends, and I do this in batches.

java -classpath .:jackson-asl-0.9.3.jar SGUser --details http://twitter.com/libbymiller

The code is available (Social graph, Foaf file) README under a BSD license - or you're welcome to copy the idea and improve it. It's a small amount of code - the hard work is done by the Jena and Social Graph APIs respectively.

Notes

I spent quite a long time working out what was going on in this area, while doing this. There's a bunch of interest still in OAuth, a way for applications to be given permission to access certain aspects of another application (like permission to write to a Flickr account), though promised implementations don't seem to have materialised yet (Twitter seem to have got rid of their experimental one). There was an OpenID / OAuth summit a few weeks ago.

I always get the Google Social Graph, the aggregation of Foaf and XFN data mixed up with Open Social, a kind way of making Facebook-style apps. There's also DataPortability an organisation for promoting ways to move your data between sites, and the Portable Contacts which is a recently developed format and protocolto enable people to move data between sites, again using existing or upcoming standards. OpenSocial is being adapted to integrate Portable Contacts.

The semweb databases I ran into were Sindice, Qdos, Falcons, and Swoogle. I think that several of these can probably perform Foaf-specific queries - and Sindice has some microformats support too, but I've not had time to look at them properly yet.

Other related links I found in the past week include: the fuss about Twitterank and giving out your password, Matt Biddulph on Dopplr and Social network subscription, who references Drew McLellan's Don't Import, Subscribe. Oh and how about hosting contacts in the DNS (and making friends too) with .tel domains and Telnic.