Partly in response to some concerns about validation and RDF, I’ve built a little tool (‘Rosco’ – RDF mumble Schema Checker mumble) to check an RDF document against a schema. It’s pretty simple – all it does is find all the classes in the schema, all the properties of each class (including superclasses), and then see if the RDF document specified says something sensible with respect to those classes and properties. Here’s the result for my foaf file (with some deliberate ‘mistakes’).
It also finds any other properties in that document for each class. For FOAF files specifically it also suggests a few improvements, e.g. use an inverseFunctionalProperty if there isn’t one there; use foaf:maker, rdfs:seeAlso where appropriate. It should work for any RDF schema though.
There’s also a little tool for looking at a schema linked from the same page. That just tells you what the domains of classes are in an RDF schema file (not ranges yet). It’s just a useful view on an RDF schema file, which can let you know if you’ve forgotten to put a domain in etc – maybe it can help with consistency checking. The foaf schema for example doesn’t have specific domains for title, nick etc (although of course rdf:Resource is implied).
Rosco is polite (I hope) and non-judgemental (Dan wanted to name it after Viz’s liberal parents). Dan’s Missing isn’t broken: data validation and freedom on the Semantic Web rdfweb weblog entry is highly relevant here.
Rosco is also a generic tool – I tried it with the RDF calendar schema and it works – although the schema itself doesn’t have any domains more specific than rdf:Resource.
So something along the lines of schemarama can do a more interesting sort of validation – you can specify that when you find a pacticular thing (identified by a query, so that could be class or something more complex such as by the presence of a series of properties), then you want it to match a certain pattern (also by a query). So this would include any namespaces you chose, and wouldn’t be restricted to ones from a particular schema. However, I think advances in this direction probably need queries that handle optional clauses which my version of squish doesn’t do yet. While waiting for shellac’s Super Query Engine 57 in Java, Rosco’s version of RDF document schema checking is useful for debugging purposes (why isn’t this file working in my viewer?) and for lightweight checking of RDF documents generally (especially as the schema is improved and updated).
As for query, I liked darobin’s metaphor: <darobin> “I see the interest in the approach of having a query that’s a graph with missing bits, so you dip in in the graph you’re querying and the interesting bits stick to it”.
Rosco of course was the bumbling sheriff from the Dukes of Hazzard…
The whole thing is running off ‘Tinkling’, my tiny java RDF api, which is now in CVS (finally) but needs a whole lot of polishing…
Sean Palmer’s come up with a nice interface to my FOAF/codepiction database, in the process finding a bunch of bugs, mostly now fixed. I’m very interested in the notion of these ‘gettable web services’ for RDF data, as it gets around some of the issues with size and hosting…not everyone has the inclination to have their own crawled FOAF or RDF databases, but some do, and finding out what’s out there in those databases is very handy for creating more data, and actually finding things out (like Sean’s weblog location for example).
I was inspired to write up the image annotation tool a bit more, as it uses these sorts of services. It strikes me that a simple interface to an addressbook or other local database would be simple enough to do, and would be a great resource for cataloguing pictures. You wouldn’t have to make the information gettable via a query language, either – most of the services used by the photo demo are searched via a keyword appended to a url returning RDF/XML, which leaves the intermediate steps pretty undefined. Danbri’s wordnet in RDF/XML service is an excellent example of this sort of service, and very useful for saying, this is a cat, this is a beer, using a vocabulary that others can also use.
So a webservice-like interface to RDF data sources not only makes it easier to find out what existing information is out there about a person, say, but also in a way provides a vocabulary for talking about this person – a sort of TAP for the non-famous. I’ve found that when cataloging, you find a bunch of existing data about clearly (to the human eye) the same person, but as far as the machine is concerned, about different people (different mbox_sha1sum, generated from a different email address).
In some cases this distinction might be deliberate; in others, it’s just because people use different email addresses or versions of one email address. In this latter case it’s nice to be able to see what has already been catalogued under each identifier. We can’t show mbox, but pictures make quite a nice visual cue (thanks MartinP :). There may still be several of course, but often that narrows it down. Even better would be if we could say this is my preferred identifier, an idea that’s been suggested a couple of times on the foaf list. That’s getting quite close to the Topicmaps published subject identifier idea.
There’s a danger that this sort of thing might turn into a registry of sorts, but it’s better to think of it as a vocabulary, like TAP, or wordnet (but unlike the latter, for individuals, not classes of things). And there will be many of them, not just one. Maybe they could compete on quality or trustableness, or smushing quality, or inferencing capablity.
I also had an interesting chat with lilo from freenode, on using FOAF and related vocabularies to describe groups, projects and people on IRC. And interesting stuff about online identity and voting from urgen (I’m very interested in electronic voting and the various issues with it). And edd on smushing…and ericp on encoding RDF ‘contexts’ using SOAP envelopes…and ephidrina about CCC (should have gone to that). Last night as well, chatting with damian and max and danbri about ‘RDFPath’ and XSLT and those possibilities – damian wants to implement that. Oh and balloons, and more balloons and light writing photographed by max.
But really it’s just far too hot to code or do anything….