Archiving a Mediawiki Installation

NoTube uses a Mediawiki installation kindly provided by for its internal documentation. As the project draws to a close (we finish officially on January 31st 2012, with our final review in late March) we wanted to make sure we had a copy of everything we had done over the last few years. Much of this is and will remain private to the partners but there are some interesting ideas and usecases we wrote down early on that we don’t want to lose track of. I hadn’t realised that by default Mediawiki has an API, but once I did, it was pretty simple to download all the pages. I’ve put the Ruby script on github in case it’s useful to anyone else. Basically the only fiddly bit is the cookies. You do, of course, need a username and password for the wiki you want to download, but thereafter, there’s an API call you can call recursively to get a list of all pages, and then download them individually.

Web [on|and|in|for|with|via|through] TV Workshop

In September I participated in the programme committee of the W3C’s Web On TV workshop, which was held in Japan. Because of some existing committments I was not able to go to the face-to-face meeting, so to try and make up for it, I read through all the papers instead of just my allocated ones. My notes are below. These are just my personal opinions, and I’m not an expert in the TV field (although Web and TV is my thing – I work on the NoTube project). All the papers are public. There is also a Draft Web and TV Interest Group Charter. The title of this post is stolen from danbri who was pointing out that Web AND TV need not be Web ON TV.

These reviews are very short – most of the papers are themselves very short, being expressions of interest. In some cases I just use a representative quote from the paper. The attempt here is to summarise not to evaluate them, though I indicate where I am interested in a particular topic. The workshop summary is here.


A large group were interested in BML and explaining why it’s important, perhaps indicating that they do not see a reason to change from using that.

A group are interested in HTML5 and how it might work with BML for interactive applications, and a subgroup interested in user interfaces for TV and common UIs for TV and other devices using HTML5.

There is an overlapping group who see the TV as being a hub for home entertainment, which seems to mean that everything is controlled via the TV, web pages are viewed on the TV etc.

There is also a group interested in APIs for TV and other devices (such as controls).

There is a strong sense that IPTV is very important and standards for it are important, especially DRM and efficiency.

I get the impression that there are a lot of participants who have specific scenarios in mind and also a number who are looking for interesting aplications of HTML5 to TV.

Papers 21 and 27, 30 are the most interesting from my point of view. 31 makes an important point. I’ve a lot of sympathy with 36. 39 and 41 are also interesting.


1. Shinichi Matsui (Panasonic)

Would like to attend in a personal capacity. His view is that “TVs are the most important components, not only for displaying contents, but of “Ubiquitous Home Appliances” which will evolve to “Web Appliances” surrounding consumers.”

2. Tatsuo Matsuoka (Innovative IP Architecture Center, NTT Communications Corporation)

They have made an IPTV service. They are interested in what functions are done by different devices and APIs, and IPTV standards and DRM.

3. Katsuhiko Kageyama (Hitachi)

Interested in consumer electronics: user interfaces for TV especially HTML5 capabilities, control and communications between devices.

4. Sunghan Kim (ETRI/W3C Korea Office)

Interested in the relationships between devices, and content provision, e.g. start watching on one device and continue on another, and the various W3C and other standards that could be employed to make it happen.

5. Wayne Carr (Intel)

Interested in HTML5 as a way to provide web experience across a range of devices, TV in particular.

6. Masakazu Muraoka (

Interested in APIs to TV and HTML on TV.

7. Aaron Zhang (Huawei)

They are an IPTV provider and suggest an architecture for improving the user experience of the web on TV (avoid bad UI experiences of early PCs)

8. Masakazu Kobayashi (KDDI)

Interested in HTML5 as a common interface to Web TV, avoiding situations such as the different standards for e-books.

9. Yusuke Kawabe (NTV (Nippon Television))

Would like to talk about BML and the usecases for it, and see what any new usecases are.

10. Hidekazu Bunne (TV Asahi)

As (9)

11. Tomokazu Yamada (IPTV Forum)

Would like to talk about IPTV, specifically DRM and EPG metadata, and have some usecases to share.

12. Tatsuto Murayama (NTT)

Describe their requirements for HTML5:

“1. Layout optimization with reflowable materials
2. Requirements for vertical writing/reading and ruby annotations
3. HTML5 widgets as containers for digital books”

Seem most interested in digital books, but also talk about easy to use layout optimisation on TV screens.

13. Koichi MARUYAMA (NTT Cyber Solutions Lab.)

Interest is in a markup language for IPTV with
” – Easy multimedia description like BML/LIME
– Interactivity as rich as that of native application
– Service integration and linkage for multiple devices”

Social networking, performance and DRM are their main interests.

14. Limin Yu (DragonTec)

They have developed a BML IPTV browser and next are doing a LIME browser. They would like to demonstrate their browser. They are interested in standards suitable for the chinese market. They think that W3C technoloigies have potential for interactive TV.

15. Shigeru Owada (Sony CSL)

Interesting ideas of devices as ‘fairies’ that can communicate with each other and that humans can communicate with. “We are interested more on fun usage of ubiquitous home network than protocol layer implementation”

16. Yoshikazu Seki (Fuji Television)

as (9)

17. Kazunori Tanikawa (NEC)

Interested in IPTV, scenarios, and the potential of HTML5.

18. Kenji Sugihara (TV Tokyo)

Similar to (9) especially for broadcaster controlled interactive appliactions using BML.

19 is missing

20. Hiroyuki Aizu (Toshiba)

Would like to show some usecases of HTML5 on TV as the hub within a hiome network, and some ideas about communication technology and TV.

21. Shuhei Habu (Allied Resources Communications)

Interesting usecases and a proposal for privacy for TV in HTML5 based on BML and APIs for TV.

22. Kenji Fukuda (Wowow)

Similar to (18)

23. Jan Lindquist (Ericsson)

A member of the Open IPTV Forum (OIPF) standardization group who woudl like to talk about his experiences in standardisation in the subgroup responsible for the web latform (javascript, embedded video).

24. Yoshiaki Ohsumi (Panasonic R&D)

Interested in possible future usecase and smarter integration of TV and web technologues; TV as a hub.

25. Ishidoshiro Takashi (Melco)

Make TVs and perpherals. Interested in the future relationship between BML and HTML5 and traditional over the air and HTML5 and from the user’s point of view how to improve the experience.

26. Keiya Motohashi (NHK)

Interested in public service usecsaes such as disaster information, BML and interactive applications, connecting TV with the web.

27. Hyojin Park (KAIST)

Researchers on TV. Interested in device APIs for the browser to control the TV, architecture and standards to allow appropriate UIs for different devices.


The paper is about BML, which is a markup language widely used in Japan and is ‘is basically an extension for existing Web standards, e.g., XHTML 1.1′.

They would be able to provide usecases and are interested in seeing how TVs will become more of a hub for entertainment in the home, and how these changes fit with html5.

29. Makoto Nishimura at Cisco Systems

“Our interest is the integration of LIME and HTML5 on to our video products such as IP-STB, RF-STB and other related solutions.”

30. Hiroshi Omata (

They have made remote controls from mobile phones and are therefore interested in devices APIs for TV. Also interested in standardising HTML5 for TV.

31. Naomi Nakamura (ACCESS)

Think that people don’t use TV but watch it – i.e. lean back exterience; therefore new usecases will need to be thought through that accept this.

32: Tatsuya Igarashi, TDG, Sony Corporation

Again interested in the TV as a hub for home entertainment and integrated web technology; interested in HTML5; provide usecases.

33: Shozo FUKUI, Tomo-Digi Corporation

They would like to explain why BML was useful, explain the diffrence between BML and HTML5 and have several usecases to discuss, including extensibility in the future.

34: Tatsuki Matsuda, NTT-Resonant Inc.

They would like to join the workshop, but don’t offer a paper – they are provders of web portal services and would like to be able to integrate with TV services.

35. Masahito Kawamori (ITU-T)

From ITU-T: would like to present their experiences standardising for IPTV: declarative languages, Lua, SVG, ECMAscript.

36. Charles McCathieNevile (Opera Software)

Proposes concrete steps (e.g. testcases) for ensuring “use of HTML on TV [is] more closely aligned with its usage in general”, and that this should happen in W3C or in close colaboration with W3C.

37. Daniel Park (Samsung)

“We are supporting on developing best practices and guidelines for Web on TV as well as easy of connection with other Web-capable devices from Web application.”

38. Diot Christophe (Technicolor)

They can help bring the views of services providers and content producers to the table. Interested in web on TV applications, why HTML5 not CE-HTML.

39. Asanobu Kitamoto (NII)

Describes the concept of ‘Bayesian TV’ – not just TV on the web or vice versa, but a personalised push system, rather than the pull of the web, with recommendations and user interactions.

40. Manabu Shimobe (UIEvolution)

“we are very interested in contributing to defining the additional standards needed for smarter integration of web technologies and broadcast services” particularly user interface aspects.

41. Kiyoshi Oura (Airframe)

Interesting points made – the only one to mention advertising – describing some of the different watching scenarios of the future including different devices. Interested in HTML5 and the potential for continuing enolving of content, especially flexible data storage mechanisms.

Some FOAF stats

Some FOAF stats from Sindice for something I had to write last week.

All classes

“Agent”, 3.84 million
“Document”, 6.15 million
“Group”, 5.78 thousand
“Image”, 711.23 thousand
“OnlineAccount”, 15.47 thousand
“OnlineChatAccount”, found 324
“OnlineEcommerceAccount”, found 242
“OnlineGamingAccount”, found 240
“Organization”, 10.05 thousand
“Person”, 2.64 million
“PersonalProfileDocument”, 11.7 thousand
“Project”, found 726

All properties

“accountName”, 8.02 thousand
“accountServiceHomepage”, 7.24 thousand
“aimChatID”, 9.54 thousand
“based_near”, 7.35 thousand
“birthday”, 2.48 thousand
“currentProject”, found 648
“depiction”, 696.31 thousand
“depicts”, 617.16 thousand
“dnaChecksum”, found 65
“family_name”, 2.46 thousand
“firstName”, 4.2 thousand
“fundedBy”, found 237
“geekcode”, found 107
“gender”, 15.8 thousand
“givenname”, 24.17 thousand
“holdsAccount”, 9.88 thousand
“homepage”, 1.22 million
“icqChatID”, 22.8 thousand
“img”, 684.38 thousand
“interest”, 64.77 thousand
“isPrimaryTopicOf”, 1.54 million
“jabberID”, 2.98 thousand
“knows”, 1.08 million
“logo”, found 374
“made”, 1.97 million
“maker”, 1.97 million
“mbox”, 3.7 thousand
“mbox_sha1sum”, 43.9 thousand
“member”, 5.53 thousand
“membershipClass”, found 58
“msnChatID”, 7.68 thousand
“myersBriggs”, found 154
“name”, 1.77 million
“nick”, 96.7 thousand
“openid”, 80.24 thousand
“page”, 5.84 million
“pastProject”, found 179
“phone”, found 999
“plan”, found 139
“primaryTopic”, 278.11 thousand
“publications”, found 202
“schoolHomepage”, found 644
“sha1”, found 60
“surname”, 25.32 thousand
“theme”, found 282
“thumbnail”, 2.51 thousand
“tipjar”, found 73
“title”, 2.02 thousand
“topic”, 3.13 million
“topic_interest”, found 90
“weblog”, 300.06 thousand
“workInfoHomepage”, found 505
“workplaceHomepage”, 1.68 thousand
“yahooChatID”, 6.72 thousand

Displaying Guardian book reviews for quick buying on Amazon

I read the Saturday Guardian every week, and quite often buy a bunch of books reviewed in it. But equally, I don’t buy quite a lot of them as they’re only available in expensive and bulky hardback (plus I resent being market segmented like that, sorry). The Guardian’s reviews are very good but they only really review hardbacks in any depth or breadth, so it’s hit and miss whether I actually get to read any of them by the time they get to paperback. I just forget. I bet a lot of people do this.

Anyway, a couple of months ago I realised there was a Guardian content API as well as a data API. I applied for a developer key and, to my surprise, got one (the docs said they were giving out very few). This weekend I finally got around to having a play with it. It’s pretty neat. I’ve not explored it very thoroughly – I’m sure people can think of much more profound applications to make – but for book reviews there is lots of interesting data, and it’s available in JSON and XML.

My initial plan was to programmatically create an Amazon list – but this isn’t possible using the Amazon ECS API. However it is possible to search (on books, title, and authors) and get XML back, including a link to the Amazon page that describes it. I made a very simple page that does a request for book reviews with the appropriate date, and then for each result returned, identify the author and title and do an Amazon lookup to get the URL (I just pick the first one returned – I’m feeling lucky). It’s not as covenient as I’d hoped, but it does make it that tiny bit easier to

  • Buy things from the list straight away
  • Put things that are only available in hardback into my wishlist so I don’t forget about them

There are a couple of issues:

  • The title and author aren’t available as separate fields in the Guardian API. Usually the linktext is very formulaic and the information can be parsed out of that, but sometimes there are non-standard items and these fail
  • Characters with accents are returned as HTML entities so those need to be swapped back to characters in order to do the Amazon search
  • There’s no data about whether the book is in paperback or not, annoyingly. Amazon seems to mostly return the paperback version first if available, but maybe this is just good luck, and it probably needs more thought

The result isn’t too bad though and maybe I’ll buy a few more books. The Ruby code is here – you’ll need your own API keys for the Guardian and for Amazon though (they are both free and you can just get an Amazon one if you have an account with them)

Generating specs from RDFS / OWL docs

I’ve been hacking away at danbri’s version of specgen so we can rev the foaf spec. The idea is that you take an RDFS / OWL schema and generate some human-readable HTML from it, by taking the classes and properties and writing out their basic constituents. Optionally you can add some introductory text in a template, plus some individual bits of text for each property and class, eventually in different languages too.

I slapped in some RDFa yesterday because we needed a replacement for the ugly addition of RDF directly into the html, which makes it invalid. I realise some people may think this is back to front, but the foaf spec’s ‘original’ format has always been RDFS/OWL so it makes sense for us. I’m not actually sure we need two RDF versions (as there is alternate pointing to RDFS/OWL version from the HTML) but heck why not, and danbri’s consulting the community so there’s probably an argument I’ve missed.

There are several specgens available and at some point it might be nice to rationalise, or maybe go for functional equivalence. These are probably better in some senses than the one I’ve been working on, especially as I’m new to Python.

The ones I’ve found:

I think the two things that unite the first three is that they are (a) self-described hacks (b) in python. The Foaf one uses RDFlib rather than Redland because danbri was having trouble with Redland installation on the Mac I believe.

Next things I’d like to look at are

  • Generating specs from sample data (maybe someone’s done this already? It wouldn’t be complete but could be a start)
  • Defining application profiles or Argots and using them to generate, say, useful Sparql queries
  • Pictures!