GRDDL and Its Supposed Language Neutrality
March 1st, 2007
GRDDL was designed as another way to help solve the semweb chicken-egg problem (no data with no killer app, no killer app with no data) by describing programmatic transformations that can be used by the data consumer to obtain the data from regular HTML pages without requiring substantial changes in the web site itself.
This is a different scheme than metamarkups such as microformats, RDF/A or eRDF that instead want to embed more structured information directly inside the HTML pages and thus require more changes in the web publishing itself.
But for it to work at all, all three these conditions are necessary:
- the transformation instructions need to be accessible by the data consumer
- they need to be executable portably and reproducibly
- they need to output data in portable and reproducible way
By default, GRDDL uses XSLT to describe transformation instructions. This solves all three conditions because:
- as long as there is a URL pointing to the XSLT file, the data consumer can access it (provided it has the right permissions, of course)
- XSLT is a turing complete language, but is also a very well defined platform, with no APIs other than the one that it comes with and no access to external data rather than the one fed for input (I’m ignoring extensions here because, at least for version 1.0, they are not portable anyway and very few rely on them)
- the input and output streams are implicit in XSLT workflow and GRDDL transformations are mandated to output RDF/XML data
If this is all GRDDL described, there would be nothing substantial to criticize. Unfortunately, as they realize that not everybody likes XSLT to describe programmatic transformations, they claim that it is just as possible to define GRDDL transformation instructions in any other language.
As I mentioned in my email, I strongly disagree: it is no generally possible to implement GRDDL transformation instructions in any other turing complete language and to satisfy the above three operational conditions without more help from the spec.
GRDDL bases its theory of operation on a very constrained language (XSLT) and simply assumes that all other programming language exhibit the same portability and reproducibility of both execution, security and data workflow. This is simply false.
We have shown this is possible because this is how we do it Piggy Bank.
So, my suggestions for the GRDDL WG are the following:
- stop saying that GRDDL transformation can be defined in any language; it might be true in theory, but in practice is hardly useful if those transformations cannot respect the above three conditions;
- decide what language you want to support and create profiles for them. If you have no time for that and you just want to do XSLT it’s fine but if you decouple the profiles from the GRDDL spec you can follow independent editing paths (like XSLT did with XPath and XSL:FO for example) and divide the work.
I hope that my criticism is not misinterpreted for lack of support: I fully believe GRDDL to be a useful and important step to unlock the semweb potential (not the only one, though, mind you), but I also think that specifications should be designed with implementation and practical constraints in mind and not just distilled out of theory and hope.