Semanticsheets
August 12th, 2003
The recent discussion about RDF and RSS (or Atom) integration led forward by Jon reminded me of a link that Santiago sent me a while ago: an essay by Jorge Luis Borges about a mythical Library of Babel that contains all the possible permutations of all the letters in the alphabet (so, eventually, all the possible books that are potentially writable), but no way to search it.
The humorous thing is that, potentially, there is a book in the library that explains how to search it but nobody was able to find it.
Borges wrote this essay in 1941 and some consider the web (or the internet in general) to be heading toward this massively noisy datastore.
Borges himself was fascinated by the problems of library sciences. In another essay, The Analytical Language of John Wilkins, he wrote, while talking about catalogs (what we would call today a metadata taxonomy):
These ambiguities, redundancies and deficiencies remind us of those which doctor Franz Kuhn attributes to a certain Chinese encyclopaedia entitled
Celestial Empire of benevolent Knowledge. In its remote pages it is written that the animals are divided into:
- (a) belonging to the emperor,
- (b) embalmed,
- (c) tame,
- (d) sucking pigs,
- (e) sirens,
- (f) fabulous,
- (g) stray dogs,
- (h) included in the present classification,
- (i) frenzied,
- (j) innumerable,
- (k) drawn with a very fine camelhair brush,
- (l) et cetera,
- (m) having just broken the water pitcher,
- (n) that from a long way off look like flies.
The above is obviously meant to be provocatively funny, but it touches a nerve: metadata is actually metacontent, and metacontent is content, thus has all the same subjectivity problems that content has. There isn’t and there cannot be such thing as objective metadata.
For this reason, semantic interoperability of metadata cannot be done directly (as the XML mental model suggests), but it’s going to require (one or even more!) layers of metadata. This is what the RDF model proposes: a way to ground symbols and allow namespaced symbols to be used independently and create an additional higher level connection between the used symbols.
What is the problem of RDF is, IMO, due to two things:
- the XML/RDF syntax: it’s ugly (rdf:bag with rdf:li? please), it’s old (it was written before all the other W3C specifications), it feels hacky (using # in namespace URIs to separate absolute parts to local ones? yuck!), it’s too loose (there are tons of ways to say the same thing).
- the implicit concept that RDF metadata should be placed alongside the data it describe and contained in the same document.
I will not talk here about the first issue (also because there seems to be activity to change that), but I am asking a question here:
what if storing RDF inside the XML document is repeating the <font> tag problem in HTML?
I personally believe this is the case: the semantics on how markup is used to signify information should be stored in a separate file, which I’ve called semanticsheet, that might be reused across different documents (just like CSS stylesheets separate style from content and can be reused on different documents).
An example of this can be seen in this very page: XHTML hyperlink tags (the <a> element) are used sometimes to indicate an external source of information, some other times are used to identify uniquely the person or concept that I’m referring to. The RDF people believe that we should be writing RDF inside our XHTML documents to make this explicit, I think this is not necessary: if I write
<a href=”http://www.betaversion.org/~stefano/”>Stefano</a>
this is not semantically equivalent to write
<a href=”http://www.betaversion.org/~stefano/”>Stefano’s homepage</a>
so, the first should be somewhat specified and the class attribute can be used for this
<a href=”http://www.betaversion.org/~stefano/” class=”identify”>Stefano</a>
then if the XHTML page contains something like
<link href=”weblog.ss” /><link href=”weblog.ss” rel=”semanticsheet” type=”text/ss”/>
a semantically-aware news aggregator might choose to augment the information contained in the document with the semantic layer and perform additional inference on that data.
Now, how this semanticsheet looks like, well, that’s another story, but I plan to start researching into the concept right away.