Home » Blog » Archives

Archive for 2005

Closed World vs. Open World: the First Semantic Web Battle

June 16th, 2005

Not many realize this, but the reason why Semantic Web technologies feel somewhat exotic (or should I say esoteric?) is the fact that they are based on the “open world” assumption.

“what’s that?”, I hear you asking. Well, suppose that you have the following statements:

“Stefano” “is a citizen of” “Italy”

Note how this is just a (hacky) pseudo syntax for an RDF statement, but it doesn’t matter at this point, just focus on the fact that we have one statement in our model.

Now, let us ask the question “Stefano” “is a citizen of” “U.S.A.”? against that model.

A system based on a closed world assumption (the one programmers are mostly familiar with and the one all XML technologies are based on) will return a sound and comforting “No”.

A system based on an open world assumption (the one RDF technologies are, so far, based on) will return a rather discouraging “I can’t tell”.

Why so? Well, because the close world assumption implies that everything we don’t know is false, while the open world assumption states that everything we don’t know is undefined. So, the question “Stefano” “is a citizen of” “U.S.A.”? will look for that statement in our model, and since we don’t have that statement, the two systems will interpret it differently: false for a closed world approach and not computable for an open world approach.

Now, I’m sure you are now thinking this is just academic, but let me show you another example or how open world assumptions make things weird. Let’s get a new model:

“Stefano” “has father” “Franco”

“has father” “has cardinality” 1

The first statement says that Franco is my dad and the second says that has cardinality of the “has father” relationship is always one. Now, suppose that we merge our model with another one, that contains the (false! Antonio is my uncle, my dad’s brother) statement below:

“Stefano” “has father” “Antonio”

Again, the two assumptions operate in a completely different way:

  • the closed world assumption triggers an error: there can be only one father and the model contains two statements that are in conflict;
  • the open world assumption triggers the creation of a new statement: there can only be one father, therefore “Franco” “same as” “Antonio”;

Open world is much less prone to trigger an error, and much more likely to generate new statements based on the fact that can be found in the model, hoping to be able to regenerate the knowledge when new information comes along.

Close world, on the other hand, is much more strict, less diplomatic, if you wish, more absolutist: I tell you what I know now and I don’t try to second guess.

These two different views of the world are nothing but political visions: some people avoid doubt as the plague, some others embrace doubt as a way of gaining new potential knowledge. Both have pros and cons, but they are really hard (if not impossible) to mix: they are like water and oil.

Recently, the W3C held a workshop on Rule Languages for Interoperability and it turns out that many people feel the need for more “close world” approaches or, as some like to call it, “negation as failure”. The paper “Stack of Two Towers?” explains well the fact that there is a battle starting between the two worlds, a battle that might fragment the semantic web in two partially incompatible worlds: one where doubt is embraced, but validity is a myth, another where validity is possible but doubt is a plague.

Note how the open world assumption is considered evil by the entire range of XML and web service technologies that try really hard to standardize, normalize, describe and validate everything that goes on, hoping to create an ordered world in which things work smoothly.

The open world assumption tries hard to avoid that: validation, in a open world assumption, is, well, limiting, if not directly out of place.

But then again, validation is a fact of life and a very strong requirement for some environments that at least try to uniform and minimize differences and disorder within a particular application or closed community.

So, the question on the table, and it’s a big one, is: how do we allow open world and closed world assumptions to coexist? is it just a parameter in the way the inferencing engines run (one empowers doubt, the other rejects it) or is something impossible to achieve?

It’s a very big and deep question, and touches not only technology and computer science, but also philosophy and linguistics, if not politics too.

My answer is pragmatic: use what you need and ignore the rest. I lost interest in the search for the perfect architecture, but I know some haven’t: fasten your seat belts, people, turbulence ahead.

Permalink | Posted in Commentary
 

What to do if Piggy Bank ate your homework

May 27th, 2005

Somebody suggests me that “killer apps” are not really meant to kill.

Point well taken. (David even drew this little cartoon about it)

image-0.png

He’s referring to the fact that Piggy Bank takes a while to start the first time you restart your Firefox after having installed it, and people think it’s stuck and kill it. When that happens, their profiles become corrupted and nothing works as before.

Ryan has written a blog post on how to get your firefox back if profile corruption happens (no worries, no data gets lost, it’s just firefox that doesn’t have the ability to roll back to a previous profile if the installation of new extensions doesn’t go well, hopefully they will fix this part soon).

We apologize for the inconvenience (and we’ll make sure to profile the startup time or find a way to provide useful visual feedback), but at least now we know where the problems are.

Permalink | Posted in Article
 

Introducing Piggy Bank 2.0

May 22nd, 2005

Today we have released Piggy Bank 2.0b1.

Piggy Bank is a Firefox extention that turns your regular web browser into a semantic web browser. Cutting the buzzword crap, we enable you to take web data, normalized and well mixable, with you. Then you can search it, store it, browse it, share it, map it, tag it and so on. All in the comfort of your firefox browser. Web data will not be jailed in their original containers and will allow you to mix and match it as you wish, not as the original web site wanted.

Piggy Bank is unique in many ways:

  1. it’s mostly written in java. Yes, you heard me: java, not javascript. We found a way to write an XPCOM component in Java and make it run on both windows, linux and macosx (you need to install something there but it works after that). So what? well, we stand on the shoulders of giants: velocity, lucene, sesame, lo4j, dom4j, jtidy… hundreds of men years of work we don’t have to redo, and immediately portable across operating systems.
  2. it digests RDF, but it’s very liberal on what RDF it can do stuff with. You have to use RDF types, but that’s about it.
  3. it makes it easy for you to add RDF to existing web sites, but it doesn’t force you to wait for web sites to do it: you can scrape them! we give you all the tools and you just have to write a scraper, either in javascript or XSLT. Think of a data-focused GreaseMonkey.
  4. it uses RDF for tagging, but in a way to allows people to agree or disagree, connect or disambiguate between tags used by others (as I described in an earlier post).
  5. in combination with a Semantic Bank, allows you to share and publish information with friends or colleagues, with the granularity and trust you want, not forcing you to post everything in a single web site that somebody else owns and controls.
  6. its designed to allow you to mix data from various sources and use web services to provide more information for that data (for example, Google Maps and Google geo-coordinate lookup web service).
  7. it’s very easy to extend, providing scrapers, facades and templates so that you can adapt it to your own needs.

What Mosaic did for the web was to show to everybody what you would gain if you invested (time and effort) in creating your HTML data. It made it obvious. This was what really kick-started it.

Piggy Bank brings you that advantage but for RDF data. And it couples it with a framework where users can innovate. The reward of spending a few hours converting your address book into RDF and see it immediately located on a map or facetted browsed is way bigger than the cost. The return on your investiment high and easily felt. This is the thing that had been missing so far: a reason.Piggy Bank gives you one and tries to help you along the way, both in the creation and in the consumption.

Piggy Bank, to me, feels like a killer app in the sense that I use it not because I want to advocate the technology (or our work) but because I need the features myself! It made it also very painful to develop on it, because when it didn’t work, it hurt! I couldn’t access the information I needed and depended on! It was a great incentive to do small incremental development and continous testing… which turned out to be an incredibly productive way of doing software developmetn (as I knew already, but this made very obvious).

This will be a day to remember.