Closed World vs. Open World: the First Semantic Web Battle
June 16th, 2005
Not many realize this, but the reason why Semantic Web technologies feel somewhat exotic (or should I say esoteric?) is the fact that they are based on the “open world” assumption.
“what’s that?”, I hear you asking. Well, suppose that you have the following statements:
“Stefano” “is a citizen of” “Italy”
Note how this is just a (hacky) pseudo syntax for an RDF statement, but it doesn’t matter at this point, just focus on the fact that we have one statement in our model.
Now, let us ask the question “Stefano” “is a citizen of” “U.S.A.”? against that model.
A system based on a closed world assumption (the one programmers are mostly familiar with and the one all XML technologies are based on) will return a sound and comforting “No”.
A system based on an open world assumption (the one RDF technologies are, so far, based on) will return a rather discouraging “I can’t tell”.
Why so? Well, because the close world assumption implies that everything we don’t know is false, while the open world assumption states that everything we don’t know is undefined. So, the question “Stefano” “is a citizen of” “U.S.A.”? will look for that statement in our model, and since we don’t have that statement, the two systems will interpret it differently: false for a closed world approach and not computable for an open world approach.
Now, I’m sure you are now thinking this is just academic, but let me show you another example or how open world assumptions make things weird. Let’s get a new model:
“Stefano” “has father” “Franco”
“has father” “has cardinality” 1
The first statement says that Franco is my dad and the second says that has cardinality of the “has father” relationship is always one. Now, suppose that we merge our model with another one, that contains the (false! Antonio is my uncle, my dad’s brother) statement below:
“Stefano” “has father” “Antonio”
Again, the two assumptions operate in a completely different way:
- the closed world assumption triggers an error: there can be only one father and the model contains two statements that are in conflict;
- the open world assumption triggers the creation of a new statement: there can only be one father, therefore “Franco” “same as” “Antonio”;
Open world is much less prone to trigger an error, and much more likely to generate new statements based on the fact that can be found in the model, hoping to be able to regenerate the knowledge when new information comes along.
Close world, on the other hand, is much more strict, less diplomatic, if you wish, more absolutist: I tell you what I know now and I don’t try to second guess.
These two different views of the world are nothing but political visions: some people avoid doubt as the plague, some others embrace doubt as a way of gaining new potential knowledge. Both have pros and cons, but they are really hard (if not impossible) to mix: they are like water and oil.
Recently, the W3C held a workshop on Rule Languages for Interoperability and it turns out that many people feel the need for more “close world” approaches or, as some like to call it, “negation as failure”. The paper “Stack of Two Towers?” explains well the fact that there is a battle starting between the two worlds, a battle that might fragment the semantic web in two partially incompatible worlds: one where doubt is embraced, but validity is a myth, another where validity is possible but doubt is a plague.
Note how the open world assumption is considered evil by the entire range of XML and web service technologies that try really hard to standardize, normalize, describe and validate everything that goes on, hoping to create an ordered world in which things work smoothly.
The open world assumption tries hard to avoid that: validation, in a open world assumption, is, well, limiting, if not directly out of place.
But then again, validation is a fact of life and a very strong requirement for some environments that at least try to uniform and minimize differences and disorder within a particular application or closed community.
So, the question on the table, and it’s a big one, is: how do we allow open world and closed world assumptions to coexist? is it just a parameter in the way the inferencing engines run (one empowers doubt, the other rejects it) or is something impossible to achieve?
It’s a very big and deep question, and touches not only technology and computer science, but also philosophy and linguistics, if not politics too.
My answer is pragmatic: use what you need and ignore the rest. I lost interest in the search for the perfect architecture, but I know some haven’t: fasten your seat belts, people, turbulence ahead.