Nash Equilibria in Non-Cooperative Data Modeling
April 3rd, 2008
In game theory, a Nash equilibrium is a state in a two or more persons game in which no player has anything to gain by changing only his or her own strategy unilaterally.
Basically, when two players can’t or won’t coordinate (for lack of ability to communicate, luck of trust, whatever) and rely only on maximizing their immediate personal benefit, Nash has shown that there is always a state in such a game that is at equilibrium, meaning that none of the players, acting selfishly would want to change it (as it would reduce their payoff).
The very interesting thing about Nash equilibrium is that such state it’s not necessarily the one that maximizes the payoff that the players would obtain if they had coordinated.
One classic example of this is the so-called “prisoner’s dilemma“, a situation in which two prisoners can decide to be silent or betray the other and obtain different payoffs depending on their choice and the other persons’ choice, but without the ability to coordinate.
If both prisoners stay silent, they get 6 months, but if one stays silent while the other betrays, the betraying one goes free while the silent one goes to jail for 10 years. If both betray, they get 5 years each.
The best outcome is for both people to remain silent, but the selfish drive (and therefore what makes the ‘both silent’ state optimal but imbalanced) is to betray since even if the other betrays as well, 5 years is less than 10 (the situation where you trusted the other person to remain silent but he betrayed you).
It is worth nothing a few things here:
- if the prisoners get 10 years in jail both if they both betray and if one gets betrayed while remaining silent, the ‘both silent’ state becomes both optimal and at Nash equilibrium.
- we can therefore infer that the optimality and nash equilibria of a particular game are not only a function of its rules, but of its paybacks as well.
This means, at the very least, that Nash stability and optimality of non-coordinated strategies can be influenced by tuning the paybacks without altering the rules of the game.
Now, think of distributed and decentralized data modeling efforts as a non-cooperative game: the players involved that require to model their data for their own use will try to do so in a way that minimizes their effort and maximizes their benefit. If both modeled their data in the same way, they could reduce their data integration costs, if not their integration costs will be substantial. The problem is that it’s hard for the players to predict the cost of each integration and, most importantly, their need for a particular one, especially as the number of players grows large.
With absolutely no coordination, the only Nash equilibrium of such a system is the ‘babel’ state: everybody does their own thing.
On the other side of the spectrum, the state that would globally minimize integration costs for everybody (which is, everybody using precisely the same way to model data) is not at Nash equilibrium, as individuals would perceive that improving their immediate modeling needs would increase their (easy to predict) immediate payoff more than it would increase their (hard to predict) future integration costs.
Note how the above is basically saying that no matter how descriptive, complete, well-thought-out and encompassing your data model is, it’s usage won’t be at Nash equilibrium, which will naturally bring diversity, dialects and changes into its uncoordinated usage.
It is critically important to realize that this is *not* a function of the quality of the data model, but it’s a function of the difference in difficulty in predicting the immediate present benefit against the future integration ones. Thus, spending more time polishing the data model won’t make any difference in the outcome as it diverges to reach Nash equilibrium.
One of the things that irritates me the most about the semantic web and its advocates is the naive presumption that optimal states in non-coordinated data modeling systems are necessarily stable and therefore will happen naturally.
While this was the case for the web (where selfish decentralized activity brought both local improvements and global ones at the same time and it was therefore relatively easy to bootstrap), this is not the case for the semantic one; this fact is often called the “chicken and egg problem”.
Many (including myself) have tried over the last decade to solve this bootstrapping problem by forcing existing data to surface, hoping to catalyze activity and applications that would further push for more data to surface and for more applications to exist.
But one thing that I’ve come to realize recently is how surfacing data might not be enough to bootstrap an autonomous system if we don’t find a way to align the Nash equilibria and the optimal states of the distributed data modeling and integration game.
What this means in practice is that we must find a way to tweak the paybacks of the data integration game (which is clearly a non-zero-sum game) so that its Pareto optimal states are also at Nash equilibrium (a thing that the Prisoner’s Dilemma shows it’s far from granted).
I personally think that Exhibit and Potluck are the best examples out there of solutions that don’t specifically change the nature of the game but shift the paybacks, thus attempting to reduce the gap between Pareto optimal states and Nash balanced ones.
A lot more has to happen on the Potluck front, of course, being practically just paperware and a lot more has to happen about harvesting the collective intelligence of people using these tools, to further improve on their use and emerge data that could be useful to increase coordination and make it easier to predict integration costs.
We are still far from solving the bootsrapping problem, but one thing is clear in my mind: exposing a bunch of data as RDF (no matter how well inter-linked and how many URIs can be dereferenced as URLs) is not going to be enough without a deeper and more serious analysis of the socio-economical dynamics around data modeling and data integration.
“Build it and they will come”, this time, might not be enough.