On The Impact of Damage non-locality in Incentive Economies around Data Sharing
June 17th, 2010
For centuries, it was common for scientists to exchange ideas with epistular discussions. These days, remotely located scientists collaborate via email, or exchange digital documents when they don’t meet face to face. These are way faster and easier to exchange than hand-written letters sent via postal services. Unfortunately, they still retain that ‘after the fact’ property that they are often revealed only when some scholar decides later they were important enough to dig out and organize.
With that in mind, I find myself excited every time I get the chance to participate in in ‘blog rebuttals’ like the ones that David Karger and myself have been having lately about requirements, motives and incentives for people to share structured data on the web. Both of us care a great deal about this problem and we still cross paths and cross-pollinate ideas even after I left MIT. We also have very different backgrounds but they overlap enough so that we can understand each other’s language even when we try to explain our own (sometimes still foggy) thinking.
It is a rare situation when people from different backgrounds cross paths and earn each other’s respect. It is even rarer when their discussions are aired publicly as they are happening; this creates a very healthy and stimulating environment not only for those participating but also for eventual readers.
In any case, the point of contention in the current discussion is the reasons why people would want to share structured data and what can facilitate it.
It seems to me that the basic (and implicit) assumption of David’s thinking is that because a web of hyperlinked web pages came to exist, it would be enough to understand why it did, replicate the technological substrate (and its social lubrification properties) and the same growth property would apply to different kind of content.
I question that assumption and I’m frankly surprised that questioning whether the nature of the content can influence the growth dynamics of a sharing ecosystem makes him dismiss it as being related to a particular class of people (programmers) or to a particular class of business models (my employer’s).
It might well be that David is right and the same exact principles apply… but it seems a rather risky thing to take for granted. People post pictures on public sites, write public tweets, contribute to wikipedia, write public blogs, or create personal web sites, all this is shared and all this is public. These are facts. They don’t publish nearly as much structured data and this is another fact. But believing that people would do the same with structured data if only there was technology that made it easier or made is transparent, is as assumption, not a fact. It implicitly assumes that the nature of the content being contributed has no impact on the incentive economies around it.
And it seems to me a rather strong assumption considering, for example, that it doesn’t hold true for open sharing of software code.
Is it because software programmers are more capricious about sharing? Is it because what’s being shared is considered more valuable? Or is it because the incentive economies around sharing change dramatically when collaboration becomes a necessary condition to sustainability?
Could it be that sharing for independent and dispersed consumption (say, a picture, a tweet, a blog post) is governed by economies of incentives that are different from sharing for collaborative and reciprocal consumption? (say, software source code, wikipedia, designs for lego mindstorm robots or electronic circuitry)
I am the first to admit that it is reasonable to dismiss my questioning for being philosophical or academic, or too ephemeral to provide valuable practical benefits, but recent insights that crystalized collectively inside Metaweb (my employer) make me think otherwise. The trivial, yet far-reaching insight is this:
the impact of mistakes in hypertext are localized,while the impact of mistakes in structured data or software are not
If somebody writes something false, misleading or spammy on a web page, that action impacts the perceived value of that page but it doesn’t impact any other. Pages have different relevance depending on their location or rank so the negative impact of that action changes depending on the page importance. But the ‘locality of negative impact’ property remains the same: no other page is directly influenced by that action.
This is not true for data or software: a change in one line of code, or one structured assertion, could potentially trigger a cascading effect of damage.
This explains very clearly, for example, why there are no successful software projects that use a wikipedia model for collaboration and allow anybody that shows up to be able to modify the central code repository.
Is that prospect equally unstable for collaborative development over structured data? or is there something in between, some hybrid collaboration models that take the best practices between the wiki models (which shines in lowering the barrier to entry) and the open software development models (which manages to distill quality in an organic way)?
I understand these questions don’t necessarely apply to the economy of incentives of individuals wanting to publish their structured datasets without the need for collaboration, but I present them here as a cautionary tale about taking the applicability of models for granted.
More than programmers vs. professors, I think the tension between David and myself is about the nature of our work: he’s focusing on facilitating the sharing of results from individual entities (including groups), I’m focusing on fostering collaboration and catalyzing network effects between such entities.
Still, I believe that understanding the motives and the incentive economies around sharing, even for purely individualistic reasons, is the only way to provide solutions that meet people’s real needs. Taking them for granted is a very risky thing to do.