Last week, I was given the honor of being the keynote speaker at the Code4Lib 2009 conference in Providence, RI.
This is a conference self-organized by a community of alpha-techy librarians trying really hard to drag the library world (kicking and screaming) into a more modern alignment of both technological infrastructure and vision. I was never a librarian, but I worked 5 years for the MIT Libraries (trying to do the above myself) and now work for a company that can very well become part of that future technological infrastructure that could be highly beneficial for a modern version of the libraries.
Both SIMILE and Metaweb are considered valuable things in this space and watched with great interest, and I know many of them read this blog (or at least have read some of my ramblings in passing). Which is why I was invited.
So it’s particularly painful to admit that my keynote sucked.
Plagued by network problems that slowed my live demos to a crawl and made the whole experience downright pathetic, my timing got off and I was forced off the stage by the bell (the conference ran on a surprisingly rigorous schedule) without delivering the entire ending of the presentation.
This wouldn’t have been that bad if the ending was just a wrap up… but that was, in fact, the part that I wanted most to tell the audience about (and everything else was a way to lead to it).
My keynote wanted to highlight all the things that libraries and librarians are really good at and that will NOT die even if books become obsolete media of information transfer…. but I didn’t have time for that, so I just told them all the things that might displace them, and even demoed a few live.
And left it at that.
It was supposed to be a “fear then hope” speech, but time forced me to cut the ‘hope’ out of it.
Yeah, right.
So, you can imagine how eager I was to go thru the IRC conference backlog during my keynote that Erik was so prompt to send me. Now, to be frank, conference backlogs are never for the faint of heart, but I’ve done this before, I have a pretty thick skin and I knew already it was bad so I dove into it, hoping to find more value than just “booo”.
The first part of the keynote was about showing how the variations of marginal costs have driven all the innovations in information transfer, that following that curve would seem to predict that as soon as “ipods for books” arrive that have reasonably similar user experience to books (no, we’re so not there yet, kindle is barely scratching that surface), most information would become digital and libraries would split between ‘museums of books’ and ‘something else’. At which, the backchannel replied
[09:27:26] bess | umm… he knows this is CODE 4lib, right? This is reminding me of mid-1990s “the book is dead” hysteria
True. The people at code4lib already sense there is some value in there (and they all work on digital technologies already). Although when (a little earlier) I pointed out that books will probably feel one day like vinyl LPs feel today, there was a slew of
[09:26:12] rjw | vinyl LPs++
[09:26:17] gsf | fanatics++
[09:26:22] kat3 | bibliophiles++
which indicates, as I knew already, that there is a love/hate relationship going on in the libraries about information technology as a displacer. Many want the benefits of the new tools, without the disruption that these end up causing. This turns out to be a very common pattern across many industries that have to deal with information distribution.
Then I go on observing the fact that if there is no limitation of storage (or its costs curves are radically altered from today’s library’s shelves), what is the justification of filtering?
[09:29:11] scolford | I would like to filter faulty or dangerous medical or financial information. Wouldn’t you?
Actually, no, I wouldn’t.
In order to do this, you must know at the time of filtering how to evaluate properties like ‘faulty’ or ‘dangerous’ and assume they are both generally applicable and don’t change over time. Faced with a problem of shelf-space limitation, it’s easy to evaluate that storing faulty or dangerous information at the expense of valuable and useful information isn’t a benefit, no matter how important this filtered out information turns out to be 100 years in the future. But I don’t find it to be a valid argument today.
[note, less input filtering does not imply less output filtering: I too care about usefulness of information, I just disagree that pre-emptive filtering is the best way to achieve it, as it was forced to be done in the past by shelf-space constraints]
Technology might displace the very reason why filtering was done in the libraries, but the mindset will be much harder to displace. This inertia is way more dangerous than it might seem at first, if only because it opens the doors to other institutions that have much less problems in accepting all information and decide on its value a posteriori, but that don’t necessarily share the libraries’ core values.
The next step was, obviously, to deal with the other side: less reasons to filter input imply more reasons for better output filters and metadata has historically being the solution for this problem for libraries. But if the entire text was available, would we still need metadata? (at which many in the audience yelled “yeah!”)
[09:30:57] gsf | attack books all you want, but leave metadata alone
and also
[09:29:27] rosy1280 | metadata doesn’t exist
[09:29:31] rosy1280 | its just data
and to sum it up
[09:30:15] * jtgorman is guessing the keynote is striking dissonance tones
which was precisely the point and then continues
[09:32:48] * jtgorman thinks non-library people as keynotes is always a gamble
which is a very diplomatic way of saying that I was not pleasing
[09:32:52] bess | hi, you must be new here. Yes, in fact, people have thought about some of this stuff before. Even in libraries.
precisely, but very few ever focus on what libraries would still be useful for (or good at) after all this electronic dust settles (but neither did I, since I ran out of time, boo).
[09:32:58] dchud | ok maybe now he’s getting to metaweb.
then
[09:33:49] timmcgeary | demoing that books are dead?
Not quite. I’m not the one killing books nor I have an incentive to do so. But my work for years has all been focusing on showing how to emerge information out of other information and for that you need the ability to re-purpose and integrate data from various sources. And you can’t (easily) do that with books. I demoed all the stuff that we’re working on at Metaweb that shows the power of that approach.
[09:34:21] akorphan | So… the semantic web is good for Jeopardy answers?
At a superficial level, yes, it really feels like all it’s good for: like one of those nerdy kids that know everything about every subject but fail miserably to distill/emerge valuable knowledge from that sea of facts.
It’s not surprising really: many societies tend to equate knowing lots of things to being very smart, while most teachers (or individuals in general) know better.
The current state of affairs is that all efforts on the web of data are, at least at this stage, in the ‘notionistic’ phase. I’m the first to admit that and I’m the first to want to improve on it… unfortunately, emerging latent properties from datasets require a considerable volume of information and dense networks of relationships; these are surprisingly rare and hard to build.
The backlog gems continued:
[09:35:17] akorphan | I think the assertion that the degranularization from book to data leaves out the notion that *narrative* is a necessity for all kinds of reading, research, etc.
This is very smart criticism and I wish I had Q&A time to discuss this live: even if web of data turns out to be all its proponents want it to be, narrative won’t still be part of it, but it will be something to put on top.
There is no question that transforming research, random thoughts and ideas, structured queries and infoviz charts into a coherent and understandable narrative is absolutely necessary for all this web of data infrastructure to be worth anything.
But we really don’t know what this ‘narrative’ turn out to be or if this is substantially different than today’s. For example, we don’t know if the availability of interactivity will change the way people write papers or books or decide to visualize and present their findings or their arguments.
There were also nice comments during the demos (despite the network slowness)
[09:35:52] anarchivist | freebasing++
I also gave the audience very hard questions about factual information and asked them where they would look for them first at which
[09:36:09] mib_nch3tkl0 | omg…I thought of freebase when he asked!
would certainly make some people at Metaweb really happy but also
[09:37:49] dchud | for every one of these questions, i know multiple librarians who would know hte answers off the top of their heads
[09:38:25] jbrinley | dchud: can I have copies of those librarians?
This next one got me laughing and scared me at the same time
[09:38:32] mbklein | Parallax is awesome, but all I really use it for is scaring the technophobic librarians.
we clearly have a lot of work to do to make all this useful. Then a bunch of funny comments about the network being slow
[09:40:37] epoz | interesting. digital heckling burns bandwidth causing presenter demo grief
[09:40:58] akorphan | Quick, everyone start playing youtube clips!
[09:41:11] bess | should i not be bittorrenting right now?
[09:42:03] anarchivist | *crickets*
[09:42:15] MrDys | looks like my plan to do a live demo was not a great one…
[09:42:16] BillDueber | Note to self: be prepared to talk over slow network.
[09:42:31] BigD | freebase ate the tubes
[09:43:02] harmless | anyone have a fast data plan on their cell?
[09:43:19] paulalbert | I wonder if it’s so slow because I’m downloading the entire second season of Hannah Montana in HD.
and the last drop
[09:43:30] rsinger | you know, a librarian wouldn’t have these problems
which really hurt.
But there are also useful things hidden in there:
[09:45:36] mikeybe | can you make private freebase data sets?
(no, you can’t… not at the moment at least) or
[09:43:46] edsu | MikeTaylor: how do you build a distributed database on the web?
[09:44:47] MikeTaylor | edsu: something fuzzier. Any solution that begins with “Hey, let’s just get everyone to input everything with rigour!” is not a solution.
and a very interesting conversation (which is indeed the kind of thing that I wanted to spark):
[09:45:41] mib_nch3tkl0 | also, librariny q…since freebase queries things like wikipedia, how do we verify info?
[09:46:11] akorphan | What if you get two sources that provide different dates for King Lear?
[09:46:17] akorphan | how do you discriminate?
[09:48:14] jbrinley | akorphan: you look at the sources used for those sources, same as you would with books/encyclopedias
[09:48:23] jbrinley | akorphan: ultimately, it comes down to trust
[09:49:10] akorphan | Sure, but the predominant convention for trust on the web at this time is amount of linking, which is a bit suspect.
[09:50:29] jbrinley | akorphan: you’re welcome to use your own methods to determine who you trust on the web
[09:50:56] MikeTaylor | Any solution that begins “you are welcome to use your own methods to …” is not a solution.
[09:51:21] harmless | books have that problem too. a lot of questionable stuff gets printed. a lot of journalists write articles from too few research papers with too small samples or too preliminary, etc.
[09:51:24] harmless | do we have any standard metric for information quality?
[09:51:26] jbrinley | MikeTaylor: but it’s the same solution that we already have for print
[09:51:57] akorphan | jbrinley: But the notion here is that you’re trying to make some kind of automated knowledge aggregator
[09:52:05] MikeTaylor | jbrinley, I thought we were trying to IMPROVE on what we already have.
[09:53:28] jbrinley | MikeTaylor: yes, improve it. But don’t say that the Internet is worse than print just because you don’t have better solutions for certain problems
Then I showed the audience what we’re doing to improve on the rate of contribution to Freebase, things like Typewriter, Genderizer and Geographer. These are ‘games with a purpose‘ build and powered by Acre (Freebase’s application platform) that want to make it easy for people to contribute data to Freebase. Here another set of dissonance tones were struck
[09:53:24] dchud | i’m not convinced that crowdsourcing is necessarily different from Gale or whoever paying staff to edit reference sources
[09:54:17] BillDueber | Great. Votes by people who don’t know what the hell they’re talking about. I can get that now on IRC.
[09:55:14] paulalbert | wisdom of crowds only works when there’s no echo chamber effect
[09:57:42] skoczko | no one is gonnna spend time writing down relations and dependencies that seem obvious to him, unfortunately those relations are not obvious for the machine
[09:58:18] Baroquem | Not boredom. It’s fun to bring order to chaos.
[09:58:35] * JodiS has a great desire to bring info together
and last but not least
[09:58:19] akorphan | I just can’t get 100% behind the “factual by majority assertion” model of authority.
[09:58:55] JodiS | akorphan: yup, that’s a big deal. “The majority is always wrong” (or whatever Ibsen said)
which deserves a separate blog post.
So, partially to set the record straight and partially to apologize, I made available my slides, all three parts, including the last one that I couldn’t deliver.
Enjoy.