Peer Review vs. Citation Network Topology
September 21st, 2004
Academic journals are extremely expensive and virtually all respectful academic institutions have to be subscribed to a number of journals, because that’s both where the action is and where academia measure relevance of work.
Until the advent of the web, publishing to a worldwide audience was a very expensive process, the need for journals that would accept your papers and publish them was an obvious one.
Then the web came, actually exactly to try to resolve the problem of “slow latency” of traditional centralized information publishing models, and interestingly enough, academia is still massively subscribed to a bunch of journals.
The MIT Libraries have an entire basement full of paper edition of those journals, but now almost all those publishers make available their last 20 years or so of papers online in digital form (normally a PDF, either scanned from the original paper version or the digital document they were printed from).
Now, the first thing that a scholar (like myself) finds extremely frustrating is that there is no search engine that runs across all of those article databases. MIT has a thing called Vera (Virtual Electronic Resource Access) which is a search engine for the e-journals themselves. No, not for the content, for just finding the journal web site based on the title of the journal you want to look for. Yeah, I had the same face when they told me.
So, where do you search for stuff? Pre-prints: pre-prints are the papers that the authors submit to the journals for review. They are still owned by the authors, even if they get accepted for printing. Once the pre-print enters the journal, the copyright gets transferred and that’s why they can decide to make you pay 20$ for a single printed copy of the article (that’s how much the ACM or the IEEE can make you pay an article that can be found in Citeseer or arXiv and has just a much crappier stylesheet applied to it).
But what are you paying with those journals is not those fancier stylesheets and that proof-reading and bibliographic record normalization they make, it’s the peer review, it’s the filtering.
In the past, journals had limited publishing space and therefore needed a way to rank the received submissions in such a way that only the highest quality ones could use its limited space. Peer review was established as such process: a panel of recognized experts in the field is given a number of papers, they return the paper with a score and the journal editors decide what to accept based on those scores.
This is how academia grants credits to their research members: if you pass peer review, you earn a credit. Some journals are more reknowned than others, they yield more credits and more visibility. The more you can publish, the more your peers find your stuff interesting, the more likely the university will be recognized around, the more likely students will want to study there, the more money the university makes (or, alternatively, the more money the students are willing to pay, if the number of possible students remains the same).
This is the business model of printed-press-era academia.
Everybody knows this peer-review system is kinda flawed: imagine yourself a very recognized expert in a field, probably running out of money for your faculty and phd students, and you are asked to rate these 20 papers. Do you think our recognized expert will go thru each and everyone of them with scrupolous detail and redo the math and/or revisit if their claims are meaningful and all that? Unlikely. Much more likely that this person is just going to rank them in three categories “oh, that could be interesting” and “bah, don’t know” and “c’mon, not again!”.
If your paper happens to be in the “oh, that could be interesting” category of enough of those reviewers, you get accepted.
Now, the problem of this scheme is that peer review does not count how scientifically valuable that paper is, but how well the researchers were able to stimulate the curiosity of their reviewers and/or impress them with special effects. [here, almost everybody understood that splashy/flashy graphs and diagrams much increase the chance of your paper to be published]
Another interesting effect is “name recognition”. Reviewers tend to like names that they recognize because that makes it easier to review [insert picture of nobody gets fired for buying IBM here]. Result is a power-law: those who published a lot are more likely to be published again. Again, this has nothing to do with scientific value, but now people seek those power-law climbers and connect them as co-authors, so that reviewers are more likely to publish them.
Similar effect but slightly different is the “affiliation recognition”. If you are from MIT you are way more likely to be published than if you submit, say, from some university in North Dakota. Why so? because it’s far more likely than the filtering process you had to go thru to be hired or be a student at MIT is a lot more strict than to enter some other virtually unknown university.
Again, no scientific value in this claim, if not on a generally statistical way… too bad that research breakthru don’t happen like that.
Anyway, the fascinating result of all this is that since peer-review is the base of academic recognition and such recognition drives the academic business model, the information is somewhat “locked down” in the journals digital vaults and not made accessible for web crawlers to make use of.
The people at Google know this and they are now running a prototype service on top of CrossRef that indexes 29 journals. Similar thing is IEEE’s own Xplore, or ACM’s digital library.
Another approach is ExLibris‘ cross-searching solution MetaLib that simply make the query against all the various database search interface (where normally they have to screen-scrape the results by hand… [insert here a picture a ton of perl-scripters locked in a basement in India] and [insert here the picture of the pile of money this system is going to cost you])
All this fighting, copyright smackdown and publishing balkanization: all because of peer review.
But what happens if a system is found that ranks scientific recognition in a better way than peer review and this system happens to be virtually free for everybody (article consumers and producers)?
What academia used to use of journals was the ability to distribute worldwide. The web is now a far better way to achieve this and academia is forcing the journals into the digital domain because hard disks are way smaller and cheaper than basements full of dead trees.
But academia finds itself locked into the symbiosis with the journals because the democracy of the web does not yield a filtering mechanism that generates that scarcity on top of which you can create a business model.
This need for scarcity is the basic for this high-order educational business to run and it’s also a way for governments, foundations and private institutions to understand who deserves money for further research, so it’s very unlikely to go away.
But it seems to me that academia is at a turning point: inertia is huge for such a worldwide ecosystem, but the web is radically changing how students and young scholars (like me) find and seek information, especially if they have the public web on one side and a pile of crappy custom search interfaces on top of very-expensive-for-no-reason content that your university pays for on the other.
The day that you find a more interesting paper in Citeseer than in any IEEE or ACM e-journal, it’s the day you don’t look back.
But what would happen next? What about the academic symbiosis with the peer review system?
Citeseer uses a Google page-rank-like algorithm for ranking: which is analyzing the properties of the citation network topology to understand which papers are more influential than others. Just like Google does with hyperlinks for web pages, Citeseer does it for bibliographic citations: the result is that peer review is not done by a panel of experts, but by every researcher in the field!!
It is very unlikely that a scientifically valuable paper will pass unnoticed in such an ecosystem. Here, affiliation and names don’t count: if you are looking for something, you want the value for you, you are not judging the potential interest that the field might find in it. The power-law distribution could be reduced. People would publish less, since their number would not count, the concept of publishing 15 times the same thing in 15 different sauces will go away, potentially unlocking the ability for scientists to go back doing what they are supposed to do: “thinking” instead of “writing”. And potentially removing those co-authorship parassites that appear in almost every paper they can have their nose in, but with virtually no scientific value.
Now, is this system free of flaws? unfortunately not. The first and most important is psychological: people have a natural tendency not to believe in network emerging effects. They don’t think, especially in academia, that a bunch of people can collectively know more than a single world-wide expert. But they use Google everyday with pleasure, which is, interestingly enough, based on open source software itself, collectively built.
The second is that open systems are much more vulnerable to abuse. Google doesn’t disclose this, but I expect them to spend a lot of energy and effort trying to find and remove pagerank abusers that, for example, post on wikis or blog comments or any other writeable web page a link to their site to increase their pagerank. Can we expect immoral scientists to introduce tons of crappy papers with fake names that reference their own to boost their network citation ranking?
The use of citation networks as a ranking methods is not free from problems, but it’s, IMO, a vastly superior approach to scientific value discovery and would yield a much better service to the scientific community in general than the current peer review system and given the fact that the value of content found on the web is, in some fields, more valuables than what you find in expensive journals, I think the fate of those journals (at least in some research areas) is doomed and it’s just a matter of time before people’s research starts to be judged with different and way more objectively meaningful and less personally influenceable criteria.