Home » Blog » Google and the Fractal Nature of the Web

Google and the Fractal Nature of the Web

May 13th, 2003

During my usual Citeseer wanderings I found an interesting paper on the self-similar nature of the web at different scales. At page 10 it says:

Theorem 1 suggests that the addition of a mere few thousand arcs scattered uniformly throughout the billions nodes will result in a very strong connectivity properties of the web graph!

VoilĂ  the math on how weblogs managed to exploit Google: given the general sparsity of the hypertextual connections in regular web and the much higher one in blogs, blogs screw their pagerank systems by adding more hypertextual-noise which degrades the system. The care on which bloggers link text is generally much lower than in regular hypertext. This is reflected in less semantic information added to the google database by those links, which is perceived by humans as a degradation of the quality of the pagerank system.

In a web where if it’s not on Google it doesn’t exist, my paranoid world-scale political side is happy to know that their system can be exploited with mathematical precision would the need emerge. This will keep them honest when they go public. Or, at least, I hope so. I like Google, a lot, I’m just scared by such a huge concentration of power.

The Roman Catholic Church had ad something called index which listed all books that were not allowed to be read. Censorship. Even today, in italian, “essere messo all’indice” (literally: to be put in the index) means to be censored, ostracized. Those who complained where put into the index. Negative feedback keeps the system stable.

Now, Google is a private entity, but is also a US corporation, subjected to US laws and US national security terms and now they are the most powerful news aggregator in the world. How many world journalists today get their information thru google? how many in the future? What happens the Google owners run for US presidency? would yous still trust their pagerank system? how would you like Mr. Bush to shape google pagerank under post-9/11 national security terms? See the danger?

Yeah, I’m mulderishly paranoid. But how come nobody hears those screams coming from Orwell’s tomb?

But if you can still read this years from now, it means that either Google remained honest or that we managed to find alternatives way to avoid their negative feedback filters. So, if you managed to get to read this, either we, the people of the web, won or there was nothing to worry about. In any case, be happy.