Home » Blog » First Impressions on Sig.ma

First Impressions on Sig.ma

July 22nd, 2009

Last week I went out for lunch with fellow italians Giovanni Tummarello and Paolo Bouquet that happened to be in LA for a conference and they mentioned their respective projects Sig.ma and Okkam.

Some of the things we talked about inspired me to write the last post about reconciliation, but I couldn’t yet  link to Sig.ma because it wasn’t released. But Giovanni posted about it today and officially released it to the public so I can now talk about it.

Sig.ma is a sort of metacrawler with an a-posteriori reconciler: when you search for something,  say “Barack Obama“, it queries a series of underlying search engines for all data fragments found on the web about that query (currently it uses mostly Yahoo! BOSS and Sindice). Then it tries hard to merge them all into a single topic. Because this operation can result in unwanted merges, it gives you options to remove certain sources of data (those who you don’t agree with or that are not exactly relevant) and solidify that collection of data fragments (which they call a ‘sigma’) into a permanent URL which you can then send around or embed as an iframe in another web page.

First let me say that compared to most semweb-related academic research projects, this stands out to be one of those rare cases where the scientists care about providing useful services and not just publishing papers about potential Utopian scenarios. For that alone, Sig.ma needs to be mentioned and Giovanni and his team praised.

Just like Google Squared, Sig.ma follows on a model where the user will search for something and gather a vast noisy collection of more-or-less related resources, then spend a considerable amount of time and effort cleaning up, evaluating the results and pruning dead branches. Then condense the reconciliation efforts in a particular URL or set of rules.

Unfortunately, the reconciliation energy spent by each individual on the data periphery (at least for now) can’t be easily used to simplify the job of the next person looking to cleanup this data (unlike, for example, when you edit a wiki page or commit a patch into an open source project)

While it’s not hard to imagine ways to emerge such information from usage patterns or further harvest them, my principal worry is that a-posteriori reconciliation efforts clash pretty badly with the cognitive efforts that one person exhibits when looking for something.

When you look for something, you don’t have time nor the will to do ‘editing’ job and cleanup somebody else’s mess. You might be willing to do those things, but at a separate time, not when you finding useful information is your immediate goal (which, if you think about it, is why Google managed to wipe out the entire set of search engines that existed when it surfaced: PageRank was more effective and the cognitive effort perceived when using Google to sieve thru crap was much less than when using other search engines like Altavista or Lycos)

This is also the reason why the vast majority of people that land on a wikipedia page from a search engine don’t stop and edit it, or they don’t stop and change around the rank of Google search results even if they can: those activities would be in the way of what the user is currently doing.

This is not really criticism for Sig.ma or Google Squared, which are both fine examples of much needed and fresh innovation in the field of web search, but a criticism for the general approach of solutions that force users down paths that don’t match their state of mind and that have a hard time collecting human activity simply because of this. Understanding user intent and creating an interaction design that flows harmoniously with it cannot be an afterthought but it needs to be a firm and a-priori driver for the design of the service.

This said, I’m happy to see Sig.ma surface if only because its non-purist approach comforts me and it’s refreshing in a world of semantic web research that is often so purist to become effectively blind.