Home » White Papers and Presentations

White Papers and Presentations

2010 - International Conference on Knowledge Discovery and Data Mining - Washington, D.C, USA
Shailesh Kochhar, Stefano Mazzocchi and Praveen Paritosh
The anatomy of a large-scale human computation engine
In this paper we describe Rabj, an engine designed to simplify collecting human input. We have used Rabj to collect over 2.3 million human judgments to augment data mining, data entry, and curation tasks at Freebase over the course of a year. We illustrate several successful applications that have used Rabj to collect human judgment. We describe how the architecture and design decisions of Rabj are affected by the constraints of content agnosticity, data freshness, latency and visibility. We present work aimed at increasing the yield and reliability of human computation efforts. Finally, we discuss empirical observations and lessons learned in the course of a year of operating the service.
» [full citation]: Kochhar, S., Mazzocchi, S., and Paritosh, P. 2010. The anatomy of a large-scale human computation engine. In Proceedings of the ACM SIGKDD Workshop on Human Computation (Washington DC, July 25 - 25, 2010). HCOMP '10. ACM, New York, NY, 10-17. DOI=http://doi.acm.org/10.1145/1837885.1837890
2009 - code4lib 2009 - Providence, RI, USA
Stefano Mazzocchi
[keynote] A bookless future for the libraries?
By looking at the past we can gain insights about the future. Each irreversible transition in information technology over the last 10 thousand years has brought both innovation and disruption. We'll retrace the steps of mankind analyzing each transition for its social impact and its economical significance. Then, we'll try to forecast how technologies that were invented over the last century, what is being invented in the present might reshape the institution of the library. [This last part will feature live and interactive demos based on mine and others' research on digital libraries.] Finally, we'll show how librarian skills can be put to use even in a bookless future.
2007 - ApacheCON EU, 2007 - Amsterdam, The Netherlands
Stefano Mazzocchi
A no-nonsense introduction to "semantic web" technologies
The so-called "Semantic Web" is a vision for an evolution of the web where web sites expose data not only for direct human consumption (as it is mostly the case today) but also for specific software agents to consume, aggregate and enrich on behalf of humans. In this presentation, I will show an outline of this vision, together with a simple and concise description of each W3C recommendation (such as RDF, OWL and SPARQL) that are involved and how they are supposed to work together. I will also show the differences and similarities between this and other models for purer-data interchange on the web (such as "web 2.0" and "atom/rss") and will demo existing solution that are based on semantic web technologies.

Stefano Mazzocchi
All you wanted to know about Open Development community building but didn't know who to ask
In this presentation, I will show, explain and analyze several patterns in community building for open development projects that I've come across during my 9 years of involvement in the ASF. The presentation is aimed at those who want to understand how to seed and bootstrap communities as well as those wanting to analyze existing communities for improvement and/or signs of behavioral changes. I will also present some of the software tools over the years that I've written to help the job of community mentor and observer.
2006 - ApacheCON, 2006 - Austin, TX, USA
Stefano Mazzocchi
A no-nonsense introduction to "semantic web" technologies
The so-called "Semantic Web" is a vision for an evolution of the web where web sites expose data not only for direct human consumption (as it is mostly the case today) but also for specific software agents to consume, aggregate and enrich on behalf of humans. In this presentation, I will show an outline of this vision, together with a simple and concise description of each W3C recommendation (such as RDF, OWL and SPARQL) that are involved and how they are supposed to work together. I will also show the differences and similarities between this and other models for purer-data interchange on the web (such as "web 2.0" and "atom/rss") and will demo existing solution that are based on semantic web technologies.

Stefano Mazzocchi
All you wanted to know about Open Development community building but didn't know who to ask
In this presentation, I will show, explain and analyze several patterns in community building for open development projects that I've come across during my 9 years of involvement in the ASF. The presentation is aimed at those who want to understand how to seed and bootstrap communities as well as those wanting to analyze existing communities for improvement and/or signs of behavioral changes. I will also present some of the software tools over the years that I've written to help the job of community mentor and observer.
2006 - Keane Excellence and Architecture Program, 2006 - Boston, MA, USA
Stefano Mazzocchi
Toward a web of data
This presentation is about a vision toward a "web of data", an evolution of the current web where machine-processable data can be more easily exchanged, mixed and used. I will explain the design requirements for such a system to come into existence and where already deployed technologies fail to meet them. Then I will introduce a new paradigm based around the 'semantic web' architecture and will show an abundant list of prototypes that myself and my group at MIT have built to demonstrate the value in such architecture.
2006 - ExpertZone Development Summit 2006 - Stockholm, Sweden
Stefano Mazzocchi
Toward a web of data
This presentation shows why the natural evolution of the web is toward a web of data rather than a web of pages. Shows how technologies like RSS/Atom have helped getting closer to that goal, how Google is really a huge mashup and how RDF and SPARQL can help reaching that end.
2005 - International Semantic Web Conference - Galway, Irland
David Huynh, Stefano Mazzocchi, David Karger
Piggy Bank: Experience the Semantic Web Inside Your Web Browser
The Semantic Web Initiative envisions a Web wherein information is offered free of presentation, allowing more effective exchange and mixing across web sites and across web pages. But without substantial Semantic Web content, few tools will be written to consume it; without many such tools, there is little appeal to publish Semantic Web content. To break this chicken-and-egg problem, thus enabling more flexible information access, we have created a web browser extension called Piggy Bank that lets users make use of Semantic Web content within Web content as users browse the Web. Wherever Semantic Web content is not available, Piggy Bank can invoke screenscrapers to restructure information within web pages into Semantic Web format. Through the use of Semantic Web technologies, Piggy Bank provides direct, immediate benefits to users in their use of the existing Web. Thus, the existence of even just a few Semantic Web-enabled sites or a few scrapers already benefits users. Piggy Bank thereby offers an easy, incremental upgrade path to users without requiring a wholesale adoption of the Semantic Web's vision. To further improve this Semantic Web experience, we have created Semantic Bank, a web server application that lets Piggy Bank users share the Semantic Web information they have collected, enabling collaborative efforts to build sophisticated Semantic Web information repositories through simple, everyday's use of Piggy Bank.
» [full citation]: David Huynh, Stefano Mazzocchi, David Karger, Piggy Bank: Experience the Semantic Web Inside Your Web Browser, Lecture Notes in Computer Science, Volume 3729, Oct 2005, Pages 413 - 430
2005 - LOTS - Bern, Switzerland
Stefano Mazzocchi
[keynote] Please stand by while we reboot the web...
An historical-fiction novel: this is the story of how your day could be today if software had always been treated like hardware.
2005 - MIT Comparative Media Studies Symposiums - Cambridge, MA, USA
Stefano Mazzocchi
The Free and Open Source Software (FOSS) Movement
Over the last three decades the FOSS movement has taken the software industry by storm, radically changing not only the way people produce and consume software, and the economy around it, but also the perception of quality associated with distributed and volunteer authoring. Started in software, these collaborative development and their innovative licensing models are now spreading to many other domains. In this talk, we'll introduce the philosophical principles that drive the FOSS movement, we'll outline the differences between the various "currents" and shed some light on the socio-economical dynamics they have generated.
2004 - Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP) - Stanford, CA, USA
Paolo Ciccarese, Stefano Mazzocchi, Fulvia Ferrazzi, Lucia Sacchia
GENIUS: a new tool for gene networks visualization
GENIUS is a graphical software tool for visualizing genetic networks composed by a high number of genes. It accepts as input a matrix summarizing gene relationships inferred by means of reverse engineering methods. GENIUS offers two visualization modalities (the Agora style and the TouchGraph style), that can be exploited in a complementary way. It is thus possible to obtain a clear and easily customizable view of one or more networks of interest which allows to simplify the exploration of the inferred relationships.
2004 - O'Reilly Open Source Conference - Portland, OR, USA
Stefano Mazzocchi
Darwinian Software Development
This presentation introduces a model for software programming that is mediated from the theory of evolution proposed by Charles Darwin for biological systems that bridges the realms of biology and dynamics of software programming to understand its dyanmics and the differences between the closed and open development models.
2004 - 13th International World Wide Web Conference - New York, USA
Ryan Lee & Stefano Mazzocchi
SIMILE: objectives, status and demo
Description of the objectives and status of the SIMILE Project, along with the demo of the current software.
2004 - CIDOC CRM Workshop on "Practices of Knowledge Sharing" - Heraklion, Greece
Stefano Mazzocchi
SIMILE: objectives, status and demo
Description of the objectives and status of the SIMILE Project, along with the demo of the current software.
2004 - DSpace User Group Meeting - Cambridge, MA, USA
Ben Hyde & Stefano Mazzocchi
The why and the how of Open Source Software: perspectives from the Apache Software Foundation
Description of what is open souce software programming and why it works based on our experiences in the Apache Software Foundation.
2003 - ApacheCON 2003 - Las Vegas, USA
Stefano Mazzocchi
How the ASF works
This session will give you everything you always wanted to know about the foundation but were afraid to ask. The difference between membership and committership, who decides what, how elections take place, how is our infrastructure setup, what is the board, what is a PMC, what's the philosophy behind the incubator, why is the foundation moving away from project containment. Come and see behind the scenes of the ASF.

Stefano Mazzocchi
Past, Present and Future of the Apache Cocoon Project
The Apache Cocoon project is often presented as a big, complex and hyperfunctional piece of software. In this session, we'll follow a totally different view and present Cocoon from an historical perspective, from its beginning to present day, and outline the possible evolutionary future. Even if Cocoon is heavily based on several XML technologies, the presentation will keep a high level overview where no XML knowledge will be required.

Stefano Mazzocchi
Virtual Community Dynamics
In this session it will be presented the experience acquired in creating a software tool name Agora for automatic discovery of social patterns in virtual communities thru the harvesting and data emergence of the email archives of the foundation. It will be shown the principles, the software architecture and will be shown how to apply the same concepts in other domains and in order environments, such as an internal corporate environment or accademic institution to discover social trends without disturbing privacy issues (since Agora doesn't work with email content but only with email headers and metadata).
2003 - II Cocoon GetTogether - Ghent, Belgium
Stefano Mazzocchi
What Cocoon is: a Visual Journey
This presentation gives a highly visual outline of how I see Apache Cocoon in my mind.
2002 - Expert-Workshop: "Online Archives - Perspectives on Networked Knowledge Spaces" - Bonn, Germany
Stefano Mazzocchi
The Economy of Distributed Metadata Authoring
This presentation will sketch the differences between data and matadata creation, outlining the impact of these differences on the economy of distributed content creation and consumption. It shows how these economical effects might impact both semantically-enhanced distributed technologies and the communities of the users of these technologies. As a result, it is suggested that economical and social projections can be used as a metric for the feasibility of a proposed technology that involves highly distributed environments.
2002 - Open Source Content Management Systems Conference - Zurich, Switzerland
Stefano Mazzocchi
Enabling Semantic Searching
This presentation outlines the problems of enabling semantic searching in a content-based environment and provides an incremental roadmap that is economically-feasible to make it possible.
2001 - Thesis for the degree of "Doctor of Electronic Engineering" - Pavia, Italy
Stefano Mazzocchi
Reducing The Effects Of Growth Saturation With The Adoption Of A Publishing Framework Based On XML Technologies
This thesis analyzes the effects of saturation trends that appear during production of web sites as their size grows and the number of people involved increases. A simple mathematical model that much such environment is proposed and used to identify the issues.
Next, the use of software design methodologies applied to work organization backed up by a web publishing framework based on XML technologies is presented as a solution.
Finally, we describe the implementation of such framework and identify a possible organization layout, indicating what technologies can be used to enforce it in order to reduce the impact of growth saturation and increase productivity and work quality in web site production.
2000 - ApacheCON 2000 Europe - London, UK
Stefano Mazzocchi
Toward The Semantic Web: A View Of XML From Outer Space
This paper presents a view of XML from great distance and outlines some of the problems that will have to be faced approaching the so-called "semantic web".
2000 - ApacheCON 2000 - Orlando
Stefano Mazzocchi
Adding XML Capabilities With Cocoon
This paper introduces the notion of XML publishing to the reader and shows how these new paradigms can be applied to existing web solutions by using the Cocoon publishing framework. Basic knowledge on XML technologies is assumed but nor necessarily required, since the aim is to show the power of XML publishing to those that are used to HTML-driven models.
1998 - ApacheCON 98 - San Francisco, USA
Stefano Mazzocchi & Pierpaolo Fumagalli
Advanced Apache JServ Techniques
This paper describes advanced uses of the Apache JServ servlet engine. Even if it describes an older version of JServ, the general principles are still of interest.

Stefano Mazzocchi & Pierpaolo Fumagalli
Servlet performance and Apache JServ
This paper presents an architectural-level performance comparison between Java servlets and other widely used web technologies.
1995 - Personal Research
Stefano Mazzocchi
Information Theory of Monochromatic Sound Fields
This article is intended to give a theoretical overview over the informations carried by a general monochromatic sound field, and how those informations can be retrieved from the field itself. This subject will be useful to a reader also interested in problems related to human hearing: it will be explained why most of today's psycho-audiology theories fail to completely describe the subject and it will give a solid mathematical base over the problem of understanding the process of human hearing especially regarding localization.

Stefano Mazzocchi
Multiple Buffering Techniques
This article is intended to show how the use of a multiple buffers architecture (MBA) in real-time systems (mostly video games and demos) can optimize wasted CPU cycles and cut down the overhead due to managing inconstant frame complexity.

Stefano Mazzocchi
Theory of Non-Lossy Data Compression
This article wants to provide a theoretical overview of the problems related to non-lossy data compression. It will be analyzed in a formal way to introduce general concepts like string entropy. New approaches at Lempel-Ziv algorithm will lead to a new and better RBA (Redundancy Based Algorithm). Theoretical ways of compressing maximum entropy strings will be discussed.

Stefano Mazzocchi
Wave Effects on Flat Textures
This article provides a short and introducing description on wave effects that can be done on flat textures, such as still images on screen. Due to the complexities of the problem, it won't be possible to apply these considerations to non-flat textures to get bump 3-D effects. I will introduce an easy approach that can reduce the real time calculations overhead to few multiplications per pixel with circular wavefronts, or even one multiplication per pixel with linear wave fronts.