Home » White Papers and Presentations
White Papers and Presentations
2010 - International Conference on Knowledge Discovery and Data Mining - Washington, D.C, USA
Shailesh Kochhar, Stefano Mazzocchi and Praveen Paritosh
The anatomy of a large-scale human computation engine
In this paper we describe Rabj, an engine designed to simplify collecting human input.
We have used Rabj to collect over 2.3 million human judgments to augment data mining,
data entry, and curation tasks at Freebase over the course of a year. We illustrate
several successful applications that have used Rabj to collect human judgment. We
describe how the architecture and design decisions of Rabj are affected by the
constraints of content agnosticity, data freshness, latency and visibility.
We present work aimed at increasing the yield and reliability of human computation
efforts. Finally, we discuss empirical observations and lessons learned in the course
of a year of operating the service.
» [full citation]: Kochhar, S., Mazzocchi, S., and Paritosh, P. 2010. The anatomy of a large-scale human computation engine. In Proceedings of the ACM SIGKDD Workshop on Human Computation (Washington DC, July 25 - 25, 2010). HCOMP '10. ACM, New York, NY, 10-17. DOI=http://doi.acm.org/10.1145/1837885.1837890
2009 - code4lib 2009 - Providence, RI, USA
Stefano Mazzocchi
[keynote] A bookless future for the libraries?
By looking at the past we can gain insights about the future.
Each irreversible transition in information technology over the last 10
thousand years has brought both innovation and disruption. We'll retrace
the steps of mankind analyzing each transition for its social impact and its economical significance.
Then, we'll try to forecast how technologies that were invented over the last century, what is being
invented in the present might reshape the institution of the library.
[This last part will feature live and interactive demos based on mine and others' research on digital libraries.]
Finally, we'll show how librarian skills can be put to use even in a bookless future.
2007 - ApacheCON EU, 2007 - Amsterdam, The Netherlands
Stefano Mazzocchi
A no-nonsense introduction to "semantic web" technologies
The so-called "Semantic Web" is a vision for an evolution of the web where
web sites expose data not only for direct human consumption (as it is mostly
the case today) but also for specific software agents to consume, aggregate
and enrich on behalf of humans. In this presentation, I will show an outline
of this vision, together with a simple and concise description of each W3C
recommendation (such as RDF, OWL and SPARQL) that are involved and how they
are supposed to work together. I will also show the differences and similarities
between this and other models for purer-data interchange on the web
(such as "web 2.0" and "atom/rss") and will demo existing solution that are
based on semantic web technologies.
Stefano Mazzocchi
All you wanted to know about Open Development community building but didn't know who to ask
In this presentation, I will show, explain and analyze several patterns in community
building for open development projects that I've come across during my 9 years of involvement
in the ASF. The presentation is aimed at those who want to understand how to seed and bootstrap
communities as well as those wanting to analyze existing communities for improvement and/or
signs of behavioral changes. I will also present some of the software tools over the years
that I've written to help the job of community mentor and observer.
2006 - ApacheCON, 2006 - Austin, TX, USA
Stefano Mazzocchi
A no-nonsense introduction to "semantic web" technologies
The so-called "Semantic Web" is a vision for an evolution of the web where
web sites expose data not only for direct human consumption (as it is mostly
the case today) but also for specific software agents to consume, aggregate
and enrich on behalf of humans. In this presentation, I will show an outline
of this vision, together with a simple and concise description of each W3C
recommendation (such as RDF, OWL and SPARQL) that are involved and how they
are supposed to work together. I will also show the differences and similarities
between this and other models for purer-data interchange on the web
(such as "web 2.0" and "atom/rss") and will demo existing solution that are
based on semantic web technologies.
Stefano Mazzocchi
All you wanted to know about Open Development community building but didn't know who to ask
In this presentation, I will show, explain and analyze several patterns in community
building for open development projects that I've come across during my 9 years of involvement
in the ASF. The presentation is aimed at those who want to understand how to seed and bootstrap
communities as well as those wanting to analyze existing communities for improvement and/or
signs of behavioral changes. I will also present some of the software tools over the years
that I've written to help the job of community mentor and observer.
2006 - Keane Excellence and Architecture Program, 2006 - Boston, MA, USA
Stefano Mazzocchi
Toward a web of data
This presentation is about a vision toward a "web of data", an evolution
of the current web where machine-processable data can be more easily
exchanged, mixed and used. I will explain the design requirements for
such a system to come into existence and where already deployed
technologies fail to meet them. Then I will introduce a new paradigm
based around the 'semantic web' architecture and will show an abundant
list of prototypes that myself and my group at MIT have built to
demonstrate the value in such architecture.
2006 - ExpertZone Development Summit 2006 - Stockholm, Sweden
Stefano Mazzocchi
Toward a web of data
This presentation shows why the natural evolution of the web is toward a web of data rather than a web of
pages. Shows how technologies like RSS/Atom have helped getting closer to that goal, how Google is really a huge
mashup and how RDF and SPARQL can help reaching that end.
2005 - International Semantic Web Conference - Galway, Irland
David Huynh, Stefano Mazzocchi, David Karger
Piggy Bank: Experience the Semantic Web Inside Your Web Browser
The Semantic Web Initiative envisions a Web wherein information is offered free of presentation,
allowing more effective exchange and mixing across web sites and across web pages. But without
substantial Semantic Web content, few tools will be written to consume it; without many such tools,
there is little appeal to publish Semantic Web content. To break this chicken-and-egg problem, thus
enabling more flexible information access, we have created a web browser extension called Piggy Bank
that lets users make use of Semantic Web content within Web content as users browse the Web. Wherever
Semantic Web content is not available, Piggy Bank can invoke screenscrapers to restructure information
within web pages into Semantic Web format. Through the use of Semantic Web technologies, Piggy Bank
provides direct, immediate benefits to users in their use of the existing Web. Thus, the existence of
even just a few Semantic Web-enabled sites or a few scrapers already benefits users. Piggy Bank thereby
offers an easy, incremental upgrade path to users without requiring a wholesale adoption of the Semantic
Web's vision. To further improve this Semantic Web experience, we have created Semantic Bank, a web
server application that lets Piggy Bank users share the Semantic Web information they have collected,
enabling collaborative efforts to build sophisticated Semantic Web information repositories through
simple, everyday's use of Piggy Bank.
» [full citation]: David Huynh, Stefano Mazzocchi, David Karger, Piggy Bank: Experience the Semantic Web Inside Your Web Browser, Lecture Notes in Computer Science, Volume 3729, Oct 2005, Pages 413 - 430
2005 - LOTS - Bern, Switzerland
Stefano Mazzocchi
[keynote] Please stand by while we reboot the web...
An historical-fiction novel: this is the story of how your day could be today if software had always been treated like hardware.
2005 - MIT Comparative Media Studies Symposiums - Cambridge, MA, USA
Stefano Mazzocchi
The Free and Open Source Software (FOSS) Movement
Over the last three decades the FOSS movement has taken the software industry by storm, radically changing not only the way people
produce and consume software, and the economy around it, but also the perception of quality associated with distributed and volunteer
authoring. Started in software, these collaborative development and their innovative licensing models are now spreading to many other
domains. In this talk, we'll introduce the philosophical principles that drive the FOSS movement, we'll outline the differences
between the various "currents" and shed some light on the socio-economical dynamics they have generated.
2004 - Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP) - Stanford, CA, USA
Paolo Ciccarese, Stefano Mazzocchi, Fulvia Ferrazzi, Lucia Sacchia
GENIUS: a new tool for gene networks visualization
GENIUS is a graphical software tool for visualizing genetic networks composed
by a high number of genes. It accepts as input a matrix summarizing
gene relationships inferred by means of reverse engineering methods.
GENIUS offers two visualization modalities (the Agora style and the
TouchGraph style), that can be exploited in a complementary way.
It is thus possible to obtain a clear and easily customizable view
of one or more networks of interest which allows to simplify the
exploration of the inferred relationships.
2004 - O'Reilly Open Source Conference - Portland, OR, USA
Stefano Mazzocchi
Darwinian Software Development
This presentation introduces a model for software programming that is mediated from the theory of evolution proposed by Charles Darwin for biological systems that bridges the realms of biology and dynamics of software programming to understand its dyanmics and the differences between the closed and open development models.
2004 - 13th International World Wide Web Conference - New York, USA
Ryan Lee & Stefano Mazzocchi
SIMILE: objectives, status and demo
Description of the objectives and status of the SIMILE Project, along with the demo of the current software.
2004 - CIDOC CRM Workshop on "Practices of Knowledge Sharing" - Heraklion, Greece
Stefano Mazzocchi
SIMILE: objectives, status and demo
Description of the objectives and status of the SIMILE Project, along with the demo of the current software.
2004 - DSpace User Group Meeting - Cambridge, MA, USA
Ben Hyde & Stefano Mazzocchi
The why and the how of Open Source Software: perspectives from the Apache Software Foundation
Description of what is open souce software programming and why it works based on our experiences in the Apache Software Foundation.
»
slides [Microsoft PowerPoint - 220Kb]
2003 - ApacheCON 2003 - Las Vegas, USA
Stefano Mazzocchi
How the ASF works
This session will give you everything you always wanted to know about the foundation but were afraid to ask. The difference between membership and committership, who decides what, how elections take place, how is our infrastructure setup, what is the board, what is a PMC, what's the philosophy behind the incubator, why is the foundation moving away from project containment. Come and see behind the scenes of the ASF.
Stefano Mazzocchi
Past, Present and Future of the Apache Cocoon Project
The Apache Cocoon project is often presented as a big, complex and hyperfunctional piece of software. In this session, we'll follow a totally different view and present Cocoon from an historical perspective, from its beginning to present day, and outline the possible evolutionary future. Even if Cocoon is heavily based on several XML technologies, the presentation will keep a high level overview where no XML knowledge will be required.
Stefano Mazzocchi
Virtual Community Dynamics
In this session it will be presented the experience acquired in creating a software tool name Agora for automatic discovery of social patterns in virtual communities thru the harvesting and data emergence of the email archives of the foundation. It will be shown the principles, the software architecture and will be shown how to apply the same concepts in other domains and in order environments, such as an internal corporate environment or accademic institution to discover social trends without disturbing privacy issues (since Agora doesn't work with email content but only with email headers and metadata).
2003 - II Cocoon GetTogether - Ghent, Belgium
Stefano Mazzocchi
What Cocoon is: a Visual Journey
This presentation gives a highly visual outline of how I see Apache Cocoon in my mind.
2002 - Expert-Workshop: "Online Archives -
Perspectives on Networked Knowledge Spaces" - Bonn, Germany
Stefano Mazzocchi
The Economy of Distributed Metadata Authoring
This presentation will sketch the differences
between data and matadata creation, outlining
the impact of these differences on the economy
of distributed content creation and consumption.
It shows how these economical effects
might impact both semantically-enhanced distributed
technologies and the communities of the users
of these technologies. As a result, it is suggested
that economical and social projections can be
used as a metric for the feasibility of a
proposed technology that involves highly
distributed environments.
»
slides [Microsoft Powerpoint - 108Kb]
»
video [Real Video - stream]
2002 - Open Source Content Management Systems Conference - Zurich, Switzerland
Stefano Mazzocchi
Enabling Semantic Searching
This presentation outlines the problems of enabling semantic searching in
a content-based environment and provides an incremental roadmap that is
economically-feasible to make it possible.
»
slides [Microsoft Powerpoint - 182Kb]
2001 - Thesis for the degree of "Doctor of Electronic Engineering" - Pavia, Italy
Stefano Mazzocchi
Reducing The Effects Of Growth Saturation With The Adoption Of A Publishing Framework Based On XML Technologies
This thesis analyzes the effects of saturation trends that appear during
production of web sites as their size grows and the number of people
involved increases. A simple mathematical model that much such environment
is proposed and used to identify the issues.
Next, the use of software design methodologies
applied to work organization backed up by a web publishing
framework based on XML technologies is presented as a solution.
Finally, we describe the implementation of such
framework and identify a possible organization layout, indicating
what technologies can be used to enforce it in order to reduce the
impact of growth saturation and increase productivity and work
quality in web site production.
»
slides [StarOffice Presenter - 258Kb]
2000 - ApacheCON 2000 Europe - London, UK
Stefano Mazzocchi
Toward The Semantic Web: A View Of XML From Outer Space
This paper presents a view of XML from great distance and outlines some of
the problems that will have to be faced approaching the so-called "semantic
web".
»
slides [xml/xslt - 28Kb (zipped)]
2000 - ApacheCON 2000 - Orlando
Stefano Mazzocchi
Adding XML Capabilities With Cocoon
This paper introduces the notion of XML publishing to the reader and shows how these
new paradigms can be applied to existing web solutions by using the Cocoon publishing
framework. Basic knowledge on XML technologies is assumed but nor necessarily
required, since the aim is to show the power of XML publishing to those that are used to
HTML-driven models.
»
slides [xml/xslt - 66Kb (zipped)]
1998 - ApacheCON 98 - San Francisco, USA
Stefano Mazzocchi & Pierpaolo Fumagalli
Advanced Apache JServ Techniques
This paper describes advanced uses of the Apache JServ servlet engine. Even
if it describes an older version of JServ, the general principles are still
of interest.
Stefano Mazzocchi & Pierpaolo Fumagalli
Servlet performance and Apache JServ
This paper presents an architectural-level performance comparison
between Java servlets and other widely used web technologies.
1995 - Personal Research
Stefano Mazzocchi
Information Theory of Monochromatic Sound Fields
This article is intended to give a theoretical overview over the informations
carried by a general monochromatic sound field, and how those informations can
be retrieved from the field itself. This subject will be useful to a reader also
interested in problems related to human hearing: it will be explained why most of
today's psycho-audiology theories fail to completely describe the subject and it
will give a solid mathematical base over the problem of understanding the process
of human hearing especially regarding localization.
Stefano Mazzocchi
Multiple Buffering Techniques
This article is intended to show how the use of a multiple buffers architecture
(MBA) in real-time systems (mostly video games and demos) can optimize wasted CPU cycles
and cut down the overhead due to managing inconstant frame complexity.
»
paper [Microsoft Word - 44Kb]
Stefano Mazzocchi
Theory of Non-Lossy Data Compression
This article wants to provide a theoretical overview of the problems related to
non-lossy data compression. It will be analyzed in a formal way to introduce general
concepts like string entropy. New approaches at Lempel-Ziv algorithm will lead to a
new and better RBA (Redundancy Based Algorithm). Theoretical ways of compressing maximum
entropy strings will be discussed.
»
paper [Microsoft Word - 43Kb]
Stefano Mazzocchi
Wave Effects on Flat Textures
This article provides a short and introducing description on wave effects that can be done
on flat textures, such as still images on screen. Due to the
complexities of the problem, it won't be possible to apply these considerations
to non-flat textures to get bump 3-D effects. I will introduce an easy approach
that can reduce the real time calculations overhead to few multiplications per pixel
with circular wavefronts, or even one multiplication per pixel with linear wave fronts.
»
paper [Microsoft Word - 79Kb]
Copyright © 1995-2010 Stefano Mazzocchi. All rights reserved.