Home » Blog » Archives

Archive for 2005

Piggy Bank, Cocoon and the Future of the Web

October 2nd, 2005

Today, after many months of work and a ton of source code inspected, traced and evaluated, we released Piggy Bank 2.1.0.

I’m very proud of this work and very happy to be part of it: it is, to me, the first significant step into a bright future and it got me closer to the mozilla architecture, which, I have to admit, is a pleasure to work with (especially now that we found a way to write XPCOM components in java and therefore we have a ton of existing libraries to use pretty much for free).

Just before last weekend, during my final Piggy Bank wrap-up’s, I sent an email to the Cocoon development mailing list airing my concerns: the web is slowly but surely changing. Some call it the Web 2.0, some call it Ajax, some call it “told you!” and some call it “so what?”, but the truth of the matter is that web services are coming and their impact has very little to do with what protocols or architectural decisions you make, but the amount of people you manage to catalyze.

Sylvain was the only one that explicitly uncloaked my intent: Cocoon is clearly not obsolete and it won’t be for a while, but it’s fat and sleepy, kinda watching TV (if you allow me) instead of going out exercising. Before I move on, I wanted to trigger a wake up call.

At the heart of Piggy Bank, there is a web server running inside your web browser. It’s running a servlet, a minimal RESTful framework that David wrote call Flair (modeled after what Mark did for Longwell 1). It’s so simple it’s actually (to my web framework architect’s eyes) embarrassing, yet it does the job: Piggy Bank’s webapp (actually Longwell 2) is fully RESTful, no session, no state, no continuations, everything is passed back and forth urlencoded and urldecoded (yes, this creates issues, but that’s another story).

Since we know that Piggy Bank runs only on Firefox, we can go crazy with DHTML and know that it will work. Ajax is used as client side include, and you can even do templating and XML pipelining directly on the client. With Firefox 1.5, even the need for graphics on the server side is gone, SVG and canvas are embedded, scriptable and fully merged with the browser, no need for the amazing SVG->PNG functionality cocoon offers… also because, guess what, David cloned it with a little servlet called Picto that we now use to have our own color-coded Google Maps placeholders.

All of this, in a fraction of the space and complexity: the entire Piggy Bank, web server + database + full text indexer + webapp framework + template system + RDFizing framework + firefox extension + icons is 4.5Mb… and it’s not even stripped down (if I really wanted to, I could get it down to 1.5mb by using ProGuard but I really don’t see the reason for it).

One thing that I miss with cocoon is the sitemap, but the cost (in terms of megabytes and complexity) of that is way too high, besides, since we use Jetty’s own APIs and not the Servlet web.xml (yes, I know, you think I completely lost my mind at this point, being one of the people that designed that Servlet API web.xml in the first place) we were able to reuse a lot of Jetty’s internals as a web server, reducing the need for what we have to handle.

So, in short, all REST, state is never temporary saved but always transferred until persisted, AJAX pretty much everywhere, a minimal servlet that translates a request into a different action handler doing the urlencoding and decoding (the controllers, one per command, in java), RDF as the model and velocity templates as the view. No pipeline, No multimodality, no XML awareness, no continuations, no sessions. Piggy Bank has, on the server side, the architectural appeal of a CGI-BIN and yes, 7 years spent designing web application frameworks, I know that to be an insult.

But the overall result is incredible: light and simple on the server, light and simple on the client. Very easy to learn, very easy to adjust incrementally (once you polish up all the memory leaks and fine tune the database indexes for performance, as we now did).

Like many on the (long) thread that started on dev@cocoon.apache.org mentioned, pretty much nobody has the luxury today of owning both the client and the server in a web application. But with Piggy Bank, we do! and, let me tell you, not only it feels great and refreshing, but it makes you rethink about the entire web, a web where the ability to influence the clients is not locked in some vault in Redmond.

Yeah yeah, sure, IE is long to be gone, and so is Microsoft, but it’s not just the Mozilla Foundation and radical web standard activists pushing for Firefox anymore, it’s already in the radar of too many web sites, forcing IE to compete. And competition is good, especially when there are players like Yahoo, Google, Amazon and eBay that will not just stay at the window on the battle for the new web-service-powered ‘man in the middle’.

Netscape’s plan of transforming the browser into a desktop, making the operating system basically an obsolete concept, was called “constellation“. Many believe this plan was what made Microsoft kill Netscape with all possible means. When the mozilla developers decided to rewrite Netscape Navigator from scratch, many thought they were crazy in building their own browser inside the HTML rendering environment itself. The ultimate flexibility syndrome.

But today, years and thousands of bugs later, mozilla is a platform capable of delivering a client side tool used by millions. I can’t think of another client side application framework that was able to achieve such tremendous success if not MFC. Java JFC (aka Swing) failed long ago to deliver that promise on the client (its look and feel is awkward, unreasonably slow and massively overdesigned) and Eclipse RCP needs to take the Mac seriously before anybody will start using it for real (and yeah, the JVM needs to go to the gym with cocoon too!). Cocoa is great, but it’s another lock-in and the only reasonable answer to who gives kids sigarettes for free today is “No, thanks”, especially since Apple will very likely kill the need for your little cool app in the next release of the OS anyway, and, no, you don’t get a piece of that money.

So, what does it mean for the future of the web?

I honestly don’t know, that thread was just a way to shake people up. All I know is that I’m proud of what I did for the web, first on JServ for Java and Servlets, then on Cocoon for XML and web frameworks that could deliver the web promise and scale to the number of people involved, and now for SIMILE and the semantic web, of which, if you ask me, this Web 2.0 buzz is just the very beginning.

When people ask me what I do for a living, I say that I research what the web of the future could be. At that point, they ask me to give them an example of what that would mean for them. My usual reply is “if we are successful, the only difference you’ll perceive is that you won’t feel as constantly lost as you feel today”. At that point they smile, happy to meet a technologist who thinks it’s his fault, not theirs, if they can’t do something with his software.

No matter what technology or platform we build the future of the web upon, we need to learn how to write the software that delivers those smiles: anything short of that will be a failure.

 

Data First vs. Structure First

July 28th, 2005

Some people find the act of categorizing and abstracting natural and rewarding, others find it frustrating and unnecessary.

The problem with information technologies is that computer programmers are likely to fall in the first category and users of such programs are likely to fall into the second one.

For example, the file system.

Files and folders are concepts that were invented by people that were managing tons of paper information and, because of that, they liked categorizing and abstracting… normal users don’t! Look at the average car, desk, room, closet… are they all perfectly ordered and structured, labeled and well categorized?

If it’s easier/faster to find something in 8 billion pages of the web than in your own 100 thousand files (including documents, emails, pictures, etc..), there’s a problem.

Another example, spreadsheets vs. databases.

Both are based on tables, rows being the items, columns being their attributes. Both allow relationships, yet spreadsheets feel suboptimal and amateurish to the average database guy and databases feel obnoxiously rigid to the average spreadsheet user. Spreadsheets (not databases!) were the reasons why people spent 10K$ on a personal computer in the early 80’s. No wonder why IBM didn’t see that coming.

Yet another example: blogs vs. CMSs.

When you blog, you don’t tell the blog where to put it. You just write, you blog. When you write a diary, you don’t pick a random page and then write an index to indicate where to locate that item. You just pick up from where you left off. Some people like to categorize their blog posts, some don’t. Some people decide what goes in their feeds, and some others allow you to have an RSS feed of a particular query.

See the pattern?

Data First strategies have higher usability efficiency (all rest being equal) than Structure First strategies.

The reasons are not so obvious:

  1. Data First is how we learn and how languages evolve. We build rules, models, abstractions and categories in our minds after we have collected information, not before. This is why it’s easier to learn a computer language from examples than from its theory, or a natural language by just being exposed to it instead of knowing all rules and exceptions.
  2. Data First is more incrementally reversible, complexity in the system is added more gradually and it’s more easily rolled back.
  3. Because of the above, Data First’s Return on Investment is more immediately perceivable, thus lends itself to be more easily bootstrappable.

But then, one might ask, why is everybody so obsessed with design and order? Why is it so hard to believe that self-organization could be used outside the biological realm as a way to manage complex information systems?

One important thing can be noted:

On a local time-scale and once established, “Structure First” systems are more efficient.

This basically means that in any given instant and with infinite energy to establish them, structure first systems are preferable. Problem is that both bootstrapping costs and capacity to evolve over time of any given designed system are endemically underestimated, making pretty much any ‘Structure First’ project appear more appealing over ‘Data First’ ones, at least at design time.

But there is more: we all know that a complete mess is not a very good way to find stuff, so “data first” has to imply “structure later” to be able to achieve any useful capacity to manage information. Here is where things broke down in the past: not many believed that useful structures could emerge out of collected data.

But look around now: the examples of ‘data emergence’ are multiplying and we use them every day. Google’s PageRank, Amazon’s co-shopping, Citeseer’s co-citation, del.icio.us and Flickr co-tagging, Clusty clustering, these are all examples of systems that try to make structure emerge from data, instead of imposing the structure and pretend that people fill it up with data.

Some believe that the semantic web is an example of ’structure first’ but it’s really not the case…. yet, many and many people truly believe that in order to be successful a ‘Structure First’ design (well “ontology first” in this case) is the way you build interoperability.

As you might have guessed, I disagree.

I think that RDF is a good data model for graph-like structures and that complex, real life systems, tend to exhibit graph-like structures. I also believe that the value is not in the ontology used to describe the data but in the ability to globally identify (and isolate) information fragments and in the existence (or lack thereof!) of relationships between them.

Don’t get me wrong, some common vocabularies (RDF, RDF Schema and Dublin Core) go a long way in reducing the bootstrapping effort and make basic interoperability happening. At the same time, I believe people will “pick cherries” in the ontology space and when they don’t find anything satisfying they will write their own. Sometimes use and abuse will be hard to tell apart, creating a sort of Babel of small deviations that will have to be processed with a ‘Data First’ approach in mind. An immune system will have to be created, trusted silos established, peer review enforced.

Next time you spend energy writing the ontology, or the database schema, or the XML schema, or the software architecture, or the protocol, that ‘foresees’ problems that you don’t have right now think about “you ain’t gonna need it”, “do the simplest thing that can possibly work“, “keep it simple stupid“, “release early and often“, “if it ain’t broken don’t fix it” and all the various other suggestions that tell you not to trust design as the way to solve your problems.

But don’t forget to think about ways to make further structure emerge from the data, or you’ll be lost with a simple system that will fail to grow in complexity without deteriorating.

Permalink | Posted in Article
 

Welkin 1.1

July 14th, 2005

A lot of people (myself included) find RDF (especially its RDF/XML representation) very verbose and not really for humans. At the same time, it happens (at least to me, but it bet it will be more and more common to you too in the near future) that you get an RDF model and you want to ‘take a look at it’.

It was Microsoft that introduced the “tree view” for an XML document in IE 5.0 as a way for people to ‘take a look’ at the XML documents that were starting to circulate. It seemed trivial but I found it to be more and more useful (so much that we cloned it in Cocoon for those browsers that didn’t support it).

The XML model is inherently a tree, so a tree view makes perfect sense as the maximum common denominator. But the RDF model is inherently a graph, way more complex than a tree and a ‘tree view’ of an RDF model feels much like looking at the shadow of an object to understand what it really is! A pain.

This is why Paolo and I wrote Welkin, the “IE XML tree view”-equivalent for RDF. And since we were at it, we added a few really nice features, like filtering by degree distribution of nodes, fisheye zooming, URI clustering and so on.

Welkin comes out of the work for Genius (that we presented at IDAMAP in 2004) drawing upon my Agora community visualization tool (that I very recently updated and now contains all the public ASF mailing lists, including the incubated ones! check it out!) and applying it to the visualization of gene networks. We understood that pretty much everything that was a graph exhibited similar properties and that by using RDF as the underlying model, we could create a very general tool, separating the concern of data production (and rdf-ization) from the data consumption (including exploration, visualization and analysis).

There is a lot of work still to do on Welkin, but I find it already very useful.

Enjoy.