A word from Rob Pike
At the time Microsoft announced its intention to re-invent PICK in the Longhorn file system it also saw XML as a programming language for the web and the combination, therefore, as a means of imposing structure on the disorder of the typical PC disk -and the internet. In effect, a Yahoo style classification based solution.
When Rob Pike, co-creator of Plan9 and one of the true gurus of both Unix and C, did a web interview on slashdot he was working with Google and touched on this set of issues. Here's part of what he said:
This is not the first time databases and file systems have collided, merged, argued, and split up, and it won't be the last. The specifics of whether you have a file system or a database is a rather dull semantic dispute, a contest to see who's got the best technology, rigged in a way that neither side wins. Well, as with most technologies, the solution depends on the problem; there is no single right answer.What's really interesting is how you think about accessing your data. File systems and databases provide different ways of organizing data to help find structure and meaning in what you've stored, but they're not the only approaches possible. Moreover, the structure they provide is really for one purpose: to simplify accessing it. Once you realize it's the access, not the structure, that matters, the whole debate changes character.
One of the big insights in the last few years, through work by the internet search engines but also tools like Udi Manber's glimpse, is that data with no meaningful structure can still be very powerful if the tools to help you search the data are good. In fact, structure can be bad if the structure you have doesn't fit the problem you're trying to solve today, regardless of how well it fit the problem you were solving yesterday. So I don't much care any more how my data is stored; what matters is how to retrieve the relevant pieces when I need them.
Grep was the definitive Unix tool early on; now we have tools that could be characterized as `grep my machine' and `grep the Internet'. GMail, Google's mail product, takes that idea and applies it to mail: don't bother organizing your mail messages; just put them away for searching later. It's quite liberating if you can let go your old file-and-folder-oriented mentality. Expect more liberation as searching replaces structure as the way to handle data.
From the big picture perspective what's important about this is the implied preference for unstructured data because any classification imposes its own limits.
Look closely at what Microsoft is doing with XML now and you'll see a contrast with what the Cocoon people are doing. Microsoft's approach is easier to understand and consistent with classification based text processing practice, but likely to be dead ended by its lack of flexibility in the face of vast amounts of data and differing user agendas. In contrast Cocoon's use of XML seems to be getting ever closer to the original point of the specification: its use as a flexible markup language for transmitting format information rather than as a structuring tool describing or defining the information itself.