ODF vs. OOXML - the way I see it

In a blog post on Friday, I asked in the title whether ODF has sufficient detail after discovering a post by Miguel de Icaza's on the subject of ODF vs. OOXML.
Written by John Carroll, Contributor

In a blog post on Friday, I asked in the title whether ODF has sufficient detail after discovering a post by Miguel de Icaza's on the subject of ODF vs. OOXML. The point from that post I keyed on was that 6,000 pages isn't a problem so long as those 6,000 pages actually have detail relevant to saving and describing office documents. Miguel de Icaza, by his reading, thinks there is information in OOXML that has never been seen before, and he finds that worthy of merit. Whether or not he thinks, personally, that OOXML should be an ISO standard is a separate question.

I dove into the ODF debate over a year ago with a series of blog posts questioning the wisdom of  mandating a new standard completely unrelated to the one that currently exists, however de facto its status. It's a bit like a UN Emergency Assistence team going to a country whose staple food is rice and distributing bags of bleached flour, or a government standards group mandating train track gauges of four feet even though 99% of the market is built with a gauge of four and a half.

ODF isn't a bad standard. It seems to be built on a lot of public standards ratified by the W3C, and I've not found anything that strikes me as irrational from a technical standpoint. It's just something out of left field, market wise, and that's a weird thing for a new standard to do. It would be somewhat akin to a standards committee dictating that all software written in a particular state use the SmallTalk programming language, even though C-family languages (C++, C#, Java) are vastly more common and developers knowledgeable in the language are few and far between.

But, I don't expect fans of ODF to agree with that, either. So, let's step back a minute and consider why this whole process got started. Governments such as Massachusetts started a process that resulted in the choice of ODF because they were looking for a document format that could be understood in the distant future, long after a particular program that created a document had passed to that great big software dustbin in the sky. In other words, governments wanted a format suitable for archival that was well defined and accessible to anyone who wished to use it.

ODF is clearly well-defined, and its backers had the foresight to initiate the official standardization process long before Microsoft did. In other words, it makes a decent archive format, so long as documents are expressed in ODF.

What Mr. de Icaza's blog makes clear, however, is that ODF does have functionality gaps. Things like spreadsheet formulas have not been standardized. Yes, you can save spreadsheet formulas in ODF, but since there is no official definition of the syntax those formulas should take, you end up with a formula the interpretation of which is application specific. If the ODF standard as it pertains to spreadsheet formulas could be compared to a vase, it would be like putting flowers in the vase in one instance and a bowling ball in the other and claiming the vase was used for the same purpose.

One could convert everything to what OpenOffice uses for formulas, but given the lack of an official standard, that's as credible as putting formulas as is from an existing Microsoft Office document. David Leigh (a respondent in the talkbacks to my last blog post) noted efforts towards a a standard formula specification (named OpenFormula), which is good. It doesn't change the fact that such a thing doesn't exist now, however, nor that there is no clearly defined completion date for that process.

But, let's pretend all the functionality gaps in ODF get closed in the next 5-10 years. Let's assume the full range of functionality found in a typical Microsoft Office document will get standardized in some form as part of ODF, resulting in a complete format that can handle pretty much any kind of functionality you might want to include in an office document.

How is that going to help to archive the BILLIONS of documents humans have made over the last 15+ years, a period when Microsoft's Office software was the de facto standard for creating office documents?

It's great that people have tried to devise what they believe to be an "ideal" office document format. The problem, however, is that most people have been using for the the past few decades an application made by a company in Redmond. Given that product's need to compete with offerings from other companies, some of which at one time were themselves the market leader (WordPerfect, Lotus 1-2-3), it has also absorbed information relevent to accurate representation of data in those formats. In other words, compatibility in Office stretches beyond the confines of Microsoft's own document formats.

That knowledge and history is included in OOXML. Some have pointed out that OOXML is the format used in Office 2007, and as such, is far from widespread given that the product in which it is used was only officially released at the end of January, 2007. That's beside the point. Microsoft has a strong interest in making sure older documents display properly in the latest incarnation of its leading Office suite. Though OOXML is an update on XML document format work started with Office 2003, it is the "inheritor" of a long history of document formats stretching back to the binary days.

It shouldn't be contentious to state that OOXML is going to have a far easier time representing older Office document formats - or even legacy formats from some of Microsoft's competitors (WordPerfect, or Lotus 1-2-3) than ODF. It's a format that was DESIGNED to express such documents, along with a range of functionality that currently isn't encompassed by ODF.

If the goal of a standard document format is to be able to properly archive ALL documents and not just those made as of mid-2006 and saved to the ODF format, then OOXML should be considered rather important. It's worth noting that one of the co-sponsors of Microsoft's submission to the the ECMA was the British Library, an organization with a strong interest in archive of digital documents. It also might help to explain why OOXML requires 6,000 pages. A format that has a history that stretches back 20 years is going to have a range of functionality and supported inclusions that ODF, as a newer format, does not have.

Even if you are a fan of ODF, I think it should be obvious that ODF is going to have a heck of a time capturing all the information contained in billions of existing office documents. Given that ODF is certainly not "complete" (in the sense of having all the functionality of a typical Microsoft Office document, a point noted numerous times in the Talkbacks to my previous blog post), why not capture them in a format that has been designed to capture them, provided that format is sufficiently documented?   19 contradictions were lodged by participants in the ISO standardization process? Fine. That's what the process is for.

Over time, ODF might figure out ways to pull more information out of OOXML and put it into whatever standard formats ODF settles on in the future. In the interim, why try to slam square pegs into round holes?

To my mind, OOXML and ODF represent two different approaches to office document technology. One is evolutionary, and the other is revolutionary. Lest people rejoice in my calling ODF "revolutionary," note that revolutions can result in great changes that benefit humanity as a whole, or regimes that kill their own people with revolutionary zeal. Evolution has merit as an approach to change.

The battle between ODF and OOXML also implies that there is no settled opinion as to the proper way to handle office document formats, much less the functionality which should be included therein. That's important in standards, as they are supposed to be something that is relatively set in stone. Since there is no general consensus on the right way to do office document formats, why not have both working alongside each other, if not competing to see which manages to serve human interests better? Archivists get what they want, and both sides can continue to work to improve their respective standards.

At some point in the distant future, we may even manage to settle on a single standard that provides the best of both worlds (and given the documentation rigor of a standarization process, cross-pollination becomes more likely). For now, we would have something that leverages the full range of the existing ecosystem (OOXML) as well as something that takes a stab at a new direction (ODF).

Anyway, that's MY opinion. And since I'm speaking on a rather controversial topic while Microsoft employed, it's worth reiterating that though I WORK for Microsoft, I do not SPEAK for the company. These are my opinions, wholly my opinions, and nothing but MY opinions.

Editorial standards