Of DVDs and documents

In their marketing claims around getting OOXML anointed as an International Standard, Microsoft says that more standards mean greater consumer choice. But sometimes less is more. We only have to examine HD-DVD vs. Blu-Ray to see the consequences of this.

[The opinions expressed here are mine alone, and not those of Google, Inc. my employer.]

Commentary-- The high-definition DVD format struggle is over. Toshiba's HD-DVD was slugging it out in the market with Sony's Blu-Ray disk format. Blu-Ray has won, and no one except for the creators of HD-DVD is really sorry. I have an HD-DVD player stuck upstairs in a closet (inherited from the previous owner when we moved into our new house) and even I don't care. I never bought any HD-DVDs, you see.

The problem with the battle between HD-DVD and Blu-Ray was they were identical standards for high-definition video players. Sure, the tech-geeks who cared knew that they were different, but for people who just wanted to watch cinema-quality video at home, having two standards that did the same thing meant only that no one bought any equipment or movies in either format. We've all seen this movie before, you see, so we knew how it ended. Back in the 1980's it was called "Betamax vs. VHS". I was the proud owner of a Betamax video player back then as well. This time everyone waited until one of the competing standards won.

I do have concerns about a Sony-controlled format becoming the standard, due to their "content" division being long-time fans of digital restrictions management, or DRM. But I'm still glad the conflict is over. Having one standard format means all I have to do is to wait for the freedom-loving digital underground to break the copy-protection on the format so I can back up my movie purchases, and then it'll be safe to buy equipment and movies. For a while, I suppose, until the next attempt to change the standard format occurs.

The arguments over high-definition digital video standards has a salutary lesson for the world of document formats. Once the dominant owner of all document formats, the ubiquitous .DOC, .XLS and .PPT file types, Microsoft is attempting to force all computer users to standardize on their latest efforts. This new format goes by the unwieldy name of Office Open XML (OOXML). This name is so confusing that even Microsoft executives often mistakenly refer to it as "Open Office XML", more commonly associated with their Free Software competitor, OpenOffice.

The trouble is, there's already an existing International Standards Organization (ISO) document format, Open Document Format (ODF), so Microsoft is trying to make their OOXML format into an ISO standard. They don't seem to care that they might break the International Standards process or ISO itself by doing it. I've written about this process already in my column "The Definition of Insanity" but I haven't written much about the OOXML standard itself or the changes that it is currently undergoing in order to pass it as an ISO standard.

If you've ever subscribed to the Microsoft Developer Network, or MSDN as it's commonly known, then you'll find the OOXML "standard" document familiar. It's a typical example of Microsoft MSDN-style technical documentation. It isn't badly written; indeed for proprietary documentation it's about as good as it gets, but as I've said before of Microsoft documentation, it's fuzzy on the details. It's not a standards document, something you can use to unambiguously create an implementation from scratch, without a great deal of trial and error testing against the Microsoft version of the same "standard".

A good example to use to compare it to real standards documents is to examine Internet Engineering Task Force (IETF) "Requests for Comments" (RFC's) documents, which are publicly available on the Web. They use key words such as "MUST", "REQUIRED", "SHALL", "SHOULD", "MAY" and "OPTIONAL" and these words have real meaning in the standard, such that an implementor can be guided by these terms. The OOXML spec just doesn't use the same precision in language that a real specification needs. It was almost certainly written by documentation professionals, not by engineers who actually understand the needs of the implementors of a standard. But of course the goal really isn't to encourage other implementations, but to bless the one existing Microsoft Office implementation as a standard at whatever cost.

As has been widely reported, OOXML has many technical flaws which were noted in comments by National Standards Bodies. The European Computer Manufacturers Association (ECMA), the front group that Microsoft used to insert OOXML into the ISO process, then produced resolutions for these comments. I've spent the last few weeks going through these to see if they fixed the original flaws and it's been a very illuminating task.

In some cases they did resolve the problems, in others they pushed back and claimed there was no original flaw, but for the most part they were remarkably open to adding extra features, which seemed to resolve the issues. I began to realize two things: First, ECMA was willing to say yes to almost anything in order to get OOXML passed as a standard. Second, the things they were pushing back on and were saying "no" to were any modifications to the specification that would mean a change to the existing Microsoft implementation of OOXML. There were many thousands of pages of comments, so it is possible I missed one, but I couldn't find any agreed change that would cause a single service pack for Microsoft Office to be released. In fact, ECMA even used the fact that a change would "break compatibility with existing implementations" as a reason for rejecting it.

An example is illustrative here. The date formats specified in OOXML are flawed. There is too much detail to go into here, but to summarize, different bugs in older Microsoft Excel and Lotus 1-2-3 implementations meant that there are two different ways to store a date specified within the original OOXML specification, with different semantics. The obvious way to fix this in future documents is to specify a single standard date format (ISO8601 is such a standard) and convert to that format when reading old documents in .XLS format. Oddly enough, this is exactly what the Free Software alternative OpenOffice does, when converting to the existing ISO Open Document Format (ODF) standard. ECMA agreed, and so added the ISO8601 format to the list of allowable date formats in OOXML. But they didn't remove the old buggy formats from the specification. They just added one more, with a note that the old format is "deprecated".

The "change" adopted by ECMA had the exactly the properties required by their sponsor. It paid lip service to the principles of ISO standardization, and required no changes to any existing Microsoft code, which will just ignore the new format. Maybe later they'll implement it, maybe not. Either still fits within the "standard". With standards this low, it's hard not to meet them. But this is a problem for interoperability. Because there's no single mandated date format, it forces any other implementations to replicate the bugs of the past. There's no other way to be sure your implementation can read OOXML files correctly without implementing the bug, and you have to write out the buggy dates as you can't be certain that any other implementation will implement the ISO8601 date format. The claimed deprecation is hollow here, this bug will live forever. Highly inappropriate for a date bug, if you ask me.

In their marketing claims around getting OOXML anointed as an International Standard, Microsoft claims that more standards mean greater consumer choice. But sometimes less is more. We only have to examine HD-DVD vs. Blu-Ray to see the consequences of this. Or to finish with an old joke:

"How many Microsoft engineers does it take to change a light bulb?

None. They just declare darkness the new standard".

Jeremy Allison is one of the lead developers on the Samba Team, a group of programmers developing an Open Source Windows compatible file and print server product for UNIX systems. Developed over the Internet in a distributed manner similar to the Linux system, Samba is used by all Linux distributions as well as many thousands of corporations worldwide. Jeremy handles the co-ordination of Samba development efforts and acts as a corporate liason to companies using the Samba code commercially. He works for Google, Inc. who fund him to work full-time on improving Samba and solving the problems of Windows and Linux interoperability.