PDF vs. Office XML

Continuing the theme started in my last post, Massachusetts' recently ratified a digital document standard that excludes technology backed by the maker of the dominant Office automation solution on the market - Microsoft.

Continuing the theme started in my last post, Massachusetts' recently ratified a digital document standard that excludes technology backed by the maker of the dominant Office automation solution on the market - Microsoft. In my last post, I questioned whether that outcome was truly a demonstration of the power of democracy rather than proof of the power of interest group politics, if not a bullet item in the case against letting governments - who don't operate according to the same procurement rules as private citizens - to override technology decisions made by the open marketplace.

Besides ODF, a standard for digital documents ratified last May by the OASIS group, Adobe's PDF was also included on the list. That inclusion raises a number of questions, however. PDF has qualities that justify its inclusion. However, many of those justifications also apply to Office XML, and Office XML has merits in its own right that make it better, in some ways, than PDF.

First, a few misconceptions need to be erased. Office XML is offered under a royalty free license to anyone that wishes to implement it.

In other words, Office XML does NOT require Windows, or even a Microsoft product, to read or write it. Therefore, it satisfies the requirement for an "open" standard outlined by Marc Wagner in a recent blog post...

no citizen (or government agency) of the Commonwealth of Massachusetts should be compelled to buy any Microsoft product (or products from any other single vendor) in order to have access to public records.

That is already the case with Office XML.  Therefore, it's false to claim that Microsoft needs to publish their formats — or otherwise open their licenses to development and transfer without royalty payments. Publication and lack of royalty payments already exists.

That's the same situation with PDF, a format developed by Adobe but offered to third parties for implementation. That open status has certainly existed for longer than it has for Office XML, and the result - the presence of PDF readers on most computer platforms in existence and near-universal support for the format - is testimony to Adobe's careful nurturing of the technology.

That isn't an argument, however, for PDF being more "open" from a specification standpoint. Microsoft may have been late to make its document formats open, but that doesn't change the fact that Office XML is now an open specification that can be implemented on other platforms. Besides, consider the alternative, a format named ODF that was ratified in May of 2005 and has far smaller an installed based than Office XML (a format developed for Office 2003).

Some might argue that PDF's standardization by third party standardization groups (i.e. "joint stewardship") make it more "open" than Office XML. PDF/A (ISO 19005-1) has been ratified by the ISO for long term document preservation and archiving, and PDF/X (ISO 15930-1) is for the reliable exchange of press-ready, high end graphic information that facilitates the exchange of, among other things, high-end color advertisements.

If Massachusetts' chose the ISO-ratified PDF variant, then we'd have something to talk about. Unfortunately, they didn't. Rather, they chose PDF version 1.5, a specification completely controlled by Adobe. Just to put that in perspective, that would be like the Massachusetts' Department for the Promotion of Video Arts (which doesn't exist) standardizing on Windows Media 9 (WM9) as opposed to VC-1. WM9 serves as the foundation of VC-1, but it is not the same thing as VC-1, which is a standard in the final approval stages by SMPTE and into which third parties can have input.

The choice calls into question the notion that one of the "standards" of openness was that "it must be subject to joint stewardship. If so, they would have specified the officially sanctioned variant of PDF, not the one owned completely by Adobe.

On that note, Microsoft isn't averse to placing its technology under "joint stewardship." It did that with the .NET CLI, as well as with VC-1 (which, as noted, is based on WM9). The question is whether anyone asked them to do that. Again, though, I would have expected the Massachusetts' standardization group to insist on the ISO variant of PDF and not one controlled by Adobe if "joint stewardship" was a critical requirement.

Berlind noted that Microsoft chose an open-patent licensing policy for its Office XML specification. This means that anyone implementing a reader/writer could use any Microsoft patent to do so. In contrast, Adobe opted for the "patent list" approach to licensing, which means a specific list of patents were licensed out for use by implementers.

This means the possibility exists (however small) that someone implementing the specification might run afoul of something not on the list. This matters, however, as non-specific patent grants "future proof" the specification, as you can never predict what patents may arise in future which could be relevant to the implementation of a particular specification.  Therefore, it's worth marking that one down on Microsoft's side of the scorecard.

An area that Berlind claimed makes PDF more open, however, is in the ability to make derivative works:

What Engelhardt basically said is that developers are free to do whatever they want with Adobe's PDF specification. For example, they can break it apart or remix it with other specifications. The only restriction on this activity is that if the final output of the software isn't 100 percent compliant with the PDF specification, the developer cannot say that the software or the documents it produces are "PDF." The freedom to remix Adobe's work lies in stark contrast to Microsoft's license which says: "A 'Licensed Implementation' means only those specific portions of a software product that read and write files.

That's all true. That position is inconsistent, however, with the point of the Massachusetts' standardization effort, which was to settle on stable document format for long-term document archive purposes, among other things.

Placing the ability to make incompatible variants of PDF on a list of reasons defending its status as an approved long-term document storage format is like listing the ability to write English-language poetry that no one but the author understands as proof of the consistency of the English language. If the situations were swapped and Microsoft was the one with a policy that defended the ability to make incompatible derivative works, the open source world would be up in arms about the potential lack of consistency.

An advantage Office XML has over PDF is its status as an XML grammar. PDF was only allowed for documents whose content and structure will not undergo further modifications and need to be preserved. ODF was the real foundation of the Massachusetts' policy, as it was the only document format that allowed modification. ODF is also an XML grammar, and according to Berlind's article, everyone involved in the standardization effort was enthusiastic about the use of an XML format for digital documents.

If XML is the goal, then Office XML makes a better standard than PDF, particularly given that a) it better ensures standard implementations through legal enforcement of compatibility, b) allows any and all Microsoft patents to be used in the implementation, which defends the specification against future patent grants, c) joint stewardship doesn't appear to be as important given that the approved PDF specification is not "jointly managed," and d) even if it were, Microsoft has already demonstrated willingness to standardize technology through third party standardization groups.

The inclusion of PDF is an Achilles heel to arguments that Massachusetts wasn't swayed by the predictable hue and cry of open source advocates and Microsoft competitors in response to the suggestion that Microsoft's format would be included on its approved list. If PDF was included, it at LEAST makes sense for Office XML to be included at the same level as PDF...which is for read-only documents. I argue, however, that Office XML should have been allowed to do more.

Unfortunately, I argue that in my next post.