In Office SP2, Microsoft manages to reduce interoperability

Microsoft Office SP2 claims to have a fully compliant version of ODF, and that's probably true, as defined by the specification. It's just completely useless at interoperating with other vendors' products. This is not interoperability; it's an attack on the very concept.

Microsoft recently released service pack two (SP2) for their flagship office product, Office 2007. As I'm not a user of Microsoft products, normally I wouldn't have noticed, but Office 2007 SP2 had an important new feature for users of Open Source office productivity software that made me pay attention. SP2 contains Microsoft's first native implementation of the file format Open Document Format (ODF), originally created for Sun's Open Source OpenOffice product. ODF was standardized by the International Standards Organization (ISO) before Microsoft's rival Office Open XML (OOXML) and is seen as the competitor to Microsoft's offering for the future of XML based office file formats, so Microsoft implementing it in Office is a big deal.

With the implementation of ODF in SP2, we finally have one portable office file format, accepted and implemented by most office productivity software. That's the theory, right ? The devil, as always, is in the details.

IBM's Rob Weir, chair of the ODF Technical Committee and one of the people involved in the design and standardization of ODF examined Microsoft's implementation of spreadsheet interoperability. He specifically looked at the case of spreadsheets using formulas (which in practical terms is most spreadsheets that users would create and use), and he published his findings here.

I'm not going to go into great technical detail of what Microsoft actually did wrong in their implementation; Rob does an excellent job of that in his blog. Let's just look at  an overview instead. In short, Microsoft managed to reduce interoperability between office productivity software by their implementation of ODF inside SP2.

How can this be ?  After all, ODF is an ISO standard. Surely if you implement a standard fully, which Microsoft claims to have done in SP2, then you must have an interoperable product. So long as others also implement the standard as written, then everything should just work together. That's the way things are supposed to work.

One of the reasons is that standards themselves are often not perfect. Microsoft and their attendant band of astroturf bloggers are already raising a hue and cry over Rob's findings, claiming the ODF standard itself is at fault, and in some cases calling for his resignation as chair of the ODF Technical Committee for the heinous sin of pointing out this emperor has no clothes.

They are right about the ODF standard of course. It is missing a proper definition of spreadsheet formulas. This is the truck-sized hole that Microsoft drove through in their implementation. Sure Excel saves formulas in ODF documents, just in a separate  namespace where no other application is currently designed to look for them. The result is that anyone trying to open an ODF spreadsheet created in Excel will have it rejected. Excel reading an ODF spreadsheet created by another application does something worse, it will use the last value for the data in the spreadsheet cell that should be governed by the formula. The formulas themselves are silently dropped.

Yet Microsoft Office SP2 claims to have a fully compliant version of ODF, and that's probably true, as defined by the specification. It's just completely useless at interoperating with other vendors' products. This is not interoperability, it's an attack on the very concept.

Unions are not popular in either the USA or the UK any more, which I think is a sad state of affairs. My first action on getting my first job in the UK was joining the local union. So for those readers not experienced with union activity, I'd like to explain the concept of  "Working to Rule". When a union is trying to negotiate with management, there are a broad range of options they can take before using the ultimate weapon of going on strike. One of these tactics is "Working to Rule". Normally in a working day, there are hundreds of small rules that people ignore in order to get their jobs done. From refilling the coffee machine for themselves (which could be a health and safety hazard, if you really think about it) to fixing small problems with the machines they use for the job. "Working to Rule" means deliberately obeying every single one of these rules. Coffee machine out of water ? "Not my job mate." Ethernet cable fallen out of a machine ? "I'm a programmer, not a hardware engineer. Someone had better come and fix that for me." I'm sure you get the idea. Punctilious observation of every possible rule in order to disrupt orderly working.

This is what Microsoft has done with ODF in SP2.

They've done this before.

When Windows NT was first announced, one exciting new feature was the concept of "subsystems". Windows NT was to be a chameleon operating system. Not only would it run Windows binaries, it had two other "subsystem personalities", OS/2 and POSIX. Yes, that's right, Windows NT was originally a fully POSIX-compliant operating system. POSIX is the standard for UNIX programs, meaning you could re-compile the same source code on any POSIX compliant system and it was guaranteed to work the same. POSIX was popular in government contract specifications, as it was supposed to save the government money on IT systems by forcing vendors to be interoperable.

I remember getting my hands on the first beta of Windows NT, starting up the POSIX subsystem and trying some code out on it. It was a joke. Networking ? That's not part of the official POSIX spec, so no access to the network. Windowing ? That's not in there either, so no fancy graphical interfaces for your POSIX programs, pure text-based code only. Anything not fully mandated by the spec was ripped out. Yes, it could pass the pure POSIX conformance tests, but that was all it was able to do. No useful code could run on this system, as all of it expected something more than the basic standard, which most other POSIX vendors had managed to create de-facto standards around. The Windows subsystem even had some of these de-facto POSIX-like standards (the Berkeley sockets networking interface for example) but these were explicitly excluded from the Windows NT POSIX subsystem. The only purpose was to allow government purchasers to check the box marked "POSIX compliant" but allow them to purchase completely proprietary Windows solutions, and that's just what they did. It implemented the letter of the law, whilst completely ignoring the spirit of it.

So how do you do real interoperability ? Well, I like to think that my own project Samba could teach engineers a thing or two about how to do that. We're working from specifications for the Common Internet File System (CIFS) protocol that are not an official standard, but we go out of our way to make sure we work with other vendors implementations. We attend interoperability testing conferences, where we work with the engineers of other vendors (including Microsoft engineers) to ensure that customers deploying any of our implementations don't get any nasty surprises. We've changed our code to work with Windows 95 and 98, Windows mobile, Windows CE, Windows 7, Network Appliance, a host of un-named embedded versions of CIFS in different appliances, even old versions of OS/2. It's not hard, it's just careful, detailed work. The only rule is to follow the words of the Internet Engineering Task Force (IETF) for interoperability, "Be conservative in what you send, be liberal in what you will accept."

If we simply worked from a specification, we'd end up with a product that would work with itself, but would have no chance of working in the real world with other vendors implementations. Very similar to what Microsoft has produced with Office 2007 SP2's ODF support.

A complete cynic would say that was what was intended. That Microsoft, being the dominant vendor of office suites, would only benefit from creating an implementation of a competing standard that was worthless for interoperability. That causing confusion in the marketplace like this was designed to make customers scuttle back to the safety of only using Microsoft Office and the endlessly mutating versions of .DOC or .DOCX, as these interoperability issues are at least problems the customers have learned to live with over the years.

But I've seen Microsoft do better than this. I've worked with their engineers on CIFS, they've attended interoperability events, they've even logged bugs on Samba when they've found problems. They know how to do this properly.

But what we currently see in Office 2007 SP2 is still "Working to Rule" in every sense of the phrase.