Chris Capossela, who runs product management for the Office family of products, dropped by to see me to dribble out more details about the next version of Microsoft Office (currently dubbed "12"), which is due in the second half of 2006.
The important revelation, which was expected, is that some Office 12 applications (Word, Excel and Powerpoint) will use Office Open XML as the default file format. Note: Excel and Word already have XML support and related schemas for saving documents with full fidelity as XML files. The formats are industry standard XML 1.0 and the schemas are available on a royalty-free basis. As a result, developers can query what's in a file and extract specific data or write their own compatible applications to view and manipulate the files. User can open the .XML files in any application that can read XML. "Our value is not tied to file format, but to the user experience and quality of the software." Capossela said. Now that's a refreshing point of view, given how in the past Microsoft has often made it difficult for others to parse the file formats.
What's new for Microsoft is compacting the often overweight XML text files using industry standard Zip compression technology to compress and decompress the data within a document--including comments, charts and document metadata--that is segmented and stored in different components. However, OLE objects and images are still stored as binaries.
Using Zip gets around the thorny issue of creating a binary XML to deal with file bloat. A few months ago I had a conversation with Jean Paoli, co-creator of the XML standard and senior director of XML architecture at Microsoft, who told me that binary XML is "nonsense" From his viewpoint, it's not possible to create a one size fits all binary XML standard to solve all the performance and size issues. "I am not negating the problems, but it's not a matter of creating a binary," Paoli said. At that time he mentioned existing technology, such as XML-binary Optimized Packaging (XOP) from the W3C or using Zip. "Everybody has Zip, and XML Zips very well. For many scenarios it's good enough" Paoli said. He also projected that by 2010, 75 percent of documents would be stored in XML format.
Using XML and Zip is not a unique approach, however, given that open-source Office competitor OpenOffice (sponsored by Sun) has been using an XML-based file format and Zip compression to store files. The OpenOffice XML file format specification is maintained by an OASIS technical committee. According to a Microsoft spokesperson, Openoffice.org has royalty-free access to the specs for the Office Open XML formats to ensure file compatibility. The current XML filter tool in OpenOffice supports the Microsoft Office 2003 XML file formats, although not always with full fidelity.
According to Capossela, users won't notice any difference with the compressing and uncompressing of files, and file size will be reduced 50 to 75 percent, resulting in savings on bandwidth and network storage. The file formats will be backwards compatible with Office 2000 and Microsoft will have tools to bulk convert files. None of the preexisting file formats are going away either.
One of the unique benefits of .XML, beside enabling more fluid intereoperability with data and applications outside of Office, is that the XML-based file format improves data recovery of corrupted files because it saves different types of data and puts them into discrete components. Instead of corrupting an entire file, only a part of it would be damaged. The XML formats will also help prevent executable payloads, such as viruses, from being delivered inappropriately in files.
A preview of Office 12 (not an initial beta, which isn't due until the fall) will be available at www.microsoft.com/office/preview on Monday, June 6. I asked about XML file formats for Macintosh Office, but Capossela wasn' sure--Mac Office is done by a different business group at Microsoft. Nor is a Linux version of Office on the drawing board. We'll also have to wait to hear about other features that will make it into Office 12. The dribbling continues...