X
Business

A slim API for OOXML

One of things that originally made me want more information about Microsoft Office document formats was the need to harvest data from documents uploaded to web sites, as well as to generate Excel spreadsheets that served as mini-applications for data entry by people in the field. This need occurred to me as far back as 1995, while doing web development for a telecommunications equipment company in Richardson, TX.
Written by John Carroll, Contributor

One of things that originally made me want more information about Microsoft Office document formats was the need to harvest data from documents uploaded to web sites, as well as to generate Excel spreadsheets that served as mini-applications for data entry by people in the field. This need occurred to me as far back as 1995, while doing web development for a telecommunications equipment company in Richardson, TX.

Microsoft Office at the time was ahead of most office suites in this regard, as their COM Automation interfaces meant I could do all of this in code. This was one of Microsoft's competitive innovations, treating every piece of software as an extension of the APIs available atop Windows.  Just as they did with web browsers (turning IE into a frame around a configurable set of HTML rendering components), they treated the Office suite as an API, exposing its functionality through a common component-oriented binary framework known as COM (itself a competitive differentiator for Windows, as competing platforms never standardized on alternatives the way Microsoft did on COM). It was a framework that worked well from native C++ through scripting languages (thank you, IDispatch), making Microsoft Office vastly more useful as a tool in the office document processing toolbox.

The problem, however, was that you still needed to load an entire Office application in order to process a document. Though this wasn't so much of a problem for the low-volume Intranet website that I built back in the mid-90s, it would be a problem for a high traffic site, or one that aimed to aggregate millions of incoming documents into its datacenter.

For that use, something a lot slimmer was required. Defining office documents in XML and making the details of that definition available to third parties is certainly an important step. Even so, there is still the "ant / sledgehammer" problem. There's LOTS of stuff in a specification, and I don't want to have to wade through it all to figure out how to access a few fields in a spreadsheet, or create simple office documents. 

To satisfy that need, the new OOXML SDK (which will arrive next month) will provide .NET objects that enable developers to manipulate OOXML documents in code. Further, given that this is a lot lower level than the old COM automation interfaces, it is likely to scale a lot better than the old model.

I can see a lot of uses for this. It's upcoming release is likely obscured in the minds of many ZDNet readers by the gyrations surrounding (possible) OOXML ratification in the ISO, but for people tasked with manipulating what is likely to become the dominant office document format in the enterprise (Office 2007 IS selling quite well), this is very helpful.

Editorial standards