X
Tech

Documentation is good, but source code is better

An order for full documentation of all APIs was a part of the judgment against Microsoft, but its source code would be much more informative.
Written by Larry Seltzer, Contributor
Perhaps the longest-running beef against Microsoft in the computing industry is over undocumented features in the operating system. An order for full documentation of all APIs (Application Programming Interfaces) was a part of the judgment against Microsoft, and many third parties have called for it as part of any final settlement.

There was a time when undocumented operating system calls were an important issue, but those days are basically long behind us. The overwhelming majority--I would guess 95 percent or more--of developers these days need no more than what is documented by Microsoft. But it's still time for Microsoft to do more and to lay this issue to rest.

Back in the 80's when I and the rest of the world were writing software for DOS, it was obvious that large parts of the OS were undocumented. In some cases, the programming interface--such as it was--was a mess, and perhaps documenting it would have been embarrassing. I once asked Chris Peters, one of the authors of DOS 2, why there was no DOS call for manipulating the DOS environment, forcing programmers to move bytes around in memory. He said that they wanted to do that, but IBM wanted it out the door already. Such was the level of thought and work that went into DOS.

But there were clearly other cases where facilities were written and used by Microsoft, and usable by the outside world, but Microsoft did not document them. Programmers did what they had to do; they reverse-engineered DOS and the Microsoft applications that used the undocumented facilities. After years of bad fragments of information being passed around among friends, Andrew Schulman released Undocumented DOS, one of the most important books in the history of the PC industry.

That Undocumented DOS has been out of print for years is both sad and gratifying, because as DOS became Windows and Windows evolved, the degree of undocumentedness (or is that "undocumentivity"?) has declined greatly. If you're writing a normal application, it's likely that if there's no API for what you need, there's no API at all.

And yet there are still undocumented calls in the operating system. But are these APIs? I've always assumed this definition: An API is a call into the operating system that is meant to be used by the outside world. But like any program that has external calls for the rest of the world to use, Windows has calls that are intended for other parts of Windows, not for third parties, and there can be perfectly good reasons for this. An internal function might be performing some reordering of an internal structure that is useless to the outside world. Furthermore, documenting a function, at least to some people, implies that it will be there in the future, and possibly that Microsoft will support it. Of course, there can be bad reasons too: If it turns out that Microsoft applications use these undocumented functions, then they really ought to tell the rest of the world about them. Examples of this were rampant in the 80's, but since the well-documented Win32 API came out they have been rare.

There was one very famous example--the case of Netscape and Windows 95--at the heart of the antitrust case against Microsoft. In the Findings Of Fact, Judge Jackson found that Microsoft withheld the definitions of the Remote Network Access (RNA) API from Netscape, and that this was the reason Netscape couldn't finish its Windows 95 browser until October 95. This particular example has always confused me, because the RNA API is used for interfacing to the RAS dialer, and a browser just doesn't need access to it. And in fact, what I remember from that time is that everyone in the world was downloading perfectly usable beta versions of Netscape's browser and using them because they were a lot better than the trash Microsoft was putting out at the time. Finally, we know from recent years that Netscape is perfectly capable, without any help from Microsoft, of being years late with a product.

I look at all the facts and I conclude that few real software companies or corporate developers are negatively impacted by features that aren't documented. In fact, most developers who spend some time surfing through the Microsoft Developer Network (MSDN) find too much information. There's more here than anyone has time to learn. Programmers using VB and many other systems rarely ever use an API anyway.

Utility software authors, such as Symantec and OnTrack, definitely have a tougher time, and documenting all the APIs might not help them, since many of the things they need aren't undocumented, they just don't exist.

By leaving the issue open, Microsoft leaves a doubt about it. One thing they could do is commit to documenting all externally callable functions in Windows, whether they are useful to the real world or not. They could even note, where appropriate, whether the call is unsupported or where it might change in future versions of Windows. After Undocumented DOS came out, Microsoft did just this with many of the DOS calls that Schulman had documented.

So the real problem is that third-party developers need to have better, more complete information, and they should be able to have confidence in it. Nothing does this like source code. They have actually begun to do this. Microsoft's Shared Source program is a license under which they release source code, essentially for informative purposes. Don't confuse this with the BSD and GPL licenses, under which people have the right to create and distribute derivative works. Just because Microsoft has released Windows CE source code under this license, doesn't mean you can make your own CE distribution. But you can look over the source to make sure you understand how your applications should run.

If you believe what Microsoft is saying about their Shared Source plans, the same is in store for the source code for Windows itself, at least for a much wider audience. In fact, according to that document, more than 1000 enterprise customers already have access to Windows 2000 source code, and the larger ISVs will too, although one wonders whether this will include companies like Sun and Oracle.

In the case of utility vendors, though, it's important to know the meaning of the data structures inside Windows, not just the API calls, and Microsoft doesn't always externally document all aspects of data structures. And sometimes the documentation for an API is inaccurate or ambiguous--or at least the developer thinks so--and this forces the developer to whip out the old debugger and see what the API is actually doing. If they had the source code, a lot of work would be saved.

While Microsoft's plans to distribute Windows source code are ambitious by past standards, one has to wonder why they don't take the next obvious step and make it generally available. My best guess is that they assume that they can trust the limited audience they are targeting, but not the general public. I'm not sure I disagree with them. Someone would surely release "Windoze XP" with who-knows-what in it.

Microsoft has also been licensing source code under other circumstances for years, but not to the general public and only under strict non-disclosure agreements. Several universities have source licenses for research purposes, and several commercial products, including SoftWindows and Citrix WinFrame, were based on Windows source licenses. But while this shows that Microsoft isn't as paranoid about its source code as many think, it doesn't address the issue, which is that people have to have the confidence that Microsoft is supplying all the information people need about their products. Documenting APIs doesn't do that. Providing source code does.

What do you think about Microsoft and its source code? E-mail Larry, or post your thoughts in our Talkback forum below.

Editorial standards