Microsoft has published yet more details about the glitch in its Windows Genuine Advantage (WGA) system that affected an estimated 12,000 users over the August 24-25 weekend.
Microsoft is attributing the WGA problems to "human error," according to a new August 28 blog posting by WGA Senior Product Manager Alex Kochis. Kochis explains on the WGA blog:
"Nothing more than human error started it all. Pre-production code was sent to production servers. The production servers had not yet been upgraded with a recent change to enable stronger encryption/decryption of product keys during the activation and validation processes. The result of this is that the production servers declined activation and validation requests that should have passed."
Windows XP and Windows Vista activation and validation were both impacted as a result. Kochis said Microsoft actually fixed the activation problem in less than 30 minutes, but "the effect of the preproduction code on our validation service continued after the rollback took place."
Kochis noted this past weekend's WGA problems have resulted in Microsoft putting some new systems in place, including improved monitoring capabilities "to alert us much sooner hould anything like this happen again." He said Microsoft also is working on increasing the speed of escalations and "adding checkpoints before changes can be made to production servers."
While it's admirable that Microsoft hasn't attempted to sweep this incident under the rug and has continued to follow through on the WGA blog with updates, there are still a number of things about this past weekend's meltdown that are troubling. Kochis' insistence that the WGA mess was not "an outage" is one such sticking point. On August 28, Kochis said:
"It's important to clarify that this event was not an outage. Our system is designed to default to genuine if the service is disrupted or unavailable. In other words, we designed WGA to give the benefit of the doubt to our customers. If our servers are down, your system will pass validation every time. This event was not the same as an outage because in this case the trusted source of validations itself responded incorrectly."
Outage or no outage, users reported that they were incorrectly identified as running "non-Genuine" versions of XP and Vista. In the case of Vista users, their Aero interfaces were disabled. They were considered guilty, not innocent. This presumption goes to the very heart of why WGA, as it is currently designed, is unpopular not just with Microsoft critics, but many customers, too.