Microsoft finally earns a passing grade (barely) for WGA

Microsoft launched its Windows Genuine Advantage (WGA) anti-piracy program in early summer 2006. Its first year was, to put it charitably, a disaster. An epic fail. A big fat F on the year’s report card. Things didn't get much better in 2007, either, as a server failure and other outages unfairly labeled thousands of legitimate Windows customers as pirates. In the past year, Microsoft has revamped and re-engineered its WGA and Vista validations systems and processes. What did they do and what does it mean for you? I went back to the same data source I used in 2006 to measure Microsoft's performance and see whether they finally deserve a passing grade.

Microsoft launched its Windows Genuine Advantage (WGA) anti-piracy program in early summer 2006. Its first year was, to put it charitably, a disaster. An epic fail. A big fat F on the year’s report card.

A certain amount of error is inevitable in any activation and registration system, but those numbers were clearly too high when WGA first rolled out. In an interview last week, Microsoft WGA director Alex Kochis tacitly acknowledged that fact, pointing out that “we’ve made major strides in the accuracy of the program” in the past two years.

WGA finally earns a passing grade (barely) for WGAHow bad was it? Users began suffering unpleasant consequences almost immediately, including system failures and false positives that flagged perfectly legitimate Windows copies as “non-genuine.” I wrote about WGA and its problems extensively throughout 2006 and 2007, documenting the extent of the problems. (The complete index of WGA-tagged posts is here.) In August 2006, I performed an exhaustive survey of problem reports from Microsoft’s own WGA support forum and discovered that “42% of the people who experienced problems with WGA and reported those problems to Microsoft's public forums during that period were actually running Genuine Microsoft Windows.”

There was another wave of failures in October 2006 and the first reports of Vista validation problems appeared in February 2007. I met with managers of the WGA program several times in early 2007 and we discussed how they were responding to these issues. To their credit, they made major changes in support policies, back-end systems, and the online experience. But in August 2007, just as the WGA program appeared to be running smoothly at long last, “human error” caused a WGA server failure, with an estimated 12,000 legitimate customers affected. Most of the glaring bugs in the system had been worked out, as I discovered when I examined forum reports from December 2006 and discovered that the failure rate had dropped from 42% to 22%. That failure rate was still too high to rate anything higher than a D-.

The August 2007 outage inspired a wave of rethinking and re-engineering at Microsoft to ensure that this sort of problem couldn’t happen again, Kochis says. “We needed to think about what the impact to the customer was so that we minimize negative impact on customers. In response, we put in place what we call a ‘circuit breaker.’” According to Kochis, the systems are now monitored continuously in real time, through automated systems and by engineers. “If we detect anything that's happening in response to our automated and human monitoring, one of the first things we do is evaluate pulling the breaker, which will [respond to] any system that calls in for validation and either use the last validation status for that system or just pass that system for that moment in time.” In effect, any time an anomaly in the system is detected, the result defaults in the customer’s favor, declaring the system “genuine,” at least until the next check.

Page 2: No false positives for Windows XP?

Page 3: Windows Vista is more complicated

Page 4: For 2008, WGA gets a C+

Kochis also says the WGA group has revamped its internal processes to make them more responsive to issues that might affect Windows customers. “We do drills,” he told me, “many, many drills. And we get better every time. We’ve had some real events, too, [although] none have been as significant as the [August 2007] server outage. They've been invisible or transparent to end users or customers.” The biggest test of the “circuit breaker” system came in January 2008, when two undersea cables in the Mediterranean were severed, disrupting Internet service over much of the Middle East and Europe, including some of Microsoft’s busiest call centers.

“We learned about it very quickly and later that same day, we had a plan pulled together that would enable us to provide support for customers in a number of different ways. We did whatever we could to reduce call volume at that time. In Egypt, we have a call center that services a number of languages, including those in Europe. So one of the first things we did was have people on airplanes flying [from Egypt] to a call center in Germany so we could redirect phone traffic there and have local language support. Likewise, support calls for Spanish-speaking customers were routed to Latin America.

“Our online activation systems were also affected,” Kochis notes. “We actually pulled the circuit breaker in that situation, so that we would minimize call volume. All systems passed, none failed, until we were ready with our rerouting process.”

If that incident had happened a year earlier, the impact on activation and validation systems would have been catastrophic. With the new systems in place, there was literally no discernible impact. I’ve been monitoring WGA longer and more closely than anyone outside of Microsoft, and in the year since the August 2007 server outage, I have seen no reports of even brief failures in the WGA system. (One report at Ars Technica in July turned out to be a false alarm that shut down the telephone-based activation system for about 90 minutes but left WGA untouched.) That doesn’t mean WGA is working perfectly today. There’s still plenty of room for improvement, as I note in the conclusion of this report.

Back in 2006, the percentage of people affected by WGA failures and glitches was  unacceptably high. Microsoft richly earned a big fat F in WGA in its freshman year. And 2007 was only a little better. Although the embarrassing conflicts with third-party software that falsely triggered WGA alerts in its early days had mostly been vanquished, the server outage of August 2007 clearly served as a wake-up call.

So the question is, two years later, has Microsoft finally gotten WGA right? Or at least good enough?

For the answers, I went back to the same rich data source I used in the original August 2006 report and for a follow-up in December 2006: Microsoft’s own WGA support forums. When I did the earlier study, Windows Vista had not yet launched, so all reports involved Windows XP. Today, two years later, there are separate WGA support forums for XP and Vista, and I looked at both of them. Back in 2006, I counted data for a 15-day period, August 1-15, and tallied 137 support requests directly related to product activation, validation, or WGA “non-genuine” messages. For the 2008 version, I used a larger sample, examining every thread on the two WGA forums that was started between August 1 and August 26

Next page: No false positives for Windows XP? -->

Windows XP: No false positives?

For the 26 days in this study, I tracked 101 separate threads on the Windows XP Genuine Advantage Validation Issues forum. Of those, 20 were simple questions (How many computers can I install my copy of Windows on? Where can I order a replacement for my lost Windows CD?), 4 were comments or notices from administrators, and 13 were off-topic support requests from people who had wandered onto the wrong forum. That leaves a total of 64 separate requests for support with an issue related to product activation or validation. No doubt some were triggered by seeing this message on the desktop:
the new XP Genuine Notificaiton message

A total of 16 reports were confirmed cases of piracy (blocked VLK or keygen). Another 9 were cases where the report specifically noted that the system had been attacked by a virus that wreaked havoc with crucial system files, including the activation components. This report is typical:

I have just managed to clear-up a severe malware attack on my laptop. However, XP will not now validate. I searched the registry and discovered that the ProductID has has its first group replaced with 'VIRUS'. How do I recover the genuine product ID - I have my product key.

In all those cases, the correct answer was to back up data and reinstall Windows, because the compromised system was probably beyond repair. After a severe malware attack, the safest way to ensure that no trace of the original infection lingers is to start from a clean slate.

The remaining 39 cases all involved situations where the user needed help, often because they had reinstalled Windows and were stymied by some part of the activation process. Although activation is technically separate from WGA, the issues overlap so closely that it’s almost impossible to separate them. In at least three cases (here, here, and here), a problem occurred because the user was trying to reinstall Windows XP using the wrong media for the system (a Dell reinstallation CD won’t work on a system with an ASUS motherboard, for example). In many but certainly not all cases, the forum staff was able to resolve the problem by stepping the user through an online validation process or pointing to a Knowledge Base article (here and here, for example). They couldn’t help this guy, who had lost his product key, or this desperate soul, who was trying to use a borrowed copy of Windows XP Professional to repair a company-issued laptop without calling the IT department. One can only imagine the back story there.

In 26 days’ worth of problem reports, I read a handful of reports of “non-genuine” messages caused by some unknown combination of software, devices drivers, malware, and hardware. I could find only one report that appeared to be a genuine false positive (but turned out not to be after all). The story was an odd one:

I purchased a new OEM version of Windows XP Home Edition.  It came with a product key certification and appears to be a genuine copy of XP.  I installed it on my  iMac under Mac OS Leopard.  The software installed with no problem, but failed online activation with "UNAUTHORIZED PRODUCT KEY'.  The telephone approach also did not work.

According to the Microsoft employee who responded on the forum, the product key in question “is not genuine and has been blocked.” Normally, this response would mean that the purchaser had bought counterfeit software from a shady online dealer, but in this case there was a twist:

I purchased the software from an authorized MS partner.  I called them and they told they me they are experiencing a number of complaints regarding invalid activations, with a number of MS products including Vista.  They have been told by MS that there is a problem with the activation server.  They requested full details of the copy, including numbers on the CD.  The vendor has assured me they are working with MS directly and will correct the problem by issuing a new key within three days.

Sure enough, three days later, the thread was updated with this post:

I reinstalled WXP and used the replacement activation key provided by Microsoft through the vendor.  It worked, activation was successful!

I asked Microsoft to comment on this particular case. After looking into the specifics, they confirmed that the person making the report was indeed a (probably innocent) victim of piracy. A Microsoft spokesperson said that the product key in the initial report “was identified as a keygen and does not match the standard product key format.” One possibility (pure speculation on my part) is that a legitimate dealer received a shipment of high-quality counterfeit product from a reseller and quickly cleaned up the mess after figuring out what had happened. I found no additional reports of similar problems, which suggests that this was an isolated case.

Next page: Windows Vista is more complicated -->

Windows Vista: It’s complicated

For the 26 days in this study, I read through 109 separate threads on the Vista support forum. Of those, 40 were either simple questions or off-topic posts, leaving 69 legitimate requests for help with product activation and validation issues, or a “not genuine” report.

Sorting the XP trouble reports into buckets was relatively easy compared to its successor. The Software Protection Platform in Windows Vista is much more complex, which means it has more points of failure than the relatively simple WGA validation process. In fact, the sorting process helped me determine two big issues that still need to be fixed in Vista (and, presumably, in Windows 7).

Of that total, 10 involved counterfeit software, and another 2 were the victims of serious malware attacks that had scrambled the operating system beyond repair.

There were a total of 17 help requests, eight of which involved activation and product keys. Several of the remainder involved “Invalid License” messages, for which the response was a boilerplate set of steps to reset the license store, using a command-line switch for the familiar slmgr.vbs script (the same one that allows you to “rearm” a system beyond its initial 30-day grace period. In this typical case, the results were successful.

The remaining 40 appeared to be false positives. This one was the strangest of all. After acknowledging that the license appeared valid, a Microsoft employee wrote:

Currently what you have experienced happens very rarely. You are experiencing a problem with the Trusted store where your drivers are stored for the hardware on your computer. Usually this will correct itself once you restart the computer. Should this not resolve the situation we recommend you to update the drivers for all hardware which you have in the computer.

I’m filing that one away for future reference. Meanwhile, the remainder of the 40 error messages were split into two large groups, each of which had earned its own boilerplate response.

Problem #1 occurs when the System Licensing service is shut down. As Microsoft’s Darin Smith explained in this typical Q&A session, the symptoms are fairly easy to identify:

In your Diagnostic Report it shows the error code "Online Validation Code: 0x80070426". This means that the Software Licensing Service has stopped. Vista uses this service to check itself and confirm it is Genuine. When the service is stopped, Vista is unable to confirm it's own Genuine status and may show Genuine or Non-Genuine. The fact that the Service has stopped is also, most likely, the cause of  other issues you may be experiencing (such as not having access to the Control Panel).

Judging by feedback on the forums, the boilerplate instructions are usually sufficient to resolve the problem. In other cases, the shut-down service is apparently a symptom of a much larger problem, as in this extremely detailed report that included this baffling error code:

An incomprehensible validation error

That report eventually resulted in a stalemate, with the forum staff unable to solve the problem and the customer unwilling to pay $59 to open a support ticket.

Problem #2 involved apparently erroneous reports from Windows Vista that its core components had been modified or tampered with, as is characteristic of BIOS-level hacks that try to fool the system into sliding past validation requests. According to a response from Microsoft’s Darin Smith in this thread, the symptoms appear in the online Diagnostic Report under the heading “File Scan Data,” in a format like this example:

File Scan Data-->

File Mismatch: C:\Windows\system32\msvcrt.dll[7.0.6001.18000]

File Mismatch: C:\Windows\system32\gdi32.dll[6.0.6001.18023]

File Mismatch: C:\Windows\system32\ole32.dll[6.0.6001.18000]

The mismatch is between the hash of the signature embedded in the file itself and the corresponding signature hash value listed in Vista’s System Catalog. That typically means one of two things:

1. The file has been tampered with, modified, or corrupted, so that its signature hash no longer matches the value in the System Catalog; or

2. The system has been updated legitimately but the value stored in the System Catalog was not updated to reflect the updated file's signature hash.

According to Microsoft, a signature hash mismatch has different effects, depending on the file involved. Vista’s status may or may not change to non-genuine, the operating system won’t validate correctly, and “other strange behavior” may occur. Typical resolution steps involve using System Restore to perform repairs, as well as uninstalling and reinstalling Service Pack 1 (which updates all system files and their corresponding values in the System Catalog). In this example, the error was caused by disk corruption of unknown cause; the resolution required a lengthy CHKDSK session and a startup repair from the Vista DVD, followed finally by using System Restore to roll back to a configuration from a few days earlier.

Those two categories collectively involve 57% of the problems reported by Vista users on Microsoft’s support forums. Some are false positives. Others might be caused by Microsoft updates that failed to install properly. Still others might be caused by undetected malware or badly written programs that are interfering with system services and tampering with system files. It’s clear that Microsoft has some work to do to identify the root causes of those two failure types and prevent them from occurring.

Next page: For 2008, WGA gets a C+ -->

For 2008, WGA gets a C+

There’s no question that Microsoft’s performance on WGA and Vista validation has improved significantly in the past year. That’s the result of experience and some very diligent engineering and process improvement work by Microsoft’s WGA group. A more robust back end, more accurate detection tools, better communication, and improved self-help options for users have all resulted in improvement. I noted a dramatically lower number of support requests compared to two years ago, with a drop of more than 40% in the number of requests for help. (It helps that the measurements were taken at the same time of year, so no seasonal adjustment is necessary.)

For XP users, the difference is especially striking. WGA errors seem to have become genuinely rare, with most issues relatively easy to resolve using online tools or simple commands.

There’s no historical database. The decision to remove Vista’s “reduced functionality” option with Service Pack 1 was also smart. That option shouldn’t even be considered with a detection system that is less than 100% accurate (or, in layman’s terms, never), and even then it’s too likely to sweep up innocent bystanders. As of Vista SP1, a glitch or hiccup in the Software Protection Platform components won’t prevent you from using Windows or any programs. It will have zero impact on performance, with only an annoying delay of a few seconds at startup to display a warning message, and a watermark on a black background displaying a similar “Not Genuine” message on the desktop after you successfully log on.

In the past, Microsoft has taken some heat for being disingenuous when it argues that WGA and Vista validation actually benefit customers by alerting them to potential system problems. After sifting through hundreds of problem reports on their forums, I’m willing to give that argument a little more weight. In example after example, a WGA message or validation error was the first sign of what turned out to be a larger problem.

Still, reading through those forums also provides plenty of ideas on how Microsoft can improve not just WGA but the entire Windows experience. Here are four suggestions I’d like to see incorporated into Vista and especially into Windows 7:

1. Simplify Windows licensing and activation. It is almost impossible for mere mortals to understand the nuances of OEM SLP activation and why the product key on the sticker on the side of your PC won’t work after you reinstall Windows. Corporations can pay people to figure this stuff out. Consumers and small business people shouldn’t have to.

2. Provide a plain-English license display that anyone can understand. I should be able to log on as an Administrator, click a link on the Windows Welcome Center, and see a single screen that tells me what type of license I have purchased, when and how my computer was activated, and whether the license can be transferred to another PC. Include a link to the full license agreement, but don’t make me read through it to figure out all the details. On the PC sitting to my left, for instance, clicking this link would tell me the following:

  • I have an OEM license for Windows Vista Home Premium.
  • This license was included with my purchase of an HP Pavilion Elite m9300t.
  • The product ID number is 89583-OEM-7332157-00061.
  • This license cannot be transferred to another PC.
  • The product was activated when the operating system was first installed, and will be automatically reactivated if I reinstall using the restore options HP provided me.

3. Provide a deactivation option for retail copies of Windows. That will make it easy to transfer a license from one machine to another without having to go through activation hassles.

4. Build a simple, usable, web-based front-end for troubleshooting WGA and validation errors. As I noted in my travels through the two support forums, the solutions for most common problems are simple boilerplate, lifted from Knowledge Base articles and pasted into forum messages. Wouldn’t it be easier if that information were organized in a FAQ page that users could find through a search engine and that forum posts could simply link to?

5. Make the Complete PC Backup utility a part of every Vista edition. Most Windows systems can be backed up onto two or three DVDs or to an external hard drive in 1o minutes or less. Well, they can if they’re running the Business or Ultimate or Enterprise editions of Vista. The sad part is that a good backup image can help any Windows user recover from most problems, including validation and activation hassles, in minutes. Home Basic and Home Premium users should have access to this option. It’s too late to make that change for Windows Vista, but it’s not too late to do the right thing for Windows 7.