The blogosphere is abuzz over a newly publicized bug in Windows 7. I read about it yesterday on Chris123NT’s blog, where it was described as a “critical bug in Windows 7 RTM.” The story picked up momentum today when InfoWorld’s Randall Kennedy (the man behind the “Save XP” Astroturf campaign) published a sensational polemic: “Critical Windows 7 bug risks derailing product launch.” Tom Warren at Neowin called it “rather nasty” but sensibly concluded that it’s far from a “show stopper.”
My conclusion? It’s alarming behavior if you’re unaware of what’s happening. But when you look more carefully, it’s arguably a feature, not a bug, and the likelihood that you’ll ever crash a system this way is very, very small and completely avoidable.
You can go read Chris’s initial report to see the repro steps. Basically, you need to run the Windows Chkdsk command using the /r switch, which is designed to locate and repair bad sectors on a disk. According to the initial report, if you use this tool as described, “you should see your memory quickly gobbled away in the chkdsk.exe process until it either stops at or around 90% or it maxes completely out and crashes the computer.”
Let’s all take a deep breath, shall we? I’ve done a couple hours worth of testing this morning on the subject. There’s much less here than meets the eye. The idea that this bug is reproducible 100% of the time is incorrect, and in fact some of the seemingly alarming behavior is actually by design.
First, you won’t see this bug on your system drive. Why? Because if you try to run Chkdsk with this switch (either from the command line or from the graphical interface) you’ll be told that the drive is in use. Windows will politely offer to schedule the disk check to run the next time you reboot and before Windows loads. The disk check in this mode is quick and harmless.
Most systems have a single drive, with a single partition. On such a system, you will never see this issue. Second, if you try to run disk check on a non-system drive that is in use (one where you have recently worked with data files for example), you’ll be offered the opportunity to dismount the drive and continue the check. If you refuse, Windows politely offers to reschedule the check to run at startup, just as in the previous case.
Third, I’ve heard at least one observer speculate that this might affect you if you insert a removable drive and Windows prompts you to “scan and fix it.”
I tried doing exactly that, inserting several USB flash drives until I found one 4GB model that triggered this prompt. It produced the following dialog box:
Note that the second option, to “Scan for and attempt recovery of bad sectors” is the equivalent of the /R switch for Chkdisk. It’s not selected by default, and even when I clicked it, the disk check ran perfectly, without incident. I tried running Chkdsk /r from the command prompt on the same disk, with no excessive memory usage.
As a final stress test, I ran Chkdsk with this option on a 160GB portable USB hard drive, as prompted by the Scan and Fix dialog box. It did indeed exhibit what seemed like alarming behavior, rapidly consuming all but
50MB 500MB or so of the 6GB of RAM on my test system. (That green bar on the bottom means I'm using roughly 93% of available RAM.)
[Click image to see a larger version]
I allowed the process to run, and although it took roughly 15 minutes to complete the check, memory usage never hit the system’s maximum, other programs remained completely responsive, and I was even able to run a second instance of Chkdsk /R on another USB drive.
Oh, and the original report was slightly off base. The extreme memory usage appears to be in the Explorer.exe process, not in Chkdsk. Update: The original report noted, correctly, that high memory usage is observed in the Chkdsk.exe process if you kick off the disk check from a command prompt. If you perform the exact same operation from the more familiar graphical interface, the measurement in Task Manager is different, with Explorer.exe being credited with the memory usage. However, the end result is exactly the same. I repeated these tests using both the graphical and command-line methods on multiple drives to confirm.
[Click image to see a larger version]
Windows boss Steven Sinofsky took the rare step of visiting the original blog and posted a comment explaining the issue:
In this case, we haven’t reproduced the crash…. [T]he design was to use more memory on purpose to speed things up, but never unbounded — we requset [sic] the available memory and operate within that leaving at least 50M of physical memory. Our assumption was that using /r means your disk is such that you would prefer to get the repair done and over with rather than keep working.
While we appreciate the drama of “critical bug” and then the pickup of “showstopper” that I’ve seen, we might take a step back and realize that this might not have that defcon level. Bugs that are so severe as to require immediate patches and attention would have to have no workarounds and would generally be such that a large set of people would run across them in the normal course of using their PC.
My experience bears out that explanation perfectly. According to Sinofsky, Microsoft is now doing “overnight stress testing of 40 machines” to see whether the bug is reproducible. If so, I would expect a patch in short order. But based on my testing I have to agree this is interesting, but far from a “show stopper.”