Why the blue screen of death no longer plagues Windows users

Why the blue screen of death no longer plagues Windows users

Summary: The dreaded blue screen of death is familiar to any long time Windows PC user, but Microsoft has been developing tools to keep the dreaded BSOD at bay.


Remember the blue screen of death, a Windows PC's way of telling you it had suffered an error so catastrophic it couldn't carry on anymore?

The dreaded Windows blue screen of death.

In recent years sightings of the BSOD have become less common in Windows operating systems, as Microsoft has stamped out some of the rogue code commonly responsible.

At a recent event in Cambridge Microsoft talked about how it had reduced misbehaving code in its operating system, using automated tools and a huge amount of crash reports from Windows XP users.

The main cause of crashes in Windows XP was device drivers, which were responsible for some 85 percent of hiccups in the OS. Drivers are the code that allow an operating system to control a hardware device, such as a video card, handling commands between the device and the core of an operating system, the kernel.

Drivers can be particularly difficult to debug, as their code will be written by different companies and is generally not open source, so is opaque to Microsoft. Their interactions can also be rather complex, with drivers commonly interoperating with a stack of other drivers.

"There's an exponentially growing number of device drivers in the ecosystem and they're written typically not by Microsoft but by our partners," said Byron Cook, principal researcher at Microsoft Research lab in Cambridge and manager of Microsoft's Programming Principles and Tools group.

"There are a number of rules that these systems must adhere to, otherwise the whole system is going to crash."


How Microsoft stamped on driver errors

Teams in Microsoft's Windows division developed algorithms that took in driver-related crash reports from XP users and automatically categorised them by driver vendor and the likely cause.

The goal for Microsoft was to figure out which drivers were causing problems and what the most common fatal mistakes were.

Microsoft established there were three ways that device drivers commonly tripped up Windows XP.

First was drivers breaking APIs in the Windows OS that handle communications between the Windows kernel and the driver. An example of this is a driver twice calling the Windows kernel API IoCompleteRequest, which caused Windows to crash.

The second major cause of errors was memory corruption, where memory is not correctly allocated for data structures needed by the driver. The third was drivers hanging the system after getting caught in an infinite loop.

To reduce the number of buggy device drivers, Microsoft embarked on what it called "data-driven program verification". This is a process whereby "you model a computer program as a mathematical system and the goal is to build tools that find proofs of correctness using mathematics and logic", said Cook.

"The goal is to build tools that automatically find proofs of correctness rather than just enumerating all the possible test cases", thereby accelerating the rate at which bugs can be stamped out.

Microsoft developed three new tools for automatically spotting and squashing software errors. The first was a piece of software called Slam, which checks that the properties of a piece of software will work with interfaces that software uses. Slam was used as a the verification engine for the Static Driver Verifier tool, which is now part of the Windows Driver Development Kit.

Another Microsoft tool, Slayer, addressed memory corruption. Slayer analyses data structures associated with a device driver and checks that every memory address the device driver touches has been properly allocated.

Using these tools Microsoft found a number of bugs in device drivers written by third parties, but also among the 40 or so sample device drivers Microsoft provided as part of the Windows Driver Development Kit.

"If you're a device driver writer what you do is typically copy and paste that code and then modify it," said Cook.

"So bugs in those samples are pretty bad because they then propagate throughout the infrastructure."

As well as fixing the bugs in the samples, Microsoft has now released tools to device driver writers that they can use to find bugs in their code.

Working out whether a device driver would get stuck in an infinite loop was a bit more tricky, as Microsoft was faced with the difficulty of addressing the halting problem. The famous mathematician and father of computing Alan Turing proved that a general algorithm for solving the halting problem couldn't exist for all possible program inputs.

But Cook said the nature of device drivers meant there were ways to analyse drivers to see if they would terminate.

"The nice thing about device drivers is they are typically quite small, about 30,000 lines of code. They typically don't have too many nested loops, and there are some other things about them that means you can succeed in this domain where you might not succeed generally," he said.

The team developed a termination prover for Windows device drivers called Terminator, which works on device drivers up to 35,000 lines of code. Terminator helped uncover a number of bugs in Windows XP, for example unplugging a mouse while moving it would cause XP to hang the system, as the OS would get stuck walking the I/O request queue forever.

Cook said the stability of recent Windows OS, such  as Windows 7 and 8, has benefited from Microsoft's work on stablising drivers.

"The internal crash data has pointed us towards buggy device drivers we should be focusing on and allowed us to figure out what the common mistakes are. It has helped us clarify with members of the Windows kernel team what rules device drivers should respect, but also what properties we should try and verify in programs," said Cook.

Further readings about Windows

Topics: Software, Enterprise Software


Nick Heath is chief reporter for TechRepublic UK. He writes about the technology that IT-decision makers need to know about, and the latest happenings in the European tech scene.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • 'Tis True

    I've yet to see a BSOD on W7.

    Next area for improvement is the updates process, it's dreadful when compared to Linux. How many reboots !
    Alan Smithie
    • BSOD in Windows these days

      For me it is always faulty hardware.

      RAM, Video Card, Disk drive, or other critical hardware going bad can cause a BSOD.

      I can't even remember the last time I have seen Windows XP BSOD unless it had faulty hardware.
      • Same here.

        Awhile back I was getting random BSOD's on a 2005-era computer running Win 7. I eliminated "all the usual suspects"--fan (overheating), memory, power supply gone flaky, CPU, swapped hard disk + OS, and it continued. Something in the mobo itself had gone flaky, so I wound up replacing the mobo, CPU and RAM with a newer one and the problem disappeared.

        A few months later the newer one started having random BSOD's. I swapped out the power supply and that got rid of the problem.
        • I would get that

          on cheaper motherboards. I think it happens when their is a fault in the board that doesn't show up until it heats up. Tied with a poor power supply that's not giving a consistent voltage exacerbates that problem.
      • Hardware was always the issue

        Last BSOD issue I had with Windows 7 was faulty RAM. Been using Windows since windows 95 and since XP BSOD's just disappeared. Its really a non-issue these days. Windows 8 UI is a much bigger issue, that and NSA back doors......
      • I haven't seen a BSOD in well over a decade.

        It was actually fairly simple to avoid them, even in XP. When I built my computers, I only used high end parts which were on Microsoft's Hardware Compatibility List (HCL.) Those components always had top quality drivers, so I never had any trouble. I left my systems on 24/7 without any problems. It always mystified me that big brand computers had driver problems and subsequently BSODs. You would think they would know about the HCL and stick to those parts.

        I've had more kernel panics on my Mac in the past year than I had in the last 12 years on my Windows PCs.
        • Exactly

          CorrectAmundo on both counts.
        • Power supply

          When I switched to a quality power supply a lot of problems went away, and it lasted about years too. Finally the caps blew.
    • Saw one in conjunction with using VMWare

      I saw one. You'll reliably BSOD if you interrupt a Ping via the Visual Studio debugger.
    • Oh, you probably have. MS changed the behavior of a BSOD

      some time ago. Instead of throwing up the screen, the computer simply restarts spontaneously. So, if while working your PC just suddenly restarted, you suffered a BSOD.
      • Interesting...

        Why didn't Microsoft just do that in the first place. Would have saved a lot of work!
        • Actually, they changed the default behavior of BSOD's

          Microsif added the mini-dump concept and the support for forwarding these to their database of failures. After this they realized giving the blue screen that just sat there was no longer the appropriate default. Most of the old behavior is enablable still.
      • Just experienced first 'BSOD' on Windows 8 a couple days ago

        A message popped up saying something like "Your PC ran into a problem and needs to restart" We have had the Dell XPS PC since March and this is the first time I have seen this bugcheck 'BSOD.' I haven't checked the dump file yet to see what caused the reboot. My wife was filling out a job application online and lost her work and had to start over, but I guess it was still smoother than the classic BSOD.
      • exactly

        Windows improve a lot recently. However my PC occasionally reboots due to hardware issue. Sure enough the dumps is there on the HD
    • Rare Windows 7 BSOD

      Compared to previous versions, Windows 7 has been a relatively bulletproof OS. I've only run into the BSOD once in Windows 7 versus numerous ones in Windows Vista (I don't remember seeing too many in Windows XP, but that's most likely because I didn't use so much high-end hardware). I don't even remember what the reason was for the Windows 7 BSOD I ran into.
    • Much improved, but could take more inspiration from Linux

      BSOD nowadays virtually always happen because of either faulty hardware or bad drivers. More and more the former now too. The OS itself from Win7 onward is far more robust.

      I have encountered one Win7 BSOD since its release--it was attributed to a driver and the hardware vendor responsible for said driver triaged the issue and sent out an updated driver.

      The article made a brief but profound statement:

      "Drivers can be particularly difficult to debug, as their code will be written by different companies and is generally not open source, so is opaque to Microsoft."

      BINGO! Guess why MSFT has always been playing catch-up with Linux and BSD? The single biggest reason is because the code is OPEN. The kernel devs can see the source code of drivers that talk to it, and driver authors can see the source code of the kernel. Indeed in the case of Linux drivers tend to be distributed as modules within the kernel package.

      Even ultra-closed Apple is open within its ecosystem. Apple is a Mach/BSD OS, and as a vertically integrated "silo" they hand-pick the hardware that their software runs and demand sufficient access to information from suppliers to make sure their smaller subset of drivers functions correctly.

      So besides Windows SCREAMING for somehting like "apt" package management and associated system update cababilites, I would say Windows needs to foster a more "open source" development ecosystem. They must reveal more of the source and inner workings of their OS to vendors responsible for develpoing drivers. In retrun, driver authors should have to disclose the source of their drivers to MSFT as a prerequisite to get the all-important "windows certifiaction" logo on their hardware. Aside from taking everything in-house and tryting to be "another Apple" this is their best hope to maintain OS quality.

      Personally I think MSFT should show everyone they are truly serious about being a "devices and services" company and give up on selling their OS entirely. The best chance they have is not to emulate Apple as they seem to be trytng to do with Nokia phones and Surface tablets, but instead "embrace and extend and extinguish" Google business model. Windows should be freely downloadable by anyone who wished to build it--think "Microsoft Windows Open source Platform" (MS WOSP) to go up against Google Android Open Source Platform (AOSP). Then they wouldn't need these advanced tools like SLAM and SLAYER to ferret out driver bugs--all the vendors would fix the problems themselves and even improve the kernel--and that would kill the BSOD for sure!
      Mark Hayden
      • Windows does have an apt-like thing now.

        "So besides Windows SCREAMING for somehting like "apt" package management and associated system update cababilites"

        It does now. It sucks, but it has it: The Windows Store in Windows 8. Okay, it's pretty crappy right now, but I expect it to improve in 8.1 and future patches.

        "driver authors should have to disclose the source of their drivers to MSFT as a prerequisite to get the all-important "windows certifiaction" logo on their hardware."

        I don't know if Microsoft demands source code, but they *do* have WHQL, which does this kind of certification stuff. Nick didn't mention WHQL, but it's also partially responsible for ensuring the stability of current versions of Windows.

        I seriously doubt Microsoft will go open source, though. It's unlikely to be in their plans. They do, however, have a lot of developers - larger than many open source projects. You can be pretty sure a lot of eyes are looking at their code, even if all of those eyes are under Microsoft's payroll.

        "Then they wouldn't need these advanced tools like SLAM and SLAYER to ferret out driver bugs . . ."

        That being said, any developer would love to have those tools, even for open source projects. The more debugging tools we have, the better.
      • OK - you do know that Windows Driver developers

        generally do get access to the Windows Source Code? http://www.microsoft.com/en-us/sharedsource/default.aspx
      • You obviously misunderstood what was meant with the statement

        "Drivers can be particularly difficult to debug, as their code will be written by different companies and is generally not open source, so is opaque to Microsoft."
      • OS Source Code

        I thought with the right NDA in place, Microsoft shared the Windows source code with anyone that needs it. Regarding Windows always being behind Linux, in the past device drivers were always available for Windows but generally not Linux. What does your comment about "always behind" mean?

        If Linux has a better device driver model that is somehow less prone to issues when twice-calling an IOCompleteRequest, enlighten us and tell us. If Linux drivers always run in a special protection ring and therefore they cannot jeopardize the integrity of the kernel and that approach does not cost performance, enlighten us.

        Apple still sells their OS for desktop. They don't for mobile but they only allow you to upgrade so many older models to the latest version of the OS. As the OS versions roll, the performance on older devices also gets slower and slower. That used to be a rabid complaint of Microsoft. Folks postulated that they were colluding with Intel to require a new PC purchase.

        Open source or closed source, Apple or Google -- the name are different but the same old, same old continues. Ideal communal code ecosystems talks but profit walks. In the end they will all do what makes them money regardless of how naive we all are in thinking each one has some "superior" system because it's ABM.