McAfee admits "inadequate" quality control caused PC meltdown

McAfee admits "inadequate" quality control caused PC meltdown

Summary: If your company uses enterprise security products from McAfee, you probably had a bad day yesterday. If you're an IT professional at one of those companies, you're probably still cleaning up the mess caused by a defective virus signature update that disabled XP systems worldwide. The worst part? According to a confidential document from McAfee, the cause was a fundamental breakdown in the most basic of quality-assurance processes. I've got the exclusive details.

SHARE:
TOPICS: Hardware
245

Update 23-Apr: Late Thursday night, McAfee posted a FAQ on this issue at their web site. The FAQ includes some of the text from the confidential document I received yesterday and is clearly a later version of that document. However, the details of why the problem occurred and the specific steps that the company plans to take to avoid similar problems in the future have been replaced with general statements. I have highlighted the differences in updates below.

As of 6AM Pacific time on 23-Apr, there is still no statement, apology, or clearly labeled link to support resources related to this issue on McAfee's home page.

If your company uses enterprise security products from McAfee, you probably had a bad day yesterday. If you're an IT professional at one of those companies, you're probably still cleaning up the mess caused by a defective virus signature update that disabled systems running Windows XP with the most recent service pack (SP3). The worst part? According to a confidential document from McAfee, the cause was a fundamental breakdown in the most basic of quality-assurance processes.

From an IT perspective, this is a nightmare scenario: an automatic update that wipes out a crucial system file and that can only be repaired manually. I've heard from more than a dozen IT pros and consultants over the past 24 hours who shared their experiences. They are, to put it mildly, unhappy.

What went wrong?

That was the question I asked in my post yesterday, and I formally asked a McAfee spokesperson for an explanation this morning. I was told that an answer will be posted on McAfee's blog later today. As of this writing, that blog post has not been published.

But I found the answer, straight from the source, in a document forwarded to me by an anonymous source. According to my source, the document was "a confidential communication to enterprise customers" sent via e-mail. In it, the anonymous author acknowledges that the screw-up was thoroughly preventable. The document, titled "McAfee FAQ on bad DAT issue," is written in Q&A format and includes the following exchange:

8. How did this DAT file get through McAfee’s Quality Assurance process?

There are two primary causes for why this DAT file got through our quality processes:

1) Process – Some specific steps of the existing Quality Assurance processes were not followed:  Standard Peer Review of the driver was not done, and the Risk Assessment of the driver in question was inadequate. Had it been adequate it would have triggered additional Quality Assurance steps.

2) Product Testing – there was inadequate coverage of Product and Operating System combinations in the test systems used. Specifically, XP SP3 with VSE 8.7 was not included in the test configuration at the time of release.

Update 23-Apr: The details I quoted above have been scrubbed from the FAQ posted at McAfee's website. The corresponding section of the FAQ now reads as follows: "The DAT release was designed to target the W32/Wecorl.a threat that attacks system executables and memory. The problem arose during the testing process for this solution. We had recently made a change to our QA environment. Unfortunately, this change resulted in a faulty DAT making its way out of our test environment."

McAfee has also sanitized the portion of the FAQ that describes its plans to adapt its quality control procedures. Here's the original text of the confidential document sent to enterprise customers:

9. What is McAfee going to do to ensure this does not repeat? McAfee is currently conducting an exhaustive audit of internal processes associated with DAT creation and Quality Assurance. In the immediate term McAfee will do the following to provide mitigation from false detections:

1)      Strict enforcement of rules and processes regarding DAT creation and Quality Assurance. 2)      Addition of the missing Operating Systems and Product configurations. 3)      Leveraging of cloud based technologies for false remediation. 4)      A revision of Risk Assessment criteria is underway.

And here is the corresponding text as it appears in the final FAQ, published overnight:

What is McAfee going to do to prevent this from happening again?

Nearly all of our 7,000 employees have been working around the clock to help customers like you get back to business as usual and to make sure this never happens again. The vast majority of our customers are now back up and running and we remain focused on those that remain affected.

We are implementing additional QA protocols for any releases that directly impact critical system files. We are also rolling out additional capabilities in Artemis that will provide another level of protection against false positives by leveraging an expansive whitelist of critical system files and their associated cryptographic hashes.

That is mind-boggling. For enterprise customers, Windows XP SP3 is probably the most widely used desktop PC configuration. Leaving it out of a test matrix is about as close as one can get to IT malpractice. Any enterprise customer who received this document has every right to be furious.

Meanwhile, McAfee's website is almost completely silent on the issue. Customers who have been affected by the issue who visit the McAfee U.S. home page see business as usual, with a rotation of large ads trumpeting McAfee's latest products. More than 24 hours after the problem occurred, only a single front-page link is available, and it's blandly headlined, "McAfee Response on Current False Positive Issue." If you go to McAfee's Enterprise home page, there is no mention of the problem and no link to any support resources. An overseas correspondent sent me a screen shot of McAfee's UK home page, which also has no mention of the issue.

That link leads to a blog post by McAfee's Barry McPherson, published yesterday at 4:29PM. McPherson seems more intent on praising McAfee's researchers and minimizing the problem than helping users. He writes: "We believe that this incident has impacted less than one half of one percent of our enterprise accounts globally…" I find it difficult to believe that the company could come up with an accurate estimate at all, much less do so within hours after the problem was identified. It certainly doesn't match up with the reports I'm hearing from the field.

Update 23-Apr: Yesterday afternoon, the McAfee blog post was edited to remove this reference. The sentence now reads, " We believe that this incident has impacted a small percentage of our enterprise accounts globally and a fraction of our consumer base..."

From a crisis management perspective, McAfee's response has been disastrous. If the company truly cared about its customers, the home page would contain an apology from the CEO and links to detailed support information. Instead, it appears that the company is hoping its customers will just forget about it.

Based on the 100+ comments to McPherson's post, customers who were hit by this error aren't likely to forget about it soon. And when they figure out that a lapse in the most basic of quality control steps caused them to spend thousands of dollars in IT manpower and lost productivity, they're likely to be angrier still.

Topic: Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

245 comments
Log in or register to join the discussion
  • Were you affected?

    If your company had to spend IT resources fixing this issue, how do you plan to deal with it? Are you considering switching to an alternative security provider?
    Ed Bott
    • Yes...

      About 1/3 of our systems still run XP SP3. We evaluated Vista but found it to be too much of a headache, so stuck with XP. We've been rolling out Win7 though, so that helped avoid having all of our desktops taken out (We'll be full Win7 and Ubuntu systems in 3 months).

      Although, McAfee having actually tested it properly would have helped more.

      When our contract is up, we will, as always, evaluate options. Overall, we're still very happy with McAfee's suite of products (especially Host Intrusion Prevention & ePolicy Orchestrator).

      I'd also expect that McAfee won't make this kind of mistake again - it's a difficult and expensive lesson, and one they aren't likely to forget.
      s_southern
      • What difficulty and what expense?

        McAfee just swept it under the rug and pretended it never happened. Oh yeah, they fixed the problem, but how much pain, as in monetary loss, did they suffer for this? Not enough! On the other hand, how much pain, as in monetary loss, did their customers feel? Entirely too much! And where is McAfee's customers compensation for their losses? Nonexistent!

        Now then, what kind of lesson do you think McAfee really learned?
        softwareFlunky
        • Pain's a coming...

          I bet an army of corporate litigation lawyers are sharpening their pencils in anticipation of the claims for damages (lol, pencils are more reliable than PCs at the moment), those ARE gonna cause Mcafee some pain.
          AndyPagin
      • Some questions for you

        Leaving aside how you find Vista to be a headache and Windows 7 to *not* be a headache when they are incredibly close in terms of usability, I do have some more questions.

        For one, McAfee is, was and probably always will be a pain to use. It is badly configured, not easy to sort through and often blocks basic system functions while STILL allowing viruses through. While I currently am not working now, up until the middle of last year I had been employed at a tertiary level institute which used McAfee as provided by the government to all large scale businesses in my country. Previously Norton was used, and for certain computers AVG free was installed. Suffice it to say, viruses still managed to hit 95% of all computers in the institute, save my personal laptop (using Bitdefender 2009 and then 2010 beta), three computers I personally installed and configured Bitdefender total security 2009 on, my boss's laptop and most of the servers. I say most, because the file hosting server often would find virus-riddled software stored on it.

        What I don't understand is how you find it so useful in your institution? Even my boss found it very annoying. He told me when I entered how Symantec/Norton was doing well. Within a week he was annoyed with it. Then he tried McAfee and said he's happy with it. Two days later he's uninstalling it from his laptop because it's blocking everything on his system and it's a hassle to go and configure everything manually. In the end I think he used AVG Free and left it alone. Most of the systems in the institute were running Windows XP SP3. My laptop used Vista Home Premium SP1 and there were four other computers using Vista Business SP1. Those didn't have many problems in terms of viruses.

        Also as for your expectations that McAfee won't make the mistake again, why do you trust them so much? There is another blog here that was written a couple of days ago which confirmed that this isn't the first time this happened with McAfee. I can't bet on the credibility of the statement, but if his job is making blogs, then he probably wouldn't have made a bold statement such as that without good reason.

        Also, do you use McAfee on every system in your administration? If you do, how are you sure you have no viruses? If McAfee does not pick up viruses on one computer, it will not do so for any. You could have quite a few viruses that it simply won't detect, and you would never know. What are your thoughts on that?
        D2 Ultima
        • Which McAfee are you using?

          We use VirusScan Enterprise 8.7, Host Intrusion Prevention, etc... all managed by ePO. We've been a McAfee shop since ePO 2. This is the first issue we've experienced in almost 10 years. The last piece of malware that got onto any of our systems was Melissa/ILoveYou.

          That said, McAfee is only part of our overall setup. We use MS Forefront TMG (Formerly ISA server) for proxying and ONLY the TMG systems are allowed to communicate with the Internet. Our firewalls all have outbound ACLs limiting traffic to that which is absolutely necessary and we have the host firewalls on each system locked down in some cases to which processes are allowed to communicate with which hosts/ports.

          As for knowing about the viruses, we have other sensors in place that detect any abnormal behaviour on our network or systems, and it's all logged and alerted if necessary. We've gone through the process of documenting exactly what traffic is supposed to be on our network, and anything outside that is flagged as abnormal.

          As well, our email gateway uses 4 different scanning engines to detect malware on inbound our outbound emails.
          s_southern
          • Better set up than we had

            You have more things covered than what I was accustomed to. I honestly don't remember the name of the McAfee we were using, though I remember my boss speaking about ePO numerous times. I don't know what MS Forefront is, though if it regulates gateway traffic we had an Untangle server for that (we didn't have full on government support and our IT budget was considered very low by my boss, though I never knew what it was). The Untangle server only sought to prevent certain uhh... undesirable internet behaviour. It blocked websites that had anything to do with proxies as well. As for the firewall, that was on the untangle server as well, though viruses rarely came from the internet. More often they were from flash drives people used and it spread. We didn't limit traffic because due to it being a tertiary education institute, some people often needed to do research (both students and lecturers). We had an internal e-mail server which eliminated the possibility of outside viruses getting in though. As for Host Intrusion Prevention, I'm really not sure what that one is either, I've never done much with McAfee other than what was required at my job previously, so I'm not sure if we used that or not
            D2 Ultima
    • Got lucky here....

      The time my Epolicy Orchestrator and systems update left me some buffer time for the problem to be found by others. I am about a day behind when the DAT's are published. I did disable all updates last night to be safe. I now have serious questions around Mcafee AV being used in this company. And from the looks of it they are reluctant to be fully transparent about it and thats not good either.
      OhTheHumanity
      • We dodged the bullet too...

        ...on our client site for the same reason (the
        update was one version behind) so when we did an
        update we got fixed one!
        DevJonny
    • Actually, No, but only because

      of the issues we had a few years ago with McAfee that we decided to switch to Symantec Endpoint Protection (not without it's own idiosyncrasies, but easilly mangaged)

      You would think that McAfee would strive to be better then the next guy, only because of things like this.

      If we were using this package right now, I would just "go in for a pound" and just make a vendor change alltogether.

      It's one thig to have a hickup, something much more serious to take down entire companies and municpalities with a patch.

      Admittedlly, I can see this actually helping them internally: They [i]will[/i] strive to make sure something like this will never happen again.
      John Zern
    • No, our EPO saved our asses.

      Thanks to our lazy IT HQ we were saved.

      The time they take to push an updates through our internal update server is long enough that the issue was known.

      We lost a few desktop here and there around the network but all of them were 2nd class users not controlled by the EPO.

      But if the shit would have hit the fan we would have lost millions in lost production. I rather not think about it.
      Tommy S.
      • lolkittenz!

        Waiting one day if you have other protections in place sounds like a good idea in this case.

        More than one day....
        DataFerret
      • No, We were Not Affected

        We did not update to SP3 so we were NOT affected.
        Thank God for that.
        jsparo
        • SP2 support will end in July 2010.

          Support for Windows XP with Service Pack 2 (SP2) will end on July 13, 2010

          support.microsoft.com/gp/windowsxpsp2
          Tommy S.
          • All the more reason to use an OLD OS

            Such as Windows 2000, which is more stable, more usable, and much better than any other MS OS EVER. People had a lot of problems updating to XP sp3 and a lot of them rolled it back to SP2 because of it. WHY do they have to keep making new stuff that doesn't work when there are old solutions that work fine?
            janitorman
          • Uhh... Yeah... No.

            Because windows 2000 is not compatible with all of the newest software, for one.
            For two, it has limitations on maximum RAM, supported processor types and speed, and other related hardware. It also is far easier to network in newer OSes, and for your information, the most stable OS I've ever seen is Windows Vista Ultimate SP1. Yeah yeah yeah vista bad yadda yadda yadda. No, I don't mean windows 7 Ultimate. I'm sure I typed the right sentence. Please stop cussing and thinking I am an idiot. I don't care how bad you *think* it is, Vista isn't bad at all and is very safe and SECURE. Most stable OS I've ever used. If you're fine with windows 2000, then good for you, but most everybody else will find that it can't do all the stuff they want.
            D2 Ultima
    • Not really....

      I have had my share of McAfee screw up way back when it was just AOL and Compuserve... I won't even go there today as I really do wish that my ISP would not pick McAfee for their Internet Security Suite. I have no doubt that for every 100 computers protected and saved by McAfee, there are maybe 1/2X that number having some problem or glitch with McAfee. I have learned a lesson. Never let your system automatically do updates or install new software from trusted sources. Even trusted sources are bound to act like the very sources that they are designed to protect you from.
      FranC.
    • lucky here

      I avoided it by the simple expedient of using Linux for the production environment, which requires no antivirus. Like many a Windows user, I'm not a MacAfee client.

      It is a wake-up call for everyone, though, regardless of whether you're in development or simply supporting the IT: test before you push the upgrade, no matter what application or OS you're talking about, and the more critical the application on the box, the more you should test.
      lordshipmayhem
    • Dodged a bad situation...

      our ePo saved me, i disabled our repository from pushing the dat right away and woke up the agents.
      dexter_rivera@...
    • I quit using McAfee years ago. No problem here !

      I've been using Norton for years now and I have had no problems on either my home or business computers since I switched.

      Way back when I had used McAfee I had repeated problems with supposedly screened viruses getting through. The last straw came with one that destoyed some important data.

      I tell everyone ... almost anything is better than McAfee.
      bob4814