Tech

90% of all statistics can be made to say anything... 50% of the time, aka my thoughts on the Verizon report

** Update 06/23/2008: I realize I didn't do a very good job of talking about what we're reviewing here. This is in response to the statistics gathered by Verizon related to Forensic Analysis of Data Breaches over a four year span.

Written by Nathan McFeters, Contributor June 22, 2008 at 4:28 p.m. PT

First off, let me start by saying, I'm 100% not putting Verizon's numbers into question. What I intend here is simply to provide a second opinion on a complex set of data that can be easily misinterpreted. To get you started, here's the report I'm talking about. When reading this, keep in mind that I've worked the last five years of my life as a computer security consultant, the last three with Ernst & Young's Advanced Security Center, and that may certainly put a spin on these numbers that others won't agree with. That's fine, the comments here are simply meant to stimulate thought. I'm looking forward to talkbacks on this one, so if you agree, if you disagree, if you think I'm a bastard... feel free to comment.

As you dig into the report, keep in mind this is a massive compilation of over 4 years worth of study from over 500 forensic investigations. Read on... if you dare.

The first thing that will jump off the report at you is the first question, "Who is behind data breaches?" which led to the following stats:

73% resulted from external sources
18% were caused by insiders
39% implicated business partners
30% involved multiple parties

The first thing you're thinking is, "Wow, my consultant has been lying to me about internal threats!", the thing is, that's not necessarily true. First off, the context around "implicated business partners" and "involved multiple parties" leaves something to be desired in terms of clarification, here's why:

Many "business partners" have access roughly equivalent to internal access due to VPN, B2B, and extranet connections that are shared between partners. This could lead to an insider threat from an outside source. My personal experience (which admittedly is not based off of four years of forensic investigations, but is sizeable) is that most clients I work with do NOT have sufficient internal network segregation controls, whether they do this by firewall, VLAN, etc. I can't say how many times I've been able to hack into a "business partners" weaker applications or network and use that access to compromise another companies Intranet. Even more discouraging is how many of my clients that I still see using a "flat" network structure.
The term "business partners" is pretty ambiguous. Does this mean, simply people I do business with, like third-party vendors or does this mean business partners that my company owns? There's a big difference there, in one case you have true outsiders, in the other, you could attribute the business partner to an insider as well.
What exactly does multiple parties mean? Does that mean several conspiring external forces, or outsiders conspiring with insiders? The question this brings up is are things being double counted or not? Obviously if you total up the above numbers, they do NOT equal 100%, so there must be some overlap, but where and how much? Hell, multiple parties could simply mean more than one person.
Does external sources mean exclusively external sources, or external only attacks + external with internal help attacks + external business partner attacks? For that matter, what does external sources mean? Does it mean attackers that are not a part of the business, or does it simply mean the attack came from outside the victim's network?

Further more, as Verizon states:

"Breaches attributed to insiders, though fewer in number, were much larger than those caused by outsiders when they did occur. As a reminder of risks inherent to the extended enterprise, business partners were behind well over a third of breaches, a number that rose five-fold over the time period of the study."

Ok, so all this said, you might choose to redefine the numbers that Verizion has provided. In fact, depending on how this data was actually collected, and Verizon's definitions of their own statistics, you might be able to say the following just as easily and possibly more accurately:

34% to 73+% resulted from external sources (assuming that some portion of the 39% of implicated business partners were counted here and really should've been considered insiders as they are truly a part of the greater business and not really external entities)
18% to 87% were caused by insiders (assuming that some portion of the 39% of implicated business partners are really internal to the network and that some portion of the 30% of involved multiple parties could've included internal resources)
39% implicated business partners
30% involved multiple parties

That really changes the way you look at it. Certainly my analysis could be flawed, but just keep this kind of thing in mind when you are looking at the numbers is that, despite Verizon's best efforts to keep us all on the same page, you truly can't understand the context that Verizon wrote some of this with. That is NOT to criticize Verizon, they did an amazing job of cataloging this information and actually making it mean a LOT of sense... again, I reiterate I love this study... I just have more questions that I hope Verizon will seek to answer.

To be fair, Verizon does try to cover this in their section entitled "Sources of Data Breaches" on page 10 of the 29 page PDF file. Also worth noting, there's some more clarification on what "business partners" means on page 14 of the 29 page document, where Verizon states:

"Partner-side information assets and connections were compromised and used by an external entity to attack the victim’s systems in 57 percent of breaches involving a business partner. Though not a willing accomplice, the partner’s lax security practices—often outside the victim’s control—undeniably allow such attacks to take place."

The second question brings up more questions and warrants further analysis. It states, "How do breaches occur?" and captures the following related numbers:

62% were attributed to a significant error
59% resulted from hacking and intrusions
31% incorporated malicious code
22% exploited a vulnerability
15% were due to physical threats

So the first one that kind of blows my mind a bit is the "62% were attributed to significant error"... so what were the other 38% attributed to? My general thought on a data breach is that somebody, somewhere, jacked something up. Maybe it wasn't the victim companies fault, cause tapes were dropped off a truck, or a third party application had a stack overflow, but somebody messed up. I'm really struggling with that one. The next that bothers me is that "59% resulted from hacking and intrusions" but only "22% exploited a vulnerability". I guess I'm looking for a definition of what hacking, intrusions, and vulnerability means to Verizon, cause I'd expect that far more than 22% of data breaches are due to a vulnerability. The confusion for me goes on, as Verizon states:

"Intrusion attempts targeted the application layer more than the operating system and less than a quarter of attacks exploited vulnerabilities. Ninety percent of known vulnerabilities exploited by these attacks had patches available for at least six months prior to the breach."

Umm.... 90% of the vulnerabilities exploited had patches, but you just said that the intrusions targeted the application layer more than the operating system. This allows me to draw one of three conclusions which I summarize below:

Verizon does not consider SQL Injection, XSS, CSRF, and other application layer intrustions as vulnerabilities, as there are no patches for most application layer flaws.
OR, their 90% of all these issues had patches statement is way off
OR, I'm somehow missing over a critical explanation that makes this straightforward (it is a large report)

Before I move further, Verizon later characterizes this information as relating to only those exploits that involved "known vulnerabilties". This doesn't really help us though. It still begs the question, are they referring to simply those things for which we have CVE reference numbers? Cause SQL Injection is a known exploit at this time (God, I hope we can say that now), and it most certainly does not have a patch you can apply.

It gets stranger at a later point in the report, where it looks to me as if they are counting OS level attacks not once but twice. On page 16 of the 29 page report, the pie chart represents that:

39% are Application/Service Layer exploits
23% are OS/Platform Layer exploits
18% Exploit known vulnerabilities
5% Exploit unknown vulnerabilities

So, somehow, this totals 100% and is represented as a single pie chart... however, their really should be two charts here in my eyes... one that represents application/service layer exploits vs. os/platform layer exploits and one that represents known vulnerabilities vs. unknown vulnerabilities.

You can actually muddy the waters even further as their are so many joint attacks now. Do you count one of my protocol handler attacks, which exploit software such as browsers, third-party apps, and operating system components as an os/platform layer exploit? It's got to be delivered somehow, what if it was delivered through an XSS exposure?

Continuing, the next question asks, "What commonalities exist?" and collects the following numbers:

66% involved data the victim did not know was on the system
75% of breaches were not discovered by the victim
83% of attacks were not highly difficult
85% of breaches were the result of opportunistic attacks
87% were considered avoidable through reasonable controls

Yikes, alright, I don't have much to argue about with these stats, but I will say this... they make me really, really sad. "66% involved data the victim did not know was on the system"... argh. Great real life example of this: Your HR department puts out a salary spreadsheet for all employees on a Windows File Share that includes detailed information such as Social Security Numbers and Bank Account Numbers (for direct deposits).

"75% of breachers were not discovered by the victim" and they go on to say, "Most breaches go undetected for quite a while and are discovered by a third party rather than the victim organization"... yikes. BUT, this brings up a really interesting point... these statistics are all shaded by the fact that there's probably a ton of data breaches that no one ever hears about (except the attacker). "83% of attacks were not highly difficult", honestly, this is a relative statement which is hard to puzzle, because it begs the question, noth highly difficult to who? I actually expect this number is much higher due to the large amount of web application attacks that are leading to data breaches, which all tend to be pretty easy issues to find and exploit.

God this report is just stuffed with wonderful information. I truly commend Verizon on their analysis. I am looking right now at a pie chart on page 13 of the document that might turn some heads. It would seem that China may not be our biggest threat as I'm showing a combined total of 47% of data breaches that were investigated East Europe and right here in our own backyard, North America. In fact, ALL OF ASIA only accounts for 35% of the data breaches. I'm not saying that China isn't a threat, again, keep in mind the nature of the statistics which are all about data breaches. There's a whole lot more hackers are interested in.

Ah, now to my favorite section of all... this one is nasty. Starting on page 22 of the 29 page document, Verizon begins characterizing numbers around the type of data being breached. In 84% of the data breaches observed, payment card data was compromised. To that, I simply say, stay gold PCI, stay gold. Don't expect this number to change much as we move forward, as PCI still lacks the bite to really help. The decisions around allowing web application firewalls as a last line of defense should keep this problem around for quite sometime.

So, my final conclusions to be drawn from the Verizon report... there's lots to exploit, there's tons of data to steal, there's a lot of misconceptions and confusion, and there's still tons of snake oil that's not helping us out at all. Oh, and I'm still bitter about PCI's decision around WAFs. Finally, while this report is great, we still have some unknowns to consider and need to keep that in mind when making strategic decisions about where to focus

-Nate

Editorial standards

Show Comments

90% of all statistics can be made to say anything... 50% of the time, aka my thoughts on the Verizon report

Related

I did not expect this $170 Android tablet to be as impressive as it is

My 2 must-have tools to make DIY projects a lot less frustrating (and they're cheap)

The best indoor TV antenna you can buy: Expert tested