How IE9 uses app reputation to axe malware

Microsoft security specialist Jeb Haber explains how Internet Explorer 9 is banking on application reputation to cut malware attacks
Written by Mary Branscombe, Contributor

The first release candidate of Internet Explorer 9, the next version of Microsoft's web browser, is due in days, incorporating a number of new security features. ZDNet UK spoke to Microsoft Internet Explorer security specialist Jeb Haber about the browser's application-reputation approach to malware.

According to the latest PandaLabs annual security report, a third of all viruses ever written were created in 2010. That volume of new malware is almost impossible for antivirus software or online malware-blocking services to keep up with, so Microsoft's Internet Explorer (IE) 9 browser will take another approach — and do away with most warning dialogs you see when you download files today.

With the release candidate of IE9 expected next week, we asked Microsoft's Jeb Haber, principal program manager lead for the SmartScreen service in IE, how the application-reputation feature works, what it protects you from — and whether looking at all the files downloaded in IE has privacy implications.

Q: What's the biggest security issue for users that your team is addressing?
A: We think executable downloads are the biggest threat they face. The basic intent of our team is to focus on helping users stay safe online. If you think about the threat landscape, you think about attacks on the computer, vulnerabilities and so on; and attacks on websites, cross-site scripting and that sort of stuff. And then there are attacks on the users, social engineering — that's what we focus on.

We already deal with two types of threats, phishing and malware, with this thing we call the URS — the URL Recognition Service. We picked a specific type of threat, socially-engineered malware and we blocked 1.2 billion in 16 months. Malware is really the biggest problem. We see anywhere from one in 50 to one in a 100 [fewer] phishing blocks compared with malware blocks.

But you're not blocking it all, so you decided to take a different approach?
What we found with all the block-based solutions, with antivirus and our own stuff, there's this latency between detection and protection. We wanted to take that problem of identifying and blocking and turn it on its head. Instead of identifying what's bad, identify what's good — and what's left over, treat that differently.

How do you identify the known good files?
We looked at the concentration of code [on the web] by file hash and code-signing certificates to see if there was a consolidation big enough we could basically build an established reputation list and [say] the stuff that's unknown is risky.

Reputation is either for a specific program — for the hash of the file you download — or the certificate. If you sign code and use that certificate over time, you will develop a reputation.

If a certificate has established a good reputation over time, anything it produces — as long as you do not start signing malware — will have a good reputation. Part of this approach is encouraging good code-signing practices, because it is impossible for us to establish reputation on every program.

We wanted to take that problem of identifying and blocking and turn it on its head. Instead of identifying what's bad, identify what's good — what's left over, treat that differently.

We've seen some malware authors signing code to avoid warnings about unsigned code…
That's great. Now I get to kill everything with one stroke instead of playing whack-a-mole all over the place. I get to take them all out.

If a download has a good reputation, IE9 won't warn you before you download it — and you believe that's safer than warning people all the time?
There's a bunch of warnings we show that are irrelevant. We wanted to get rid of that "everything is scary on the internet" warning. We didn't want that for when you download [something like] iTunes.

Because people ignore it?
It's horrible habituation. People get used to seeing it and they just look for the button to click on. We looked at the data. We know what click-through rates are. It's a meaningless warning for that particular file for that particular user.

In some large sense, yes, things from the internet might be dangerous. But how does that help me when you tell me that about everything? Don't warn people when they don't need to be warned and warn them when they're...

...actually at greater risk. It seems simple but it's not how any other browser works — actually a lot of software doesn't work that way today.

And you think they won't ignore warnings they don't see as often?
The typical user — probably not advanced technical users or enthusiasts but the typical user — will see this warning two or three times a year. I'm being conservative. Lots of users won't see it at all.

[In the beta] we have four different UIs in play so we can see which people engaging with, which they are clicking through more, which end up with more malware running. We're looking at that data to refine the user experience for the release candidate and RTM. There will be changes.

Yes, things from the internet might be dangerous. How does that help me when you tell me that about everything?

How good is the warning? How many downloads can you give a good reputation to and how risky are the unknown files?
About 90 percent have established reputation by hash or cert — this is after [we've done] a bunch of modelling, a bunch of data mining, a bunch of work on the algorithm.

The scary thing is that with today's URL rep, about four percent of program and executable downloads are blocked already by SmartScreen. That's a scary number. The remainder, I call 'stranger danger' — things you probably don't want your non-technology friends and family to be downloading.

What we're seeing in this bucket varies over time but we're seeing 25 to 40 percent of things that show this stranger-danger warning later ending up being confirmed malware. We're saying, based on our data, the risk of clicking that button is 25 to 40 percent risk of infection.

The unknown space is also volatile. About 50 percent [of executables] every day we've never seen before — and we've been tracking these for a long time. So it's either polymorphic malware, where you're getting a new package for every download, or very weird coding practices, unsigned code that's generating itself uniquely every day and so on. The fact that about half the programs behind those unknown prompts every day are new is super-concerning.

If you're looking at all the executables online, is IE tracking what everyone downloads? Is there a privacy issue?
We have a great privacy team here at Microsoft responsible for ensuring our privacy statements are upheld.

Yes, we're collecting data — that's how intelligence works. We are processing massive amounts of data so we're looking at things in aggregate data models. There's nothing in those environments that's specific to any user.

In terms of URL rep, there is an anonymising algorithm that runs on the dataset, a Microsoft standard personally-identifying-information scrubbing algorithm on the inbound data. The data is in an access-controlled environment. There're no third parties accessing the data. It's not being shared outside the company.

This data is not used to target advertising to you. There is no mechanism in our back-end that has anything to do with ads. I don't have anybody on my team that thinks about revenue. Our intent and use of the data is our primary focus of protecting the Windows user.

How effective do you think application reputation is going to be?
I think this is a big one. I feel if users get it, I think it's going to have a huge impact on the number of socially engineered malware attacks.

Editorial standards