Bayes Theorem keeps spam out

March 25, 2005, 12:22am PST | Length: 00:03:13
When applied to spam, this probability theory kicks out real junk mail and is less likely to create false positives.

Transcript

Bayes Theorem keeps spam out

The first generation of spam filters was pretty crude. Itnot only missed a lot of spam, but it misidentified real messages as spam. Thenew generation of spam filters is much more flexible, much more likely to catchtrue spam and much less likely to indicate a false positive, flagging a messageas spam that's a real message.

Most spam filters actually use what's called Bayesfiltering, which relies on Bayes Theorem. Dr. Thomas Bayes was a 19th centurymathematician. In his theorem, well, it goes something like this. Theprobability, just kidding, let's not do the math. Think about this. BayesTheorem basically says if you have an event and you want to know the likelihoodof that event happening, you can determine a probability by looking at a subsetof randomly selected parts of that group.

Think about an election. Let's say you've got a group ofvoters here and they're going to vote on a referendum. They have to vote eitheryes or no and you want to know what is the likelihood that there's going to bea yes vote. If you take a randomly selected subgroup and you learn thedistribution of yes to no in this group, let's say in this case 7 yes, 3 no,you can use Bayes Theorem to predict what's the likelihood of a yes vote fromthe larger group. I think the math works out in that case of 7 to 3 distributionthat you have an 89% probability that the vote is going to be yes. So that'show Bayes Theorem works in an election.

Now with spam filtering, it's basically the same kind ofproblem. You want to know yes or no for spam within this whole universe of youre-mail inbox. So Bayes filters take a subset of that and they test fordifferent conditions. They look at an individual work and say, what is theprobability that that word is spam from looking at the subset. So look atcombinations of words and a probability to the combination. They look at colorsin an HTML e-mail. What's the likelihood that a particular color in an HTMLe-mail indicates spam or what kind of URLs, what kind of links are in there andwhere the placement of individual words is? All kinds of conditions and theycan assign a probability using an individual condition or a combination of anyone of these. That allows you to build a filter back here that is veryflexible, you can set it to be as restrictive or nonrestrictive as you want.This allows you not only to catch a lot more spam, but a lot less falsepositives, because you're not just looking at say, you know, in the old age,you'd say, any e-mail that has Viagra in it is automatically spam. Now you cansay any e-mail that has the word Viagra you can say by itself 90% likely, butif it says Viagra plus pills, 98% likely, Viagra plus 'buy now' 100% likely.

So by using Bayes filtering you're able to reduce the numberof false positives and have a much more efficient spam filter.

Getting hooked: Phishing, pharming and online threats

Getting hooked: Phishing, pharming and online threats

Sponsored: There's no shortage of malicious code on the Internet. Agent Peterson of the Geek...

Vista: User account security

Vista: User account security

David Berlind, executive editor at ZDnet, explains how new security features in Windows Vista...

Attack of the mobile viruses

Attack of the mobile viruses

As mobile communications become more prevalent, so does theonslaught of viruses. CNET's Robert...

The next Sober virus attack

The next Sober virus attack

CNET Senior Editor Robert Vamosi explains how and when the Sober virus is expected to strike...

Managing emerging e-mail threats

Managing emerging e-mail threats

With viruses rife and three billion spam messages flowing around the world each day, some of...

Phishing vs. pharming

Phishing vs. pharming

Phishing involves the receipt of an e-mail message that appears to come from a legitimate...

Virus vs. spyware

Virus vs. spyware

Which is it? Once you determine the 'who', the 'why', the 'what' and the 'how' it all becomes...

Beware of spyware

Beware of spyware

How does spyware get its hooks into your computer and what can you do to remove it?

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

White Papers, Webcasts, & Resources

Facebook Activity