The first generation of spam filters was pretty crude. Itnot only missed a lot of spam, but it misidentified real messages as spam. Thenew generation of spam filters is much more flexible, much more likely to catchtrue spam and much less likely to indicate a false positive, flagging a messageas spam that's a real message.
Most spam filters actually use what's called Bayesfiltering, which relies on Bayes Theorem. Dr. Thomas Bayes was a 19th centurymathematician. In his theorem, well, it goes something like this. Theprobability, just kidding, let's not do the math. Think about this. BayesTheorem basically says if you have an event and you want to know the likelihoodof that event happening, you can determine a probability by looking at a subsetof randomly selected parts of that group.
Think about an election. Let's say you've got a group ofvoters here and they're going to vote on a referendum. They have to vote eitheryes or no and you want to know what is the likelihood that there's going to bea yes vote. If you take a randomly selected subgroup and you learn thedistribution of yes to no in this group, let's say in this case 7 yes, 3 no,you can use Bayes Theorem to predict what's the likelihood of a yes vote fromthe larger group. I think the math works out in that case of 7 to 3 distributionthat you have an 89% probability that the vote is going to be yes. So that'show Bayes Theorem works in an election.
Now with spam filtering, it's basically the same kind ofproblem. You want to know yes or no for spam within this whole universe of youre-mail inbox. So Bayes filters take a subset of that and they test fordifferent conditions. They look at an individual work and say, what is theprobability that that word is spam from looking at the subset. So look atcombinations of words and a probability to the combination. They look at colorsin an HTML e-mail. What's the likelihood that a particular color in an HTMLe-mail indicates spam or what kind of URLs, what kind of links are in there andwhere the placement of individual words is? All kinds of conditions and theycan assign a probability using an individual condition or a combination of anyone of these. That allows you to build a filter back here that is veryflexible, you can set it to be as restrictive or nonrestrictive as you want.This allows you not only to catch a lot more spam, but a lot less falsepositives, because you're not just looking at say, you know, in the old age,you'd say, any e-mail that has Viagra in it is automatically spam. Now you cansay any e-mail that has the word Viagra you can say by itself 90% likely, butif it says Viagra plus pills, 98% likely, Viagra plus 'buy now' 100% likely.
So by using Bayes filtering you're able to reduce the numberof false positives and have a much more efficient spam filter.



















