Years ago when I was still a bit more naive, I thought we could end the spam dilemma if we would simply implement domain-level sender authentication using digital signatures. In fact when David Berlind wrote "Why spam could destroy the Internet" in November 2002, Berlind quoted me saying that every domain's official SMTP server should digitally sign each message to prove the email came from that domain. SenderID and Yahoo's DomainKeys came out around 2004 gave me the satisfaction of knowing that I wasn't alone in calling for domain-level authentication and DomainKeys is very similar to what I was proposing in 2002. The difference is that I proposed using standard commercial digital certificates from commercial Certificate Authorities to distribute public keys whereas DomainKeys used DNS to publish its public key information.
I was so sure at the time that if we could only get people to use this system we would surely stop spam. Microsoft's Bill Gates gave me some company in 2004 when he proclaimed that "spam will be a thing of the past in two years' time". As it turns out, we were both wrong and naive to say that we can stop spam because it's like saying you can stop crime and the most we can ever hope for is to manage it to tolerable levels when there are determined adversaries who will do anything to get around any barrier you can put up. I am coming clean on this now because there are still so people who believe that stopping spam is simple and that if it isn't stopped, it's must be the fault of the major ISPs and corporations for dragging their feet.
My colleague David Berlind blamed the spam problem on the big-four email vendors and declared rDNS (reverse DNS) and maybe SPF (Sender Policy Framework) the solution. Now I'm certainly not trying to belittle David Berlind because his heart is definitely in the right place. In fact, I'm essentially saying that Bill Gates and I were wrong to say that say that spam could be stopped and that it's about time my colleague David Berlind takes a good hard look at the problem and stop implying that spam could be stopped if only we did XYZ.
The fundamental challenge here is that we're will never stop spam because we will never go to the pure white-list model where we will only accept email from verified entities. In fact there's the little problem of human rights we have to deal with because words can get you imprisoned or executed in many countries. I never gave much consideration to this issue in the past but I've given it some thought over the years and I've given in to the legitimate need for anonymous and decentralized email.
Why charging for email to stop spam is just plain dumb One of the most commonly floated ideas for stopping email spam is that if only we charged a postage fee for every email ever sent, then the cost of spending spam would be so outrageous that it would deter spammers. Not only will it not work, but there is the risk of abuse by some larger ISPs to charge users and legitimate companies for sending legitimate bulk email under the justification of stopping spam. Why bother charging honest people for email when you can simply fine the bad apples and leave everyone else alone?For one thing, spammers don't send the spam directly; they have their hijacked botnet armies send it for them. These are personal computers (and some servers) that have been taken over with malicious software by criminal. If anyone is going to pay, it will be the owners of those computers who pay.
The second most obvious thing that proponents of the email postage idea missed is that if you actually had such a massive billing scheme in place, it would have to have every sender registered with their credit card on file and every email ever sent had a digital signature that proves it was sent by the purported sender. If this were the case, you would have already stopped spam without charging a dime for any emails because you can slap them with a massive fine if they ever dared send spam. Why bother charging honest people for email when you can simply fine the bad apples and leave everyone else alone?
The key to managing spam is reliable white-lists[Updated 4:40PM - Revised wording for clarity] So what do we do about spam? Well for the most part it is already being managed relatively effectively when a good SMTP gateway solution is in place. When you look inside your Hotmail or Gmail inbox, almost all of the spam is shoved in to the spam folder (which can be quickly flushed) and rarely does spam make it to the inbox. Everything that we're certain is spam is rejected outright and everything we're unsure of will end up in the user's likely spam folder. The user will then skim the spam folder with human eyes and save any legitimate messages and empty the rest with a few simple clicks.
Some of the key criteria for ranking emails as likely spam is to check whether a message is bulk or not using a centralized checksum database, heuristics, IP blacklisting, keywords, and a few other things. These methods are pretty much universal in the anti-spam industry but the way it is implemented means the difference between having a very good spam catch rate along with very low false positives versus poor catch rates with high false positives. The bottom line is that I might see 2 or 3 pieces of spam make it to my inbox and 1 legitimate email in the spam folder and I'll simply flush the spam folder after I spend 3 seconds skimming the subject lines.
What remains a problem is the occasional false positive where good email is lost. My worst fear isn't getting 2 or 3 spams in my inbox but losing legitimate email to the spam filter before the message ever makes it to my computer and that's where the white-list becomes critical. So to make the system better and mitigate false positives where good email gets filtered, we need a reliable white-list of trusted senders that we will always accept and we'll use software algorithms to perform statistical analysis to filter non-white listed email based on a large number of criteria. The challenge is to make the white-list as encompassing as possible while keeping its integrity.
The biggest problem with email right now is that there is no reliable white-list mechanism in place since relatively few email domains have implemented DKIM (IETF standardized version of DomainKeys). I left out SenderID because it lacks non-repudiation and it breaks email forwarding which are two deal breakers in the creation of a trustworthy white-list. Breaking email forwarding is a deal breaker for many organizations so that's one major strike against SenderID. But what would be a travesty is if large ISPs could bully small businesses in to paying thousands of dollars a week to send legitimate email Non-repudiation is critical in enforcing proper behavior among white-list participants since you can't send a piece of spam to someone and claim you didn't send it because your digital signature is on the message.
If an email to me came from a SenderID domain from the official SenderID-designated SMTP servers and it contained spam, what can I do about it? Nothing since I have no way to prove it came from that domain short of having a trusted third party monitor my mail infrastructure and witness the spam coming in. If the email came from a DKIM domain, I have all the proof I need in the email itself because it contains a digital signature that only the legitimate mail server could have generated and I don't need any witnesses. If I were a small business that relied on sending out tens or hundreds of thousands of legitimate bulk mail, I would be happy to put up a $1000 bond that I would forfeit if anyone can submit a piece of spam with my mail server's DKIM signature and I would be kicked off the white-list permanently. But what would be a travesty is if large ISPs could bully small businesses in to paying thousands of dollars a week to send legitimate email.