Spam fighters open up

Yahoo and Microsoft each have put the interests of all Internet e-mail users ahead of their own by not only inventing techniques that could lay the necessary foundation for ending spam, but by making those techniques freely available to competitors.
Written by David Berlind, Inactive
For almost two years, I've shied away from covering proprietary, non-interoperable anti-spam technologies--solutions that exacerbate the problem, rather than strategically correct it. The vendors of these technologies will tell you otherwise, as they have told me. I've never believed them and neither should you.

Every time I write about spam, every anti-spam solution provider -- and there are more than 200 of them now -- comes out of the woodwork to tell me why their product is the one we've all been looking for. As a matter of habit, I ask them to call me back when their focus turns to creating an anti-spam standard through which all e-mail servers can interoperate at the message transfer agent (MTA) level--a standard that's freely deployable, even by the vendor's competitors. Only then, I have maintained, will we take a step in the right direction and can I consider endorsing the approach.

Well, then is now. Finally.

My hat's off to Yahoo for its DomainKeys and Microsoft for its CallerID. As far as I can tell, Yahoo and Microsoft each have put the interests of Internet e-mail users ahead of their own by not only inventing techniques that could lay the necessary foundation for ending spam, but by making those techniques freely available in a way that allows their competitors to use them . Microsoft and Yahoo are two of only three companies with sufficient presence in the Internet's e-mail system to create or endorse interoperable anti-spam technologies. The third company -- thus comprising the unofficial controlling consortium of Internet e-mail known as AMY -- is America Online, which is testing the independently developed Sender Policy Framework (SPF).

Each of these specifications promises to establish, with a much greater degree of confidence than was ever available before, that e-mails are truly from the source they claim to come from. Should a standard emerge for authenticating an e-mail's source, it would raise a significant barrier to spoofing, a technique spammers often use to falsify their identities. Should all MTAs be enabled with an interoperable technology that establishes an e-mail sender's authenticity, the way would be paved for ISPs and e-mail servers not only to reject mail that's virtually assured of coming from spammers, but to make additional filtering decisions based on what else is known about authenticated senders (e.g.: their reputation). Each of the specifications employs different techniques to accomplish this objective, but all three rely on the Internet's DNS for the retrieval and/or storage of the information necessary to complete the authentication process.

Since December 2003, all that was known for sure about Yahoo's technology was that it was called DomainKeys, that it involved the use of public and private keys, that the company SendMail was testing an implementation of the specification with its MTA, and that the technology bore some resemblance to parts of a sender authentication technology known as the Trusted E-Mail Open Standard (TEOS) from the ePrivacyGroup - enough of resemblance that the ePrivacyGroup issued a press release applauding the move, but subtly reminding the world that it held intellectual property (IP) in the area.

Then, earlier this week, just prior to an Internet Engineering Task Force(IETF)-organized meeting of MTA Authorization Records In DNS (MARID)--a group dedicated to the DNS-related fundamentals behind DomainKeys, SPF, and CallerID--Yahoo submitted its DomainKeys specification to the IETF as a Request for Comment (RFC). Although the collective power of AMY is probably enough to turn any mutually agreed upon anti-spam technology into a de facto Internet standard, the IETF is regarded as the official standards-setting organization for most of the Internet's standard protocols; submitting an RFC, as Yahoo has done, is the first step that a specification must take before it can be considered for ratification as an IETF-endorsed Internet standard.

In addition to submitting an RFC to the IETF for DomainKeys, Yahoo also published its licensing terms for the technology. Whereas the IETF prefers that RFCs be available on a royalty-free (RF) basis, it is less restrictive when it comes to where in the range of RF license types a particular RFC falls. Though RF licensing terms are critical to the mass adoption and penetration of a standard, RF licenses may involve a range of other encumbrances that could accelerate or hasten penetration and adoption.

In offering a royalty-free and very minimally encumbered license, Yahoo is the first member of AMY to set its obligations as a key influential Netizen ahead of any business ambitions that could be connected with its anti-spam intellectual property.

"We definitely thought that a standard needed to be royalty-free with as few restrictions as possible," said Miles Libby, anti-spam product manager for Yahoo Mail. "Anyone can implement DomainKeys as long as they promise not to sue us or other users of it. As soon as they sue, they lose their license. We really hope that DomainKeys becomes an Internet standard and we want to make sure everybody has the right to use it."

Additionally, those licensing terms will not be contested by TEOS-IP holder ePrivacyGroup. According to ePrivacyGroup's Vincent Schiavone, "We will work hard with Yahoo or anybody else to resolve any conflicts that might arise in order to make sure that any [relevant] items contained in TEOS can be contributed to the public domain on a royalty-free basis."

Of course, the benefits

Even if the motives appear altruistic, that's not to say that there aren't business benefits for Yahoo or Microsoft should authentication technologies like DomainKeys, CallerID, or SPF be adopted en masse. Such standards could lead to some much needed relief to the systems that transmit and store e-mail. As a side note, judging by the e-mail storage limit war that has erupted between Google, Yahoo, and Lycos, I'm not so sure that the storage problem is as bad as many say it is.

Regardless of the business benefits, Yahoo is still to be commended for meeting its social obligation by taking this giant step. Dan Rosensweig, Yahoo's chief operating officer, told me, "As spam is an industry-wide issue, we believe it is important to collaborate with the broader Internet community on promising e-mail authentication technologies. We are encouraged by all of the progress being made to protect people from spam and e-mail forgery, and will continue to actively participate in these important discussions."

Not to be upstaged by Yahoo's well-timed-to-the-MARID-meeting announcement, Microsoft will, by the time this story is published, or very shortly thereafter, submit to the IETF the specification for its CallerID technology. According to Sean Sundwall, spokesperson for Microsoft's Anti-Spam Technology and Strategy Group (a group that employs 60 people for the spam problem alone), "Microsoft's submission of the CallerID specification to the IETF will happen within days." However, there remains a bit of controversy around the already published licensing terms for CallerID.

Not surprisingly, any royalty-free licensing terms from Microsoft, especially regarding a technology that could become a permanent fixture in one of the Internet's two biggest killer applications, gets special scrutiny. The first CallerID license raised industry-wide concerns about the perpetuity of the royalty-free terms. Microsoft's revision to the licensing agreement may have fixed that problem, but raised new concerns about sub-licensing--a common wrinkle in the licensing of standard specifications. Though everyone's guard appears to be up (because it's Microsoft) and the current license is considered a barrier to mass adoption, most people I spoke with (both in and out of Microsoft) regarding CallerID's licensing terms view the hang-ups as ambiguities that Microsoft will eventually resolve.

"The idea has not been to create a business model," according to George Webb, Microsoft's general manager of its Anti-Spam group. "The intent of our licensing terms is to prevent folks from creating [functionally different derivatives] of CallerID and then having those versions being something that others profit off of."

"We've had a lot of feedback and concerns," Webb told me. "I'm aware of the sub-licensing and perpetuity issues. There will be a new IP declaration that goes with the RFC and there will be modified terms as well. The intent is to have this as broadly adopted as quickly as possible and to hand this over to a standards body and our submission this week will be a proof point of that."

Issues other than licensing also will determine the breadth and speed of adoption. The more complex a specification is, the harder it is to deploy, the longer it will take before a reasonable number of systems begin to interoperate over that spec, and ultimately, the longer it will be before any of the technologies' effects on spam can be felt. In terms of the three specifications, the first complexity issue that MTA developers are apparently looking at is how much work must be done on both the sending and receiving ends of the pipe. In this regard, CallerID and SPF may hold more promise than does DomainKeys for accelerated penetration.

Although all three specifications rely on the DNS, DomainKeys alone requires changes on both the sending and receiving systems, while neither CallerID nor SPF require changes to the send-side of an MTA. In order to get started, only the receive-side of the MTA must be modified to compare certain information found in inbound e-mails to information that's stored in the DNS. In fact, the similarities between SPF and CallerID are compelling enough that the two are viewed as competing specifications on a path to merge once their differences can be resolved.

One of those differences is Microsoft's employment of XML to store information about sending systems in the DNS. According to John Levine, co-chairman of the Anti-Spam Research Group (a working group within the research arm of the IETF known as the Internet Research Task Force or IRTF), "Merging the two shouldn't be that hard. Microsoft would just have to drop XML from the CallerID specification, and we'd be done. XML is the main complaint and an XML parser is a large chunk to add to a lot of programs," said Levine. "XML is a good way to do a lot of complex data, but we don't want complexity."

Microsoft's Sundwall says that Microsoft is trying to ease the pain associated with XML by creating wizards that make it so easy to generate the XML string that's entered into the DNS that even his grandmother could do it. Said Levine, however, "Creating the string is the easy part. Parsing it is hard. Not only must a routine on a receiving system be able to parse correct XML, but it must also be able to deal with arbitrarily coded or hostile data." Complexity leads to vulnerabilities, Levine added. "The more complicated the format, the harder it is to see and deal with [in the parser] all the ways that it might be broken." Because of the complexities associated with CallerID, Levine says, "If the MARID group decided to do the easiest thing first, then that would be SPF."

One major criticism of CallerID and SPF that DomainKeys overcomes in some situations is a forwarding problem. While CallerID and SPF are hailed as solutions that require no changes to the send-side of an MTA, there's a downside to this advantage. When users set their inboxes to auto-forward their mail to another inbox, the information from the original sender (which could have been a spammer) needed to complete the authentication process is lost.

DomainKeys, however, adds new information to an e-mail as it's being sent. That information stays with the e-mail when it's forwarded, which in turn preserves the integrity of the comparison that the final receiving system must make to data that's stored in the DNS when attempting to authenticate a sender. This, in combination with DomainKeys' use of cryptography to secure that data, is one of the reasons that DomainKeys is not considered to be a competing specification. Instead, DomainKeys is seen as complementary, and proponents of SPF and CallerID, including Microsoft's Sundwall, regard the rollout of DomainKeys, or something like it, as a natural second phase once the easier specifications are deployed.

"Spoofing is a humongous problem that needs to be solved now," said Sundwall. "We think DomainKeys is a great idea. We just don't want to wait. We see it being implemented on top of CallerID, SPF, or a specification that's a merger of the two." Sundwall said Microsoft is working closely with Meng Weng Wong, the developer of SPF, to figure out how the two specifications can be merged.

Should any of the three specifications (or some IETF-inspired permutation) penetrate the Internet to the point that most MTAs are interoperating over them, the question then becomes: What's next? Most people I've spoken with think some sort of reputation management system comes next.

IronPort is one company attempting to address the idea of a reputation service through a protocol it calls SMTPi. But the ePrivacyGroup's Schiavone thinks that's putting the cart before the horse. "Once you've established the 'who', there are still two more pieces of information that are more important than reputation: the 'what' and the 'why,' " said Schiavone. "What are you sending to me and why are you sending it? You are David Berlind from ZDNet, you are sending me a newsletter, and you're sending it to me because I [requested it.]."

TEOS, as Schiavone describes it, is about providing the sort of granular information that helps recipients separate the wheat from the chaff. "The beauty of such granularity," said Schiavone, "is that existing laws address its usage. If you misrepresent any of the information, it becomes a fraud or truth-in-advertising situation."

Levine, the co-chair of the IRTF's ASRG, is helping Schiavone codify the TEOS specification into yet another RFC--a protocol layer that would sit on top of authenticating protocols like SPF, CallerID, and DomainKeys.

What chance, if any, do these specifications have of finding their way into the Internet?

Eric Allman, CTO at SendMail, is optimistic. SendMail has been testing DomainKeys and, according to Allman, is also working with CallerID. These activities bode well for the penetration of the specifications. According to Yahoo's Libby, SendMail's MTAs account for 60 percent of all MTAs on the Internet. Between support from AMY and the sort of penetration that SendMail could bring to bear, the authentication specifications in question could get a good head start. But a head start is one thing, penetration is another. As Allman pointed out, SendMail's support of a specification, and the penetration of that specification, are two entirely different issues. "The architecture of SendMail is such that support can be included in new versions as well as added to old installations," said Allman. "Whether or not users, especially existing ones, take advantage of the additional support is a separate question."

OK, so it may take a while. But it's a start, and both Yahoo and Microsoft deserve credit for doing the right thing to rid the Internet of its worst scourge. Let's hope they stay the course.

Editorial standards