DKIM: Useless or just disappointing?

Now that DKIM is established as the leading method for sender authentication, it's clear that it doesn't really claim to do all that much, and fails even at that.
Written by Larry Seltzer, Contributor

Spam is perhaps the oldest of security problems affecting Internet users widely. A lot of effort has been put into fighting it, and yet it persists. Even the most advanced of standards for combating spam fails in the face of a simple spoofing attack. There's probably nothing that standards bodies can do that will make a real difference.

The original designers of SMTP neglected to include any method for authenticating the sender of an email message. By the time it became clear something had to be done, SMTP was so widely deployed that changes became difficult. Any mandatory change would break existing mail software, so it would have to be optional.

A lot of work was done on this problem 5 to 10 years ago and one standard emerged as the gold standard: DKIM (DomainKeys Identified Mail), a synthesis of standards from Yahoo! and Cisco, both motivated to improve on the popular, but very limited SPF standard. A newer standard, DMARC, attempts to synthesize both SPF and DKIM into a set of consistent procedures in order to facilitate interoperability.

DKIM attempts to allow a recipient to verify that the domain from which the message is purported to originate is in fact the sender of that message. The sending domain digitally signs the message and specified parts of the message envelope using a private key, and puts the signature into a "DKIM-Signature" field. The recipient reads this field, which includes the name of the purported sending domain, and retrieves that domain's public key from the DNS. It uses this to verify the signature against the contents of the message. This proves both that the message was in fact sent by the domain which claims to have sent it and that the signed parts of the message were not modified in transit.

Dave Rand and Doug Otis of Trend Micro argue that a weakness in the specification means that end-users can be effectively spoofed by messages that pass all the tests in DKIM. If you "prepend" a second From: field in the header, the user may see that one.

By default, under the spec, DKIM doesn't sign or check signatures on most parts of the envelope (i.e. message header elements like these), but the sender can specify that they are signed with (for example) an 'h=From:;' in the DKIM-Signatures header. Consider this scrap of message header:

Date: Thu, 8 Aug 2013 21:44:28 -0700 (PDT)
From: Barack Obama <barack@whitehouse.gov>
From: Random Spammer <rndmspmr_098435@yahoo.com>
Reply-To: Random Spammer <rndmspmr_098435@yahoo.com>
Subject: Awarded a Pulizer for "DKIM is Harmful"
To: "Joseph Shmoe" <joe.shmoe@gmail.com>

Note the two From: fields. If such a message is sent to DKIM with 'h=From:', both fields may be included in the signature (the standard isn't clear on the matter), and the end user may see the first one. In other words, the recipient (Joe Shmoe) may see a DKIM-verified message coming from barack@whitehouse.gov. Incidentally, you can also insert multiple To: or Subject: fields and these may also result in misleading behavior.

But it's worse than that. Because DKIM only signs the specified parts of the message, the message can be forwarded on by an intermediary that inserts the extra fields, and the signature will still match. This is called a replay attack.

The importance of this replay attack is hard to understate. Because it can be done, not only may the user be fooled by the spoofed From: address, but the DKIM engine is fooled by the signature. In the above example, GMail receives the signed message with a signature from Yahoo! that will match when GMail checks it. But the message wasn't actually sent by Yahoo; it was resent by some other domain controlled by a spammer which pre-pended other fields. So even the domain authentication isn't what it claims to be.

Otis and Rand's objections haven't impressed the DKIM standards bodies. Their response is that checking for multiple header fields is not a test properly done by DKIM, but at some other level of the email software stack. It's also worth pointing out that messages with multiple From: or other such fields may or may not be legal under the "Internet Message Format" standard (RFC 5322) set as a prerequisite by DKIM. Dave Crocker of Brandenburg InternetWorking, an author of many of the DKIM specs, tells me that it is not compliant, but it's not clear to me from the spec whether it's not compliant or just not addressed.

Otis raised these objections 2 years ago at an early stage of the DKIM Working Group's deliberations on the standard for DKIM signatures.. You can see how it went in this blog post by Otis and the comments below it, including one from Crocker. This dispute re-emerged recently when the IETF accepted RFC 6376 (DomainKeys Identified Mail (DKIM) Signatures) as an Internet Standard. Otis and Rand appealed the elevation of RFC 6376, laying out their arguments in this document: DKIM is Harmful as Specified. The IESG (Internet Engineering Steering Group) rejected the appeal for essentially the same reasons the DKIM working group rejected it.

Otis and Rand argued un their appeal that a one-line change to the spec, requiring a check for multiple headers, would solve the problem. DKIM software already has to read and check all the headers anyway, so it wouldn't be much of a burden on the implementation. The DKIM standards bodies were not moved.

If you read the various discussions on the matter, especially those outside of formal standards documents, you can see that the participants get really worked up. The raised voices come through clearly, even without the use of caps lock. Rand tells me that the influence of large e-mail senders, whose interests might not coincide with those of recipients, is too great in this debate. The big mass-mailers certainly want to be able to get their messages through without any problems, so they would prefer a simple, lenient standard.

I'm uncomfortable thinking of Crocker and some of the others on the DKIM WG as flunkies for mass-mailers. In fact, all of the people I've spoken to about this have long histories volunteering time to work on Internet standards for the benefit of all of us and I don't believe for a second that any of them are selling out. 

On the other hand, both sides can't be right. When it comes to facts on the ground (or in the cloud), Otis and Rand have a really good point. Whether the IETF is correct that From: header checking doesn't belong in the DKIM spec or not, the fact remains that you can easily spoof the From: field in a fully-compliant DKIM-signed message that passes all the tests. 

So how can this be? Wasn't DKIM supposed to stop this sort of thing? No, it wasn't.

When the efforts to create sender authentication began about 10 years ago we all expected a lot more from it. As the years wore on and the true complexity of the task became clear, the scope of what DKIM actually does narrowed considerably. Here's what it tries to do: To validate that the domain which purports to send the message actually sent it.

That's it. This is information designed for back-end server processes, not end users. End users can assume *nothing* based on whether an individual email was validated under DKIM. The message could easily be a phishing attack or contain a malicious attachment or have a malformed header designed to spoof the identity presented to the end users. None of this is DKIM's problem.

You might think that a message which fails DKIM checks is clearly a problem; in fact, RFC 5863, DomainKeys Identified Mail (DKIM) Development, Deployment, and Operations states that '…messages with invalid signatures need to be treated no better and no worse than those with no signature at all'. A message with no DKIM signature would seem to be one which is harder to assess by back-end analysis engines. Even by design, a single message which passes DKIM tests provides no useful information about the reputation of the sender or safety or accuracy of the message content.

Early on, some mail clients, especially web mail clients, displayed a little shield or something similar as a 'trust marker' when a message passed DKIM. The ones I've checked don't do that anymore, and for good reason: End users can't do anything with that information other than to make too much of it.

You might be thinking that DKIM sounds kind of useless, but perhaps we're just expecting too much of it. If we just lower our expectations and look on DKIM as a tool use by other back-end analysis engines to judge the reputation of sending domains, perhaps it could be useful. If you look at a lot of email over time that DKIM has validated as coming from that domain, the idea is that you can be sure that it really is from that domain and judge the domain's reputation. This would be helpful input for an email analysis engine if you really could trust the domain authentication, but the replay attack above shows that you can't.

IETF documents don't even tell administrators to be careful with the results of DKIM checks. They say to whitelist whole domains based on reputation and to judge whether a message has come from that domain based on DKIM. Section 5.4 of the aforementioned Development, Deployment and Operations document states:

DKIM is frequently employed in a mail filtering strategy to avoid performing content analysis on email originating from trusted sources.  Messages that carry a valid DKIM signature from a trusted source can be whitelisted, avoiding the need to perform computation and hence energy-intensive content analysis to determine the disposition of the message.
Mail sources can be determined to be trusted by means of previously observed behavior and/or reference to external reputation or accreditation services.  The precise means by which this is accomplished is outside the scope of DKIM.

And this is what large sites are doing today: They are whitelisting certain other large sites which they decide are trusted and then letting messages pass through unscrutinized if they contain a valid DKIM signature from one of the whitelisted sites. This leaves open the possibility of retrospective analysis of those messages that might affect the domain's reputation, but it still doesn't make sense. With the replay attack, the domain specified in the DKIM signature is an innocent bystander, and there's no sense in diminishing its reputation.

This is why Otis and Rand go beyond calling DKIM 'useless' and use words like 'harmful'. Trend Micro is a major provider of email security services and they say they are seeing messages performing exactly this sort of abuse out in the wild.

Even if DKIM performed as advertised, with the modest claim of sender domain authentication, I'd have to call it disappointing and of speculative value. But it doesn't perform as advertised, and when administrators follow the guidance in the Development, Deployment and Operations document it can be far worse than disappointing.

Clearly the problems which cause DKIM to be unreliable aren't addressed in any Internet standard and they probably can't be. Only proprietary implementations of email security products can look for things like double From: headers. Standards, and especially sender authentication standards, have failed us.

Editorial standards