Boy those folks at Google are clever when it comes to spam. The question is whether a technology that they probably have in the works in their labs will be one that can and will be embraced by other major participants in the Internet's e-mail ecosystem. First, an introduction to the problem Google is looking to solve (one that it's recent IMAP announcement is probably related to). Second, what Google is thinking about.
If you follow what I've been writing about spam for what seems like forever now (more than five years), you'll know that I've talked about how beating spam requires a variety of new and universally deployed (in e-mail clients, servers, and services) protocols, all of which work together to reduce the 'surface area' in the Internet's e-mail system that makes spam possible.
One foundational protocol that gets discussed a lot is a universal sender authentication protocol, the purpose of which is verify that an incoming e-mail is really from the sender it purports to be from. One of the biggest vulnerabilities in the Internet's e-mail system is how an e-mail can easily pretend to be from someone it's not (like your bank or eBay -- a weakness that has given rise to a form of spam known as phishing).
There are a number of authentication protocols floating around -- none of which has been universally deployed into all e-mail systems. That sort of ubiquity can be achieved if only the four biggest e-mail technology providers (Microsoft, AOL, Google, and Yahoo a.k.a MAGY, pronounced "maggie") would come to an agreement on one technology that they'll all support. But, even if the four companies can come to some agreement, a universally deployed authentication protocol can only go so far in the war on spam.
Even though you may have established that an e-mail is indeed from the person or domain it claims to be from, that doesn't necessarily mean it's e-mail you want. Most of us get plenty of unwanted e-mail from legitimate senders. One man's junk is another man's treasure. The definition of spam, or unwanted e-mail (I'm refraining from usage of the phrase 'unsolicited commercial e-mail' because I think it's too confining), varies from one Internet user to the next. What you want to allow into your inbox and what I want to allow may be two very different things.
One nasty side effect of the war on spam is what I call the 'deliverability problem.' Every time I write about spam, I get e-mails and Talkbacks from people who say they've found the ultimate solution. Invariably, these are solutions that do a really good job of filtering out spam without too much collateral damage (where some legitimate e-mail gets mistakenly classified as spam and is also filtered out). But when I ask them what their wonderful solution does to make sure the e-mail they're sending to other people doesn't get automatically shuffled into the recipients junk mail folder, that's when they realize that they don't have the ultimate solution. False positives are more troubling than the spam itself because of how easily a mission critical e-mail -- one that hard dollars could be connected to -- might never make it to its recipient. This is why standards are important. Sending and receiving e-mail systems need to be able to engage in a series of handshakes that, when taken together, will go a long way towards solving both the spam and deliverability problems.
Another nasty side effect of the war on spam is the casual, if not lazy, stereotyping of legitimate senders as spammers. By now, you have probably seen something like the "This is spam" button in the Web interfaces of the more popular online e-mail services from MAGY. Spam scholars know all too well that these buttons are kludges that can be very problematic. For example, most e-mail users assume (and rightly in many cases) that by pressing the This is Spam button while viewing an e-mail, they are issuing instructions to their e-mail server to blacklist future e-mails that, in some way, match the current e-mail.
One big problem is that the This is Spam button is a far more convenient way to unsubscribe from a newsletter or automatic mailing (like the one that you might get from FaceBook, LinkedIn, etc.) than the official process you'd normally go through to officially unsubscribe or stop those e-mails. Here again, some collateral damage results from what is otherwise a noble effort in the war on spam. Yes, there are some number of unsolicited e-mails bearing instructions (usually at the bottom) on how to unsubscribe from future mailings. But then again, there are those e-mails that the recipients actually "requested." That request may have come in the form of a subscription to a newsletter (for example, one of ZDNet's newsletters) or maybe it comes as a result of using a service that relies on e-mail for event notification (again, FaceBook and Linked-In are great examples here).
In the case of the latter "solicited subscriptions," using the This is Spam button simply because it's the most convenient way to unsubscribe is what results in unwanted collateral damage. For example, if enough people take this easy way out with a legitimate newsletter (one that they actually subscribed to), there's a chance that the sender's reputation could be digitally sullied to the point that multiple e-mail systems consider the sender to be a spammer, or perhaps in just the one system that internalizes every depression of the This is Spam button. Or, a bit closer to home, pressing the Do not Spam button as a means of unsubscribing could result in the filtering of e-mail you actually want (perhaps somewhere down the line).
Bottom line; using the This is Spam button as a means for unsubscribing may be the easy way out, but it's a bad idea. It's the equivalent of throwing an aluminum can in the garbage when, with a bit more extra effort, you could recycle it instead. Both are a means to the same end (getting rid of the can). One has potential consequences you probably don't want.
To truly address the unsubscription issue, I've often mused (here on ZDNet) about the idea of a standard unsubscription protocol that makes the process of unsubscribing as easy as the process of tagging something as spam. Again, this would involve some sort of user-initiated handshake between the receiving and sending systems that basically contains instructions to discontinue the regular transmission of e-mail. The protocol would not be a simple one. For example, it would have to address a variety of unsubscription demands; unsubscribe me from future editions of this particular newsletter, unsubscribe me from any future mailings from this particular sender, unsubscribe me from all newsletters coming from that domain.
One big difference between an unsubscription protocol and the basic white or blacklisting that some people use to separate spammers from legitimate senders (an alternative is to blacklist the sender) could have to do with what happens if, down the line, you want to subscribe to something from that sender or domain again. In the course of a "subscription handshake" between two systems, the publisher could interrogate the subscriber to make sure the subscriber isn't permanently unsubscribed from the domain in such a way that would prevent the successful subscription to some newsletter. I'm imagining a user experience where by I press a subscribe button and some dialog pops up that says the subscription can't be completed because of an existing domain-wide unsubscription.
Until yesterday, my discussions about subscription/unsubscription protocols seemed a bit rhetorical. Before such a protocol can be layered on top of the Internet's e-mail system, some sort of standard sender authentication protocol must globally take root. But the subject did come up in the course of my interview of David Murry, an associate product manager at Google who is intimately familiar with the Google's decision to support the IMAP e-mail retrieval and synchronization protocol.
One shortcoming in the client/server architecture found in most e-mail deployments (whereby some client -- be it Outlook, Thunderbird, or a smartphone -- is accessing an in and outbox on some e-mail server or service) is the lack of communication of anti-spam indicators between client and server. Today for example, if you're using Outlook to access mail on some Internet-based service and you tag an e-mail in your inbox as junk, that preference usually does not get communicated back to the server. Likewise, when a server has some anti-spam technology on it and is automatically filtering suspected spam into a junk mail folder, you may not be aware of it until you visit the Web interface to your e-mail. One reason for this lack of spam signalling between client and server is that most e-mail retrieval is done over the very simplistic POP3 e-mail retrieval protocol. It simply isn't robust enough to support some sort of standard anti-spam signalling between client and server.
IMAP, on the other hand, is a substitute for POP that is inherently suited to such signaling. The difference between IMAP and POP3 is that IMAP supports the organizational context of e-mail folders. In my discussion with Murray (which will be made available as a podcast later as a part of another blog post), when the end user drags an e-mail into a junk mail folder on the client side, GMail's support of IMAP ensures that that folder will be synchronized with Gmail's spam folder (actually, it's just a tag.. but let's call it a folder for the sake of simplicity) which in turn is converted into a signal that's identical to the way GMail internalizes a depression of Gmail's Report Spam button via the Web-based interface.
Interesting. Very interesting. Very clever too. Instead of coming up with a new protocol for signaling anti-spam indicators between server and client, it's done via IMAP. Another benefit of this IMAP approach is how all Gmail e-mail that's tagged as spam gets replicated to the junk or spam folder on the client side. This way, if you're one of those people that likes to double check a junk mail folder to make sure that no legitimate mail squirreled its way in there, now you can without having to visit the server itself (through the Web interface).
But perhaps just as interesting was where my discussion with Murray led when it came to how Gmail's processes a Report Spam event from a Gmail user. I asked him about the problem of using such buttons and functionality as a lazy way to unsubscribe from something and that's when Murray made it clear that Google has not only thought hard about this problem, but is actually looking into ways to solve it. While he fell short of making any announcements, I couldn't help but read the tea leaves. Google has something coming along the lines of what Murray talks about. Here's what he said:
DM: We know that users are [pressing Report Spam to unsubscribe]. Not only are we aware of this, we are actively working on ways to incorporate that into the Web experience. So, we know that when people mark something as spam, it's not necessarily because something is legitimate spam. Sometimes it is marketing mail that they just don't want. Sometimes, it's actually a mailing list from a friend that they actually don't want to be on. So they mark it as spam so that it will get out of their way and they know they don't want it in their interface. To us, spam is just unwanted mail in any form. And, so, thankfully, Google has a ton of technologies to detect a lot of these things using machine learning to figure out what is the user trying to do. Are they trying to get rid of actual spam? Or, do they just want to unsubscribe. We definitely know that when people click that button, they want to unsubscribe. Without going into any sort of details of future plans to launch stuff, we do know that users are doing this and we are actively investigating making that experience better.DB: Looking for ways to stuff an unsubscribe message back to the sender, maybe?
DM: There are multiple ways of doing it. But, yes. That kind of thing is definitely something that would definitely add value to the user experience.
DB: This is where I've clamored for a standard in the area of unsubscriptions so that everybody works according to the same rules.
DM: Believe it or not, there actually is a standard. There's these x-notify headers for unsubscribing. They already exist. The problem is that there are a lot of folks that don't use the standard. So, I would encourage any of your listeners to look into this a little bit more. I don't know all of the technical details behind it. But what I do know is that there is a way to expose an unsubscribe header in your e-mail and being able to expose that such that when somebody marks something as spam, we can just follow through and use that header and auto-unsubscribe someone, that's definitely something that we want to be able to do because we know that's what our users want.
So, much like the way Google is using IMAP to handle client/server handshaking of anti-spam information, it is also looking to piggyback an existing technology (xnotify) as a standard subscription and unsubscription mechanism.
While I don't have the nitty gritty on how this would work, what I think is safe to say is that, before it could reliably work, it would require more widespread support from other e-mail systems -- particularly the e-mail service providers and solutions that handle the generation and delivery of legitimate bulk e-mail (eg: ZDNet's newsletters). That said, if there's a company with the global influence needed to get others on board with a subscription/unsubscription standard, Google is certainly one such company.
Finally, there's the machine intelligence that Murray referred to that sounds incredibly interesting; the part where Google tries to figure out if, when you pressed the Report Spam button, you really meant unsubscribe. Theoretically, the business process is probably relatively simple. If and when a user classifies some e-mail as spam, inspect the message for an unsubscribe mechanism (a Web link, reply-to, etc.). If such an unsubscribe link exists, then, instead of automatically treating the e-mail as spam, act on the user's behalf and unsubscribe from it instead.
Such a feature would be neat if Google could pull it off. But I can also imagine the difficulty in getting in right. For example, Google may have to decide if, in the course of automagically unsubscribing me from a bulk e-mail that I reported as spam, whether to also remember that and block other e-mail from the same domain or sender. These decisions are probably best left to end users who must decide for themselves what is and what is not spam. But the more work you give to end-users to do, the more complex things get, and the greater the likelihood that the feature won't get used.