Waiting for Attention... or something like it

Summary:A week ago, attention seemed to be on a lot of minds at the O'Reilly ETech conference. I had interesting conversations with a number of people, watched the light bulb go off in some, and came away with the feeling that things were on the move.

A week ago, attention seemed to be on a lot of minds at the O'Reilly ETech conference. I had interesting conversations with a number of people, watched the light bulb go off in some, and came away with the feeling that things were on the move. But its newfound visibility is bringing some negative energy out of the woodwork.

Perhaps the idea has peaked too fast. Perhaps the RSS Bubble is tipping the M&A market too quickly. Perhaps the resulting Gold Rush is convincing too many that there's no need for an open foundation of metadata, that we can just proceed to the punchline and let the market sort it out.

The idea behind attention is very simple. I know, because it's my idea. Doc Searls introduced me to Dave Sifry at a party, and Dave and I sat in the corner for two hours and brainstormed how to turn that idea into reality. Later, I came down to Technorati's office and fleshed the idea out, describing what I do (did) with NetNewsWire and how I wanted to do it better. Dave sat there, taking notes, debriefing me in a classic deconstruction of what I did with RSS data, what I found important, and what the inforouter (my name for an aggregator on steroids) could do to improve information transfer.

Soon the outlines of a spec emerged; who, what, and for how long feed data was being consumed. I insisted that OPML be used as the first bootstrap of subscription data. Sifry, in the throes of establishing a business out of Technorati, seemed to sense the value of attention, but had to fit it in with many other priorities in allocating resources. In my role as a member of the Technorati Advisory board, I evangelized what I saw as attention's profound value proposition as RSS adoption accelerated the need to deal with a second order magnitude of information overload. I also surfaced the idea on a series of blogs, first at CRN, then at eWEEK, and lately at ZDNet/CNET.

Sifry responded by hiring Tantek Celik, then a Microsoft developer with extensive experience in XML standards work, and set him loose with Kevin Marks in fleshing out the spec. Late last year, I wrote the December issue of Esther Dyson's Release 1.0, initially focusing on the accelerating adoption of RSS and then, at Esther's and editor Christina Koukkas' direction, entirely on attention. During development of the report, I followed up on conversations with Adam Bosworth, Mark Fletcher, Chris Alden, Dave Winer, Dick Costolo, Brendan Eich, Scott Johnson, Scott Rafer, and others. I used the late great Gillmor Gang radio show to maintain momentum and garner committments from many of these players to support attention.xml or, as Bosworth put it in his seminal 2004 speech, "something else less formal and more organic."

Of these commitments, the one I most appreciate is that of Mozilla architect Brendan Eich. Once attention is recorded inside a Firefox/Thunderbird hybrid (which we have discussed, most recently at ETech), attention metadata can be harvested wherever it appears in the wild (the browser) across Bloglines, RoJo, MyYahoo, and potential RSS clouds such as GMail. Hank Barry and I are in the process of establishing a foundation to secure that data and the spec for the users who contribute it.

From the beginning, I've seen this as primarily a political process, not a technological mountain to be climbed. To be sure, handling the volume of data that attention can generate is no simple task, but I'm confident that solutions will be found. In fact, Yossi Vardi told me of one already in existence in our short conversation about attention at the San Diego airport. But I've concentrated my efforts on subtle and not-so-subtle forms of blackmail, catalogued under the rubric Roach Motel. I first used this characterization at a pre-conference session at John Battelle's Web 2.0 conference, pointing at Yahoo!'s MyYahoo service as an example of attention metadata going in and not coming back out.

Jeff Jarvis publicized the refrain in his post about the session, so I continued ringing the bell as a question in several of the main conference sessions, most directly a dialogue between Marc Andreesen and Yahoo!'s Dan Rosensweig. After the session, I buttonholed Rosensweig and asked him for a conversation for the Release 1.0 report, hoping to further pin him down about supporting an open stream of attention metadata. He agreed to talk, but passed me off to his assistant, who finally offered me Scott Gatz instead. Though Jeremy Zawodny subsequently reported Rosensweig's interest in progress, Gatz has hewed to a vague "we'll look at it when our customers ask for it" ever since. He even privately suggested that work was under way to put forward alternate proposals for a spec. To date, none has surfaced.

Also privately, I received assurances that Microsoft would not support the effort. Given Jim Allchin's lock on company policy regarding ad-hoc (i.e. not invented in Redmond) XML standards, I believed my source. But a chance meeting at a Microsoft-thrown party at ETech with Dare Obasanjo briefly brightened my outlook (pardon the expression.) As Dare notes, we had a vigorous discussion about attention that seemed to augur for at least a fair hearing for the idea if not the implementation.

Unfortunately, Dare's comments seemed typical of the kind of thinking I've been hearing so often not just about attention but the entire RSS transformation we are now moving through. Over and over, people are using denial, procrastination, redirection, and other attempts at downplaying or avoiding the fundamental issue--time is the ultimate arbiter of adoption--but then oddly ending up agreeing with the central thesis they have worked so carefully to debunk.

Thus it was that Dare, after separating what he says I define as the "attention problem" from the attention.xml spec and reiterating his opposition to its engineering, says this:

After talking to Steve Gillmor I realize another reason I didn't like the attention.xml spec; it ignores all the hard problems and assumes they've been solved. Figuring out what data or what algorithms are useful for determining what items are relevant to a user is hard. Using said data to suggest new items to the user is hard. Coming up with an XML format for describing an arbitrary set of data that could be collected by an RSS aggregator is easy.


Yes, Dare. I agree. A simple idea done simply and easily. The hard work left to be done by the community. And the problem? Dare goes on to talk about his ideas for solving the hard problems. Bring it on, Dare. Yahoo!--bring on the "better" ideas. Even the RDF crowd--bring it on. As Bosworth says, "It doesn't matter."

What does matter is a pool of attention metadata owned by the users. This open cloud of reputational presence and authority can be mined by each group of constituents. Users can barter their attention in return for access to full content, membership priviliges, and incentives for strategic content. Vendors can build on top of that cloud of data with their own special sauce--the newbie crowd of MyYahoo, the pacesetter early adopters of Diller/Ask/Bloglines, the social attention farm of RoJo, and Google's emerging Office service components orchestrated by the core GMail inforouter. And the media, which now includes publishers, analysts, researches, rating services, advertisers, sponsors, and underwriters, can use the data as a giant inference engine for leveraging the fat middle of the long tail.

With so much going for it, how and where is attention vulnerable? It's vulnerable to being pigeonholed as an automated artificially intelligent approach to personalization. In my view (remembering that this is my idea, and the problem I wanted to solve), attention metadata is useful in service of the reputational filter of the people and ideas I and the people I track are interested in. This is not about merely reorganizing my feed data based on my patterns of acquisition, but the cumulative weighting of the minds and interests represented by those feeds and items.

Here's what I told Dave Sifry in that first production meeting: I first read those feeds and items of a select few whose daily output I value highly. Some of those feeds (like Scoble's when he's not overwhelmed with what he thinks is his real job) are most useful for what they lead me to. So I might skim the first three posts, then read in full something that catches my eye because it's about something both Robert and I are deeply invested in (usually RSS-related).

Even at this early stage, attention of two strategic types is being recorded. One is the more obvious--what am I interested in. This can be derived from links clicked on in the item, keyword analysis, and so on. But equally and sometimes more valuable is the material not focused on, skimmed, or outright ignored. Here, the time-stamp becomes important, the links not clicked on, the text analysis cross-referenced with other inference data on the source feed and even parent website with its blogroll and other peripheral data.

Why is this so important, at least to me? Because RSS is about time, and the data about lack of interest is intensely valuable to me as an indicator of what can be thrown out or pushed down the priority stack. As RSS takes hold, we are moving rapidly to a multiplicity of valuable content, where throwing out duplicates, redundancies, and repetitive analyses is key to providing enough of a window for absorbing the much greater signal-to-noise of the attention stream.

To Dare's point, here's how I would describe an appropriate information triage algorithm:

  1. Separate duplicate links into two bins, those with unique citations, and those with a delicious/furl-like tagging format. Next, correlate duplicates in both groups with the author's attention rank (where on the OPML or other list the feed resides based on my priority of reading them. The goal here is to separate the attention-ranking dynamic from the presentation of feed data. I don't want to see the same link over and over again, especially if the link is disguised by different text in multiple posts, but I do want to know if?six of my top influentials or reputational filters have cited the same post and use that metadata to push the item and/or feed up my priority list.
  2. Throw away or lower the priority of duplicate links, reorder the unique citations around their relative ranking in my attention list, and add other posts by the same authors. If my reading characteristics of a certain feed show I typically read all of the author's posts, include them all. If I typically only read one or two, batch the rest together at a lower priority. Apply vanity and topic feeds next, pulling in keyword hits at a weighting that corresponds to how many duplicates there are with the previous scan. Throw away or lower the types of vanity or blogroll hits that typically populate vanity feeds. (I want to see hits for "Gillmor" but not those of articles I wrote, hits on my brother but not before hits about me, except those about the Long Tail video that cite my brother instead of me, and so on.)
  3. Now assign unique ids to this sort and track my readership patterns (and those of any who subscribe to my attention feed that make that data public to me) not just for what I read but what I don't read. Then apply that weighting data to incoming data on an updating basis. This should cull repeated hits that somehow escape my first layer of filters, and also provide valuable data for marketers should I deliver that back to the cloud or to some private cloud under contract.
  4. Now we get into discovery--items that have escaped both my subs and search feeds. First I go to my reputational thought leaders, the subs and recurring items that bubble to the top of my attention list. It's a second-degree-of-separation effect, where the feeds and items that a Jon Udell and a Doc Searls and a Dave Winer prioritize are gleaned for hits and duplicates, and returned as a weighted stream. In turn, each of those hits can be measured for that author's patterns and added in to provide a descending algorithim of influence. All the while, what is not bubbling up is driven further down the stack, making more room for more valuable content.

It's important to remember that this is an open pool of data, massaged, sliced, and diced by not just a Technorati but a PubSub or Feedster or Google or Yahoo or any other service, and my inforouter will gladly consume any return feed and weight it according to the success (or lack of it) that the service provides in improving my results. Proprietary services can play here as well, by providing their unique (and contractually personalized) streams as both a value proposition for advertisers and marketers and as an attractor for more users of their service.

Inference--in this case deriving characteristics about the larger anonymized cloud from a subset of personalized users--will let a Bloglines or a RoJo make recommendations about open attention feeds based on link and other analysis of the data they own within their own clouds. As Jon Udell has explored, you can already use Bloglines subscriber data (that which is authorized as public by its users) together with Feedburner statistics about Bloglines' market share to infer the growth or loss in the number of new subscriptions across time. Or, if my RoJo friends pay attention to my attention list feeds in a similar priority, what other feeds do they also read that might be important to me. And if they are an early predictor of what won't be paid attention to, I (and you) might even pay them money--or attention--to clear the way for better use of my time.

Recently I got a Skype invitation from Kevin Werbach, to which I replied by pinging him on Skype IM. Here is part of that transcript:


27.03.2005

(13:33:56) Kevin Werbach says:

Are you still following the attention.xml stuff actively?
I'm still trying to understand what it all means.

(13:34:55) Steve Gillmor says:

yes I'm following it. My idea
axml is me and sifry
now being built out by tantek c, kevin marks at t'rati

(13:35:40) Kevin Werbach says:

Right.

(13:35:46) Kevin Werbach says:

I'm still trying to grok what the point is.

(13:36:05) Steve Gillmor says:

do u use rss?

(13:36:22) Kevin Werbach says:

sure
I've been using it since 1999

(13:36:33) Steve Gillmor says:

do u have enough time to get through it?

(13:36:34) Kevin Werbach says:

Time to get through what?

(13:37:05) Steve Gillmor says:

the rss data you receive

(13:37:18) Kevin Werbach says:
Ah.
Yes and no.
I don't have enough time for anything these days. (Two kids under 3 years old).

(13:37:40) Steve Gillmor says:

uh huh

(13:37:53) Kevin Werbach says:

But I find the time investment in reviewing the aggregtor stream the most efficient way to keep track of what's happening of interest.

(13:38:16) Steve Gillmor says:

exactly
this is now a second order problem, making the stream more efficient
enter attention

(13:39:22) Kevin Werbach says:

Right.
But I don't necessarily correlate what others like with what I want to read.


(13:39:59) Steve Gillmor says:

no, but once I'm done with finding what I think I want, then I move to the influencers that I value
which corresponds to the order in which I consume their feeds

(13:40:57) Kevin Werbach says:

I guess so. I consume them chronologically.

(13:41:10) Steve Gillmor says:

yes so do I, but then what

(13:41:35) Kevin Werbach says:

Either I have half an hour to skim all the updates in a day, or I don't.

(13:42:00) Steve Gillmor says:

first chronological, then reputational filter, then throw out duplicates

(13:42:14) Kevin Werbach says:

That makes sense.

(13:42:16) Steve Gillmor says:

then infer discovery

(13:42:39) Kevin Werbach says:

It just seems like a tweak on the data set more than anything else.
But I realize I'm an atypical user of all this stuff.

(13:43:07) Steve Gillmor says:

yes, but a tweak that opens up enough time to get through more valuable data
rss is a tweak on browsing

(13:43:19) Kevin Werbach says:

Good point.

(13:43:47) Steve Gillmor says:

access to a group of disruptive posts forms the basis for disruption
it's always the last one in the chain that provokes action



Topics: Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.