Setting Scoble's record on Technorati straight

Although I try to stay away from issues concerning the blogo-journosphere here on Between the Lines (it's just not very IT-related), I've decided to come out for a whirl of it today since someone else's coverage of the "blojosphere" draws some unfairly levied criticism at a company that I've been studying closely as well as some attention to the blogging/journalism rub.

Although I try to stay away from issues concerning the blogo-journosphere here on Between the Lines (it's just not very IT-related), I've decided to come out for a whirl of it today since someone else's coverage of the "blojosphere" draws some unfairly levied criticism at a company that I've been studying closely as well as some attention to the blogging/journalism rub. 

In my daily sweep of my favorite bloggers, I stumbled across Robert Scoble shooting from the hip in his criticism of Technorati when he said that the company was getting "outexecuted by its competition"  and then set about proving his point by comparing apples to oranges (first sin: bad methodology), citing the wrong information from that incongruous comparison, (second sin: bad observation skills), and finally, not giving Technorati the courtesy of a call to double check his findings (third sin: no apparent fact checking).  

For disclosure's sake, neither I nor ZDNet currently have a business relationship with Technorati. I have been studying the data that Technorati collects and aggregates to see if it might be useful to our business in a variety of capacities (everything from performance measurement to editorial products) and, in the process of doing so, have not only become intimately familiar with the inner workings of the Technorati engine, but have also been routinely providing feedback as to how the site works, what's confusing to us, and what we as a media company could really use, if only Technorati could provide it to us.


So, when I read Scoble's blog, my first reaction was, "OK, I'm not the only one with some suggestions."  In that blog,  Scoble uses (and links to) the results of a search on Technorati CEO Dave Sifry's own blog to prove a point: that Bloglines is doing a better job of indexing the blogosphere than is Technorati.  In his search of Sifry's blog, he compares the 2,644 links that turn up on Bloglines to the 735 he observed for the same search on Technorati and concluded that Bloglines is doing a better job of indexing blogs.  In selecting Sifry's blog, it's almost as if Scoble is saying, "If Technorati can't do a better job of indexing its own CEO's blog, then what of the rest of the 13 million+ blogs out there?"  What's wrong with this picture.  Well, for starters, look at the picture (above left).   Although I may need to have my eyes checked, just above the actual search results, it says 1,191 links from 735 sites.  So, that's not 735 links, but rather 735 sites (isn't it nice that Technorati provides you with that sort of information?).   "Even still," you're probably saying, "with 2,644 links, Bloglines must be doing better because it has more than double the links."

Look again.  As little as a month ago, Technorati probably would have had more links.  But one of the problems with those results was that there were a lot of duplicates.  I know this because in the course of studying Technorati to see how it could help ZDNet gauge the traction of its blogs, I noticed the duplication issue and then helped Technorati to understand how, even though we appreciated the inflated performance totals that came as result of the duplicates, what we really wanted was the ability to strip the duplicates out so we could get a more honest assessment.   You only need to scan the first page of Scoble's sample  Bloglines results to see that the results are flush with dupes.  Not that you find some in Technorati's search results.  But, if you look at the bottom of Technorati's search results page, you'll see how Sifry & Gang are making an effort to give you cleaner results through the elimination of duplicates.

Had Scoble placed a phone call to Sifry (the two know each other pretty well), not only would he have avoided the mistake of publishing a false fact (confusing the links with the sites), he may have rethought his criticism of Technorati altogether because, in addition to learning about the deduplication effort, he would have learned of a few other things that might have changed his mind.  So, I placed the call to Sifry for him.

Regarding the appearance of duplicates, there's actually a bit more rocket science involved in cleansing the results than one might imagine.  A duplicate, for example, can turn up for reasons other than what sometimes happens when the same exact link is mistakenly indexed two or more times.  The problem, as Sifry explained to me, is that the same data -- the data he indexes (links from a site to a blog) -- often appears in multiple places thanks to the use of different syndication technologies (RSS and ATOM) as well as the usage of services like FeedBurner.   Separating the wheat from that chaff is actually easier said than done and I can't tell you how encouraged I was when I finally saw Technorati de-duping it's results (OK, so our totals went down, but at least they're honest).

Another thing Scoble would have learned, had he called Sifry, is that Technorati isn't looking to provide its users with the be-all end-all list of all the links on the Web to specific blogs.  As Sifry explained it to me, "We're looking to measure authority and, while you're free to disagree with us, our methodology for determining authority is to count the number of links from a home page to a blog, not the number of links from an entire site."  Why just the home page?  Sifry says "Our belief is that authority is not independent of time.  Suppose someone who has a higher number of links from a higher number of sites than you has stopped blogging? What if they stopped six months ago? By watching home pages, we really get to zero in on who the current authorities are because if people stop blogging, it won't be long until the links to them scroll off all the home pages." 

Whether you agree with Sifry or not on his choice of methodology (and he's more than willing to accept that  there are those who might not agree), the point is that it's a choice that Technorati has made and it's an explanation for why Technorati's search results are so discriminating.  While I don't know what Blogline's methodology is, Scoble's coverage could have been much more interesting had he uncovered some differences in methdology and started a debate over how best to measure authority.  Personally, even though it penalizes us (because ZDNet does get lots of links), I'm in agreement with Sifry.  If we want an honest assessment of how we're doing, I'm much more interested in who is actively linking to us today rather than who did it six months ago.    Added Sifry, "Enter "Berlind" into Google and tell me how many links you get."  Answer: 130,000.   "Did you ever try to access the 130,000th link?  You can't.  The deepest Google and Yahoo go is about 1000.  Don't get me wrong.  We actually have the data [referring to all that data that's necessary to display every single link and not just links from the home page]."  The point is that if he wanted to give Scoble every single link, he could.  But he's not and his methodology is what determines the cutoff point (as opposed to something more arbitrary) and his choice of methodology was a business decision, not a bug.

Finally, I'd like to touch on the blogging/journalism thing because this is a real good case study.  In a prior blog, Scoble defends his own methodology for writing, saying that he never claimed to be a journalist but that "I do occassionally do journalism here."  Perhaps we could use a little icon so we know how to recognize when he's doing it.  He goes on to talk about how he sometimes gets in wrong and that his cell phone number is on the home page of his blog to make it easy for people to call him with corrections.  He concludes by saying "I agree that corrections don't always cut it. I too wish for a more accurate reporting system, but this system is pretty darn good at self correcting. So far I've been watching for factual mistakes where Microsoft is concerned and there hasn't been that many."  What about factual mistakes where Microsoft isn't concerned?

Whether you have the sort of reach that Scoble's blog has (and his blog has the sort reach that I can only dream of) or not, and your journaling the performance of parties other than yourself (which Scoble frequently does), then, it doesn't matter what you think you are.  Writing first and then waiting for the phone to ring second is the perfect way to do both your readers and your credibility a huge disservice, not to mention causing tangible harm to undeserving parties.  That phone does make outbound calls, doesn't it?  To be certain, I asked Sifry if he received a fact-checking call.  Sifry says he did not.  I tried double-checking that with Scoble (using the number on his blog, but he didn't pick up).  If Sifry is lying and Scoble called him, then the pox on me for not waiting for a call-back from Scoble.  But then again, a taste of one's own medicine never hurts.