Data attraction: hard science, or numbers game?

Behind the numbers.
Written by Donna Bogatin, Contributor
Statistical data holds a powerful allure. Numbers are believed to provide “ascertainable totals” and to represent “abstract mathematical systems” (Merriam-Webster).

In “Data agendas: PR by the numbers” and “Data hype? Internet data shops kingmakers, rainmakers”, however, I question the validity of publicly available Internet data and the public’s reliance on such data:

The Hitwise e-mail blasts are eagerly awaited by the blogosphere, as is data put forth by ComScore, OneStat…'Headlining’ data stories are easy, popular, and ‘ring true.’

As blog stories based on data shop headlines, however, may ‘anoint’ the companies headlined, data shops can become Internet ‘kingmakers.’

Also, by writing stories based on data shop headlines, the blogosphere can become part of data shop viral ‘rainmaking.’…

Data agendas and public relations by the numbers are not unique to the corporate world. The not-for-profit community is well versed in the power of using small-scale ‘surveys’ and ‘studies’ to put forth headlining conclusions which further their missions.

This week, the risks of using publicly available Internet data for “headlining stories” are again illustrated via two posts at TechCrunch:

Original Headlining Story: "Dazzle Us Again, Del.icio.us"

'the recent numbers aren’t looking so good. In fact, by some measures they’ve tanked completely'

Follow-up: "More Stats on Del.icio.us, This Time Positive"

'At the end of this process, after reviewing the public data (deeply flawed, but neutral) and Yahoo internal data (presumably accurate, but selectively disclosed), I’ve come to the conclusion that I have no idea what’s up at del.icio.us.'

Further discussion of the unreliable nature of Internet data publicly available, from the TechCrunch comments section, by Chris Lake:

Alexa: useful for trends, not for absolutes. I run one tech blog and one entertainment blog. The latter attracts a far greater amount of daily unique users and page impressions, sometimes 20X the daily amount we’ll see on the tech blog. Alexa never reflects this - the ents blog will show spikes, but the tech blog is *always* ranked higher. Alexa is skewed towards tech. It ain’t accurate.

Comscore: 2m installed users, plus survey-based data, and 2m is a lot of people. Should provide a good cross-section, but as Brian says, the people buying into this aren’t going to be particularly savvy if they are lured by ‘free email virus scanning’. So maybe we can say Comscore is skewed against tech. It too ain’t accurate.

Hitwise: Watches something like 20m web users across various territories. That’s a mega-sample. However, the Hitwise data seems limited to consumer behaviour. As I understand it, data is sucked up from consumer ISPs, so at-work usage isn’t tracked. More useful for B2C companies, but not so good for B2B. Presumably no real view of lunchtime web activity, slackers-at-work, 9-5 search activity, etc. So it isn’t accurate either.


Editorial standards