Last week the Chicago Tribune reported the arrest of "three former employees of Tribune Co. newspapers over their alleged roles in a circulation scandal that engulfed the papers last year....The government charged the men with criminal fraud for helping overstate the papers' circulation, or sales."
By coincidence, the report came about a week after ZDNet carried the news that Google had achieved a higher market value than Time-Warner:
With a current [June 7/05] stock market capitalisation of more than $80 billion, Google is now worth more than any other media company in the world. That includes Time Warner, created five years ago when AOL purchased Time Warner for $106 billion in a much-hyped combination of old and new media.
What these two reports have in common is the eyeball war. As the Tribune's report said: "Newspapers use circulation figures to set rates charged to advertisers" and so does the rest of the industry, including Google.
Believers will tell you that Google is worth more than Time Warner despite having less than 10% of the latter's revenues and no track record of profitability because it either already has, or soon will have, the ability to drive more sales, at a lower cost to the advertiser, than its more traditional competitor. Personally I expect Time Warner to be making money long after Google turns into Altavista, but that's not the point. The point is that even the most basic measure of advertising reach, eyeballs on the page, isn't terribly credible.
If you run a personal or organisational website you're probably familiar with a special form of this problem. Consider, for example, this Apache log entry from my Winface server:
184.108.40.206 - - [16/Jun/2005:11:08:30 -0400] "GET /any/tips_11.html HTTP/1.1" 200 1458 "http://www.winface.com/any/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)"
I generated that page hit myself using a PC belonging to a school and then went to another PC in same school and did it again - but I only got the one log record. That happens because the school district has a little internal network and uses a proxy server to minimise traffic on its external ISP connection. That server cached the first request, and served the second one without linking to my site at all.
The HTTP 1.1 protocol includes HTTP header directives telling cache holders like the SQUID open source proxy server how long they can hold a document. Apache's mod_expires module, for example, lets you set cache expiry dates far into the future for infrequently changed objects like background graphics and in the past for text objects. This causes graphics to be cached locally while text will refreshed for most new readers -meaning that you will get a log record for most reads, but traffic volume will still be minimised.
There's a simple bottom line to all this, and a very un-simple consequence. The bottom line is that organizations whose technical staff plan services around both their own measurement issues and the needs of proxy cache users can do much better on counting and verification than organizations whose staffs treat measurement as something that happens after they've done their jobs.
The unsimple consequences are likely to be unpleasant for all involved. The legal action against the three former Tribune Co. employees, for example, adds weight to the arguments of those in the news business who want to restrict web access to paid subscribers because that turns the web access measurement problem into the traditional one they believe they understand. That may seem fair, but their actions will reduce the value of the web to the rest of us by making some news reporting and interpretation inaccessible again.
More tenuously, continuing improvement in worldwide network performance may gradually reduce the advantages offered by sophisticated content distributors like Akamai to the point that the measurement advantages of directly controlling your own server dominate decision-making -- eventually putting them out of business and raising everyone's infrastructure cost by reducing sharing.