The eyeballs war

The eyeballs war

Summary: There's a simple bottom line to all this, and a very unsimple consequence. The bottom line is thatorganizations whose technical staff plan services around both their own measurement issuesand the needs of proxy cache users can do much better on counting and verification than organizationswhose staffs treat measurement as something that happens after they've done their jobs.

TOPICS: Servers
Last week the Chicago Tribune reported the arrest of "three former employees of Tribune Co. newspapers over their alleged roles in a circulation scandal that engulfed the papers last year....The government charged the men with criminal fraud for helping overstate the papers' circulation, or sales."

By coincidence, the report came about a week after ZDNet carried the news that Google had achieved a higher market value than Time-Warner:


With a current [June 7/05] stock market capitalisation of more than $80 billion, Google is now worth more than any other media company in the world. That includes Time Warner, created five years ago when AOL purchased Time Warner for $106 billion in a much-hyped combination of old and new media.

What these two reports have in common is the eyeball war. As the Tribune's report said: "Newspapers use circulation figures to set rates charged to advertisers" and so does the rest of the industry, including Google.

Believers will tell you that Google is worth more than Time Warner despite having less than 10% of the latter's revenues and no track record of profitability because it either already has, or soon will have, the ability to drive more sales, at a lower cost to the advertiser, than its more traditional competitor. Personally I expect Time Warner to be making money long after Google turns into Altavista, but that's not the point. The point is that even the most basic measure of advertising reach, eyeballs on the page, isn't terribly credible.

If you run a personal or organisational website you're probably familiar with a special form of this problem. Consider, for example, this Apache log entry from my Winface server: - - [16/Jun/2005:11:08:30 -0400] "GET /any/tips_11.html HTTP/1.1" 200 1458 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)"

I generated that page hit myself using a PC belonging to a school and then went to another PC in same school and did it again - but I only got the one log record. That happens because the school district has a little internal network and uses a proxy server to minimise traffic on its external ISP connection. That server cached the first request, and served the second one without linking to my site at all.

The HTTP 1.1 protocol includes HTTP header directives telling cache holders like the SQUID open source proxy server how long they can hold a document. Apache's mod_expires module, for example, lets you set cache expiry dates far into the future for infrequently changed objects like background graphics and in the past for text objects. This causes graphics to be cached locally while text will refreshed for most new readers -meaning that you will get a log record for most reads, but traffic volume will still be minimised.

People have come up with other measures, but they all have weaknesses. For example, the obvious idea that you should ignore page reads to concentrate on change in product sales doesn't apply in PR advertising and doesn't let you test one ad against another while both are running. Using cookies or client-side Javascript sounds good, but not all proxy servers handle cookies predictably and both Javascript and cookies can be turned off in the client. You can do panel studies, get big ISPs like AOL to meter hits for you, or expire everything on service and dynamically tinker with each page to force cache reloads, but those solutions cost dollars. Other solutions cost eyeballs: for example, uncacheable redirects, like Doubleclick's, or forcing readers to log-in on your site, can be so annoying that a lot of readers simply go elsewhere.

There's a simple bottom line to all this, and a very un-simple consequence. The bottom line is that organizations whose technical staff plan services around both their own measurement issues and the needs of proxy cache users can do much better on counting and verification than organizations whose staffs treat measurement as something that happens after they've done their jobs.

The unsimple consequences are likely to be unpleasant for all involved. The legal action against the three former Tribune Co. employees, for example, adds weight to the arguments of those in the news business who want to restrict web access to paid subscribers because that turns the web access measurement problem into the traditional one they believe they understand. That may seem fair, but their actions will reduce the value of the web to the rest of us by making some news reporting and interpretation inaccessible again.

More tenuously, continuing improvement in worldwide network performance may gradually reduce the advantages offered by sophisticated content distributors like Akamai to the point that the measurement advantages of directly controlling your own server dominate decision-making -- eventually putting them out of business and raising everyone's infrastructure cost by reducing sharing.

Topic: Servers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • IP's not the only way to calculate eyeballs

    At least a couple come to mind.

    Cookies, unique per visitor (subject to people actually allowing cookies.)

    Dynamic pages, unique per visitor, e.g. web bug the web page (Subject to people hitting reload)

    Flash Shared Objects

    I've seen all of these used, so it's not a difficult problem to minimize the error, it just takes thought.
    • it takes thought

      Yes it does, and perhaps commenting without reading the article should too.
      • I agree

        i was going to say the same thing.

        Reading the first paragraph doesnt mean you read the whole thing. damn speed readers.
  • Sales matter more

    I suspect advertisers are well aware of the limitations of HTTP and various stats gathered from web traffic. The interesting part that both TW and Google share is the ability to offer advertisers a track record (long and short, respectively) of advertising correlating to sales. Google figured out how to make money advertising on the web, and it works because people actually buy stuff after clicking on Google's ads. Much as people buy stuff after reading Time.
  • Skew, churn and the need for trusted auditing.

    Other factors complicating things in the great Web Versus Everything Else battle for eyeballs and ad dollars:

    [b]1. Skew.[/b] "Skew" refers to the type of audience your media is reaching. One media outlet might get 10,000,000 "eye impressions" per month versus the other guy's 5,000,000, but if his 5,000,000 are better-heeled and a contain a larger pool of prospective customers for my client's product/service, then my agency's dollars are spent with the guy with 5,000,000. One real-life example: at least through the 90's, advertisers felt that the Web skewed towards male, under-25 and at the lower end of the disposable income range. Print (specifically newspapers and magazines) was seen as skewing towards more equal gender reach, the 25-55 age group and greater disposable income. Therefore, even though most realtors had some sort of Web advertising presence, if a real estate agency was trying to sell a multi-million-dollar property and wanted fast results, they chose print over other media. I think the reality has changed quite a bit since the late 90's, but there's still a sense among my clients that for big ticket items, traditional print advertising generates better results than anything else. TV and radio and the Web are great for brand awareness campaigns, but when the customer is ready to buy and is actively shopping, you'd better be running a full-page ad in the paper. And yes, folks do increasingly buy online. But not in anywhere near the numbers of the Great Impatient Masses who scan the Sunday paper and run on down to the local retail outlet. There's that proximity effect of instant gratification from buying local that will probably always keep time-delayed Web-based purchasing in second place.

    [b]2. Churn.[/b] Another important factor is "churn." I'm not up to speed on how the companies gathering metrics on Web advertising take churn into account. In the print world, churn is the phenomenon of a single copy of a magazine or newspaper being read by several people in a household, and tending to hang around for a period of time to allow this to happen (I'm personally a very good example of this, as I allow print materials to hang around Rancho Yen for weeks or months before getting tossed). Is there such a thing as churn with Web publishing/advertising? I know from personal experience that from time to time people in a household will call others over to the single household computer to look at something, but I've never come across any data on Web advertising churn. Churn is very important in print publishing. For example, your typical big city daily might have a churn factor of 1.3, which means that if they printed 500,000 copies of the Sunday paper, that edition would actually have a readership of 650,000. We ad agencies love churn and "persistence" (used in the sense of lingering) in a medium.

    Basically, the problem for Web publishing / broadcasting / advertising is the establishment of trusted third-party auditing services, like the Audit Bureau of Circulations ( for print publications, Arbitron ( for radio and Nielsen ( for TV. For advertising agencies, there seems to be currently some sort of psychological stumbling block over acceptance of data relating to Web advertising. Not that there's anything that I'm aware of in particular that's wrong with the data being presented--it's just that the groups presenting Web data aren't given as much credibility as the ABC's, Arbitrons and Nielsens are in other media and whom are--relatively--highly trusted.