Researchers find web tracking up, privacy down

It's not your imagination. A new report from researchers at UC Berkeley says web trackers have stepped up their efforts to follow you, and they show no signs of slowing down.
Written by Ed Bott, Senior Contributing Editor

No, it's not just your imagination. You really are being tracked more online, and there is evidence that this trend will increase.

That's the conclusion of a new study published last month by the Berkeley Center for Law and Technology at the University of California.

The researchers behind the Web Privacy Census built an automated web browser with some extra diagnostic smarts. Then they used it to crawl the top 25,000 sites on the web, as measured by Quantcast. They did deeper crawls of the top 1000 and top 100 sites, respectively.

[The] deep crawl ... consisted of visiting the home page of the domain obtained from Quantcast and then traversing up to 6 random links from that page, intended to simulate some level of activity at the website.

The shallow and deepcrawls collect the same type of information at each webpage: http cookies, flash cookies, calls to HTML5 local storage, calls to flash that may be used for browser fingerprinting, as well as metadata about the webpage and crawl.

The goal, according to the researchers, is to "formalize the benchmarking process and measure internet tracking consistently over time."

For this pass, the results were alarming.

  • Every one of the top 100 sites uses third-party cookies capable of performing online tracking, with a minimum of 1 and a maximum of 234 third-party cookies per page.
  • Among the top 1000 sites, at least one set 359 third-party cookies per visit. (One shudders to think about the performance of that page.)
  • On average, the top 1000 websites (which represent the vast majority of traffic on the web) set more than 50 third-party cookies each.

And those numbers might be conservative, in terms of their impact on your privacy. The six links selected for each of the top 1000 sites were selected at random and might not represent the most popular links on a site. In addition, the researchers say:

[T]he crawler did not access content behind sites that require logins, consequently any content and trackers that existed behind a log in were not recorded. Related to this, the crawler did not login and maintain an identity while traversing sites. For example, the crawler did not log into a Facebook account and then attempt to visit websites in this iteration.

Make no mistake about it. These third-party cookies are used for tracking:

Most cookies— 84% of them—were placed by a third party host.  We detected over 446 third party hosts among the third party cookies.   Google had cookies on 16 of the top sites; the company’s ad tracking network, doubleclick.net, had cookies on 73. Combined, Google has a presence on 78 of the top websites.  Only 22 lacked some type of Google cookie.


The most frequently appearing cookie keys were: utmb,utma,utmc,utmz, uid. Many of these keys are commonly associated with unique user tracking and Google Analytics. For instance, __utma is used by Google for identifying unique visitors.

Based on previous tallies, those numbers are up dramatically just in the past two years, as this graph shows:

Total Cookies

There's also some evidence to suggest that the advertisers and analytics companies behind this major increase in web tracking are shifting their focus away from cookies, which can be easily detected and blocked, and are using HTML5 local storage instead.

The leading websites have moved aggressively to use the new technology: According to the researchers, 311 of the top 1000 sites were using HTML5 local storage. That's about three times the usage of this advanced feature compared to the top 25,000 sites, here fewer than 10% are using HTML5 local storage. This isn't necessarily a privacy risk, but it has tremendous potential for the data-collection industry. At least one tracker is using HTML5 local storage to hold unique identifiers from third party cookies, the researchers reported.

This report is the first in a promised quarterly census of the web and privacy. Sadly, the smart money is betting that the number of trackers will rise significantly over time.

Editorial standards