X
Business

First signs of Google using Analytics data

I have been using Analytics since the day it was made free, just like over two hundred thousand other webmasters.  By using this service, I aknowledge that Google may use this information to improve their search engine.
Written by Garett Rogers, Inactive

I have been using Analytics since the day it was made free, just like over two hundred thousand other webmasters.  By using this service, I aknowledge that Google may use this information to improve their search engine. Their TOS leaves enough room to use the collected data as they see fit, but Matt Cutts denies any plans.

For many of my Web sites I have been running awstats, a free open source log file analyzer which gives me at least one feature that Google does not -- the ability to see robot data.

If you have dug into Analytics beyond just adding the short snippet of code to your Web pages, you may know that you can use the urchinTracker() javascript function to create "virtual pages" that don't actually exist.  This is handy when you are tracking multiple "steps" of a process with the same URI.

As an example, if your shopping cart requires 4 steps -- from sign up to payment -- and this entire process is all done on the same physical page using a series of posts (ie. cart.php), you can dynamically output "urchinTracker('step1.html')" to "urchinTracker('step4.html')" rather than simply "urchinTracker()".  Most log file analyzers will see 4 requests to cart.php, whereas Analytics will record hits for step1.html through step4.html.

This by itself doesn't prove anything, but when you consider that my awstats sees GoogleBot added a few extra pages to their crawl list, it becomes clear.  GoogleBot is now crawling step1.html, step2.html, step3.html and step4.html even though they do not exist!  The only way Google could know about these pages is if they use data gathered from my urchinTracker("step#.html") code!

Since we can tell they are already using some of this data, they have probably started determining the relevance of sites based on statistics like the average amount of time visitors stay on individual pages (Analytics tracks to the second BTW), what percentage of users bounce, what the most popular pages on a Web site are, etc.  These stats could be an interesting addition to the algorithm, and at the same time make search engine optimization a nightmare.

Update:
I requested more information from Google about what might be happening here, and they have responded with some information from their engineering team:

We are not using Google Analytics data to influence the crawl. Instead, Google's crawler employs advanced algorithms to find additional pages to crawl, which can include looking at JavaScript code on a page. It was this technique that led to crawling these pages. The crawl team is looking into adding an exception so that Google doesn't try to crawl this type of tracking url in the future.

Google is saying here that thieir Google Bot does more than just follow links, they read javascript in order to determine more content to crawl.  They are definitely not using Google Analytics data for any reason other than simply providing the end user this service.

Editorial standards