Aust businesses are losing in the big data era, one 'Like' button at a time

Many businesses give away their customers' clickstream data in exchange for trinkets of code. It's a poor exchange.
Written by Stilgherrian , Contributor

During the first dotcom boom, the Fairfax newspaper empire refused to believe that classified advertising, their biggest revenue source, would move to the internet.

Fairfax failed to see how the internet would make transactions like "posting a job ad" far cheaper and more convenient than printing and distributing a newspaper. They also misunderstood how quickly it would happen after a deceptively slow start.

They weren't alone, and the mistakes weren't limited to media companies. Witness the bleating of Harvey Norman and other retailers as consumers started buying direct from cheaper offshore websites.

Scroll forward to 2012 — the age of big data is dawning and we're seeing a similar sense of denial.

The exact meaning of "big data" is still being argued and the snake oil merchants are starting to circle the way they did for "cloud". But from a commercial perspective, it's simple: the more data, the better. Record everything, analyse it, look for patterns, and convert that into profit.

Target (US), for example, knows when a woman is pregnant just from her shopping list and they know which offers will persuade her to buy based on previous shopping patterns.

When it comes to the web, the key consumer dataset is the clickstream: the record of precisely which pages were visited, when, and for how long.

Clickstreams for individual websites are generated directly by the web server, but that's a narrow view of the users' lives. Companies like Fairfax, which are able to correlate clickstreams from multiple websites, can generate a richer understanding.

But the most comprehensive understandings of all belong to advertising platforms, like DoubleClick and AdBrite, correlating clickstreams from the multitude of sites into which they insert advertising.

Until now.

Now, Facebook and, to a lesser extent, other social networks have the most comprehensive view of all.

Facebook's "Like" button and its ilk are placed on websites, regardless of whether they carry advertising. The clickstream can be matched to a Facebook user, whether they click the button or not — and thanks to cookies and browser fingerprinting, regardless of whether they're even logged in to Facebook.

A recent study of nearly 1,000 top websites by The Wall Street Journal reported that 75 percent of them now include such code from social networks.

Facebook can then correlate the clickstreams with their map of every users' friends and families (the "social graph"), their interests and activities, and everything that can be inferred from it.

These inferences can be powerful. One 2009 study by Massachusetts Institute of Technology (MIT) showed that a simple analysis of a man's Facebook contacts can reveal his sexual orientation.

Three months ago, French digital strategist Frédéric Filloux sketched out a future scenario where an analyst warned a recruiter against hiring a woman. The patterns of language used in her tweets suggested that she suffered from undisclosed migraines and had a 75 percent chance of becoming pregnant in the next 18 months, thanks to her Facebook posts.

Given the advances in data mining and textual analysis, this future seems completely plausible.

But the exchange between social networks and the sites that use their tools is unequal.

A torrent of raw user data is streamed to the social networks via their parasite code.

Sometimes, it's more than clickstreams. When The Wall Street Journal looked at 70-odd popular websites that request a log-in, more than a quarter of the time, those sites passed the user's real name, email address, or other personal details, such as the username, to third-party companies.

In return, host websites get an inexpensive "Like" button, and maybe a log-in code. Neither would have been expensive, and counting the number of "Likes" is what marketing automation expert Will Scully-Powers calls a "vanity metric". There's little direct impact on the bottom line.

They get to target their advertising and the chance to be featured in a user's Timeline.

Conventionally, we consider this business architecture to be cloud-like, using the services of Facebook et al for specialist tasks. But perhaps we should look at it from the other end.

Social networks get to run ever more efficient and profitable analytic engines, getting their customer companies to deal with the messy business of acquiring users and dealing with the physical world.

In the age of big data, he who holds that data is master.

Yet, some businesses seem to want to be rid of the raw data of user comments. It's too hard, apparently, despite their insight and eyeball value. It's another business blind spot, I reckon.

Editorial standards