Artificial intelligence in your shopping basket: Machine learning for online retailers

AI techniques are becoming part of every day computing: here's how they're being used to help online retailers keep up with the competition.
Written by Simon Bisson, Contributor

Ecommerce is a complex, convoluted thing. What started as a way of putting catalogues online has now become something much more involved. In the past we built ecommerce engines out of databases, with a little shopping cart magic wrapped around them. We generated static content for Google to search, and redirected users to our dynamic sites as soon as they clicked on a link. Manual curation was the watchword, much like the paper catalogues the web had replaced.

That's all changed, thanks to the same machine learning and cloud-scale processes that have grown out of the world of search. I recently spent some time chatting to BloomReach's CEO, Raj De Datta, about how these technologies are changing ecommerce site development.

BloomReach is a relatively new company, founded five years ago, focusing on building tools to algorithmically power ecommerce websites, making things personal - what De Datta calls a "personalised discovery platform". With a team of ex-Google data scientists, BloomReach's aim was to understand what attracted people to a site, and how they then found what they were looking for. At the heart of the problem was an old issue, that marketing needs relevant content to work effectively.

That led to BloomReach's first app, a tool for driving traffic to sites via organic search. Under the hood of a service that uses machine learning to manage on-site navigation, is a "web relevance" engine that uses user data - which has been collected at scale - to understand demand. This isn't data about you, per se, it's the aggregate high-level data about all the users like you. If you liked blue sheets, that data suggests you're also likely to like a certain type of scented candle, an approach very similar to that used by machine-learning giant Amazon. And if you don't like it, and don't buy it, that information becomes an input to the next iteration of machine learning rules. The result is a set of highly optimised web pages build on the fly, and delivered to users as they navigate around a site.

You've probably used BloomReach's software without knowing about it; as it's already being used by large US and European consumer brands as they struggle to compete with Amazon's "everything store". Their web relevance engine is solving a tough problem: personalisation has to be, well, personal, but it can't be creepy. It needs to show you what you're looking for right now, building on the familiar DNA of search engines like Bing and Google.

There's a lot of infrastructure needed to build this type of service. At the back end BloomReach ingests hundreds of terabyte of information, analysing tens of millions of web pages to build a list of billions of synonym pairs. Currently the machine learning system is working with one billion consumer interactions, on a 150 million web pages; which means processing 5TB of data every night. Some of that information comes from JavaScript-powered tracking pixels, but much more comes from ecommerce systems themselves, along with other rich data from sources like site logs. All that data is combined, and processed.

BloomReach is able to aggregate data from many sources, with user data kept in silos for privacy reasons - an approach which also means keeping the computational, data-processing and machine leaning infrastructure separate from the serving infrastructure. The result is a micro services model that can deliver millions of pages from the cloud, while still learning from user interactions and new content. De Datta points out that without new information search boxes degrade over time, and the more inputs you have, the smarter the system gets.

When a user hits a BloomReach-powered site, a real time API call is used that sends back objects that are rendered by the browser in real time. Users won't be able to tell the difference between natural site content and generated content from the BloomReach cloud service. Currently the service handles three different aspects of an ecommerce site: the landing page, your search experience and the recommendations that a site makes, and the curated aspects of an ecommerce site, including promotions.

The latest component of the service is called Compass, designed to help site owners make data-informed decisions about their sites. If you're got a hypothesis, you can generate the appropriate content with Compass - even delivering test cases for A/B testing, and handling the statistical analysis of responses, comparing them to the rest of the users on the system as a whole. That means you get access to a huge control group, one that's larger than the users on your site.

Understanding the data you have is key, and there's a lot of rich behavioural data you can get from a service like BloomReach. Then you have to know what actions you can take on that data, to help reveal user intent. Tools like this help expose the semiotic relationships in a product feed, and help you understand user context in terms of the device they use and the store they're accessing (along with much more contextual information) - and whether a user buys or browses. De Datta suggests that this is about removing a mismatch in expectations between value-conscious consumers and brand-focused marketeers.

He notes that tools like Google have shifted consumer expectations, with smart search tools filling in the gaps. It's a shift from hierarchical to responsive, to unstructured, to machine learning. That means that sites need to use data to become relevant.

Delivering machine learning as a service like this makes a lot of sense. One thing we've learnt with the recent breakthroughs in machine learning and AI is that systems like this need scale: they need massive amounts of data, and they need massive amounts of compute. Cloud scale services like these are hard for retail businesses to build - and why should they need to?

Ecommerce companies are organisations that focus on selling products, on managing logistics, and on keeping track of increasingly complex international supply chains. If it's possible to buy a service from someone like BloomReach it saves them from having to invest in data science, and from having to invest in the at-scale infrastructure such services need.

Now read this

Editorial standards