What Overstock.com learns about its customers from decades of data

At nearly 20 years old, Overstock.com has been able to parse through mountains of data to get a 10,000-foot view of customers, as well as a granular understanding of shopping habits.

In the competitive retail industry, personalization strategies have become table stakes. But for a brand to really connect with a customer, it first has to know that customer. Data science and machine learning are making it easier for brands to get useful insight into their customers based on their behavior. Overstock.com, at nearly 20 years old, has a deep well of data to pull from.

Chris Robison, Overstock.com's Director of Data and Audience, talked to ZDNet about the different insights it can gain from its customer data -- what decades-old information about a customer can tell you versus the latest updates to their shopping cart.

Here are some highlights of the conversation:

Two decades' worth of data

Overstock.com is marketing roughly five-plus million products on a global scale, Robison noted.

"The very unique thing about our company that attracted me as a data scientist, data enthusiast here, is just that almost two decades' worth of rich data that we can actually mine," he said. "That puts in us in a very unique space in the e-commerce industry. What that amounts to for me as a data scientist and a marketer is years and years of rich web log data.

"We're able to actually go back and look at some of our early customer profiles and individuals who were maybe young parents at the turn of the century who are now starting to shop for that first set of dorm room furniture for an excited new undergraduate student. To see people through those actual transitions in life has been a really incredible data set and an incredible opportunity as a data scientist."

Old data vs. new data

The long-term data Overstock collects, Robison said, is useful for CRM purposes.

"We can actually look at these patterns of historically who has been shopping, who we attracted, to what does the longer-tailed nature of our core group of customers look like?" he said. "Where are they in terms of demographic data that we might be able to get from the census or a different third party platform? So it becomes sort of a what I would describe as a 10,000-foot view."

email.jpg

However, when it comes to converting window shoppers into buyers, Robison said some of the older data may not be as relevant.

"The way that customers interact with e-commerce websites has fundamentally changed," he said. "Online shopping, especially in the last five to 10 years, say, has become so ingrained into our day-to-day lives that those patterns and signals are really starkly different from maybe the early enthusiasts about online shopping that were coming to our site in '99 or 2000."

The data collected, Robison stressed, is from customers who opted into hosting Overstock's cookies on their browsers, which is aggregated with on-site information.

Marketing goal: Spotting propensity to purchase

When Robison joined Overstock about two years ago, his team's first task was to identify customers' "propensity to purchase," he said.

"The idea there is we'll comb over this web log data... and try and develop feature designs that help us identify when a customer is, say, just 'window shopping,' to use kind of the physical term, versus when they're actually ready to convert. How do we encode features including classic time series, lead lag variables, all the way up to event interactions that show us when someone's ready to kind of pull that trigger and make a purchase."

"Event interactions," Robison explained, could mean any action on the site. "Any time you go and click on maybe a different color swatch for a couch or you're viewing...any time you enter the site. Those end being individual events in our web log data."

Insights from disparate pieces of data

"Our job as data professionals is to take that very disparate data and roll it up into something meaningful in terms of features," Robison said. "Some of the interesting things we found were first off, confirmation of natural assumptions. People tend to browse and window shop more often during the day and may be more likely to convert in the evening.

"There's also a lot of interesting cross-device interaction. I, for one, do a lot of browsing on my cell phone, but if I'm making a large purchase, say, a sofa or a coffee table, I'm more likely to do that at home on my actual computer or a tablet. So different interaction points that show us is someone really gearing up and ready to convert. One of the more interesting ones we found were people tend to use their carts like a wishlist... Then individuals will go and start to remove items from their cart in a monotonic fashion, which signals to us they really are getting ready to convert because they're trying to meet that budget that they've set for themselves."

Moving models to production with Databricks

While Robison's task was in marketing, "There's similar challenges in any of these applications," he said, "and one of the largest challenges for anyone in the industry is that last mile of moving models that have been nicely honed by data professionals into an actual production environment."

Robison's team had been using Apache Spark, so it turned to Databricks to solve that problem. Databricks was formed by some of the creators of Apache Spark, and they push back the vast majority of open source contributions to the Apache Spark project. "It was a natural fit for the tool we were using and then looking at a managed platform that could help us productionize and accelerate our innovation using that tool," Robison said.

Dealing with different environments

"When I came to Overstock, a lot of the data scientists were running models and doing computations on individual machines or inside of Jupyter Notebooks, things of that nature," Robison said. "Being able to then take small sample data sets that you're prototyping a model on and figure out how to get that model so it's in line with a marketing process, say, pushing particular bids or propensities to a channel, tends to be very different.

"Then you're interacting with production databases, production environments. We need to come up with automated ways that we can ETL the data and get it rolled up into the format that we want. But there also needs to be checks and balances on top of those process."

The need for data validation processes

"If things are running in an automated fashion, in particular as these models move more towards real-time, there's a lot of opportunity for processes to kind of get out of control and run awry," Robison said. "You need monitoring, you need checks and balances. There's always data validation processes.

"If the fundamental behavior of our customers changes, maybe for a reason in the market or because for example this time of year we're heading into our peak season with Black Friday and the winter holidays. That's going to fundamentally change the distribution of any of my features. We need to have monitoring set up for that so one, we notice the problem before it becomes any real issue, but then it becomes a more interesting question of, 'Okay, has the behavior of our customers actually changed or did something break in my automated ETL process?' We need to be able to answer those questions very rapidly."