Marketing and advertising have enormous influence on society at large -- business, technology, media, culture, and data. So, discussing with people working on the intersection of those can offer some insights on the state of the union of Big Data and Ad Tech.
Advertising is a multi-billion dollar business that has been going through the process of digital transformation for a couple of decades already. Some of today's most advanced, powerful, and influential companies have advertising embedded in their core.
A good part of the innovation that has been driving big data has come about as a response to the needs of advertising at scale before getting a life of its own. MapReduce, for example, the blueprint for Hadoop's first incarnation, was originally developed and deployed at scale at Google.
Mike Driscoll, CEO of Metamarkets, points out that marketing is being digitally transformed and marketers are following suite. Metamarkets is part of the Ad Tech wave, and its core business is to provide marketers with insights on their digital presence.
"The future will be digital," Driscoll says. "CMOs (Chief Marketing Officers) are turning to CMTOs (Chief Marketing Technical Officers). But as marketers are going digital, they are also starting to have less trust in some of the channels they're buying from. Investing in technology means they are now able to hold their partners accountable."
The findings of the survey show that almost half of the brands using programmatic media buying believe lack of transparency is inhibiting its future growth and scale. But what do we talk about when we talk about transparency here?
It's not hard to see where this is going: Google and Facebook dominating the market and dictating their terms. This presents a problem for all parties involved. For media, dependence on advertising translates as dependence on the Big Two. Media are trying to find ways to cope and come up with new models of doing business while maintaining their editorial independence.
Obviously, this presents a problem for advertisers and their clients. "Marketers want more transparency. They would like to get a receipt for what they buy, instead of a powerpoint and a good story," Driscoll says. "Brands are asking from their partners to provide better analytics. Historically, channels have been providing results, but not analytics on the results.
Digital transformation means that we don't just buy goods anymore, we also buy data about the goods. Take AWS, for example: When they started, they just provided the service, but by now, you also get analytics to go with it. Major channels need to invest more not just in internal technology, but also in providing better data access to their partners."
The big guys will just not share
But, seriously, this is Google and Facebook we're talking about. Are we to believe that the most iconic data-driven organizations in the world can't make the right data available to marketers? "Ask any marketer and they'll tell you -- the big guys will just not share. It's not in their interest to be transparent, but rather to be as less transparent as possible," Driscoll says.
So, what can be done to deal with this? The real power of marketers is the power to check them, according to Driscoll. "If you look at the leaders emerging in Fortune 500, the next generation of marketers are technologists, and they are demanding independent audits and data. Consider this:
For a long time, advertisers wanted to know if their ads were viewed or not. Facebook says, 'OK, we'll measure it ourselves.' And they got away with it for a while. They reported their own view-ability stats, just like NBC used to report how many people viewed their own shows.
That does not mean Facebook and Co., will give away all of their data -- you also have to consider privacy issues here. But when you talk to brands, even though they will not say that in public, they are actually doing that. When you have budgets in the tends of millions, you can do that -- pull data out and do what every marketer would like to do: Build a unified view over their channels."
"We've been hearing rumours about the Congress getting involved, but for most businesses that would be the last resort. It's not the ideal solution for marketers or media companies, especially considering the all-time low approval ratings in the US right now," Driscoll says. For him, the answer is in marketers investing more in analytics.
On the one end of the continuum, organizations can do it all themselves, using infrastructure like Hadoop and analytics tools that sit on top and can help them collect and analyze the data they need. On the other end, Metamarkets touts itself as the right solution for marketers.
Metamarkets is a domain-specific solution that builds on four pillars: Fast data exploration, intuitive visualization, collaboration, and intelligence. Driscoll elaborates: "Scale is a requirement, and we are quickly moving towards streaming events and data.
Interactive visualization helps you understand what's going on. You need more than dashboards. Dashboards may update, but the questions they answer stay the same. You need collaboration -- like Slack for data, that helps teams communicate and share methods and insights.
And you need intelligence. In analytics, you spend 80 percent of your time preparing data and 20 percent actually doing analysis. We have ETL connectors for a multitude of platforms that help get the data where you need them. Plus, it's one thing to show data, and another thing to search for insights."
Metamarkets tries to look at what analysts do and automate that to suggest root causes. For example, a campaign running behind targets is something that can be monitored using metrics. But to get to the reason why this is happening, an analyst would slice and dice data per region or demographics.
Metamarkets says they can automate this process and suggest root causes, evolving from tracking statistical significant signals to deriving business-focused insights. "We let analysts specify metrics they are interested in, and then perform root cause analysis for them. We believe in machine and human working side by side, not in replacing analysts, but in giving them super-powers," Driscoll says.
Data at advertising scale and the future of pipelines
As Metamarkets has been on the forefront of data at advertising scale, and Driscoll himself has served as its CTO, he shared some insights on the evolution of big data architecture: "We have been pushing the limits of scale, so we encounter problems before others do," he says.
This has resulted in MetaMarkets developing and releasing Druid, an open-source distributed column store. "We created Druid because we needed it and it did not exist, so we had to build it. And then we open sourced it, because if we had not, something else would have come along and replaced it."
"We have the largest deployment in production, and we love being part of the community. Druid is used by the likes of Airbnb and Ali Baba. But we have no plans of building a business around it. We don't believe the future is around data infrastructure, which is becoming a commodity, and we don't want to be competing against the Googles of the world there.
Sure, this may be working for companies built around Hadoop, but commercialization of open source needs widespread adoption to succeed. But I can tell you that Cloudera and Hortonworks are looking to add Druid to their stack and to the range of services they offer."
Driscoll does not believe in horizontally expanding Metamarkets, even though its experience in building data pipelines at scale could in theory be applied to other domains beyond advertising. Its own pipeline has been evolving, going from Hadoop to Spark and from Storm to Samza.
"Spark is more mature and it meets our needs at this point, and we also feel about the same way about Samza," he says. "But we see streaming as the future of our pipeline. When you work with streaming, there's a sort of CAP theorem equivalent that applies there.
In distributed data stores, you have consistency, availability and partition tolerance, and you can pick two of those that your system supports simultaneously. In streaming data, you have accuracy, velocity, and volume, and your system can only support two of those simultaneously.
This is why we think the model supported by Apache Beam, Google Data Flow, and Apache Flink will be key going forward. When streaming at scale, there's no such thing as objective truth, so you have to rely on statistical approximation and on using watermarks.
Do we see our current Lambda architecture giving way to a flattened, Kappa architecture? When you work on the bleeding edge of real-time architecture, the ability of organizations like Metamarkets that are in the business of integrating data from other sources is important.
But when it comes to other companies, not many are yet at the point where they can stream data out. Only the most sophisticated, agile companies out there are able to do this. At this point, only about 50 percent of our clients are there."