This year, we've had the chance to chat in depth high up the data scientist totem pole on how to operationalize AI projects. We've found that data science is the fundamental building block. We've also found that, while data and insight are the fuel, collaboration and agility are lubricants that grease the skids. Ultimately, that places a premium on the people and process sides of AI projects. The spotlight might be on the skills, the access to powerful GPUs, and the frameworks for developing the algorithms.
Giovanni Romero, a partner who heads analytics and consulting at Mindshare, a full service media network that is part of WPP, has had a front row seat on how digital exhaust has expanded the analytics footprint. Their client base – mostly consumer brands – have traditionally been disintermediated from the consumer, with feedback loops typically coming from focus groups or surveys that only sampled the population. Digital changed all that, starting with clickstreams, which in turn brought targeted digital ads that now account for about half of all advertising. With it came, not only direct interaction with consumers, but also, lots of data.
In the ad tech and advertisement spaces, we've seen this movie quite a bit, as data warehouses got supplemented, or in some cases replaced by a Hadoop cluster, and then to data lakes with unlimited compute and storage in the cloud. And following the hype, nobody wanted to be called statisticians or data miners anymore, as it became the age of the data scientist.
And as we've noted, data scientists now want machine learning or AI on their business cards. But as we found in our research, the moment you start replacing those static data science models with machine learning models, the project lifecycle gets a lot more complicated. Unlike conventional data science projects, adding in machine learning or AI adds steps like getting and labeling training, tuning hyperparameters, and constantly monitoring, not only the physical performance, but the outcomes of models to ensure alignment.
Also: Top 5: Ways AI will change business TechRepublic
And, before we forget, one other thing: machine learning and AI projects need a lot more data. And that means that you'll probably need more assistance from data engineers and developers, not only to get models deployed, but also to ingest, transform, and secure data. Data scientists get plenty bogged down in wrestling with data; if they get enough help from data engineers, hopefully that burden will shrink to only half their work. And then, try to translate that model that you developed on your laptop to physically scale in a cluster; let's hope it doesn't degenerate to a game of telephone.
And once your data science teams have expended all that effort in building, deploying, and maintaining ML models, the last thing you want them to do is reinvent the wheel. As a global organization, Mindshare's Romero saw teams at their many offices getting silo'ed, not learning from each other's insights or mistakes, and at worst, duplicating work. Ironically, the unlimited resources of the cloud might be exacerbating problems, as now, all of the teams have access to all of the compute and all of the data. but they can't readily share notebooks.
It's not surprising that, going to a big data expo, much of the growth in startups has been with tools that can help data science teams collaborate and get better grips on the modeling lifecycle. Mindshare chose Domino Data Lab as the collaboration hub for sharing notebooks. Given the distributed nature of the teams, one of the biggest hurdles was getting the tool sync'ed with Active Directory to ensure robust authentication.
The other hurdle is cultural. By nature, data scientists are likely to be iconoclasts, as you need people capable of independent thought. The fact that there are multiple languages, modeling frameworks, and approaches is not simply a reflection that there is no one-size-fits-all solution, but also reflects the fact that data scientists have very strong loyalty to their tools and languages. And so weaning data scientists and developers off their laptops is an ongoing battle that Romero admits is still not over yet.
But a year into adopting Domino as their collaboration tool, Romero can point to five applications in production and a user base that has grown to 2000, including data scientists, data engineers, developers, as well as generalist planners and analysts. For instance, one of those apps analyzes and predicts the effectiveness of different visual images in ads. He reports that there are another 10 applications now in the pipeline.
The biggest lesson according to Romero is that people are as important as tools. And that means attention, both to changing the culture to spur more collaboration, along with technical skills, not only for training practitioners on languages, new frameworks, or techniques for exploiting the newest GPU hardware, but also on how to communicate effectively. And for practitioners, it means helping them understand how models work so they can, not only collaborate with data scientists, but also have realistic expectations.
Previous and related coverage:
An executive guide to artificial intelligence, from machine learning and general AI to neural networks.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.
An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.