Data to analytics to AI: From descriptive to predictive analytics
Artificial Intelligence (AI) seems to be the buzzword du jour for organizations, but this is not an obvious or straightforward transition even for those building advanced products and platforms. Many organizations are still struggling with digital transformation to become data-driven, so how should they approach this new challenge?
Last week saw a number of interesting events in Europe, including Big Data Spain in Madrid and GOTO in Berlin. Both had presence from key industry figures and organizations, were well-attended (in the range of 1000 participants give or take), well-organized, and forward-looking.
There is a chain of evolution in analytics, ranging from descriptive to diagnostic to predictive, and culminating with prescriptive, according to Gartner's classification. Many organizations are still in the descriptive stage, utilizing more or less traditional BI approaches: get all your data together and use visualization to get quick views on what has happened.
Diagnostic analytics is about figuring out why did an event happen, and uses techniques such as drill-down, data discovery, data mining, and correlations. Most analytics frameworks have been incorporating such features in their offerings.
But where things get really interesting is when trying to use predictive analytics to project what will happen. Typically this is done by using existing data to train predictive machine learning (ML) models. And this is why according to Ravichandran, analytics is part of the evolution that leads to AI.
Whether being able to make predictions using machine learning constitutes AI, and whether having analytics in place is a prerequisite for this, are key questions to ask here.
AI solutions have been developed and used long before analytics. Expert systems are an example that has been in use with varying levels of success for years, in domains like medicine and agriculture by organizations that were not necessarily analytics-aware.
So for AI experts, the view that analytics is a prerequisite for AI may sound strange at first. But one has to consider the difference in context: in traditional AI, knowledge bases have been typically assembled and curated by soliciting expert knowledge and treated as the single version of the truth.
So for people in the analytics business, a recipe that goes from analytics to AI seems like a natural and pragmatic progression, and concerns related to data cleaning, reliability, location, and integration are indeed prerequisites.
Machine Learning challenges
Before tackling the question of whether machine learning constitutes AI, let's take a minute to reflect on what it takes to get ML right. The elusiveness of data scientists and the diversity and scarcity of their skills is an often discussed topic, and having all people involved in ML projects align around a clearly defined value proposition is not trivial.
As any seasoned engineer knows, building the right thing is fundamental, even more so than building the thing right.
The Machine Learning Canvas (MLC) is a tool introduced by Louis Dorard from papis.io to ensure that teams working on ML projects have a clear shared understanding of the project, what it is out to achieve and how it will go about it. It is modeled after the well-known business model canvas, and covers aspects ranging from mission statement to data sources and features. MLC is meant to help teams choose the right algorithm / infrastructure / ML solution prior to implementation, as well as to guide project management.
But even though tools like MLC can helps cross-functional teams align and coordinate, they cannot resolve the wide array of issues around ML projects. Proposals and technological stacks for automating ML have been put forward by industry veterans, but despite their state-of-the-art status such pipelines require a high level of skills across a range of technologies, and they remain cryptic to approach for most.
As has been noted by people working on ML projects of high complexity at Google, ML projects are difficult to debug, revise incrementally and verify, algorithmic transparency is challenging, components are hard to isolate, automated integration introduces unusual risks, and as a result technical debt accumulates more readily.
At this time, there is no known solution to these issues, but pointing them out is one step in the right direction. We will explore attempts to address some of them, as well as the question on the relation between ML, data, metadata, and the latest advances in Deep Learning and AI in the next part of this post.
The author has been granted admission for GOTO and travel expenses & admission for Big Data Spain by the organizers of the events. Slide decks for talks given in Big Data Spain by Paco Nathan, Natalino Busa, and Louis Dorard have also been provided by the organizers. Views expressed by Ravichandran are personal and do not necessarily reflect those of his employer.