Predictive analytics and machine learning working separately or together can be just what a company needs to succeed. But understanding how they work is key to figuring out how they can help businesses thrive.
So, what is predictive analytics? Datafloq's Mark van Rijmenam uses the car metaphor, according to which traditional, descriptive analytics is like looking at the rear-view mirror to see what has happened, while predictive analytics is using a navigation system to tell you what will happen, and prescriptive analytics is a self-driving car that knows how to take you to your destination.
This metaphor, while easy to comprehend, may also be deceptively simple. It certainly is open to interpretation, so it's a good starting point for discussion. Some might say that a navigation system presumably has access to all the data regarding potential routes. So is suggesting a route based on that data really a prediction? Isn't that something algorithmic, deterministic, thus not really "intelligent"? Or is this a matter of definitions -- semantics?
It depends on how a navigation system is defined and how it works. Typically, navigation systems do not try to predict where do you want to go today. What they do instead is they wait to get specific instructions and then they figure out how to get from point A (either explicitly given as the starting point or calculated using GPS geo-location) to point B.
Let us examine a different example: Boarding Gate Readers (BGRs). BGRs are able to indicate whether a certain person should be granted access to a certain area of an airport at a certain time. For non-tech people, this is equally mystifying as a navigation system: how does the system "know" what to do, what the right answer/action is?
For techies, both examples are nothing to write home about: there is a database with all the information (streets and distances, passenger lists), there is an algorithm determining the output for the given input (fastest route from A to B, whether passenger X is in the list for flight Y), there is a medium that connects the system with the outside world (GPS position, bar-code reader). In fact, there is no real prediction involved in either system.
When looked at under that lens, these systems may differ in terms of implementation details and complexity of algorithms and data, but they are fundamentally not that far apart. Still, while few people in the tech industry would classify a BGR as a predictive system, presumably some would do so for a navigation system. Is the fact that BGRs respond with a binary (access/no access) answer, while a navigator responds with specific instructions a differentiating factor?
To answer this, let's look at another example: identifying malware. As described by Kaspersky's Alexey Malanov, this used to be possible using rather straightforward algorithms and rules. At some point, the search space (i.e. the number of potential malware to identify) became so big and started expanding so fast that it was very hard to devise rules that would cover it in its entirety and keep up to date. Hence, enter Machine Learning (ML).
Malanov shows how ML can be used to perform the same task -- identifying malware -- more efficiently. The essence of how this works is by using an algorithm implementing heuristic rules based on metrics (in this case, letter sequence frequency) and a curated dataset to train the algorithm. The process is different, there are quite a few gotchas along the way, but the end result is basically the same: the ability to respond to input with a binary answer of malware/not malware.
So, is a navigator all that different? The two examples share some similarities -- they have a big search space and devising algorithms to cover it in its entirety is pretty hard. What Malanov's example shows is how a ML algorithm works as a function that classifies input into binary output. The same principle can be extended to non-binary outputs, such as choosing a route from A to B.
This is actually an optimization problem. The optimal solution for getting from A to B would be to drive in a straight line between the 2 points. This however is not possible, as there are only certain routes that offer unobstructed access from A to B. One way to approach this would be to encode a set of rules that define what is and is not possible when driving and then use the navigator's database in conjunction with the rules to figure out what a/the best way to get there.
The ML way to approach the same problem would be to get data on the routes people have used to go from A to B and use that to train an algorithm. In this case, there may be many alternatives for the same route, so simply responding with a "yes/no" would not do. But the same principle can be applied to classify inputs into more than two potential bins of outputs -- what is known in ML terminology as multiclass classification. A simplistic classification for potential routes could be something like "Impossible," "Bad," "Good," or "Optimal."
Presumably however, most navigators don't work utilizing ML -- at least not for their core function. Malanov touches upon some of the reasons why ML is not a panacea: False positives, Model bypass, Model update. While valid, these may not actually be the most serious drawbacks of using ML. There seems to be a widely popular misconception at the moment, that ML is something that automagically works out of the box -- you just need to throw data at it. But as Oren Etzioni of AI2 put it, "99% of machine learning is human work."
There is human work involved in finding, devising, selecting, and combining the right algorithms for the task at hand, in finding and appropriately labeling datasets to train the algorithms, in fine-tuning system parameters and so on. But equally importantly, there are cases for which ML is a great tool, others for which it is ill-suited, and others for which it needs to be combined with other techniques.
In the navigator example, one idea would be to use ML to predict where users want to go, based on their starting point and common routes that start there. But it's important to understand that the quality of predictions depends on a number of factors: how big and diverse of a dataset we have (if only a few people use the system and they only go from home to work and back, these are the kind of "predictions" we will get), what kind of parameters we factor in our prediction model (using time of day probably is a good idea, driver age maybe, car color probably not) and how well our model works for our scenario.
More importantly however, not even the best combination of datasets, parameters, and models can guarantee predictions in real-world, wicked problems. We have recently seen how predicting election results has been failing since it has first been attempted. Another famous example is predicting forex: while clearly central bank decisions are a defining parameter here and presumably factored into every model, they depend on factors of such qualitative and quantitative complexity that they are practically impossible to predict without inside information, as recently showcased in the case of the swiss franc.
But would using ML really be the best solution to the core problem of finding a route? Why go into the trouble of finding data and training an algorithm to implement behavior in a domain that is well-understood and can come down to a set of rules that function independently of whether there are data on what others do, avoiding the cold start issue?
What about BGR? BGRs are hopelessly deterministic: they respond with a yes/no with 100 percent accuracy -- assuming all parts of the system function properly. And in this case, that's a good thing. One could conceive a ML-powered BGR that could take into account all sorts of properties of passengers past to "predict" whether each new passenger is entitled to access the boarding area. But is this something we'd actually like to have?
Why do this when the requirements here point towards a simple database and a couple of simple procedural rules that do not require any training or labeling? And what kind of features could someone possibly use to classify passenger eligibility? Age, height, income, ethnicity, facial features? Along similar lines of thinking, ML has been applied to do things like classifying people into certain groups based on their facial features, raising widespread criticism and bringing back memories of 19th-century phrenology.
Examples of biases and failures in algorithms abound, but even when not taken to extremes, ML is not something that just works on its own. ML is a good way to identify patterns and potential correlations in data that can greatly assist humans. For example, it can help identify common spelling errors, patterns in wrangling data, or even correlations in data that can serve as starting point for tagging, schema management, and ontology creation. But while extremely useful, none of these really constitutes Artificial Intelligence (AI).
ML, albeit often complex in its implementation and impressive in its results, is algorithmic and deterministic. There is nothing magic about it and it certainly does not constitute AI, in the sense of Artificial General Intelligence. At best, it can be considered a building block for Artificial Narrow Intelligence. But these definitions deserve more thorough examination, along with Deep Learning, the use of ML in conjunction with Semantics, the human in the loop and the road to prescriptive analytics.