From big data to AI: Where are we now, and what is the road forward?

It took AI just a couple of years to go from undercurrent to mainstream. But despite rapid progress on many fronts, AI still is something few understand and fewer yet can master. Here are some pointers on how to make it work for you, regardless of where you are in your AI journey.

In 2016, the AI hype was just beginning, and many people were still cautious when mentioning the term "AI". After all, many of us have been indoctrinated for years to avoid this term, as something that had spread confusion, over-promised, and under-delivered. As it turned out, the path from big data and analytics to AI is a natural one.

Not just because it helps people relate and adjust their mental models, or because big data and analytics were enjoying the kind of hype AI has now, before they were overshadowed by AI. But mostly because it takes data -- big or not-so-big -- to build AI.

ZDNet: Black Friday 2018 deals: Business Bargain Hunter's top picks | Cyber Monday 2018 deals: Business Bargain Hunter's top picks

It also takes some other key ingredients. So, let's revisit Big Data Spain (BDS), one of the biggest and most forward-thinking events in Europe, which marked the transition from big data to AI a couple of years back, and try to answer some questions on AI based on what we got from its stellar lineup and lively crowd last week.

Can you fake it till you make it?

Short answer: No, not really. One of the points in that Gartner analytics maturity model was that if you want to build AI capabilities (the predictive and prescriptive end of the spectrum), you have to do it on a solid big data foundation (the descriptive and diagnostic end of the spectrum).

Part of that is all about the ability to store and process massive amounts of data, but that really is just the tip of the iceberg. Technical solutions for this are in abundance these days, but as fellow ZDNet contributor Tony Baer put it, to build AI, you should not forget about people and processes.

Also: MemSQL 6.7 brings free tier and performance enhancement: Transactions for nothing, and faster queries for free

More concretely: Don't forget about data literacy and data governance in your organization. It has been pointed out time and again, but these really are table stakes. So, if you think you can develop AI solutions in your organization by somehow leapfrogging the evolutionary chain of analytics, better think again.

analytic-maturity.jpg
Gartner's analytics maturity model may be a good starting point to explain and prepare the transition to AI. (Image: Gartner)

As Oscar Mendez, Stratio CEO, emphasized in his keynote, to go beyond flashy AI with often poor underpinnings, a holistic approach is needed. Getting your data infrastructure and governance right, and finding and training the right machine learning (ML) models on this can yield impressive results. But there is a limit to how far these can take you, amply demonstrated by everyday fails by the likes of Alexa, Cortana, and Siri.

The key message here is that bringing context and reasoning capabilities in play is needed to more closely emulate human intelligence. Mendez is not alone in this, as this is something shared by AI researchers such as Yoshua Bengio, one of Deep Learning's top minds. Deep Learning (DL) excels in pattern matching, and the data and compute explosion can make it outperform humans in tasks based on pattern matching.

Also: ScyllaDB achieves Cassandra feature parity, adds HTAP, cloud, and Kubernetes support

Intelligence, however, is not all about pattern matching. Reasoning capabilities cannot be built on ML approaches alone -- at least not for the time being. So what is needed is a way to integrate less hyped AI approaches, of the so-called symbolic line: Knowledge representation and reasoning, ontologies, and the like. This is a message we have been advocating, and to see it take center-stage in BDS was an affirmation.

Should you outsource it?

Short answer: Perhaps, but you should be very considerate about it. So, let's not beat around the bush: AI is hard. Yes, you should definitely build on foundational capabilities such as data governance, because this is good for your organization anyway. Some, like Telefonica, have managed to get from Big Data to AI, by executing strategic initiatives. But it's no easy feat.

This point has been validated by what is probably the most reliable survey for ML adoption, answered by more than 11K respondents. Paco Nathan from Derwen presented results and insights from an O'Reilly survey he has instrumented, which more or less confirmed what we knew: There is a growing gap between the AI haves and have-nots.

Also: Manyverse and Scuttlebutt: A human-centric technology stack for social applications

On the one end of the spectrum, we have the Googles and Microsofts of the world: Organizations applying AI as a core element of their strategy and operation. Their resources, data, and know-how are such that they are leading the AI race. And then there are also adopters, working on applying AI in their domains, and laggards, buried too deep in technical debt to be able to do anything meaningful in terms of AI adoption.

p0043.jpg
Leaders, adopters, laggards -- the machine learning version. (Image: Paco Nathan / Derwen)

AI leaders have offerings that, on the face of it, seem to "democratize" AI. Both Google and Microsoft presented those in BDS, showcasing, for example, demos in which an image recognition application was built in a point and click fashion in a few minutes.

The message was clear: Let us worry about models and training, and you focus on the specifics of your domain. We can identify mechanical parts, for example -- just feed us with your specific mechanical parts, and you are good to go.

Google also announced some new offerings in BDS: Kubeflow and AI Hub. The idea behind them is to orchestrate ML pipelines similarly to what Kubernetes does for Docker containers for applications, and to become a Github for ML models, respectively. These are not the only offerings that promise similar advantages. They sound alluring, but should you use them?

Also: Pretty low level, pretty big deal: Apache Kafka and Confluent Open Source go mainstream

Who would not want to jump the AI queue, and get results here and now without all the hassle, right? This is indeed a pragmatic approach, and one that can get you ahead of the competition. There's just one problem there: If you outsource your AI entirely, you are not going to develop the skills required to be self-sufficient in the mid-to-long term.

Think of Digital Transformation. Yes, going digital, exploring technologies and redesigning processes is hard. Not all organizations got it, or dedicated enough resources to it. But the ones that did are now ahead of the curve. AI has similar, if not greater, potential to disrupt and differentiate. So, while getting immediate results is great, investment in AI should still be seen as a strategic priority.

The one part you can be less skeptical about outsourcing is infrastructure. For most organizations, the numbers of maintaining your own infrastructure just don't add up at this point. The economy of scale, head start, and peace of mind that running your infrastructure in the cloud can give are substantial benefits.

Where do we go from here?

Short answer: To the moon and back. It seems like the ML feedback loop is in full swing. So, while adopters are trying to keep up and laggards keep lagging, leaders are getting more and more advanced.

As pointed out in Google Partner Engineering Iberia / Italy Pablo Carrier's presentation, compute is going to grow exponentially if you try to improve accuracy in DL linearly. In the past six years there was a 10-million fold increase in compute. That's hard to keep up with even if you are Google Cloud, let alone if you are not.

A rising trend in DL is distribution. In an overview shown in another Google presentation, by Viacheslav Kovalevskyi, technical lead at Google Cloud AI, a word of warning was shared before embarking on the theory and practice of distributed DL: If possible, avoid it. If you really must do it, be aware there is an overhead associated with distribution, and be prepared to pay the price, both in terms of compute and complexity and in terms of footing bills.

Also: Opinionated and open machine learning: The nuances of using Facebook's PyTorch

Kovalevskyi offered a historical perspective on the different ways of using distributed DL -- distributing the data, the model, or both. Distributing data is the easiest approach, distributing both is the hardest. But, in any case, distributed DL is "fairy tale zone" -- you will not get a k-times increase in performance by increasing your compute times k.

Of course, Google's presentation was focused on TensorFlow on Google Cloud, but this is not the only way to go. Databricks has just announced support for HorovodRunner to faciliate distributed DL using Horovod. Horovod is an open source framework, introduced by Uber, also utilized by Google. It's not the only game in town though.

In a presentation given by Marck Vaisman, Microsoft data scientist and Azure data/AI technical specialist, alternatives were presented, using both Python and R without Spark in the mix. Dask, an open source library for Python, was highlighted. Dask promises advanced parallelism for analytics, working in tandem with projects like Numpy, Pandas, and Scikit-Learn.

Also: Neuton: A new, disruptive neural network framework for AI applications

And last but not least, graphs and graph databases were also a key theme throughout BDS: Microsoft's knowledge graph, AWS Neptune, and Oracle Labs using graph analytics with Notebooks. As this is a topic we are following closely, we'll be revisiting it shortly. Another Google insight to mention here: an internal analysis showed that most of Google's ML models operate on structured data.

Cloud, distribution, and bringing structure to ML via graphs are some key themes to keep in mind for the future. We will continue to cover those as the journey from Big Data to AI progresses.

Previous and related coverage:

What is AI? Everything you need to know

An executive guide to artificial intelligence, from machine learning and general AI to neural networks.

What is deep learning? Everything you need to know

The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.

What is machine learning? Everything you need to know

This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.

What is cloud computing? Everything you need to know about

An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.

Best Black Friday 2018 deals: