Artificial intelligence on Hadoop: Does it make sense?

MapR just announced QSS, a new offering that enables the training of complex deep learning algorithms. We take a look at what QSS can offer, and examine AI on the Hadoop landscape.
Written by George Anadiotis, Contributor

Hadoop is becoming a substrate for artificial intelligence

Getty Images/iStockphoto -- MapR

This week MapR announced a new solution called Quick Start Solution (QSS), focusing on deep learning applications. MapR touts QSS as a distributed deep learning (DL) product and services offering that enables the training of complex deep learning algorithms at scale.

Here's the idea: deep learning requires lots of data, and it is complex. If MapR's Converged Data Platform is your data backbone, then QSS gives you what you need to use your data for DL applications. It makes sense, and it is in line with MapR's strategy.

MapR is the first Hadoop vendor with an offering that is marketed as what we'd call artificial intelligence (AI) on Hadoop. But does AI on Hadoop make sense more broadly? And what are other Hadoop vendors doing there?

MapR does deep learning

Remember when Hadoop first came out? It was a platform with many advantages, but required its users to go the extra mile to be able to use it. That has changed. Now Hadoop is a burgeoning ecosystem, and a big part of its success is due to what we call SQL-on-Hadoop.

Hadoop has always been able to store and process lots of data for cheap. But it was not until support for accessing that data via SQL became good enough that Hadoop became a serious contender as the enterprise data backbone. SQL was, and still is, the de-facto standard for accessing data. So supporting it meant that Hadoop could be used by mostly everyone.

AI and SQL are different. It's not a backwards compatibility, commodity feature. AI is a forward-looking, trending field. But even if today AI is a differentiator for those who have it, it looks like it will soon be somewhat of a commodity as well: those who do not have will not be able to compete.

AI and SQL are also similar: If you are a Hadoop vendor, this is not really what you do. This is something others do -- you just need to make sure that it can run on your platform, where all the data is. This is what MapR is out to achieve with QSS too.

MapR leverages open source container technology (think Docker), and orchestration technology (think Kubernetes) to deploy deep learning tools (think TensorFlow) in a distributed fashion. None of this technology has to do with MapR, but the value QSS brings is in making sure everything works together seamlessly.


The distributed deep learning MapR's QSS proposes has three layers. The bottom layer is the data layer, the middle layer is the orchestration layer, and the top layer is the application layer.

Image: MapR

Ted Dunning, MapR chief application architect, explains: "The best approach for pursuing AI/Deep learning is to deploy a scalable converged data platform that supports the latest deep learning technologies with an underlying enterprise data fabric with virtually limitless scale."

He also notes that "almost all of the machine learning software is being developed independently of Hadoop and Spark. This requires a platform like MapR that is capable of supporting both Hadoop/Spark workloads as well as traditional file system APIs."

And since that works, why don't you also use MapR-DB and MapR Streams and MapR-FS to feed your data and MapR Persistent Application Client Container (PACC) to deploy your model? Oh and we've got services for you too -- we'll help you. That is MapR's message with QSS.

Anil Gadre, MapR chief product officer, says: "DL can provide profound transformational opportunities for an organization. Our expertise...coupled with [our] unique design...form the foundation for [QSS]. QSS will enable companies to quickly take advantage of modern GPU-based architectures and set them on the right path for scaling their DL efforts."

AI on Hadoop

So, is AI on Hadoop a thing? Unlike SQL, there is no standard for AI. There is no widely accepted and understood definition even. DL is only a part of machine learning (ML) which is only a part of AI. And even within DL, while there may be some shared concepts, there is no such thing as a common API. So QSS is DL on Hadoop, but not really AI on Hadoop.


There is more to AI than machine learning, and there is more to machine learning than deep learning.

Image: Nvidia

The notion of using a data and compute platform like Hadoop as the substrate for AI is a natural one. But being able to run ML or DL on Hadoop does not really make a Hadoop vendor an AI vendor too. This is a discussion we've been having with many Hadoop vendor executives over the last few months.

For Cloudera CEO Tom Reilly, "ML is very real and very active, it's here and now and it's doing great things in practice. Our customers are trying to understand AI and what lies in their journey to the future. We are helping them with ML, our platform already supports ML and will continue to add support for it. We think of our platform as the host of the data people will use for AI".

Cloudera has been criticized for trying to pose as an AI company in its recent IPO filing. To the best of our knowledge, Cloudera does not have extensive internal expertise on AI. There is a data science team, comprised of a handful of people, and there is also the recent acquisition of sense.io.

Sense.io has been integrated in Cloudera's stack and repurposed as Cloudera Data Science Workbench (CDSW). In a recent discussion with Sean Owen, Cloudera Data Science Director, Owen compared sense.io to IBM's DataWorks.

"By providing ready access to data, CDSW decreases time to value of AI applications delivered with our automated ML platform", notes Jeremy Achin, DataRobot CEO, in Cloudera's press release for CDSW.

CDSW sounds like a useful tool, but not really having much to do with ML, let alone AI. The reason is obvious in Achin's statement: having access to data is great, but the ML really happens elsewhere. Cloudera could ramp up ML libraries for Hadoop / Spark, but this is not what CDSW is about.

For Scott Gnau, Hortonworks CTO, AI is comprised of two key components: loads of data plus packaging and algorithms to traverse the data. Hortonworks supports both, and as AI wins, Hortonworks wins as well. Gnau, however, emphasizes what he sees as Hortonworks' strengths, namely enterprise governance and security.

Gnau believes we are yet to see emerging technology in AI that we have not yet dreamt of. So Hortonworks' approach is to invest in infrastructure and to be the trusted purveyor of data, while keeping an eye on emergent killer technology and applications it can plug in from an application perspective.

Each vendor's approach has to be seen in the context of where they are now and how they see themselves evolving. AI is a new battlefield that vendors approach in line with their philosophy and goals. We will continue with an analysis of how these are manifested in AI in a subsequent post.

Editorial standards