IBM made its support for Apache Spark clear back in June at Spark Summit, when it announced a $300 million commitment to Spark -- including dedication of 3500 researchers and the establishment of a Spark Technology Center in San Francisco. At its own IBM Insight event in Las Vegas today, the company is announcing availability of IBM Analytics on Apache Spark, a Spark-as-a-service offering as part the IBM Bluemix cloud.
Spark and data and code, oh my!
By offering Spark in the Bluemix environment, IBM will integrate it with its other cloud data and analytics services, including the Cloudant NoSQL offering and the dashDB cloud data warehouse service. And because Bluemix is essentially an application development cloud, IBM feels it's in a good position to connect the dots from code to database, to Big Data, to analytics. And, in that spirit, IBM Analytics on Apache Spark will support working with Spark using Python-based code notebooks -- a feature also supported on the Databricks Cloud Spark platform.
And data feeds, too
Along with the Spark offering, IBM is unveiling what it calls its Insight Cloud Services, which features "external data about people, events, geospatial and businesses from sources such as Twitter and The Weather Company," according to IBM's press release. Clearly, IBM is trying to provide a complete analytics workbench, with the ability to enrich a customer's own data with external data feeds, then perform analytics on that enriched data using Spark.
(Not) losing its religion
When I spoke with Derek Schoettle, General Manager of IBM Cloud Data Services (CDS) and CEO of Cloudant before that company was acquired by IBM, he provided some color and context around IBM's Spark enthusiasm. Schoettle explained that since Spark is not only a parallel Big Data processing platform, but also one that handles machine learning, SQL access, graph engine analysis and streaming data analytics (albeit via micro-batch processing), IBM sees Spark as an all-encompassing environment for working with data.
That religion is so strong, that, according to Schoettle, IBM has replatformed some fifteen of its own commerce and analytics products onto Spark. It took its DataWorks product for ETL/data prep and reduced its source code line count, he said, from 40 million down to 5 million.
Is Spark IBM's data platform glue?
Arguably no one in the industry has the data and analytics surface area of IBM. Remember, this is the company that has DB2, Watson, Netezza, Cognos, TM1, SPSS, DataStage, Informix, Cloudant and the BigInsights Hadoop distribution all under its roof. If IBM could federate all of those platforms around Apache Spark, and do it in the cloud, that would be a major, end-to-end, concrete demonstration of Spark's power and viability as the modern data analytics lifeblood.
IBM has its work cut out, though, as pulling off such a sweeping realignment of decades of home-grown and acquired technology will be anything but safe or easy. But if IBM can get traction in this campaign, they will become worthy of everyone's careful attention.