IBM today pledged to put the Apache Spark data processing platform center stage in its cloud services.
The technology giant plans to embed Spark into its analytics and commerce offerings, and to offer Spark as a cloud service on its Bluemix platform.
Spark was started in 2009 as a UC Berkeley research project to create a clustering computing framework addressing target workloads poorly served by Hadoop. It went open source in 2010 and last year had more than 450 contributors. Its creators went on to found Databricks.
Spark has various advantages over Hadoop's MapReduce execution engine when it comes to processing big data, in both the speed with which it carries out batch processing jobs and the wider range of computing workloads it can handle. Spark SQL supports a HiveQL-compatible SQL execution environment; Spark's MLLib enables machine learning; Spark Streaming provides for high-speed stream processing of data and GraphX provide for graph processing.
Big Blue sees a role for Spark in providing the backend for apps and Internet of Things appliances - supporting real-time analysis and predictions from big data.
IBM will also put more than 3,500 IBM researchers and developers to work on Spark-related projects at more than a dozen labs worldwide; donate its IBM SystemML machine learning technology to the Spark open source ecosystem; and help provide training for more than one million data scientists and data engineers on Spark. This training will be provided in partnership with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC.
Spark will also be used to power the insight platform for IBM's Watson Health Cloud - which IBM claims will deliver faster results to doctors and medical researchers when analysing population health data.
One of the organizations that will use the Spark service on Bluemix will be the SETI Institute, which is working with IBM and NASA to analyze terabytes of deep space radio signals using Spark's machine learning capabilities in a hunt for patterns suggest the existence of intelligent extraterrestrial life.
"With Spark as a Service on Bluemix, we'll be able to work with IBM to develop promising new ways to analyze signal data as we hunt for evidence of intelligence elsewhere in the cosmos," said Dr. Seth Shostak, senior astronomer and director of the Center for SETI Research.
IBM is one of four founding members of the UC Berkeley AMPLab, where Spark was first invented, and as a result works closely with AMPLab researchers on projects of mutual interest.
More on big data
- Apache Spark 1.4 adds R language and hardened machine-learning
- Is it blind faith or common sense that keeps CIOs loyal to the mainframe?
- MapR, Pentaho announce new releases at Hadoop Summit
- Mesosphere launches its Mesos-based datacenter OS plus a free version on AWS
- How big data gone bad could cost you your job
- MongoDB adds BI tie, gears up for growth