Today is Day 2 of the three-day Spark Summit event in San Francisco. As I reported yesterday, MapR and Microsoft have already made Spark distribution-related announcements timed for the event. Today, it's IBM's turn, as the company has announced a new Spark development environment. And, going back to yesterday, there were Spark connector announcements from Couchbase and Snowflake Computing that I wasn't able to cover.
Let's look at all three announcements now.
IBM, Spark and R
IBM, who, you may recall, made a splashy announcement, around a $300M investment in Spark support, at least year's Spark Summit, today announced a major software deliverable from that initiative.
That deliverable is a cloud-based development environment for Spark permitting developers to author code in R. The product, dubbed "Data Science Experience," combines a Jupyter notebook interface with RStudio, H20 and access to 250 curated data sets, according to the company.
In building Data Science Experience, IBM made contributions to the SparkR open source project as well as to the Spark SQL and MLlib components of the Apache Spark project. IBM says it has also made contributions to the PySpark project, among others.
Big Blue vs. Redmond
With respect to Microsoft's Spark announcement from yesterday, an IBM public relations representative told me that while Microsoft's Jupyter notebook implementation processes code in Python and Scala, IBM is allowing data scientists to use whatever language they prefer, by bringing R into the mix.
Fair enough, but given Microsoft's acquisition of Revolution Analytics and its announcements yesterday around R Server for HDInsight and Hadoop now being powered by Spark, I have a feeling we may see parity there soon.
Meanwhile, back in connector-land
NoSQL database vendor Couchbase and cloud data warehouse player Snowflake Computing, each took the opening day of Spark Summit yesterday to announce their new Spark connectors. The Couchbase Spark Connector and the Snowflake Data Source for Spark each provide direct connectivity to Apache Spark for their respective databases.
Snowflake's product is a native connector, based on the Spark DataFrame API. The company mentions Streaming/IoT data ingestion, Complex ETL and Machine learning as applicable use cases for the connector. For its part, Couchbase lists real-time product recommendations, failure detection, network intrusion detection, fraud detection and "product and customer 360" as use case examples.
What seems clear is that both Couchbase and Snowflake see Spark connectivity as enabling streaming data scenarios on their respective platforms, making Spark a means to an important end. In IBM's case, R is the means to end of making Spark more accessible to data scientists/analytics specialists.
In all three vendor cases, as with MapR and Microsoft, slapping the Spark decal on their products is seen as good business. Right now, the orange star is outshining the yellow elephant.