Partnerships are nothing new in the analytics world, and neither are integrations between technologies. But this week has seen a a couple of announcements that fall into the partnership/integration category and, this time, it's all about AI. Aerospike announced interesting integrations with two popular Apache Software Foundation open source data analytics technologies and ThoughtSpot is announcing integration with Alteryx. The latter integration ties in handily with another one Alteryx announced last month. And all of these integrations are AI-relevant.
Aerospike sparks kafkaesque integration
Let's start with Aerospike, whose eponymous product is an in-memory NoSQL database that can leverage flash memory as well as RAM. The company announced on Tuesday the releases of Aerospike Connect for Spark and Aerospike Connect for Kafka, which connect to Apache Spark and Kafka, respectively. Of course, connectivity to those two open source technologies is fairly common, but there's more to it than that.
First off, the Spark integration is pretty cool...this isn't just an important-export bridge...it's something that lets Spark developers query Aerospike and get the results back as a Spark DataFrame. From there, almost any Spark operation on the data is possible. On the Kafka side, meanwhile, things are nicely bidirectional -- so not only can data streaming off Kafka topics come into Aerospike (that support was already there), but now data in Aerospike can stream into a Kafka topic as it changes. As explained to me by Srini Srinivasan, Aerospike's Chief Product Officer and Founder, the combination of these integrations brings three benefits:
- By leveraging Aerospike as the main data store, and bringing in an analysis-specific subset of data, Spark users avoid maxing out the RAM on the Spark cluster. Since Aerospike leverages flash, it can have a much larger memory capacity, overall. The integration balances things out.
- By getting Aerospike data into Spark, the latter's MLlib component can be leveraged to build machine learning models on that data
- By using the combination of Aerospike, Kafka and Spark Streaming, those ML models can be kept up-to-date and retrained as the underlying data changes.
Aerospike also announced a new Aerospike REST Client, to be released in April, that will augment its current language-specific software developer kits (SDKs) for developer connectivity.
ThoughtSpot and Alteryx let you search for AI
Moving on, ThoughtSpot is today announcing a partnership and integration with data prep/data pipeline specialist Alteryx that mashes up ThoughtSpot's search-based analytics with Alteryx's ability to build machine learning (ML) models. The new integration allows Alteryx users to add native ThoughtSpot Bulk Loader connections and ThoughtSpot TQL statements directly into an Alteryx workflow. As a result, a search-based query can trigger the scoring of data that's in ThoughtSpot against an Alteryx ML model (which itself is built utilizing R or Python/scikit-learn, behind the scenes). In a call-and-response fashion, the resulting predicted value(s) will come back from Alteryx and can be visualized in ThoughtSpot, automatically.
That may sound a little Rube Goldberg and, granted, I have not had this integration demoed for me. But the ability to pipe a result set out of ThoughtSpot and into an Alteryx workflow, then get the scoring data set back in, seems reasonable. Meanwhile, the search-based interface is already ThoughtSpot's primary paradigm.
And the plot thickens...
Quite coincidentally to ThoughtSpot's announcement, I had a briefing this week with Ashley Kramer, Alteryx's SVP of Product Management, and the discussion was specifically focused on Alteryx's ML capabilities (rather than the data prep and pipelining capabilities for which it is perhaps best known). What I learned was pretty intriguing; and combined with ThoughtSpot's news, it's more interesting still.
Also read: Alteryx expands product set, makes data science acquisition
Also read: Domo, Alteryx and Absolutdata take machine learning to business users
Also read: Alteryx Promote delivers AI/machine learning model deployment, management and integration
Here's the gist: to complement its native ML capabilities, Alteryx last month announced a partnership with H20.ai that allows Alteryx to use H20's "Driverless AI" AutoML feature. In further synchronicity, I happened to have written about AutoML earlier this week, so it's all starting to make sense.
Also read: AutoML is democratizing and improving AI
Put the Alteryx integrations all together, and here's what you get: non-data scientists can use the combination of Alteryx and H20 Driverless AI to build machine learning models, with the feature selection, algorithm selection and hyperparameter tuning performed on an automated basis. Said models can then be brought into Alteryx and could theoretically (I haven't confirmed it) be used to score data from ThoughtSpot via search-based query, with the prediction data set streaming back to that platform to be visualized automatically.
Coordinating the interaction of these three products likely has some complexity and maybe a couple of gotchas involved, but even just as a proof of concept, it's impressive. ML model design and training, as well as query, scoring and visualization, all available without coding and without needing data science expertise. I imagine things will simplify in the future, but the fact that all these dots can connect, today, is very exciting.
The analytics bone's connected to the AI bone
All of these integrations, and all of these vendors (Aerospike, ThoughtSpot, Alteryx and H20.ai) are effectively endorsing the notion that the holy grail of data analytics is AI, and the construction and deployment of ML models. They are taking concrete measures to make integrated, code-less AI a reality, adding automation wherever possible, making things scale and, in Aerospike's case, keeping an eye on continuous integration of data to keep models up-to-date and accurate.
Again, there's likely a lot of assembly required to get this whole streaming data/in-memory analytics/AI pipeline working properly today, but these sorts of partnerships and integrations are usually a necessary first step before more integrated platforms become available, often from the same vendors.