DataTorrent aims to make streaming into an application
With the deluge of IoT data, streaming analytics is getting impossible to ignore. For streaming to break beyond early adopters, it has to become less of a do-it-yourself homegrown software development project.
There's no shortage of use cases requiring real-time do-or-die decisions. Location-based marketing, border security, smart grid optimization, cybersecurity, and adtech auctions are a few of the many processes that rely on decisions made in the here and now. And with the explosion of IoT data come even more compelling use cases for taking action on data in the here and now.
Since our days covering middleware, we've been hearing that streaming or event processing is on the cusp. Sure, complex event processing was a technology looking for a solution. But as bandwidth, commodity hardware and, declining memory prices made it thinkable for Big Data to go real time, we started drinking the Kool-Aid. In 2012 we stated, "Fast Data, used in large enterprises for highly specialized needs, has become more affordable and available to the mainstream. Just when corporations absolutely need it." And then in 2015: "Real-time streaming, machine learning, and search will become the most popular emerging workloads."
But in the meantime, streaming did not exactly take the world by storm. Instead, at Ovum we've been getting a mouthful from our enterprise clients over machine learning and cloud deployment. Few clients are asking us about streaming.
It's not that machine learning or cloud have been immune from hype. There's certainly been no lack of that to go around. Yes, there have been plenty is headlines about whether AI and self-aware robots are going to take away our jobs. Beyond the hype, however, the results from machine learning and cloud are already tangible. Machine learning already powers a growing array of consumer services and analytic tools, while cloud deployment with major providers continues to spikeupward.
The challenge with streaming is that it's still hard for mere mortal organizations to implement, and for line of business people to understand. It's not because of a shortage of streaming engines, related tooling, or potential use cases. It's that streaming is still largely a custom development task. There is no such product as a streaming analytics for retail or network optimization product. Virtually every adopter must reinvent the perpetual motion wheel.
Yet that hasn't slowed the proliferation of streaming engines that are now crowding a landscape that extends from the classic complex event processing engines to streaming, data flow management, and message queuing. So, not only is the technology raw, but there's a bewildering array of options to choose from.
DataTorrent, one of many aspiring players, open sourced its technology as the Apache Apex project just over a year ago. As technology, Apex has differentiated in being sort of a middle ground: unlike Spark Streaming, which handles events in micro-batches, Apex is a true streaming engine, capable of handling individual events at a time. But the creators of Apex claim it works with less overhead and more flexibility than Storm, which tracks the state of every single event. And unique among streaming engines, Apex was engineered specifically for Hadoop with YARN support built in, rather than added on.
DataTorrent's commercial product, the RTS platform, provides a visual development environment for configuring streams and piecing together streaming applications. And while Apex as open source streaming engine might not have the visibility of Spark Streaming, DataTorrent boasts a number of prominent logo customers like GE (which used Apex for its Predix IoT analytics platform) and Capital One. Operating largely under the radar, DataTorrent realized a doubling of business last year -- of course, that's coming from a very early stage company were the multiples should be steep.
Indirectly, as one of the outgrowths of Dell's EMC acquisition, DataTorrent has a new management team, with the CEO and SVP of marketing. Incoming CEO Guy Churchward, who previously headed EMC's storage division, is realistic that 2017 won't be "the year" that streaming analytics breaks out. Churchward sees 2017 as a building year. There are a couple related challenges. First is broadening the Apex open source communities where for now, about three quarters of the committers are from DataTorrent. The company reports that the project has grown to 50,000 "members." But that is not the same as committers and obviously does not represent actual production.
Related to the growth of the community is harvesting ideas hat come from it -- and developing a critical mass of applications or jumpstart templates that allow enterprises to get value quicker compared to the current profusion of raw technologies and toolkits. Right now, there are some templates over data preparation and de-duplication.
It's been pretty easy to grow jaded over streaming. Yet, given the explosion of IoT devices, it's going to be a technology that will be hard to ignore. If streaming finally gets packaged as applications, it will become impossible to ignore.