DataTorrent: Hard code around streaming data philosophy in 90 days

We give you streaming data applications in 90 days, no matter what. And we'll do anything, including building on our closest competitor's engine, to deliver.
Written by George Anadiotis, Contributor

The age of cloud data center dominance is here

We give you streaming data applications in 90 days, no matter what. And we'll do anything, including building on our closest competitor's engine, to deliver.

This is not something we are used to hearing, coming from Big Data vendor CEOs. But it's the essence of what DataTorrent CEO Guy Churchward told ZDNet, on the occasion of the new version DataTorrent is releasing today.

DataTorrent RTS is the commercial incarnation of the Apache Apex streaming engine. As is typically the case with Apache open source projects, there is a vendor that guides the development of the project, commits resources, and offers a commercial, hardened version of the software with added components and services, support, SLAs and the like.

DataTorrent is that vendor in the case of Apache Apex. Or at least that's what we thought. After talking to Churchward, we are not so sure anymore. DataTorrent was first covered for ZDNet by co-contributor Tony Baer a year ago, and the emphasis on delivering applications was already noted. But the release of DataTorrent RTS 3.10 today marks an interesting twist.

It's a philosophy or ethos, but there's hard code around it too

Key new features as marked by DataTorrent include extended support for SQL and analytics via integration with Druid, more machine learning and AI capabilities via Python and PMML, Complex Event Processing rule support via integration with Drools, and Store and Replay to record and replay data from a point in time.

These are all interesting features, but for the most part not unique. In a market as crowded as streaming, real-time data engines however, what is DataTorrent's unique value proposition? This is what drove the conversation with Churchward.

Apparently it's all about Apoxi and AppFactory. DataTorrent describes Apoxi as a framework that binds components together to create optimized, pre-built applications. Apoxi can also integrate independent applications to allow them to operate as one, and AppFactory is where you go to get applications.

For DataTorrent everything, features and value, is built on and around Apoxi and AppFactory. New additions to the AppFactory include applications for omni-channel payment fraud prevention, online account takeover prevention, and a retail recommender. Churchward describes all these as production ready applications.


The new DataTorrent is all about Apoxi. Image: DataTorrent

In trying to explain the thinking behind this, Churchward, an industry veteran, referred to his experience to provide a typology of organizations and their approach to getting value out of big data projects.

Churchward classifies organizations in 3 groups: the ones that rely on engineers, the ones that trust generalists, and the ones that buy off the shelf solutions. Churchward says there is something missing in every one of these approaches: clearly defined goals, familiarity with specialized tools and flexibility respectively.

That may be so, but what's Apoxi got to do with it? Churchward says that DataTorrent was suffering from a common syndrome, and Apoxi solves this:

"People who work on open source care about components. They believe they are applications, but they are not. They are not an outcome for a customer. Our job is to stitch together these components into outcomes. This is what Apoxi is about. It's a philosophy or ethos, but there's hard code around it too."

Our job is to provide applications, not components

Apoxi includes things such as a message/service bus, connection endpoints, service and schema repositories, UI and Web Services, fault tolerance, high availability and hybrid cloud support, staging ability and backward compatibility.

"We take components, scale them, and make them bulletproof. This is what matters to deliver value, it's our unique IP. We are not a component provider, we are an outcome provider, and this is disruptive technology and a disruptive business model. We guarantee to deliver outcomes in 90 days" says Churchward.

That all sounds good, too good almost. It also sounds like DataTorrent is building an ecosystem like, say, a Hadoop distribution. Maybe because that's what it is doing, according to Churchward. Churchward goes on to add they will stop at nothing, including dumping Apex if they have to. Confused? We were too.


DataTorrent has its own stack, and it includes some solutions competitive to Apex such as Spark or Kafka. Apparently DataTorrent would not mind dumping Apex altogether. Image: DataTorrent

Apex is the core of DataTorrent after all, what makes it what it is. A streaming engine for handling "real" realtime data, as in not doing micro-batching like Spark, but built from the ground up to work on streaming. In that respect, its closest competitor would be Apache Flink, with which it shares a number of features. Churchward however begs to differ:

"From a technology standpoint, our closest competitor is Flink. But going forward, we see ourselves more in the space of Hortonworks or Cloudera. We have a co-opetition relationship with them, as we need something to deploy on (think HDFS).

We think Apex is the best, but we are not tied to it. If Apex did not turn out to be the standard way to do streaming, and Flink was, we would use Flink. Our job is to provide applications, not components".

We are not an Apex distro

If the signs we have are anything to go by, that's not just talk. DataTorrent already has done this at least twice, for SQL and storage. After developing its own solutions there, it has given them up in favor of a 3rd party solution, which in both cases happens to be Druid.

Druid provides SQL support in the same way that Apex used to do - leveraging another Apache project, namely Calcite. Churchward says they could have gone with Druid from the start but waited until they were sure they would productize and harden Druid. Once they concluded that Druid is best of breed, they fully invested in it.

Same story for HDHT, DataTorrent custom DHT over HDFS. HDHT was used as back end storage for dimensional compute and that's not needed anymore according to Churchward. So, HDHT was deprecated in favor of Druid.


DataTorrent architecture. Image: DataTorrent

Druid is somewhat orphaned, after its initiator MetaMarkets was acquired. Will DataTorrent step up for Druid, after having invested in it? "We intend to step up in Druid, but not as Druid-distro. We are not a distro of any technology. We are not an Apex-distro shop. We sell Apoxi and AppFactory."

Willing and able, but let's wait and see

What about Apache Beam then? Beam is the closest thing to a standard in the streaming world. It's an API that aims to provide an abstraction over any streaming engine. Beam seems to be at a stalemate though.

Only Google (its creator), Flink and Apex support it officially at this point. When asked if this was part of DataTorrent strategy to make transition for users of other platforms easier, Churchward said most of DataTorrent's customers are those who faced failure with current options.

He added that although they commit resources to Beam regularly, customers are not asking for this - they are more intent to get to production in a timely manner: "We don't see much traction, we want to see where this is going. We're willing and able, but also in a wait and see mode".

Churchward had a similar answer to offer when asked whether DataTorrent will be offering a managed cloud version to follow suit with the competition:

"It's an interesting model, it may be working for DataBricks, but it's not for us. Our clients mostly want to keep their apps and data on premise. We have full support for AWS and Azure right now. We will support Google Cloud when we see demand from our customers".

So what to make of DataTorrent? Its solution has technical merits. This, coupled with the fact that it was founded by Yahoo alumni, can account for having clients such as GE. But DataTorrent operates in a crowded market, and this change of course could be make or break, as it's actually looking to move away from this market.

DataTorrent's stance seems to resemble more a service provider than a software vendor. Will that scale, and will DataTorrent be able to execute and differentiate, going up against the likes of Hadoop vendors? Will the software and the people deliver on the promise? They may be willing and able, but we'll have to wait and see.

Editorial standards