Hazelcast open source in-memory data grid secures $21.5 million funding, expands platform to real-time streaming data

The streaming data pie is growing, and Hazelcast wants a piece of it. It's got some way to go, funding helps, but is it enough?

How Google and Amazon capture IoT data from their hubs Tonya Hall talks to Tom Hulsebosch, senior managing director at West Monroe Partners, about how Google and Amazon are able to capture IoT data from their hubs.

Object caching. In a way, that's where it all began for Hazelcast. Back in the day, Ehcache was one of the first solutions to popularize the use of caching in Java. Ehcache was founded by Greg Luck, who later went on to serve as CEO and CTO for Hazelcast. 

Conceptually, caches and in-memory data grids are very close anyway: it's all about using fast memory to speed up access to data residing in slow(er) storage systems. Doing caching efficiently is a hard problem, and Luck is among the leading experts in the field. About a year ago, however, Luck stepped down from his role as Hazelcast CEO and took over the CTO role, while Kelly Herrell became CEO

This move was part of a bigger plan for Hazelcast. Today, Hazelcast announced it has closed a $21.5 million funding round, by both new and existing investors, and ZDNet discussed the plan with Herrell.

Streaming data platforms for real-time applications

Herrell has helped build four successful Silicon Valley companies over the past 20 years. He said that what attracted him to his new role was the size and dynamics of Hazelcast's global community and customers, even though he was not familiar with in-memory platforms up to that point.

Hazelcast quotes Gartner, according to which "Digital business model imperatives are demanding cost-effective support for real-time analytics, hyperscale architectures and fast access to data demanded by digital business models, which in turn drives fast growth for most in-memory computing (IMC)-enabling technologies and IMC-enabled architectures."

As far as we're concerned, that's preaching to the choir -- we've long ago identified in-memory processing as a key ingredient of modern data platforms for building real-time applications. In fact, some of today's most successful platforms do that already. So the real questions are, what does Hazelcast bring to the table compared to those, and how will the funding make a difference. 

Herrell mentioned that Hazelcast's total funding from 2008 until today ($16,1 million) is atypical for an industry which is growing at a rapid pace, but they are set on changing that. Today's funding round of $21.5 million is lead by new investor C5 Capital, while existing investors Bain Capital Ventures, Earlybird Venture Capital and Capital One Growth Ventures also participated in the round.

The goal is to accelerate Hazelcast's product roadmap and bolster go-to-market capabilities. Herrell also said that the company has grown its headcount by 33% since he joined in 2018, and they expect to reach 130 by end of 2019. Hazelcast, once a single-product company with IMDG (in-memory data grid), has introduced two more products in the last year: Hazelcast Jet and Hazelcast Cloud.

The idea with IMDG is pretty much the same as with a cache: a layer that sits on top of underlying storage, be it file-based such as HDFS or a database, and speeds up access. Hazelcast Jet is a streaming data platform which brings access to data in-flight, in addition to data at-rest accessed via IMDG, while Hazelcast Cloud is a managed version of IMDG.

System of Now

Hazelcast calls its offering System of Now™, and Herrell argued it's the only one that supports both data in-flight and data at-rest with a common API. Our objection was that Databrics with its Apache Spark-based offering and Ververica (formerly data Artisans) with its Apache Flink-based offering, for example, could reasonably argue that they do this too, through their respective platforms.

So, how does Hazelcast Jet compare to Spark and Flink? Jet just went GA after three years of development, so it's less mature than both Spark and Flink. Furthermore, both Spark and Flink have more visibility and larger and more diverse contributor communities, while Jet is a one-company project. How does Hazelcast expect to onboard people to Jet then? Speed, standards, and ease of use.

Hazelcast has released a benchmark comparing Jet to Spark and Flink, according to which Jet shows better throughput. Benchmarks are typically controversial, and without getting into the specifics one thing we can note is that the benchmark uses older versions of all platforms. In any case, Hazelcast says they have made the source code available for anyone to have a go at this. 

dpftbsbxkaee-zl1.jpg

Hazelcast Jet is Hazelcast's streaming data platform

Herrell noted that Hazelcast is easier to use than both Spark and Flink, as they both have external dependencies, while with Jet all you need is a Jar file. Again, going into the specifics here requires a deep-dive, as there are efforts underway to make something similar possible for Flink, too, for example. In a way though, the answer to questions regarding operations increasingly is "outsource it".

Choose your platform, and use it as a managed service in the cloud, that is. Databrics has made this a key part of its strategy, and Ververica offers it, too. Hazelcast Cloud is just joining this game, and although Herrell said early signs of adoption are looking good, he also noted it's still not meant for production use. It currently targets AWS, with Azure and Google Cloud next in the roadmap.

To us, the plan to onboard people to Jet still looked kind of weak at this point. When discussing this with Herrell, however, he threw something else in the mix too: Apache Beam. Beam is a programming model for batch and streaming data processing jobs that run on any execution engine. Beam is already supported by Flink, and there are unofficial ports for Spark.

While Hazelcast's official FAQ states that Beam is not supported at this point, it leaves the possibility open for the future, and Herrell said it is in the roadmap. Apache Beam's site, on the other hand, mentions that Hazelcast Jet Runner can be used to execute Beam pipelines.  SQL support seems to be more complicated: while both Spark and Flink, as well as GridGain / Apache Ignite in-memory platforms, have it, Hazelcast does not. Herrell said there is no clearly defined roadmap for SQL support.

special feature

Sensor'd Enterprise: IoT, ML, and big data

The internet of things embeds intelligence into business processes to let us measure and manage the enterprise in ways that were never possible before.

Read More

Open source, cloud, and executing on the plan

But there's something more the Sparks and Flinks of the world have that Hazelcast does not, at this point: support for machine learning. Not that Hazelcast cannot be used for this, but it was not built for this, and there will be custom code involved for example to integrate the libraries needed for this. Herrell said there are customers doing this, however, and there will be more of it in the future.

Last but not least, Herrell touched upon the open source nature of Hazelcast, the relationship with its community and with cloud vendors, and adding machine learning to the mix. Hazelcast is open core, meaning there is an open source version of its products, with additional enterprise features available in proprietary extensions.

Hazelcast's community looks dominated by contributions made by Hazelcast, and Herrell confirmed this impression: the rest of the community are really users, not contributors. Herrell said this is normal, and due to the complex nature of Hazelcast's distributed platform. He went on to add that this works well for them, and Hazelcast listens to input received from the community. 

download-5.jpg

Hazelcast Jet API. Jet builds on Hazelcast IMDG.

As for the relationship with cloud vendors, now that Hazelcast is about to cross that bridge too, Herrell emphasized a few points. First, cloud vendors don't all have the same policies, with AWS obviously being the most aggressive among them. While Herrell noted he would happily take a deal similar to the one offered to other open source data vendors by Google, he sketched Hazelcast's take on AWS too.

According to Herrell, AWS's service is meant to address the basic needs of a broad market, and is not specialized. When it comes to in-memory caching, he went on to add, AWS can offer 3 nines availability, but not 5 nines, which is what Hazelcast does. AWS does not offer a managed version of Hazelcast at this point, but even if that happens, Hazelcast will remain consistent and won't change its license.

Overall, getting that funding and expanding its product line (and ambitions) seems like a reasonable move for Hazelcast. They have, after all, been doing in-memory processing longer than most of the competition. If anything, we would argue Hazelcast probably should have made that move earlier.

At this point, it seems Hazelcast has some catching up to do. While its funding round will help, and it already has a substantial clientèle to show for, executing fast and broadening its mindshare beyond its traditional market will determine how this will play out.

NOTE: Article was updated on June 20 2019 to include clarification on the status of Jet's support for Apache Beam.