Confluent shows open source, paradigm shifts, cloud, and commercial success can all co-exist

Confluent just became a unicorn. We discuss why, what happens from now on, and how this is significant for the entire data ecosystem and the world at large with CEO Jay Kreps

Confluent, the vendor offering a commercial version and services for the open source Apache Kafka platform, just received $125 million in funding. This has launched Confluent into unicorn territory with a valuation of $2.5 billion. You probably know this by now, and if you've been following, you also know why and how Confluent got to this point. 

ZDNet has been keeping track of Kafka and Confluent's evolution, and the news were a good opportunity to catch up with Jay Kreps, Confluent CEO. Here is the lowdown on how Kafka will evolve from now on, the latest updates on the data streaming landscape, and last but not least, what this all means for the cloud and open-source software.

The story so far

Getting a funding of this magnitude, and from this kind of investors, is about more than money. It's mostly about an affirmation of the value proposition and the business model. Confluent must be doing something right with Kafka. Not bad at all for a company for which the common mantra was that it's not mainstream enough. Apparently, the prospects for its subscription based offering looked good to investors. 

Also: The AI chip unicorn that's about to revolutionize everything has computational Graph at its Core

What Kafka does is that it acts as the gateway for continuous, real time data processing. It supports processing data as a stream of events, rather than as discrete data points. This is a change of paradigm and enables building real-time applications. While the concept is not all new, as message buses for example have been around for a while, a couple of things have made Kafka stand out. 

graph-v4-600x403.png

Confluent's growth has been based on a subscription business model

The first thing is the scale at which Kafka can do this. The difference, according to Kreps, is that older systems were not able to handle the scale that Kafka can: "We can scale to trillions of messages. New style, cloud data systems are just better at this, such techniques did not exist before. We benefited as we came around a bit later."

The second thing is that Kafka does more than direct traffic, or messages, to the right recipient with speed and reliability. It can also apply transformations and business logic to message payload. This is crucial, as it adds a real-time processing layer. It essentially means that Kafka can act as the host for microservices-based applications, and this is something Kreps has mentioned as a strategic goal in the past.

Also: Data, crystal balls, looking glasses, and boiling frogs: Reviewing 2018, predicting 2019

Kreps affirmed that Kafka has been very successful with microservices use cases. He mentioned a survey on microservices adoption [PDF] that shows over 50 percent of companies considering microservice are adopting Kafka, as well a few examples on Kafka adoption as a Microservices platform, such as Capital One and Yelp. Kreps added that they are also focused on winning in the stream processing space, which means advances to things like KSQL.

Real time data processing: A paradigm shift, and a new ecosystem

KSQL is among Kafka's latest additions and greatest achievements. It is the ability to use SQL syntax for real-time processing, which makes things a lot easier. Interestingly, this is a part on which there is competition with other platforms. Architectures for real-time processing often have Kafka as the entry point, with other platforms picking up events downstream and applying further processing.

One of these platforms is Apache Samza, also stemming from LinkedIn like Kafka. Samza recently released version 1.0, also introducing SQL, as well as an update of its API, based on Apache Beam. Apache Beam is an open-source project that provides a unified API, allowing pipelines to be ported across execution engines including Samza, Spark, or Flink. Kafka does not support Beam, and we wondered whether this changes things somehow. Not really. 

Kreps sees Samza as more of a precursor to the stream processing features now in Kafka: "It was built and open sourced by my team when I was at LinkedIn. It still exists, as these open source projects always do, but Kafka's stream processing capabilities have taken that to the next level both in terms of the scope of adoption, the transactional capabilities, the flexibility, or the capabilities the API provides."

The other thing that just happened is that data Artisans, the vendor driving Apache Flink, one of the other platforms in the real-time data processing space, just got acquired by Alibaba for €90 million. Kreps said he was really happy for the folks at data Artisans to have found what seems to be a good exit and wishes them success at Alibaba:

"We don't see the stream processing space as being zero-sum. The vast majority of users of Flink use it with Kafka as the source of their stream. So whether people do their stream processing with Kafka's own stream processing capabilities or with an external processing layer like Flink on top of Kafka, we're excited to see people adopt and build this kind of event streaming platform," Kreps said.

On cloud number 9

Indeed, this is not a zero-sum game, and each platform can have its own place in technical architectures. But what is perhaps the most interesting aspect of this is the antagonism between open source platform providers and cloud vendors. data Artisans has been acquired by a cloud vendor, and we've seen why this made sense for both sides. But as Confluent wants to keep growing in the cloud, it finds itself in the middle of this conflict. 

Also: Alibaba Blinks: Building an open source, data-driven cloud empire in real-time

Kreps noted that hybrid on-prem and cloud deployments are an enterprise reality, so they will continue to meet customers where they are: 

"We're continuing to drive our cloud strategy forward and for customers in production, which means additions like management and monitoring tools.

We want to provide a streaming data service to support all customers on their journey to an event-driven enterprise. Providing a product both on-prem and in the cloud is essential to serving a modern company that is operating in hybrid and globally distributed environments."

hybrid-cloud-comparison.png

Hybrid cloud and multi cloud strategies are the new normal, and database workloads are gravitating towards them. Image: Tom's IT Pro

This is a reality for every data platform vendor, not just for Confluent. Kubernetes promises to become the de facto operating system for the multi and hybrid cloud, and Kafka has taken steps to support it. We asked Kreps what he sees next, and whether we can expect for example Confluent to join the Cloud Native Foundation. Kreps said that Kafka coupled with Kubernetes can make streaming data ubiquitous.

But, recently, AWS announced a managed version on Kafka, and Confluent joined the ranks of open-source vendors which made changes to their licenses to make sure this does not happen. We asked Kreps to what degree this was part of their own strategy, and to what degree has it been implicitly "dictated" by the need to present a viable business model to investors. How has the community reacted? What about cloud vendors?

Open source in the real world

Kreps noted that Kafka's license actually hasn't changed. What has changed, he said, is the license for the components they produce around Kafka. In the blog post that made this change public, Kreps defended this decision against two potential allegations, diametrically opposed to each other: that Confluent is doing this to save its faltering business, or to ensure more profits. Kreps once more asserted business is going really well.

Also: Beyond experts: Jobs, tasks, and skills for a data driven Future of Work

However, he went on to add, they think the right way to build fundamental infrastructure layers is with open code: 

"As workloads move to the cloud we need a mechanism for preserving that freedom while also enabling a cycle of investment, and this is our motivation for the licensing change.

There is no standardized, 'limited purpose' license yet. We went with a solution that grants most of the same rights as does Apache 2.0. We defined our excluded purpose as narrowly as possible to accomplish our purpose. If a standard solution in the space emerges, we're open to considering that."

So, is this a legit and justifiable move? Is this still open source? Kreps argued that there is open-source software that does not need this, as loose contributions and organizational structure is enough. But the kind of software Confluent is building is really hard and requires an organization to support it, and the organization requires a viable business model. This makes sense to us. The Open Source Foundation does not seem to agree, but this may be a narrow-minded view

Jaron Lanier has made the case that the "everything for free" line of thought that has been prevalent among people behind the establishment of the early internet has led to the kind of twisted ad-based business models and oligopolies in turn behind many of the issues we are dealing with now. Making the same mistake again with the cloud would not be wise.

Since there's no such thing as a free lunch, the more rational, fair and clear business models are, the better for everyone in the long run. The clash of interests between open source commercial platforms and cloud providers seems like a case of (mostly) intellectual property versus (mostly) material resources and capital. 

Also: Start the reskilling revolution without me: Future of Work trends and soft data on soft skills

Open source vendors like Confluent have shown that not only it is possible to build a better product utilizing on open source model, but you can also build a viable business based on this. Cloud providers who contribute nothing to open source, and yet seek to profit from it by repackaging it as a service which directly competes with services offered by its creators should not be allowed to do this.

Failure to realize this undermines the interests of everyone who contributes to open source. Having viable open source business models is essential for the proliferation of open source. The Open Source Foundation should get over its dogma, and work with open source vendors and contributors to come up with a workable solution.

Previous and related coverage:

Pretty low level, pretty big deal: Apache Kafka and Confluent Open Source go mainstream

Apache Kafka is great and all, but it's an early adopter thing, goes the conventional wisdom. Jay Kreps, Kafka co-creator and Confluent CEO, digresses. Mainstream adoption is happening, and it's happening now, he says, while also commenting on latest industry trends.

Confluent release adds enterprise, developer, IoT savvy to Apache Kafka

Confluent, the company founded by the creators of streaming data platform Apache Kafka, is announcing a new release today. Confluent Platform 5.0, based on yesterday's release of open source Kafka 2.0, adds enterprise security, new disaster recovery capabilities, lots of developer features, and important IoT support.