A few months ago, Confluent, the chief commercial backer of Apache Kafka, founded by that technology's creators, launched its Project Metamorphosis effort to transform the event streaming technology. The challenge was formidable: take Kafka, which began life as a complex technology, run on customer-operated clusters of servers, and convert it to a cloud-based service that "just works."
Also read: Confluent makes Apache Kafka cloud-native
Confluent also organizes the annual Kafka Summit event, which kicks off today as a virtual conference. The company is using the occasion to announce the September release of Confluent Platform 6.0 and with it, a new phase in its cloudy Kafka efforts: Project Metamorphosis – Global. The new phase of the project includes a feature called Cluster Linking that allows customers to stand up multiple Kafka clusters, each potentially in different geographic regions, and federate (or "link") them with minimal effort. Cluster Linking will be available in preview in Confluent Platform 6.0 and in private preview on Confluent Cloud.
Cluster Linking will allow data to be replicated between clusters implicitly and will accommodate implementations where specific data can be assured of staying within specific geographic boundaries. And since it will be a feature in both Confluent Platform and Confluent Cloud, it will work across clusters in multiple public clouds as well as on-premises. In fact, generic clusters running Kafka 2.4 or higher, on non-Confluent distributions, can participate in Cluster Linking as well, albeit in a unidirectional, read-only capacity.
On the Confluent Cloud side, Metamorphosis features previously in preview, including self-balancing clusters, ksqlDB and infinite data retention, are now generally available (GA) across Google Cloud Platform, Amazon Web Services and Microsoft Azure.
Even Confluent, in its own press release, admits that "connecting Kafka clusters between different environments...is too complex for most organizations...[putting] hybrid cloud and multi-cloud Kafka deployments out of reach." Clearly that's not good enough. With any new technology, while raw innovation grabs everyone's attention at first, it eventually becomes clear that complexity shouldn't be the customer's problem, and that the platform should make the hard things easy.
Since Kafka itself is used to integrate data between systems, it can be frustrating that replicating data between Kafka clusters themselves can be difficult. Confluent's solution with Cluster Linking takes that to heart: it uses a Kafka broker protocol to allow two or more Kafka clusters to replicate data without the need for other components. It's kind of the technology equivalent of practicing what one preaches.
Under our socially distanced reality, online systems are more in-demand than ever, and they must share data seamlessly, even while respecting data sovereignty restrictions. Under these requirements, features like Cluster Linking are critical. While Metamorphosis may have been Confluent's effort to make Kafka cloud-native, irrespective of COVID-19, the pandemic has certainly validated the wisdom in embarking on that project in the first place.
Along with the Infinite Storage capability announced previously, Cluster Linking may well position Kafka to act as a system of record for event data overall, rather than just a service bus for processing that data.