Apache Kafka, the open source distributed messaging system, has steadily carved a foothold as the de facto real-time standard for brokering messages in scale out environments. Confluent, the company whose founders created Kafka, has just released their third annual report on implementation. The report reached a much bigger sample, hinting at growth, while showing some modest changes in how Kafka is being used.
Gartner analyst Merv Adrian's point that in Silicon Valley, if an idea is on a whiteboard, it must be commonplace could apply to Kafka. There are alternatives: MapR Streams allows you to broker messages without requiring a separate Kafka cluster, while streaming services such as Amazon Kinesis Firehose offer similar capabilities. Nonetheless, Kafka has become the de facto standard for highly distributed, high volume, real-time message queuing with wide vendor support. But when we reviewed Kafka a year ago, we found that the tooling was still primitive.
So it shouldn't be surprising that take-up is still largely the domain of early adopters. The survey sample, which doubled this year to 600 respondents, clearly skewed toward organizations that are ahead of the curve. Case in point? 78% of them are already using microservices architectures, and 63% of them are using Kafka to manage state with those microservices. In the general population, you won't find a majority of enterprises redesigning their application stacks to expose functionality as microservices.
So it shouldn't be surprising that the most represented sectors in the sample were the usual suspects for early adopters: computer systems, financial services, and media and entertainment.
Nonetheless, the data provides a useful glimpse on where first-generation Kafka implementation is headed. While 30% of the sample was in the lowest volume tier (below a million messages daily), significantly, an almost equal proportion reported handling up to 99 million messages. Among early adopters, a sizable portion is putting Kafka to the stress test.
Over 60% are using Kafka to replace legacy messaging and PubSub systems, while just under half are using it to transform ETL from a batch to a real-time process.
Since last year, there have been some changes in how Kafka is used. While data pipelines proved most popular use this year, last year it was all about streaming. Microservices event processing made a new appearance this year, but was not on last year's survey, so that could be a freak of sampling. But there were some similarities between this year and last: in both years, half reported using Kafka for messaging, with streaming and data integration close behind.
When it comes to taking advantage of Kafka's streaming capabilities through the streaming API, asynchronous applications and ETL were the top uses; but as a real-time process, it was surprising that barely over 10% of the sample were using streams with IoT.
As an integration framework, just under half the respondents used Kafka Connect to integrate with Elasticsearch; behind it, about 25% - 30% of respondents reported connecting to PostgreSQL, HDFS, Amazon S3, and Cassandra.
So what types of use cases were most popular? Not surprisingly, they differed by industry. For e-commerce, media, and entertainment sectors, recommendation engines were the most common use of Kafka, while computer hardware and software firms were more likely to apply it to security and fraud detection. The one surprise here was that in financial services, another early hotbed for Kafka use, security and fraud did not stand out; instead, the most common use was with "financial data" use cases for which the most obvious would be real-time ticker feeds.
Reinforcing the point that this is still early days, over three quarters of respondents noted that Kafka skills are hard to find. In spite of Kafka enjoying broad industry support, there remain hurdles to getting Kafka off the whiteboard and into production.