Elasticsearch 6.0: not that new, but quite improved

Elasticsearch is an open source platform that is about a lot more than just search. It entails a whole stack of solutions and is growing rapidly. Today its new version is out, and its CEO discusses the way forward.

Shay Banon has been called a person who has written more code than what is humanly possible. This has led him from working on a solution for search in his spare time to building an open source framework and a global company around that with clients such as eBay and Verizon.

Elasticsearch has come a long way and Elastic is about a lot more than search. Today Elastic is announcing version 6.0 of what is now an entire stack built around the core premise of search, and Banon as the recently appointed CEO discussed with ZDNet the past, present, and future of Elasticsearch and the trends shaping the industry.

Elasticsearch as a system of record

Our conversation did not start with the new features of version 6.0. If you are part of the Elastic community, you may already know them. If not, you may not get very impressed at first sight. This is interesting in and by itself, but we thought it would be a good idea to shed some light on what Elasticsearch can and cannot do before embarking on a detailed discussion on new features.

When Banon started working on Elasticsearch, it was all about storing JSON and having a powerful search language. That was 8 years ago, and as he notes "NoSQL was all the rage. To me Elasticsearch was something I was passionate about, so I did not want to be part of any hype cycle. That would be shifting the focus from the value Elasticsearch can bring as a very powerful search solution.

People have been asking -- can I replace my MongoDB, or my Oracle database with Elasticsearch? Can it work as a system of record? My answer has always been, if you place Elasticsearch next to any of these systems, be it Cassandra or Hadoop or whatever, it will bring value. It has this angle of how to solve challenges under a search prism that no other system has. But our goal is not to replace these systems."

This "system of record" discussion has been an ongoing one regarding Elasticsearch. In previous versions there has been work in the context of the Jepsen project that revealed conditions under which data loss in Elasticsearch may occur. Even today, Kyle Kingsbury, Jepsen's mastermind, says "I would not use this as a system of record, so you would put your data in S3 or Postgres and have a replication tool so that it reiterates the data."


Not every database is entirely reliable. In fact, most are not. Elasticsearch is not even a database in the strict sense, so it should not be used or judged as one. Image: Jepsen / Kyle Kingsbury

Banon seems to agree with that in his way. He acknowledges Kingsbury's contribution in pointing out deficiencies in Elasticsearch sharding and says they have worked with him in trying to address them, and this work has made lots of progress and is openly documented. And if someone wants to use Elasticsearch as a core system to store financial transactions on, Banon would not advise them to do this.

In the end, Banon concedes, Elasticsearch has not been around for as long as the Oracles of the world have, and this means it's by definition less mature. Of course, as he notes, if your data on Elasticsearch gets lost or corrupted, this makes for a bad user experience, so they are working on resilience.

For Banon however resilience is not all about distributed algorithms and sharding, but also about things such as stability and memory footprint: "if you end up writing to a system that causes your runtime to pause, it's indistinguishable from a network partition. We have invested heavily in this area and there are many improvements in 6.0."

Elasticsearch 6.0

One such improvement Banon highlights is based on something called sequence IDs. It's the ability to have consensus on the sequence of operations between a primary and a replica shard. Banon says this greatly improves the ability to maintain coherency between data, and helps address a gap Elasticsearch has had historically.

Another area that Banon highlights is what he calls circuit breakers. This is about improving detection of requests that end up consuming lots of resources so they can be isolated without bringing down a cluster. He says a lot of work has gone in the ability to track and stop queries when needed, as well as working with Java off-heap memory techniques and structures. As a result, memory footprint today is much smaller than it used to be.

Many other improvements are in that category as well - things that require expert knowledge not just to implement, but also to comprehend and evaluate the impact of. Features like index sorting, which end up trading time in indexing documents, can significantly boost query time performance. Another feature, sparse doc values, changes the way sparsely populated fields are stored, resulting in between 30 percent and 70 percent of savings in storage space..


Index sorting is a new feature in Elasticsearch 6.0. It takes some engineering skills and a long blog post to get it, but the end result is better performance. Image: Elasticsearch

In the end, if you don't spend the time to dig into these new features, there's a good chance you may remain unimpressed by Elasticsearch 6.0. Even though Banon says they see the new version as something that has been progressively been shared with and explained to the community via a series of blog posts, he acknowledges the fact that not everyone will necessarily have the time and energy for that.

For the record, other new features in Elasticsearch 6.0 are spread out across the Elastic stack, which is comprised of Kibana, Beats and Logstash. These are Elasticsearch's solutions for visualization and dashboards, data ingestion and log storage respectively. The Elastic stack is complemented by X-Pack, a premium set of features that include things like graph visualization and anomaly detection via machine learning.

Listening to users, charging to the future

Elasticsearch started as a modest solution centered around making Lucene, the open source framework for indexing and search which is heavily used to this day, usable for efficient search on JSON. Discussing with Banon the progression that has led to where Elasticsearch is today, it becomes clear that what he sees as the key to Elasticsearch's success is also the reason you may remain unimpressed with the new features.

For Banon it's always been about connecting with and listening to the community. "One of the things i've learned about building a successful open source company is that you need to be a good listener", he says. "After releasing core Elasticsearch, it was clear that people wanted to have visualization and dashboards on top of that. So we brought Kibana in-house and made it part of out stack.

When I started working on Elasticsearch, i never imagined one day storing logs would be part of it. But people started doing that, and today we are the number one open source solution and in fact a system of record for this. People are happy with that, our solution works much better than Splunk for example."


Elasticsearch is open source and treasures the relationship with its community. But not everyone in is entirely happy about every aspect of it. Image: Logz.io

This has been pretty much the story of how Elasticsearch has grown, and will apparently continue to be. Banon does not believe in going away and coming back with radically new things that may be asking people to bet on them, but rather in taking progressive steps. Elasticsearch has embraced things such as the cloud, or machine learning, but is not going all-in on them either.

When discussing the move to the cloud, Banon says Elasticsearch was designed to work with AWS from the start, and this has contributed a great deal to its success. Today Elasticsearch also runs on Azure and Google cloud, with which there also is a partnership, as well as with Alibaba cloud. There are not many enterprise software providers that are big in China like that, and Banon sees this as validation of the strategy.

Still, he emphasizes that for them it's all about empowering users: "when we made the move to offer a managed version in the cloud 3,5 years ago, it was not to force our users, but rather to be there for them. They can run Elasticsearch on whatever cloud they want, or use our managed version, or run on premise. We don't want to leave anyone behind, and with Elastic Cloud Enterprise we run the same code our users run".

As for the move to IPaaS platforms and machine learning (ML), Banon says IPaaS is very much in accord with what they do. The progression towards analytics is happening also in Elasticsearch, with the recent acquisition of Prelert's ML technology having been incorporated in the stack. Initially this is used for anomaly detection, and Banon says it's already seeing great adoption and the next step is to add forecasting capabilities.

Other areas that Elasticsearch will target next are application performance monitoring and fraud detection, security analytics and taking visualization up a notch. This is clearly moving up the stack to domain-specific applications, which may provide a new set of challenges as Elasticsearch will have to compete against incumbents. Banon however believes in the strategy that has been paying off so far:

"Five years ago we were a small company with a relatively popular open source product, and look where we are today. The way we did it is that we embrace users and listen to them, and make sure that when they innovate on top of our platform, this will soon find its way to the platform. If we as a company behave in the same way that we have been, I have no worries."