Today, at a dedicated online event for Azure Data Explorer (ADX), Microsoft is announcing numerous enhancements to the service, including a next-gen release of the underlying engine and an array of integration points that should make it more accessible, more enticing and more useful. ADX, which is most often used for telemetry data lake workloads and analytical solutions as a service, will now run even faster overall than it had, will have numerous optimizations and will connect with a variety of other data services, streaming data sources and data visualization solutions. This will help a service that's been very successful but not especially well-known, even among Azure analytics experts, achieve more mainstream appeal.
Performance gains galore
What new features are coming to ADX? To start with, Microsoft's introducing a new version of the core engine (in preview, with GA expected in February), which takes a wholly different strategy to querying data. The Kusto v3 engine will generate multiple versions of the desired query, use the fastest, and compile it to native code before executing, so that it runs at maximum speed. The indexing layer in the v3 engine has also been rewritten. As a result of these changes, Microsoft says queries will run between 2x and 30x faster.
And beyond this raw performance gain, ADX will now offer self-refreshing materialized views, query result set caching and configurable sharding/partitioning. Near real time scoring with machine learning models -- including those hosted on Azure Machine Learning as well as those from other platforms, packaged in ONNX format -- is being added as well. Fast Fourier Transforms, geospatial joins and polynomial regression are onboarding too. ADX is also getting row-level security capabilities that will make it more appealing to customers who want to support a wide range of users, some of whom may not merit unfettered access to all data.
On the data integration side, ADX now sports an adapter for Apache Kafka that is Gold Certified by Confluent, the company founded by Kafka's creators. There's also now integration between Fluent Bit and ADX, via Azure Blob Storage, to which Fluent Bit can now deliver data and from which ADX is (and has been) able to ingest it, automatically. Also, ADX's 1-click ingestion and streaming ingestion features that Microsoft had already released in preview are now generally available (GA).
For data visualization, ADX will now offer a native dashboard facility, where the visualizations returned from data exploration queries can be pinned as tiles. This feature had previously been released in preview in June of this year; and such a dashboard, taken from the feature's documentation, is shown in the image at the top of this post. Beyond these native dashboards, ADX will also integrate with Grafana, through a plugin that will now offer a graphical query builder.
Specifically in the Microsoft ecosystem, ADX will now be query-able from Azure Data Studio (which up until now has mostly been a tool for working with SQL Server); will integrate with Azure Data Share; will support Vnet and parameterized DirectQuery integration with Power BI; and, through a data connector, will serve as a linked service to Azure Synapse Analytics. Moreover, Microsoft's roadmap includes the further integration of ADX as a fully native Synapse service, ostensibly in the same way Azure Data Factory, or even Apache Spark, is today.
- Azure Synapse Analytics combines data warehouse, lake and pipelines
- Azure Synapse Analytics: A progress report
- Azure Synapse Analytics data lake features: up close
It was already cool
It's important to keep in mind that all of this new power and versatility is being layered atop a service that was already massively powerful. ADX is the commercialization of Microsoft's internal "Kusto" technology that powers Microsoft services like Azure Monitor, Microsoft Intune, Azure Time Series Insights, and Dynamics 365 Product Insights. Even if a well-kept secret, it was a groundbreaking and innovative cloud service, from inception.
Microsoft has said that the current version (v2) of the Kusto engine can run queries over one billion rows in under one second. That performance is so good that the claim almost sounds like hyperbole, which may explain why not all customers could appreciate ADX's power and immediately put it to use. Nonetheless, ADX runs across a total of over 1 million CPU cores in the Azure cloud, ingests new data at the rate of 35 Petabytes a day, and now stores a cumulative 2+ Exabytes.
Growth Begets Growth
One Exabyte, by the way, is a thousand Petabytes, or the equivalent of one million Terabytes. And one Exabyte was where ADX's cumulative data volume was in January of this year, according to Microsoft. In other words, ADX's total data under management has fully doubled in the last 9 months, no doubt facilitated by the COVID-19 pandemic and its acceleration of digital transformation.
Clearly, then, ADX didn't exactly need a "shot in the arm," but the enhanced performance, capabilities and integration will give it enhanced visibility as well. A stronger Azure Data Explorer should make for a stronger Synapse Analytics, a stronger Azure Machine Learning, a stronger HDInsight and a more valuable Azure Data Lake Storage layer. Stronger synergies between services can make each service more valuable on its own. Synapse Analytics had already been doing that for Azure; hopefully, ADX can further enhance that synergistic effect.