Fastly is an edge cloud platform provider that, it says, processes about 10% of all requests on the Internet. Azure Data Explorer (ADX -- formerly project "Kusto") is a cloud-based big data analytics platform from Microsoft. ADX is still in public preview, but Fastly has nonetheless teamed with Microsoft to create a customer-facing solution for real-time analysis on high-volume click-stream data, based on ADX.
Also: Microsoft cloud services see global authentication outage
I already had familiarity with Azure Data Explorer, but a conversation with with Lee Chen, Fastly's Head of Strategic Partnerships (and its former Head of Product), helped me understand why the Fastly-ADX solution is innovative. The background also helped me understand better what ADX itself is all about. That was very useful insight since, to be frank, the Azure Data Explorer name -- and the service's marketing thus far -- can make it sound like a generic analytics offering.
ADX is not a generic data service though, despite the name. It works with fast data, but it's not a streaming data platform per se --the Azure Event Hubs service takes on that workload, as do Azure HDInsight Kafka clusters. As shown in the figure at the top of this post, ADX also allows querying and visualizing of its data with a SQL-like language called KQL (Kusto Query Language). But ADX isn't a streaming analytics or data visualization platform either -- Azure Stream Analytics, Azure Databricks and Power BI serve those workloads.
Rather, ADX puts together these capabilities, along with time series analytics features, to perform queries over huge volumes of data -- with response times similar to that of a BI platform over tiny relatively small data sets. Microsoft claims that ADX can "query billions of records in seconds." And Microsoft itself uses ADX to power the Azure Monitor and Azure Time Series Insights services.
These capabilities enable Fastly, which gathers all that click data at edge locations across the Internet, to let customers like Taboola analyze their data in near real-time, or over as much as the the prior 7 (soon to be 30) days of historical data. This is no small feat, given that Taboola generates 22 billion records of edge delivery logs -- some 17 TB of data -- per day. Despite those data volumes, Fastly's provision of all log data in real-time from the network edge directly to Azure Blob Storage, combined with ADX's feature set, allow the solution to monitor site performance and troubleshoot issues as they occur.
Microsoft provides good detail on the ADX/Fastly solution in a blog post. Taboola describes the solution in its own post as well. Quite frankly, both posts are more than a little promotional. But beyond the marketing, there are some interesting takeaways from this solution. First, the combination of columnar storage and indexing (both of which are implemented by ADX) can produce stunning results. Second, time series analytics on truly big data can actually be straightforward.
But proprietary solutions like ADX might be necessary to get there. While you could string together a custom solution -- using the likes of, say, Apache Kafka and the Spark Streaming component of Apache Spark -- such solutions will involve a lot of complexity and require a variety of skill sets, plus active management to scale the infrastructure, as needed. But ADX can be provisioned on-demand, scaled automatically and, skill set-wise, it requires little more than learning the service's query language. The value provided there is huge. And since the source data can live in cloud storage, it's still query-able by open source technologies like Hadoop and Spark.
Ultimately, if you're doing everything with open source solutions running in Kubernetes clusters, you'll have lots of portability across public clouds and into corporate data centers as well. But time to market/value and project success may be a lot more challenging with purely open source solutions. This crystallizes the cloud data analytics trade-off. Ariel Pisetzky, VP Information Technology at Taboola said "Azure Data Explorer, together with Fastly's real-time logging, outperforms our previous solution with a faster update time and an intuitive interactive interface. Plus, it was so simple that we were up and running in a week, ingesting and analyzing 17 TB of data per day."
That calculus won't work for everyone, though. Your team will need to weigh ease-of-implementation vs. lock-in concerns and pick a solution accordingly. No matter what, though, it's good to know that the cloud and newer data technologies are putting solutions like the Fastly one within reach.