Microsoft has an array of components in its cloud for data management and analysis. There are the obvious ones like Azure SQL Database, Machine Learning and HDInsight; more infrastructural ones, like Azure Data Factory and Azure Stream Analytics; there are IoT-related services like Azure IoT Hub and there are several others you may not even know much about.
So, while it's easy to see how Microsoft has its bases covered, it's less obvious whether and how these services fit together. And testimonials from customers using a great many of these services in an integrated fashion may seem few and far between.
I found one though: MediaBrix, a five-year-old mobile advertising technology (AdTech) firm, based right near me in New York City, and focused on "moment-based targeting." I spoke with MediaBrix's CTO, Christopher Beach, who explained in nerdy detail which parts of the Azure data stack his company is using, and how they're tied together.
Before Beach's arrival, MediaBrix was on Azure, but using a different collection of services and technologies. The new scheme retains elements of the old one, but modernizes and consolidates things quite a bit. Doing so has cut MediaBrix's monthly spend in half, from roughly $10K to roughly $5K.
The data pipeline
While MediaBrix collects data into its 50TB corpus across a 24/7 period, it tends to process that data in batch jobs that currently run during just a fraction of the day. Data sources include telemetry data, device performance data, ad performance feeds and data from Data Management Platforms (DMPs -- essentially advertising/audience/marketing data warehouses) like Oracle's BlueKai and others.
A lot of the data is ingested via Azure Event Hubs and Stream Analytics, then landed in Azure Blob storage, where it can be processed in different ways. For one, it can be brought into Azure SQL Data Warehouse, using the PolyBase technology that is built into that platform. MediaBrix uses Azure Premium Storage with SQL DW, to keep performance reliably consistent. Once it's in SQL DW, the data can be queried directly via T-SQL and can also be aggregated for further analysis into SQL Server Analysis Services multidimensional cubes.
Also read: Microsoft's PolyBase mashes up SQL Server and Hadoop
Also read: Microsoft BUILDs its cloud Big Data story
Also read: Cloud data warehouse race heats up
One of the neat things about Azure DW is that it can be paused -- which is to say shut down for long durations -- causing compute charges to cease, and only storage charges to persist, until it's resumed. While I've been personally a bit skeptical of adoption of this feature, MediaBrix uses it liberally -- in fact the company only keeps its Azure SQL DW up for about five and a half hours each day.
How is the data queried when the DW is off? Well, bear in mind that Analysis Services cubes contain a materialized view of the aggregated data, so analytical and data discovery queries can be run against the cubes regardless of whether the DW is online or off.
But here's another twist: MediaBrix is looking at using Azure Data Lake Store, in place of vanilla Blob storage, as a place to land ingested data. This would then allow users to leverage Azure Data Lake Analytics and the U-SQL language to query the raw data, even when the warehouse is paused.
Other technologies used include .NET/C# to build API-based services in front of the data, and some use of Azure Machine Learning (ML) for predictive analytics. MediaBrix is now investigating the use of R Server, in place of Azure ML, for reasons of cost. With SQL Server R Services now part of SQL Server Enterprise, the company can foresee the day when R Services is also part of Azure SQL Database or Data Warehouse, allowing MediaBrix to "productionalize" scoring of data against predictive models in an economical fashion.
In the unmanaged days...
What did the old stack consist of, you might ask? In addition to the Blob Storage and SSAS components used now, it also included SQL Azure Database, SQL Server Enterprise on Azure virtual machines, MongoDB and Azure Table Storage. One striking distance between the old and new slates of technologies is the shift to fully managed services and away from self-managed ones like SQL Server and MongoDB.
Of course, SQL Service Analysis Services can now run only on-premises or in Azure virtual machines, so that's one self-managed component that remains. Azure Analysis Services, a managed cloud service equivalent, which just recently entered preview, is a possible replacement, but not until the service supports the classic multidimensional mode of SSAS versus the Tabular mode that's supported in the preview service now.
Also read: Microsoft, AtScale make data news at PASS Summit
The clergy and the laity
Clearly, MediaBrix runs a very sophisticated IT shop and is able to integrate a collection of services in a way that other organizations might be challenged to replicate. But the overall takeaway is clear: companies can today take advantage of IoT, streaming data, data warehouse, big data and SQL-on-big-data bridges, like PolyBase, today, for less than the cost of one mid-level technical employee. It can all be cloud-based, and mostly service-provider-managed, and the services can be composed and integrated.
In the future, these services should get easier and cheaper to use and snap together. At that point, finding customers who are taking advantage of such a large portion of the stack will be more commonplace than rare.
But for organizations with the right tech talent on staff, that functionality is ready now.