Azure Synapse Analytics was first revealed by Microsoft in November 2019, at its Ignite conference in Orlando, back when we still had live events. With just a few months to go until its first birthday, we thought it would make sense to take a look at various platform features that have recently released to general availability (GA) and public preview. We also learned of an interesting and just-announced partnership Microsoft and Qlik have built around Synapse and thought that worthy of exploration as well. We'll cover it all in this post.
But first, the TL;dr. Today, Synapse is Microsoft's, and integrated data lake functionality is now in public preview. But Synapse is more than just the sum of its lake and warehouse parts. What it really is, in strategic terms, is the service that aggregates, integrates and facilitates an array of other Azure data services that have heretofore existed as mere islands of functionality. Essentially, Synapse is the "better together" agent for data and analytics on the Azure cloud.
- Azure Synapse Analytics combines data warehouse, lake and pipelines
- A closer look at Microsoft Azure Synapse Analytics
The plot thickened
At Microsoft's online Build event this past May, the company made broad announcements about bringing Synapse's data lake functionality and its Synapse Studio environment to public preview, as well as interesting integrations between Synapse and Cosmos DB. We already covered those but, as it turns out, there were additional capabilities added that were not as well-publicized.
Recently, we caught up with Microsoft's Principal Group Program Manager for Synapse Analytics, Charles Feddersen, who explained these deeper features to us. More recently, we spoke with Microsoft and Qlik, to talk about the two companies' partnership that mashes up Synapse with some of Qlik's technology, and data from SAP systems. The combination of these two briefings made clear how strategic Synapse is to Microsoft. Participation in both briefings by Microsoft's Director of Products for Azure Data and Artificial Intelligence, Daniel Yu, made that clearer still.
Additional feature overview
Feddersen explained to us that Synapse's data lake will support version 0.6 of Delta Lake tables. He reiterated how integrated machine learning via Apache Spark and Azure Machine Learning, as well as support for ONNX models and the T-SQL PREDICT command on the warehouse side, will be very important. He further reviewed the importance Synapse's support for the Common Data Model.
Then Feddersen went deeper still, and introduced us to substantive core data warehouse improvements. These include the new T-SQL COPY command, available only on Synapse, for bulk loading of data. There's also a Bulk Load Wizard that supports schema inference, generates scripts based on the COPY command and creates jobs to run those scripts on a scheduled basis. And there are performance improvements as well, including better matching on materialized views and updateable hash keys - which allow true updates, in place of deletions and re-insertions.
There are productivity gains too. For example, if you view an object in Synapse Studio, you can right-click on it and, through a menu option, generate a new notebook with the necessary code to open the object and bring back data from it. Such code gen shortcuts are nothing new for Microsoft, but it's good to see them applied to the context of open source analytics.
Do-si-do your partner
Synapse acts as a kind glue, and that stickiness can extend to partnerships, too. Microsoft has a really interesting one with Qlik, around Synapse Analytics and data from SAP's application platforms.
While you might think of Qlik as focused exclusively on business intelligence -- and thereby an arch competitor to Microsoft and Power BI -- that's simply no longer the case. Qlik's 2018 acquisition of Podium Data and, especially germane to this case, its 2019 acquisition of Attunity, changed all that.
Under the catch-all Qlik Data Integration brand that encompasses both of the acquired companies' products, the former Attunity technology provides industry-leading change data capture functionality via Qlik Replicate, supporting near-real-time data replication between systems. Qlik Data Integration also covers data warehouse automation, via Qlik Compose. And, going as far back as 15 years ago, such technology enabled Attunity to create special solutions for SAP data.
Put it all together with Synapse and what do you get? Automated creation of a Synapse data warehouse for SAP data, with on-the-fly updates into it, avoiding batch updates and the associated data latencies. And since the offering is based entirely on the data warehouse functionality in Synapse (i.e. the same functionality that was present in Azure SQL Data Warehouse), it's ready to run, today.
Of course, when Synapse's public preview functionality goes GA, the possibilities around this partnership grow richer still. For example, Power BI reports based on current SAP data could be run on a scheduled or interactive basis. And SAP data could also be moved into the lake, where predictive models could be built on it, using Spark's machine learning capabilities, for example. Those models could then round-trip back to the warehouse where they could be used to run in-database inferencing with the PREDICT command. That, in turn, could facilitate data-driven decision making based on up-to-date sales, inventory and supply chain data coming from SAP.
Microsoft and Qlik are confident enough in their tie-up that they're offering "proof of value" (POV) implementations based on it. These include free half-day architectural workshops and sponsored software subscriptions for Qlik Data Integration, Synapse and Power BI, for the duration of the POV.
Synapse keeps firing
Synapse may have had its coming out party before the coronavirus/COVID-19 pandemic, but Microsoft seems ready to perpetuate the platform's momentum during the pandemic as well. New features, partnerships and sponsored implementations are good investments in that regard. Hopefully, new capabilities like stronger integration with Azure Machine Learning and Azure Event Hubs, as well as rationalization with HDInsight and Databricks, are in the pipeline too.
Beyond the Azure data stack, integrations with Microsoft 365 components like Teams, SharePoint, and Excel could really give Amazon Redshift, Google BigQuery and Snowflake good runs for their money. Recessionary times can be good opportunities to invest and create goodwill; pandemics, I would reckon, are all the more so. Microsoft's on the right path in that regard, and it should keep going. There will be plenty of opportunity to rest on laurels at the next in-person event. Whenever that is.