Microsoft Ignite kicks off in Orlando, FL today and numerous product teams -- several of them within the data and analytics stable -- are using the event as a vehicle to launch new releases, features and capabilities. My colleague Mary Jo Foley will be covering numerous announcements from the show, but this post aims to summarize all the data and AI news.
The first slate of news comes from the land of (the on-premises version of) SQL Server. And the news is Big...as in Data. First off, Microsoft is releasing a public preview of SQL Server 2019, the next major release of the product. And within the 2019 version, there's lots that's new.
Also: What to do when big data gets too big TechRepublic
The biggest news is that, through a thorough revamp of the flagship database's storage engine and its PolyBase technology, SQL Server seeks to become a true Big Data platform, through integration with HDFS (the Hadoop Distributed Filing System) and Apache Spark. SQL Server will now be able to use HDFS for storage, will optionally leverage Spark for data engineering and machine learning tasks and can itself operate using a distributed architecture.
Also: Microsoft's PolyBase mashes up SQL Server and Hadoop
Nodes can run as SQL compute nodes, SQL storage nodes or HDFS data nodes. In the HDFS case, SQL Server and Apache Spark run co-located, in the same container. All of this interoperability is enabled by Kubernetes, and SQL Server 2019's Kubernetes-compatibility enables it to run on premises or across the various public clouds (even if not as a Platform as a Service offering).
SQL Server will also continue to work in its conventional SMP architecture and the enhancements to PolyBase are available there as well. These enhancements include the ability to connect to Oracle, Teradata, MongoDB, generic ODBC data sources and even other SQL Server instances, in addition to continued support for Azure storage, and both Cloudera and Hortonworks Hadoop clusters.
And that's not all. For example, SQL Server 2019 brings enhanced features to the (very v1.0) graph processing capabilities introduced in SQL Server 2017. It also adds support for in-place execution of Java code, using the same infrastructure that enabled R and Python code to run in-database and facilitate the product's Machine Learning Services component, which itself will now run on SQL Server Linux instances as well as those running on Windows.
Also: SQL Server 2017 adds Python, graph processing and runs on Linux
The SQL Server technology family's innovations extend beyond the "box product" (i.e. on-premises SQL Server), though. For example, Microsoft is bringing Azure SQL Database Managed Instance to general availability (GA) today. Managed Instance brings near-full compatibility with on-prem SQL Server and yet the server instances are managed by Microsoft.
A new service, Azure SQL Database Hyperscale, which facilitates work with huge volumes of data -- up to 100TB, in fact -- is being launched in preview. While the technology is limited to the Azure SQL Database PaaS product, Rohan Kumar, Microsoft 's Corporate Vice President for Data, told me it will eventually make its way to Managed Instances and even to Azure SQL for MySQL, and MariaDB. There are even plans to bring Hyperscale storage to Azure SQL for Postgres (though that will be harder, since Postgres isn't designed to accommodate multiple storage engines the way MySQL and MariaDB are.)
In other SQL Server news, Microsoft is also introducing a new product name: Azure Data Studio. The product itself isn't new, though, as Azure Data Studio is a re-brand of SQL Operations Studio, the cross-platform front-end tool for SQL Server. With the re-brand, Microsoft will be making the product more modular, so that it can work with data sources other than SQL Server. And using that new modular design, Microsoft is also releasing an add-in that allows the product to work with SQL Server 2019.
Beyond the SQL Server family, Microsoft has news around Cosmos DB, Azure Machine Learning and a new new service, as well.
Also read: DataStax Enterprise 6: Faster, fit and finish
On the Cosmos DB side, Microsoft is bringing the multi-master feature set to GA and is introducing Cassandra API compatibility. Multi-master capabilities allow the Cosmos DB globally distributed database to accept updates to the data at any location, assuring those updates will be visible across all locations (previously data could be read at multiple locations but could only be written at one). Cassandra API compatibility means that applications written for the Cassandra/DataStax NoSQL database can now be ported to Cosmos DB; previously, only MongoDB and HBase NoSQL developers enjoyed that capability.
Also: Microsoft debuts Azure Cosmos DB, a superset of its DocumentDB service | Inside Microsoft's Cosmos DB
On the Machine Learning front, Microsoft has acknowledged that data scientists like the dev tools they already have. As cool as it was to have the Azure Machine Learning Workbench, Microsoft is now introducing a Python SDK that will allow developers to work with Azure Machine Learning from Jupyter, Apache Zeppelin and, ostensibly, Databricks notebook environments. That will put Azure on equal footing with Spark MLlib, TensorFlow and other ML frameworks which are typically utilized from those environments as well.
Also: How to build a business architecture for your big data TechRepublic
Beyond the hardcore data science set, though, Azure ML will also become accessible to a broader array of developers, as it introduces AutoML capabilities. If Microsoft pulls this off well, it will mean that the rather bespoke process of (and accompanying rarefied skill set involved in) selecting machine learning algorithms and setting values for their "hyper-parameters" will become automated. That would, in theory, allow developers to bring their data sets to Azure ML, identify the features (input columns) and label (predicted column) and proceed immediately to building an ML model for later predictive analytics use.
MORE FROM IGNITE: Microsoft tries again to win developers with new Cortana Enterprise Skills Kit | What's next for Teams | Microsoft staggers rollout of Surface Hub 2 | Microsoft to unify search across Windows 10, Office 365 and Bing with Microsoft Search | Microsoft readies previews of Azure Digital Twins, Azure Sphere secure-edge service
On the Cognitive Services side, Microsoft is introducing a number of FPGA-powered machine learning models that customers can train with their own data.
In the one more thing department, Microsoft is introducing a new service, Azure Data Explorer, for performing analytics on event/time series, unstructured and natural language data. Previously known under the code-name "Kusto," this technology was used internally at Microsoft for years and seems to have played a role in power Azure Application Insights. Now the company is bringing Kusto to the public under the Azure Data Explorer brand, as a commercial service, joining an array of streaming/messaging/event-based services in the Azure cloud.
Also: Volume, velocity, and variety: Understanding the three V's of big data
There is a lot of news here, and further details will be necessary, especially around SQL Server 2019 and Cosmos DB. Expect further posts, with more information on these services, soon.
Here's how you can still get a free Windows 10 upgrade
How to install, reinstall, upgrade and activate Windows 10
After Windows 10 upgrade, do these seven things immediately
There is no one role for AI or data science: this is a team effort
'How quote-to-cash works in in any ERP is not something that you can teach a data scientist in two days.'
Knowledge graphs beyond the hype: Getting knowledge in and out of graphs and databases
What exactly are knowledge graphs, and what's with all the hype about them? Learning to tell apart hype from reality, defining different types of graphs, and picking the right tools and database for your use case is essential if you want to be like the Airbnbs, Amazons, Googles, and LinkedIns of the world.
What to do with the data? The evolution of data platforms in a post big data world
Thought leader Esteban Kolsky takes on the big question: What will data platforms look like now that big data's hype is over and big data "solutions" are at hand?