The big data odyssey of SQL Server 2019, and more data and AI news from Microsoft Ignite

Ignite, Microsoft's annual IT Pro and Developer confab kicks off today, with a slew of news in the data, analytics and AI areas of the company's stack, both in-cloud and on-premises.
Written by Andrew Brust, Contributor

Microsoft Ignite kicks off in Orlando, FL today and numerous product teams -- several of them within the data and analytics stable -- are using the event as a vehicle to launch new releases, features and capabilities. My colleague Mary Jo Foley will be covering numerous announcements from the show, but this post aims to summarize all the data and AI news.

SQL Server 2019: a Big Data odyssey

The first slate of news comes from the land of (the on-premises version of) SQL Server. And the news is Big...as in Data. First off, Microsoft is releasing a public preview of SQL Server 2019, the next major release of the product. And within the 2019 version, there's lots that's new.

Also: What to do when big data gets too big TechRepublic

The biggest news is that, through a thorough revamp of the flagship database's storage engine and its PolyBase technology, SQL Server seeks to become a true Big Data platform, through integration with HDFS (the Hadoop Distributed Filing System) and Apache Spark. SQL Server will now be able to use HDFS for storage, will optionally leverage Spark for data engineering and machine learning tasks and can itself operate using a distributed architecture.

Also: Microsoft's PolyBase mashes up SQL Server and Hadoop

Nodes can run as SQL compute nodes, SQL storage nodes or HDFS data nodes. In the HDFS case, SQL Server and Apache Spark run co-located, in the same container. All of this interoperability is enabled by Kubernetes, and SQL Server 2019's Kubernetes-compatibility enables it to run on premises or across the various public clouds (even if not as a Platform as a Service offering).

Not for clusters only

SQL Server will also continue to work in its conventional SMP architecture and the enhancements to PolyBase are available there as well. These enhancements include the ability to connect to Oracle, Teradata, MongoDB, generic ODBC data sources and even other SQL Server instances, in addition to continued support for Azure storage, and both Cloudera and Hortonworks Hadoop clusters.

And that's not all. For example, SQL Server 2019 brings enhanced features to the (very v1.0) graph processing capabilities introduced in SQL Server 2017. It also adds support for in-place execution of Java code, using the same infrastructure that enabled R and Python code to run in-database and facilitate the product's Machine Learning Services component, which itself will now run on SQL Server Linux instances as well as those running on Windows.

Also: SQL Server 2017 adds Python, graph processing and runs on Linux

Managed instances and hyperscale storage

The SQL Server technology family's innovations extend beyond the "box product" (i.e. on-premises SQL Server), though. For example, Microsoft is bringing Azure SQL Database Managed Instance to general availability (GA) today. Managed Instance brings near-full compatibility with on-prem SQL Server and yet the server instances are managed by Microsoft.

A new service, Azure SQL Database Hyperscale, which facilitates work with huge volumes of data -- up to 100TB, in fact -- is being launched in preview. While the technology is limited to the Azure SQL Database PaaS product, Rohan Kumar, Microsoft 's Corporate Vice President for Data, told me it will eventually make its way to Managed Instances and even to Azure SQL for MySQL, and MariaDB. There are even plans to bring Hyperscale storage to Azure SQL for Postgres (though that will be harder, since Postgres isn't designed to accommodate multiple storage engines the way MySQL and MariaDB are.)

Sharpest tool in the shed?

In other SQL Server news, Microsoft is also introducing a new product name: Azure Data Studio. The product itself isn't new, though, as Azure Data Studio is a re-brand of SQL Operations Studio, the cross-platform front-end tool for SQL Server. With the re-brand, Microsoft will be making the product more modular, so that it can work with data sources other than SQL Server. And using that new modular design, Microsoft is also releasing an add-in that allows the product to work with SQL Server 2019.

Beyond the SQL Server family, Microsoft has news around Cosmos DB, Azure Machine Learning and a new new service, as well.

Also read: DataStax Enterprise 6: Faster, fit and finish

On the Cosmos DB side, Microsoft is bringing the multi-master feature set to GA and is introducing Cassandra API compatibility. Multi-master capabilities allow the Cosmos DB globally distributed database to accept updates to the data at any location, assuring those updates will be visible across all locations (previously data could be read at multiple locations but could only be written at one). Cassandra API compatibility means that applications written for the Cassandra/DataStax NoSQL database can now be ported to Cosmos DB; previously, only MongoDB and HBase NoSQL developers enjoyed that capability.

Also: Microsoft debuts Azure Cosmos DB, a superset of its DocumentDB service | Inside Microsoft's Cosmos DB

Can developers more easily learn machine learning?

On the Machine Learning front, Microsoft has acknowledged that data scientists like the dev tools they already have. As cool as it was to have the Azure Machine Learning Workbench, Microsoft is now introducing a Python SDK that will allow developers to work with Azure Machine Learning from Jupyter, Apache Zeppelin and, ostensibly, Databricks notebook environments. That will put Azure on equal footing with Spark MLlib, TensorFlow and other ML frameworks which are typically utilized from those environments as well.

Also: How to build a business architecture for your big data TechRepublic

Beyond the hardcore data science set, though, Azure ML will also become accessible to a broader array of developers, as it introduces AutoML capabilities. If Microsoft pulls this off well, it will mean that the rather bespoke process of (and accompanying rarefied skill set involved in) selecting machine learning algorithms and setting values for their "hyper-parameters" will become automated. That would, in theory, allow developers to bring their data sets to Azure ML, identify the features (input columns) and label (predicted column) and proceed immediately to building an ML model for later predictive analytics use.

MORE FROM IGNITE: Microsoft tries again to win developers with new Cortana Enterprise Skills Kit | What's next for Teams | Microsoft staggers rollout of Surface Hub 2 | Microsoft to unify search across Windows 10, Office 365 and Bing with Microsoft Search | Microsoft readies previews of Azure Digital Twins, Azure Sphere secure-edge service

On the Cognitive Services side, Microsoft is introducing a number of FPGA-powered machine learning models that customers can train with their own data.

Another product joins the data party

In the one more thing department, Microsoft is introducing a new service, Azure Data Explorer, for performing analytics on event/time series, unstructured and natural language data. Previously known under the code-name "Kusto," this technology was used internally at Microsoft for years and seems to have played a role in power Azure Application Insights. Now the company is bringing Kusto to the public under the Azure Data Explorer brand, as a commercial service, joining an array of streaming/messaging/event-based services in the Azure cloud.

Also: Volume, velocity, and variety: Understanding the three V's of big data

There is a lot of news here, and further details will be necessary, especially around SQL Server 2019 and Cosmos DB. Expect further posts, with more information on these services, soon.

A brief history of Microsoft's Surface: Missteps and successes

Previous and related coverage:

Here's how you can still get a free Windows 10 upgrade

Microsoft's much-hyped free upgrade offer for Windows 10 ended in 2016, right? Not exactly. The GWX tool may be gone, but all the other upgrade tools still work. The end result is an apparently valid digital license, and there's no evidence that the free upgrades will end any time soon.

How to install, reinstall, upgrade and activate Windows 10

Here's everything you need to know before you repair, reinstall, or upgrade Windows 10, including details about activation and product keys.

After Windows 10 upgrade, do these seven things immediately

You've just upgraded to the most recent version of Windows 10. Before you get back to work, use this checklist to ensure that your privacy and security settings are correct and that you've cut annoyances to a bare minimum.

There is no one role for AI or data science: this is a team effort

'How quote-to-cash works in in any ERP is not something that you can teach a data scientist in two days.'

Knowledge graphs beyond the hype: Getting knowledge in and out of graphs and databases

What exactly are knowledge graphs, and what's with all the hype about them? Learning to tell apart hype from reality, defining different types of graphs, and picking the right tools and database for your use case is essential if you want to be like the Airbnbs, Amazons, Googles, and LinkedIns of the world.

What to do with the data? The evolution of data platforms in a post big data world

Thought leader Esteban Kolsky takes on the big question: What will data platforms look like now that big data's hype is over and big data "solutions" are at hand?

Related stories:

Editorial standards