Microsoft is launching more and more internal and external services on Azure. Given CEO Satya Nadella's focus on creating a "data culture," it's not too surprising that big data services are a top priority.
HDInsight, Microsoft's Hadoop on Azure service, was Microsoft's first commercially available big-data service. I'm wondering if Cosmos might be its next.
Currently, Cosmos is an internal-facing Microsoft service. It's Microsoft's massively parallel storage and computation service that handles data from Azure, Bing, AdCenter, MSN, Skype and Windows Live. According to a recent Microsoft job posting, there are 5,000 developers and "thousands" of users inside Microsoft using Cosmos. Cosmos was built using Microsoft's Dryad distributed-processing technology.
(In 2011, Microsoft dropped plans to make Dryad commercially available in favor of going with Hadoop. But some internal Dryad work has continued.)
Cosmos provides the backbone for Bing analysis and relevance. It is used by Microsoft internally to generate custom datasets of all kinds that teams use to build and evaluate products and services. But maybe Cosmos could generate externally available datasets, too -- especially as Cosmos expands its charter so as to store and process exabytes of structured data, social network updates, geospatial/map data, and semantic data collected across the Web.
"Every day we (the Cosmos team) run thousands of computations that read and write petabytes of data," according to another Microsoft job post. "For our external service, we’ve just started development, and it’s super exciting to be working on this product with very real multibillion dollar potential."
The post notes that there's a Big Data Tooling Team inside Microsoft's Cloud & Enterprise unit that is building the developer experience and tools for both Cosmos and HDInsight. That team owns the end-to-end user experience for big data. Team members "get exposure to users and the inter-workings of the massively parallel systems. Most importantly, we are producing the differentiator that will sell Microsoft’s big data products against its competitors," the job post continues.
Last week, I asked Microsoft Corporate Vice President President T.K. "Ranga" Rengarajan -- who heads Microsoft's data platform team -- whether Microsoft might make Cosmos available to non-Microsoft employees as an Azure service. He said "not at this time, but if it's something customers want, we'd be happy to do that." Update: A Microsoft spokesperson said that Rengarajan actually said that Cosmos is "Microsoft-only at this time, but we keep watching what customers want."
The Cosmos team isn't the only Microsoft group focused on mining big data for internal use. The Applications and Services Group (ASG) has a Shared Data Platform team that is building a set of hosted services used by Microsoft engineers internally to build and deploy data-centric applications. Those services and tools include a common metadata catalog, Web and Visual Studio authoring tools, data virtualization, data movement, SQL fabric management, data dimension processing, and a user interface surface that provides end-user access to data via Excel.
"The Data Platform team has been given responsibility for building the data sets that will be used to run Microsoft's devices and services businesses. In a One Microsoft partnership with teams across ASG, C&E (Cloud & Enterprise), Marketing, and Finance, we will be building the applications to acquire, integrate, and optimize the data that will be used throughout the company to understand the performance of our Online businesses," according to a job posting on Microsoft's career site.
There's also a team in Microsoft's Global Foundation Services unit that runs Microsoft's datacenters that is focused on harnessing big data. "Project Nova" is the unit that is collecting, storing, analyzing and visualizing datacenter telemetry, asset, networking and business process data. "The Nova Data Analytics team provides the tooling to access over 100TB of data in the GFS operational data warehouse," yet another job posting notes.
Back to Cosmos. I don't know how feasible or useful it would be for non-Microsoft employees to be able to tap into the kinds of data Microsoft currently collects and processes via Cosmos. But all those petabytes of data about what people search for, what ads they click on, which sites they visit and more definitely could be of keen interest to customers in certain businesses...