Microsoft is developing an Azure-hosted, paid version of its internal-facing Cosmos big-data computation, analysis and storage service.
I speculated last August that Microsoft was poised to make Cosmos one of its next big Azure service offerings. Based on information shared with me by sources, it appears this, indeed, is happening.
Microsoft is in the midst of recruiting testers for the key components of the coming offering. Those components include an analysis-engine piece codenamed "Kona" and a storage-engine piece codenamed "Cabo". There's also a new SQL-friendly language, known as SQL-IP, that will be part of the coming big-data analysis package, sources said.
Currently, Cosmos is an internal-facing Microsoft service only. It's Microsoft's massively parallel storage and computation service that handles data from Azure, Bing, AdCenter, MSN, Skype and Windows Live. According to a recent Microsoft job posting, there are 5,000 developers and "thousands" of users inside Microsoft using Cosmos. Cosmos was built using Microsoft's Dryad distributed-processing technology.
Microsoft has used Cosmos internally to process telemetry data; to perform analysis and reporting on large datasets, such as those created via Bing and Office 365; and to curate and perform back-end processing on many kinds of data. A lot of the data used for these various purposes is shared. Queries on this data can run on anywhere from one to 40,000 machines in parallel.
Microsoft is planning to position the external-facing version of Cosmos as a complement to HDInsight, which is Microsoft's Hadoop-on-Azure service. Users will have a choice of using HDInsight or SQL-IP on the same datasets, sources said.
SQL-IP is a mix of SQL, C# and .NET. It's meant to be extensible and to handle parallel computation. It sounds like there will be a Visual Studio plug-in supporting SQL-IP, from what my sources have said.
SQL-IP is an evolution of Microsoft's SCOPE language, which is more inherently SQL-friendly. I blogged about SCOPE back in 2011, noting it was a parallel querying capability of Cosmos, designed to make it appear to users that their distributed/parallel queries were executing on a single machine.
An interesting aside for those tracking Cosmos' evolution: Microsoft researcher Ed Nightingale noted that starting in 2012, he spent time rearchitecting the Cosmos service "bringing to bear the lessons and principles from the Microsoft Research Flat Datacenter Storage project." Flat Datacenter Storage (FDS) is a "high-performance, fault-tolerant, large-scale, locality-oblivious blob store."
Microsoft is planning to charge customers for what they use, in terms of compute and storage with the coming Cosmos service. Users will be able to execute a query on their data and pay only for the processing they use, sources said.
I don't know when Microsoft is planning to release a public preview of the coming externally-facing Cosmos service or when the company plans to make it generally available. I've asked Microsoft officials if they'd share further details. No word back so far.
Update: A Microsoft spokesperson said the company had no comment.
In a job offering (which is no longer posted on Microsoft's Careers site), company execs called the externally-facing version of Cosmos a product with "multibillion dollar potential."