Microsoft's Azure and Research teams are working together to build a new AI infrastructure service, codenamed "Singularity." The Singularity team is working to build what Microsoft describes in some of its job postings for the group as "a new AI platform service ground-up from scratch that will become a major driver for AI, both inside Microsoft and outside."
A group of those working on the project have published a paper entitled "Singularity: Planet-Scale, Preemptible and Elastic Scheduling of AI Workloads," which provides technical details about the Singularity effort. The Singularity service is about providing data scientists and AI practitioners with a way to build, scale, experiment and iterate on their models on a Microsoft-provided distributed infrastructure service built specifically for AI.
Authors listed on the newly published paper include Azure Chief Technical Officer Mark Russinovich; Partner Architect Rimma Nehme, who worked on Azure Cosmos DB until moving to Azure to work on AI and deep learning in 2019; and Technical Fellow Dharma Shukla. From that paper:
"At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of accelerators (e.g., GPUs, FPGAs)."
Microsoft officials previously have discussed plans to make FPGAs, or field-programmable gate arrays, available to customers as a service. In 2018, Microsoft went public about its "Project Brainwave" work which was designed to provide fast AI processing in Azure." At that time, Microsoft made available a preview of Azure Machine Learning Hardware Accelerated Models powered by Brainwave in the cloud -- a first step in making FPGA processing for AI workloads available to customers.
I would guess that Singularity is the next phase in turning Brainwave into a commercially available service. I've asked Microsoft for a comment on that. I've also asked when and how Microsoft plans to turn Singularity into a commercially available service. I will update this post with any information I get back.
While the AI supercomputer Microsoft has built is exclusively for OpenAI, Microsoft officials have been saying they planned to make the company's large AI models and training optimization tools available through Azure AI services and GitHub. Microsoft also makes various accelerators and services available under its "Azure AI" banner to customers who don't need a dedicated supercomputer. In November 2021, Microsoft announced it was expanding its AI supercomputer lineup with 80GB NVIDIA A100 GPUs in Azure.
Microsoft watchers may recall that Microsoft previously used the Singularity codename for another Microsoft Research project. That Singularity was a microkernel operating system and set of related tools and libraries developed completely in managed code. Singularity was not based on Windows; it was written from scratch as a proof-of-concept. Singularity ended up spawning and/or influencing several other operating system research projects at Microsoft, including Barrelfish, Helios, Midori, and Drawbridge.
It's also worth noting that Microsoft is hardly the only tech company looking at trying to make AI supercomputing capabilities available internally and to customers. Meta is doing the same, and, unsurprisingly, has positioned its work as the key to unlocking the metaverse.