Home & Office

Microsoft pushes ahead with conversation transcription, virtual microphone arrays

Microsoft Research's 'Project Denmark' technology allows users to use the microphones in phones and laptops to create a virtual array that can handle conversation transcription and more.

Written by Mary Jo Foley, Senior Contributing Editor May 10, 2019 at 11:47 a.m. PT

Microsoft Build 201

Microsoft demonstrated some interesting advancements on the smart-meetings front this week during its Build 2019 keynote. Company officials showed off a new Conversation Transcription capability that's part of its Azure Speech Service. The new capability, now in preview, allows real-time transcription of multi-user conversations with automatic speaker attribution -- even when there's cross-talk happening.

But there was another part of this year's Build 2019 demonstration that happened so quickly that many (including me) initially may have missed it: Microsoft showed this service working not only on its custom microphone-array reference hardware -- like it did at last year's Build -- but using a cloud-powered virtual microphone array.

The virtual/cloud piece of this is still a Microsoft Research project, which is codenamed "Project Denmark." Instead of relying on dedicated microphone arrays, Project Denmark allows users to set up "virtual" microphone arrays using consumer devices like mobile phones and laptops with ordinary microphones. It fits into Microsoft's evolving ambient-computing strategy.

"Algorithms for combining speech information at multiple levels yield transcription accuracy that approaches that from close-talking microphones," say the Project Denmark researchers. There's a new project page for Project Denmark on the Microsoft Research site (thanks to WalkingCat for the link), as well as a technical report about Denmark.

From Microsoft Research's blog post about its announcements from Build this year:

"Project Denmark can potentially help our customers more easily transcribe conversations anytime and anywhere using Azure speech services, with or without a dedicated microphone array DDK. Future application scenarios are broad. For example, we may pair up multiple Microsoft Translator applications to help multiple people communicate more effectively using mobile phones to minimize language barriers."

Microsoft announced this week that it will be making the mysterious circular microphone array hardware we first saw at Build 2018 available to those outside the company in the form of device developer kits (which are codenamed "Princeton Tower). Audio-only microphone array DDKs can be purchased from http://ddk.roobo.com for roughly $100. Advanced audio-visual microphone array DDKs are available from Microsoft systems integration partners.

The Speech Devices developer Kit is made for those who want to build devices for custom virtual assistants, conversation transcription and smart speakers. (The Azure Kinect developer kit also can handle conversation transcription, for what it's worth.)

At Build 2018, Microsoft announced it was moving toward creating a single speech application programming programming interface (API) and software development kit (SDK) that would work across its products and services, including Windows, Office, Cortana, Xbox, and the HoloLens.