Microsoft moves toward consolidating its many speech services

Microsoft's new unified Speech Service and development platform aim to bring together the many different speech services used across Microsoft's own products and services.
Written by Mary Jo Foley, Senior Contributing Editor

Video: Search is evolving from desktop and keyboard to voice

Currently, there is no such thing as a single Microsoft speech service.


But Microsoft is taking the first steps toward creating a single speech application programming programming interface (API) and software development kit (SDK) that will work across its products and services, including Windows, Office, Cortana, Xbox, and the HoloLens.

Microsoft disclosed this move last week in a rather understated way at its Build 2018 conference. (This Day 3 Build session on the "Cognitive Services Speech SDK" covers some of the details.)

Microsoft has some ambitious goals for its coming unified Speech Service, which falls under its Microsoft Cognitive Services umbrella. (Cognitive services are Azure APIs that developers can use to add various AI capabilities to their own apps and services.)

The new unified Speech Service "unites several Azure speech services that were previously available separately: Bing Speech (comprising speech recognition and text to speech), Custom Speech, and Speech Translation. Like its precursors, the Speech service is powered by the technologies used in other Microsoft products, including Cortana and Microsoft Office," according to Microsoft.

Read also: Microsoft quietly adds Cortana features

Microsoft is aiming to have the common speech API and SDK "run on all modern platforms" and "support all modern programming languages." Microsoft wants the service to be accessible by all levels, from novice to expert developer, and to work online, offline, in hybrid situations and batch, officials said. The new API and SDK will provide speech-to-text; speech-to-intent; speech translation and custom keyword-spotter invocation. They will work with both single-shot spoken commands and continuous ones. Microsoft is committing to handle all 28 spoken languages in the one unified Speech SDK.

"We don't have all that today, but this (Speech preview) is a good first step," said Rob Chambers during last week's Speech SDK session. The preview supports Windows 10, Linux and Android (via the Speech Devices SDK), and works with C#, C++ and Java currently. Support for iOS and macOS X are coming "soon."

The Speech Devices SDK is a "pre-tuned library paired with specific microphone-enabled hardware," explains Microsoft in its documentation. "The SDK makes it easy to integrate your device with the cloud-based Microsoft Speech service and create an exceptional user experience for your customers."

The Devices SDK is meant to enable companies to build their own "ambient devices with a customized wake word," and it provides noise suppression, echo cancellation, far-field voice and more. Currently, the SDK preview provides access to Speech to Text and Speech Translation. Text to Speech is currently not supported by the SDK.

Microsoft officials said they are moving the existing Microsoft Translator app/service to use the new unified Speech Service and SDK as of its next release. Office also is planning to replace the current dictation engine, based on Dictate technology developed by the Microsoft Garage incubator, with the new service/SDK.

"Microsoft is planning to move Office Dictation to the Microsoft Speech Service and unified SDK when it becomes generally available. In the meantime, Office Dictation will continue to be updated and the migration will be seamless for customers," a spokesperson told me when I asked about timing.

Microsoft officials said they expect the service/SDK to become generally available some time in the "next few months," the spokesperson said.

Read also: The promised Cortana-Alexa integration is getting closer

I've also asked the Windows team about its plans regarding when/how Windows 10 will support the new unified speech service and SDK. With the Windows 10 April 2018 update, Microsoft officials were touting improved dictation built into Windows 10 as one of the April Update's main selling points. But Windows doesn't use the same speech engine as Office or other Microsoft products at this time; it uses legacy Microsoft speech technology.

So far, no word back from the Windows team on what it's planning on this front.

Update (May 16). The Windows team is keeping its cards close to the vest. From a spokesperson:

"Microsoft has been invested in voice and speech for years. As you saw most recently at Build, we announced an important set of updates for the speech capabilities of Microsoft Cognitive Services - a new unified Speech service in preview. From a broader customer perspective, we're listening to feedback and hearing people are increasingly wanting to use voice to engage with PCs so we'll continue to test/deliver new experiences. Beyond this, we have nothing to share regarding details on future plans."

16 Cortana shortcuts, secrets, and power tips

Editorial standards