Video: Search is evolving from desktop and keyboard to voice
Currently, there is no such thing as a single Microsoft speech service.
But Microsoft is taking the first steps toward creating a single speech application programming programming interface (API) and software development kit (SDK) that will work across its products and services, including Windows, Office, Cortana, Xbox, and the HoloLens.
Microsoft disclosed this move last week in a rather understated way at its Build 2018 conference. (This Day 3 Build session on the "Cognitive Services Speech SDK" covers some of the details.)
Microsoft has some ambitious goals for its coming unified Speech Service, which falls under its Microsoft Cognitive Services umbrella. (Cognitive services are Azure APIs that developers can use to add various AI capabilities to their own apps and services.)
Microsoft is aiming to have the common speech API and SDK "run on all modern platforms" and "support all modern programming languages." Microsoft wants the service to be accessible by all levels, from novice to expert developer, and to work online, offline, in hybrid situations and batch, officials said. The new API and SDK will provide speech-to-text; speech-to-intent; speech translation and custom keyword-spotter invocation. They will work with both single-shot spoken commands and continuous ones. Microsoft is committing to handle all 28 spoken languages in the one unified Speech SDK.
"We don't have all that today, but this (Speech preview) is a good first step," said Rob Chambers during last week's Speech SDK session. The preview supports Windows 10, Linux and Android (via the Speech Devices SDK), and works with C#, C++ and Java currently. Support for iOS and macOS X are coming "soon."
The Speech Devices SDK is a "pre-tuned library paired with specific microphone-enabled hardware," explains Microsoft in its documentation. "The SDK makes it easy to integrate your device with the cloud-based Microsoft Speech service and create an exceptional user experience for your customers."
The Devices SDK is meant to enable companies to build their own "ambient devices with a customized wake word," and it provides noise suppression, echo cancellation, far-field voice and more. Currently, the SDK preview provides access to Speech to Text and Speech Translation. Text to Speech is currently not supported by the SDK.
"Microsoft is planning to move Office Dictation to the Microsoft Speech Service and unified SDK when it becomes generally available. In the meantime, Office Dictation will continue to be updated and the migration will be seamless for customers," a spokesperson told me when I asked about timing.
Microsoft officials said they expect the service/SDK to become generally available some time in the "next few months," the spokesperson said.
So far, no word back from the Windows team on what it's planning on this front.
Update (May 16). The Windows team is keeping its cards close to the vest. From a spokesperson:
"Microsoft has been invested in voice and speech for years. As you saw most recently at Build, we announced an important set of updates for the speech capabilities of Microsoft Cognitive Services - a new unified Speech service in preview. From a broader customer perspective, we're listening to feedback and hearing people are increasingly wanting to use voice to engage with PCs so we'll continue to test/deliver new experiences. Beyond this, we have nothing to share regarding details on future plans."