Microsoft making big speech bets with Windows 8, Bing

The Microsoft Tellme team is working with the Bing, Windows Phone, Kinect/Xbox, Azure and other Microsoft teams to add new speech-centric capabilities to Microsoft and third-party products in the coming year-plus.
Written by Mary Jo Foley, Senior Contributing Editor

Microsoft's consolidated speech technology unit, Microsoft Tellme, is working with a number of product teams inside the company to make speech recognition and understanding a key component of a number of next-generation Microsoft offerings.

Microsoft execs have been demonstrating publicly how Windows Phones currently can handle spoken queries. With Mango, Windows Phones will support even more speech functions, including speech-to-text and text-to-speech. And the Kinect sensor is going to get more sophisticated voice-command support this fall, enabling users to use Bing to search for movies, TV, music and other content via voice.

But within the coming year, even more Microsoft products and services are getting the speech recognition/understanding treatment.

Windows 7 today can recognize a limited set of spoken commands. But Microsoft will be taking this work further with Windows 8, said Ilya Bukshteyn, Tellme Senior Director of Sales and Marketing. Windows 8 on ARM and Intel slates will be able to recognize many speech commands, which makes sense given they won't be optimized for keyboard and mouse input. And because Windows 8 is "HTML-based," the HMTL5 speech tag could allow developers inside and outside Microsoft to create applications for Windows 8 that are speech-capable, Bukshteyn added.

As the Tellme team pushes beyond speech recognition and into conversational understanding, scenarios become even more interesting, Bukshteyn said. When CEO Steve Ballmer recently touted the ability of Bing to support complex natural-languge-query commands, he didn't explain what would make that magic happen. It turns out it's Tellme's voice technology, combined with social-graph information delivered via Windows Live, plus Bing's search functionality. ("Windows Live is a social graph hub for FaceBook, Twitter and LinkedIn," Bukshteyn explained.)

Microsoft posted on August 9 a video clip highlighting how this kind of conversational understanding could work (and showed this clip at the SpechTek conference keynote in New York today):

Example: Say you want to meet with a friend in New York for dinner next week. Maybe as soon as a couple of three to five years from now (timing reference changed due to a request from Microsoft), Microsoft officials think you'll be able to say to your PC "arrange a dinner with Joe in Manhattan on Thursday," and Tellme will recognize the query, link to your Facebook or LinkedIn social-graph information to discern which "Joe" you're likely looking to meet, compare your calendars, and use Bing to search for restaurants you both have indicated you "Like" on Facebook.

From a Tellme blog post on August 9, here's Microsoft's explanation as to what's coming with Bing/Tellme/social-graph integration:

"We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using."

This "understanding intent" work is part of Microsoft's push to make Bing's results more personalized, Bukshteyn said. And Tellme is playing a big role here because of the volume of speech data that it is collecting and using to improve the accuracy of its results. Tellme currently is processing 11 billion "utterances" per year, Bukshteyn said.

While the Tellme team focuses on enabling these longer-term scenarios, it will continue work it is doing on nearer-term projects, such as providing interactive voice response (IVR) to customers and partners. (Quite a few automated voice call-handling systems are powered by Tellme today.) And the team is working on adding a speech programming interface to Windows Phone so that developers can write apps that take advantage of the speech technology built into the phone platform. Bukshteyn didn't have a timeframe to share as to when Windows Phone developers might get this API support.

The Tellme team also is planning to add support for the Tellme speech cloud to Windows Azure at some point, so that developers will be able to build and support IVR-enabled apps and services running on Azure. Tellme's speech cloud doesn't run on Azure today; there's no firm timetable as to when or if Microsoft may move it to Azure, Bukshteyn said. But the Tellme service will be available to third-party developers regardless of whether Microsoft moves Tellme itself to Azure or not, he said.

Is speech the unsung part of Microsoft's NUI story? Will speech support give Microsoft products much of a leg up over those of its competitors?

Editorial standards