Microsoft making big speech bets with Windows 8, Bing

Microsoft making big speech bets with Windows 8, Bing

Summary: The Microsoft Tellme team is working with the Bing, Windows Phone, Kinect/Xbox, Azure and other Microsoft teams to add new speech-centric capabilities to Microsoft and third-party products in the coming year-plus.


Microsoft's consolidated speech technology unit, Microsoft Tellme, is working with a number of product teams inside the company to make speech recognition and understanding a key component of a number of next-generation Microsoft offerings.

Microsoft execs have been demonstrating publicly how Windows Phones currently can handle spoken queries. With Mango, Windows Phones will support even more speech functions, including speech-to-text and text-to-speech. And the Kinect sensor is going to get more sophisticated voice-command support this fall, enabling users to use Bing to search for movies, TV, music and other content via voice.

But within the coming year, even more Microsoft products and services are getting the speech recognition/understanding treatment.

Windows 7 today can recognize a limited set of spoken commands. But Microsoft will be taking this work further with Windows 8, said Ilya Bukshteyn, Tellme Senior Director of Sales and Marketing. Windows 8 on ARM and Intel slates will be able to recognize many speech commands, which makes sense given they won't be optimized for keyboard and mouse input. And because Windows 8 is "HTML-based," the HMTL5 speech tag could allow developers inside and outside Microsoft to create applications for Windows 8 that are speech-capable, Bukshteyn added.

As the Tellme team pushes beyond speech recognition and into conversational understanding, scenarios become even more interesting, Bukshteyn said. When CEO Steve Ballmer recently touted the ability of Bing to support complex natural-languge-query commands, he didn't explain what would make that magic happen. It turns out it's Tellme's voice technology, combined with social-graph information delivered via Windows Live, plus Bing's search functionality. ("Windows Live is a social graph hub for FaceBook, Twitter and LinkedIn," Bukshteyn explained.)

Microsoft posted on August 9 a video clip highlighting how this kind of conversational understanding could work (and showed this clip at the SpechTek conference keynote in New York today):

Example: Say you want to meet with a friend in New York for dinner next week. Maybe as soon as a couple of three to five years from now (timing reference changed due to a request from Microsoft), Microsoft officials think you'll be able to say to your PC "arrange a dinner with Joe in Manhattan on Thursday," and Tellme will recognize the query, link to your Facebook or LinkedIn social-graph information to discern which "Joe" you're likely looking to meet, compare your calendars, and use Bing to search for restaurants you both have indicated you "Like" on Facebook.

From a Tellme blog post on August 9, here's Microsoft's explanation as to what's coming with Bing/Tellme/social-graph integration:

"We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using."

This "understanding intent" work is part of Microsoft's push to make Bing's results more personalized, Bukshteyn said. And Tellme is playing a big role here because of the volume of speech data that it is collecting and using to improve the accuracy of its results. Tellme currently is processing 11 billion "utterances" per year, Bukshteyn said.

While the Tellme team focuses on enabling these longer-term scenarios, it will continue work it is doing on nearer-term projects, such as providing interactive voice response (IVR) to customers and partners. (Quite a few automated voice call-handling systems are powered by Tellme today.) And the team is working on adding a speech programming interface to Windows Phone so that developers can write apps that take advantage of the speech technology built into the phone platform. Bukshteyn didn't have a timeframe to share as to when Windows Phone developers might get this API support.

The Tellme team also is planning to add support for the Tellme speech cloud to Windows Azure at some point, so that developers will be able to build and support IVR-enabled apps and services running on Azure. Tellme's speech cloud doesn't run on Azure today; there's no firm timetable as to when or if Microsoft may move it to Azure, Bukshteyn said. But the Tellme service will be available to third-party developers regardless of whether Microsoft moves Tellme itself to Azure or not, he said.

Is speech the unsung part of Microsoft's NUI story? Will speech support give Microsoft products much of a leg up over those of its competitors?

Topics: Operating Systems, CXO, Emerging Tech, Microsoft, Software, Windows


Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • RE: Microsoft making big speech bets with Windows 8, Bing

    Yeah... ok. First improve the "XBOX, Pause" command so I don't have to repeat it 5 times. Then let's move on to "PC, setup a meeting with Mike. No, not that Mike. The tall one."
    • RE: Microsoft making big speech bets with Windows 8, Bing

      @amitballs well, how exactly do you say Pose...pouse...poss...pass...? =p
    • RE: Microsoft making big speech bets with Windows 8, Bing


      If you have to repeat it 5 times, it is probably not setup correctly. English is not my primary language and the Kinect almost never misses my commands.

      I suggest you re-run the Audio Setup and make sure you're using the speakers you commonly use.
    • RE: Microsoft making big speech bets with Windows 8, Bing

      @amitballs The issue here is probably the fact that we typically sit across the room from our tv's, granted, I don't know how your sitting area is set up. They should create little microphones you can put around your sitting area so the Kinect can hear commands better, especially when something is already playing.
      • RE: Microsoft making big speech bets with Windows 8, Bing

        @wixostrix@... The kinect sensor uses microphone array consisting of 4 seperate microphones which allow it to pick out voices from a room. If you have ever used the video chat feature people will notice it sounds like you are using a clear headset and not an omni directional mic like a laptop or webcam would have. This removes the need for "little microphones" around you.

        If you have properly set up your kinect it will not only noise-cancel any normal sounds in the room, so only your voice is heard (anywhere in the room), but also will be set up to cancel out the sound produced by your speakers playing from your xbox so that even when listening to loud music or movies your voice commands are heard clearly.

        If anyone is having problems being heard I would highly suggest re running the audio/voice set up, and possibly increasing your normal volume if you are having trouble when you are playing sounds at full blast.
      • RE: Microsoft making big speech bets with Windows 8, Bing


        I am aware of the 4 separate mics used in the Kinect sensor which is why I was surprised it hasn't worked ideally. Though, I haven't tried running the audio/voice set up. I wasn't around for the initial set up so it's probably needs the tuning. Thanks for the tip.
      • RE: Microsoft making big speech bets with Windows 8, Bing


        Do you have an accent? I've been playing with Kinect voice commands with the SDK and it cannot understand my Australian accent when I say 'pause'. Everything else works quite well (the Kinect SDK only officially supports US English I think).

        Is there a word that Americans pronounce the way an Australian or English person pronounces 'pause'?
      • Message has been deleted.

      • RE: Microsoft making big speech bets with Windows 8, Bing


    • RE: Microsoft making big speech bets with Windows 8, Bing

      Well, each Mike should have a last name don't you think? And if they don't it eill ask you which one just as Windows Phone already does today.
    • RE: Microsoft making big speech bets with Windows 8, Bing

      @amitballs - Yes. Very funny. 3-5 years from now. Make sure you have FB and Like a restaurant. Oh, and I'm sure the rest of the software will all have to be MS warez. LOL!
      The Danger is Microsoft
  • Dig that Tango action!

  • RE: Microsoft making big speech bets with Windows 8, Bing

    Would be nice if you could grab your Windows Phone and send speech commands to the Xbox even if you don't have Kinect.
  • RE: Microsoft making big speech bets with Windows 8, Bing

    "We see a future where the service will know you"

    I see a future where Microsoft and Google run the entire planet and "governments" and "human rights" are replaced by big business and marketing strategies.
    • RE: Microsoft making big speech bets with Windows 8, Bing

      @Sqrly Now that is indeed a scared thought. Governments need an overhaul, but to put our lives in the hands of Big corporations is the only thing that is worse.
      • RE: Microsoft making big speech bets with Windows 8, Bing


        I'm not really sure the difference...
      • RE: Microsoft making big speech bets with Windows 8, Bing

        @Rick_Kl The governments are already run by the big corporations.
    • RE: Microsoft making big speech bets with Windows 8, Bing

      @Sqrly add Apple to that list!
      • Apple is the one that many fear the most

        as they are the most controlling.
        Tim Cook
      • RE: Microsoft making big speech bets with Windows 8, Bing

        @Mister Spock - in the amount of goodness a consumer can have and the amount of wealth and investor can have. Sounds good to me ;-)
        The Danger is Microsoft