ie8 fix

Microsoft making big speech bets with Windows 8, Bing

By | August 9, 2011, 1:37pm PDT

Summary: The Microsoft Tellme team is working with the Bing, Windows Phone, Kinect/Xbox, Azure and other Microsoft teams to add new speech-centric capabilities to Microsoft and third-party products in the coming year-plus.

Microsoft’s consolidated speech technology unit, Microsoft Tellme, is working with a number of product teams inside the company to make speech recognition and understanding a key component of a number of next-generation Microsoft offerings.

Microsoft execs have been demonstrating publicly how Windows Phones currently can handle spoken queries. With Mango, Windows Phones will support even more speech functions, including speech-to-text and text-to-speech. And the Kinect sensor is going to get more sophisticated voice-command support this fall, enabling users to use Bing to search for movies, TV, music and other content via voice.

But within the coming year, even more Microsoft products and services are getting the speech recognition/understanding treatment.

Windows 7 today can recognize a limited set of spoken commands. But Microsoft will be taking this work further with Windows 8, said Ilya Bukshteyn, Tellme Senior Director of Sales and Marketing. Windows 8 on ARM and Intel slates will be able to recognize many speech commands, which makes sense given they won’t be optimized for keyboard and mouse input. And because Windows 8 is “HTML-based,” the HMTL5 speech tag could allow developers inside and outside Microsoft to create applications for Windows 8 that are speech-capable, Bukshteyn added.

As the Tellme team pushes beyond speech recognition and into conversational understanding, scenarios become even more interesting, Bukshteyn said. When CEO Steve Ballmer recently touted the ability of Bing to support complex natural-languge-query commands, he didn’t explain what would make that magic happen. It turns out it’s Tellme’s voice technology, combined with social-graph information delivered via Windows Live, plus Bing’s search functionality. (”Windows Live is a social graph hub for FaceBook, Twitter and LinkedIn,” Bukshteyn explained.)

Microsoft posted on August 9 a video clip highlighting how this kind of conversational understanding could work (and showed this clip at the SpechTek conference keynote in New York today):

Example: Say you want to meet with a friend in New York for dinner next week. Maybe as soon as a couple of three to five years from now (timing reference changed due to a request from Microsoft), Microsoft officials think you’ll be able to say to your PC “arrange a dinner with Joe in Manhattan on Thursday,” and Tellme will recognize the query, link to your Facebook or LinkedIn social-graph information to discern which “Joe” you’re likely looking to meet, compare your calendars, and use Bing to search for restaurants you both have indicated you “Like” on Facebook.

From a Tellme blog post on August 9, here’s Microsoft’s explanation as to what’s coming with Bing/Tellme/social-graph integration:

“We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using.”

This “understanding intent” work is part of Microsoft’s push to make Bing’s results more personalized, Bukshteyn said. And Tellme is playing a big role here because of the volume of speech data that it is collecting and using to improve the accuracy of its results. Tellme currently is processing 11 billion “utterances” per year, Bukshteyn said.

While the Tellme team focuses on enabling these longer-term scenarios, it will continue work it is doing on nearer-term projects, such as providing interactive voice response (IVR) to customers and partners. (Quite a few automated voice call-handling systems are powered by Tellme today.) And the team is working on adding a speech programming interface to Windows Phone so that developers can write apps that take advantage of the speech technology built into the phone platform. Bukshteyn didn’t have a timeframe to share as to when Windows Phone developers might get this API support.

The Tellme team also is planning to add support for the Tellme speech cloud to Windows Azure at some point, so that developers will be able to build and support IVR-enabled apps and services running on Azure. Tellme’s speech cloud doesn’t run on Azure today; there’s no firm timetable as to when or if Microsoft may move it to Azure, Bukshteyn said. But the Tellme service will be available to third-party developers regardless of whether Microsoft moves Tellme itself to Azure or not, he said.

Is speech the unsung part of Microsoft’s NUI story? Will speech support give Microsoft products much of a leg up over those of its competitors?

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Mary Jo has covered the tech industry for more than 25 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Disclosure

Mary-Jo Foley

Freelance journalist/blogger Mary Jo Foley has nothing to disclose. WYSIWYG (what you see is what you get). I do not own Microsoft stock or stock in any of its partners or competitors. I have no business ventures that are sponsored by/funded by Microsoft or any of its partners or competitors.

Biography

Mary-Jo Foley

Mary Jo Foley has covered the tech industry for 25 years for a variety of publications, including ZDNet, eWeek and Baseline. She has kept close tabs on Microsoft strategy, products and technologies for the past 10 years. In the late 1990s, she penned the award-winning "At The Evil Empire" column for ZDNet, and more recently the Microsoft Watch blog for Ziff Davis.

Got a tip? Send her an email with your rants, rumors, tips and tattles. Confidentiality guaranteed.

56
Comments

Join the conversation!

Just In

RE: Microsoft making big speech bets with Windows 8, Bing
jackson1984-24316069205748857739440257893812 9th Oct
Privileged i stumbled on , this remarkable webpage, could make nfl jerseys 2012 assured to bookmark it so i can arrive to routinely.
Yeah... ok. First improve the "XBOX, Pause" command so I don't have to repeat it 5 times. Then let's move on to "PC, setup a meeting with Mike. No, not that Mike. The tall one."
@amitballs well, how exactly do you say Pose...pouse...poss...pass...? =p
@amitballs

If you have to repeat it 5 times, it is probably not setup correctly. English is not my primary language and the Kinect almost never misses my commands.

I suggest you re-run the Audio Setup and make sure you're using the speakers you commonly use.
@amitballs The issue here is probably the fact that we typically sit across the room from our tv's, granted, I don't know how your sitting area is set up. They should create little microphones you can put around your sitting area so the Kinect can hear commands better, especially when something is already playing.
@wixostrix@... The kinect sensor uses microphone array consisting of 4 seperate microphones which allow it to pick out voices from a room. If you have ever used the video chat feature people will notice it sounds like you are using a clear headset and not an omni directional mic like a laptop or webcam would have. This removes the need for "little microphones" around you.

If you have properly set up your kinect it will not only noise-cancel any normal sounds in the room, so only your voice is heard (anywhere in the room), but also will be set up to cancel out the sound produced by your speakers playing from your xbox so that even when listening to loud music or movies your voice commands are heard clearly.

If anyone is having problems being heard I would highly suggest re running the audio/voice set up, and possibly increasing your normal volume if you are having trouble when you are playing sounds at full blast.
@BucksterMcgee

I am aware of the 4 separate mics used in the Kinect sensor which is why I was surprised it hasn't worked ideally. Though, I haven't tried running the audio/voice set up. I wasn't around for the initial set up so it's probably needs the tuning. Thanks for the tip.
@wixostrix@...

Do you have an accent? I've been playing with Kinect voice commands with the SDK and it cannot understand my Australian accent when I say 'pause'. Everything else works quite well (the Kinect SDK only officially supports US English I think).

Is there a word that Americans pronounce the way an Australian or English person pronounces 'pause'?
0 Votes
+ -
Message has been deleted.
MasseyJanette Updated - 10th Aug
  • Flagged
@allusernamestaken

"paws"
@amitballs
Well, each Mike should have a last name don't you think? And if they don't it eill ask you which one just as Windows Phone already does today.
0 Votes
+ -
@amitballs - Yes. Very funny. 3-5 years from now. Make sure you have FB and Like a restaurant. Oh, and I'm sure the rest of the software will all have to be MS warez. LOL!
0 Votes
+ -
Dig that Tango action!
Joe_Raby 9th Aug
....
Would be nice if you could grab your Windows Phone and send speech commands to the Xbox even if you don't have Kinect.
"We see a future where the service will know you"

I see a future where Microsoft and Google run the entire planet and "governments" and "human rights" are replaced by big business and marketing strategies.
@Sqrly Now that is indeed a scared thought. Governments need an overhaul, but to put our lives in the hands of Big corporations is the only thing that is worse.
@Rick_Kl

I'm not really sure the difference...
@Rick_Kl The governments are already run by the big corporations.
@Sqrly add Apple to that list!
0 Votes
+ -
@apetti
as they are the most controlling.
0 Votes
+ -
@Mister Spock - in the amount of goodness a consumer can have and the amount of wealth and investor can have. Sounds good to me wink
@Sqrly You see a future? Man you are too late to the party. They are not only already running our lives and know everything about us, they are also running the governments. Do you know how many lobbysts Google has in Washington? Last I heard Eric Schmidt is in some commitie. And you may not not it, but these companies are hands-in-gloves with NSA and the likes. So welcome to the 21st century where privacy have gone the way of the Dodo's. Like the saying goes, "We are the government, we know everything."
0 Votes
+ -
Democracy is being replaced by...
scH4MMER Updated - 10th Aug
"Capitolistocracy" - a government of the Coroprations, by the Corporations, for the Corporations.
0 Votes
+ -
RE: Microsoft making big speech bets with Windows 8, Bing
The Danger is Microsoft Updated - 13th Aug
@eInfinity - uh....the people still have the power if they would take their heads out of where-ever they have shoved them and take some responsibility in reading and understanding what is going on and who to vote for. Oh, and then, OhMyGosh! Actually VOTE instead of thinking 'I hate this so I will be absent'.

Whiners deserve what they get! Just like Winners deserve that they get.
Great if it happens but MS has a long history of over-promising and under-delivering.
0 Votes
+ -
When will this product ship?
HollywoodDog 9th Aug
Before or after iPhone 5? Because currently my plan is to get an iPhone 5 in September.
@HollywoodDog
why did you take time to post your response?

Given your feelings on Microsoft, your reply is of no relevance or importance.

plain
0 Votes
+ -
I'm wondering if this is a thing
HollywoodDog 10th Aug
@Mister Spock ... where Ballmer wants to quiet restive institutional shareholders displeased with Microsoft's performacne by promising to deliver a HAL 9000 one of these years. Just don't ask for any management changes and look at this thing we might someday deliver.
0 Votes
+ -
@Mister Spock - I don't know. It's nice to know my Apple stock is going up wink
0 Votes
+ -
Message has been deleted.
HollywoodDog Updated - 10th Aug
0 Votes
+ -
Because it's creepy, that's why
Robert Hahn 9th Aug
I'm sure Microsoft has done plenty of focus group research on this stuff, but I'm still surprised they are investing heavily in this. Everything I know tells me that human beings secretly hate machines that talk, and they hate machines that listen even more. The reaction to robocalls and robo-receptionists is visceral, instant, and not happy. I don't know why people are suddenly going to develop a fondness for it.
0 Votes
+ -
@Robert Hahn
as what is being described here, so their feelings on the matter would logically be quite different as well.
plain
@Mister Spock
The robocalls were merely an example. Talking ATMs are unpopular as well. But your inability to generalize from the specific is well known, so this will I'm sure go over your head as well.
@Mister Spock ... as in "Your call is important to us. Please continue to wait."
0
"I'm sorry I did not understand your response."
0
"I'm sorry I did not understand your response."

etc.
@Robert Hahn ... People hate mechanical sounding things, because they don't sound natural, and they don't understand or interpret things. Moreover, they barely even "hear" what you tell them.

Robocalls, as you put it, are not necessarily the problem as what they are calling about... they're usually automated telemarketing calls, which officially makes that company or service doubly hated by default.

Roboreceptionists, more commonly known as automated attendants or interactive voice response systems, aren't usually hated because their routing programs are usually designed by monkeys who don't know the proper department call structure.

I have to insist, however, that as long as they are kept simple and functional, they aren't a bad thing, necessarily. For example, at my hospital, I have Microsoft Exchange 2010 running my voicemail system. It is setup with a single menu tree... "Press 1 or say In-Patient for the In-Patient Unit; Press 2 or say Out-Patient for the Out-Patient Unit; Press 3 or say Billing for the Billing department; Press 4 or say Scheduling for the Scheduling department." That's it. No sub-trees and no follow-ups, it just goes straight to a person's desk in those areas. I have even tried fooling it with accents, and it's remarkably accurate.
0 Votes
+ -
@GoodThings2Life - MS Exchange? Fail!
I keep thinking of that 30 Rock scene where Jack is pitching a voice command TV to his boss. When the actors on-screen say "mute," the TV mutes. When a talk show audience applauds, the TV turns itself off. When Jack finally exclaims "crap," the channel changes to "Keeping up with the Kardashians."
0 Votes
+ -
Message has been deleted.
monclera Updated - 10th Aug
0 Votes
+ -
Message has been deleted.
monclera Updated - 10th Aug
0 Votes
+ -
Message has been deleted.
kris_stapley@... Updated - 10th Aug
Do you really want everybody in your office chattering away at their computers? What happens when my co-worker in the next cube says "Delete all files!" and my computer hears him? I'll keep my keyboard and mouse, thank you.

The only place I see this being useful is hands-free phone and car stereo operations.
@thensley@... "Computer, please delete all of my coworkers' Excel documents, convert Word files to image-based PDF's, and delete original copies, then reply all to all emails with **** you all, I'm outta here." ... Surely someone will take that into consideration, lol.
Our Microsoft rep finally talked me into using a Windows 7 phone (gave me a phone). I am an iPhone user and have a Blackberry. For the last 3 weeks, I have given up my iPhone and Blackberry and to be honest, have not missed either. The speech recognition on the Windows 7 phone is very good and to date has not missed a call "fill in blank" command yet. I even tried it on names I figured it would not get and I accidentally saw the interpretive side when I once left out my wifes last name and just "call at home" and it got it. I don't think the market has quite realized that the next MS phones may in fact be "cool" phones which is not something MS has ever been.
@tgroom@... I agree. I just switched to an HTC Arrive after using the HTC EVO for over a year. It has been an absolutely joy to use it. It's fast. It's simple. It let's me get things done and move on.
0 Votes
+ -
@tgroom@... Bullsheit! Your check will be reduced by 50% because you failed to convince anyone.
* Win 2000 had a great magazine ad that trashed Win95. Most companies would be laughed out of existence, if they openly told people their earlier products were garbage... (win2000 also put the GUI in ring 3 as opposed to ring 0 where NT 4 had it... result = less stability...)

But that's ancient history... Microsoft can live on their brand name alone.
0 Votes
+ -
How else would you do it...
Joe_Raby 10th Aug
@HypnoToad72

...when the only competition you have is yourself?
I still freak people out when I 'talk to my Phone' (Windows 7) and it looks stuff up for me. It works rather well - looking forward to the improvements in Mango.

Kinect works fairly well for me, unless my air conditioner comes on and blows near it or some other unexpected sound. I wish I could train it to pause when the phone rings.
0 Votes
+ -
@Zappykins - You freak my out that you even USE a W7 phone. Poor barstard.
0 Votes
+ -
then otherwise, STFU!
0 Votes
+ -
Paranoid much?
HollywoodDog 11th Aug
@MSFTWorshipper ... Apple has no chance.

"There's no chance that the iPhone is going to get any significant market share. No chance. It's a $500 subsidized item. They may make a lot of money. But if you actually take a look at the 1.3 billion phones that get sold, I'd prefer to have our software in 60% or 70% or 80% of them, than I would to have 2% or 3%, which is what Apple might get."

Now who said that. Oh yeah, a Mr. Stephen A. Ballmer.
0 Votes
+ -
RE: Microsoft making big speech bets with Windows 8, Bing
jackson1984-24316069205748857739440257893812 9th Oct
Privileged i stumbled on , this remarkable webpage, could make nfl jerseys 2012 assured to bookmark it so i can arrive to routinely.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix
Click Here

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix