Touch isn't Microsoft's only next-generation interface technology

By | August 3, 2010, 1:18pm PDT

Summary: While Microsoft’s mult-touch capabilities (and lack thereof) are in the news daily, the company’s speech engine and algorithms don’t often merit a mention. At the SpeechTEK conference in New York City on August 3, Microsoft officials attempted to explain what the Redmondians have coming in the voice recognition and synthesis space — without going so far [...]

While Microsoft’s mult-touch capabilities (and lack thereof) are in the news daily, the company’s speech engine and algorithms don’t often merit a mention.

At the SpeechTEK conference in New York City on August 3, Microsoft officials attempted to explain what the Redmondians have coming in the voice recognition and synthesis space — without going so far as to announce undisclosed products. And yes — before you ask — there is a cloud angle, like there seems to be for every Microsoft product and technology thesee days.

Zig Serafin, the General Manager of the “Speech at Microsoft” group, outlined for SpeechTEK attendees Microsoft’s evolution in speech, a technology area that has been part of the natural user interface (NUI) focus for the Softies since 1993.

In 1999, Microsoft made its first speech-specific acquisition, the speech-toolkit vendor Entropic. In 2007, Microsoft spent $1 billion to buy speech-recognition vendor TellMe. But it wasn’t until a little over a year ago that Microsoft consolidated its various speech-focused products and technologies into the Speech at Microsoft team, whose charter is “bringing speech to everyday life,” Serafin said.

These days, Microsoft execs don’t look at speech as a standalone product or technology. They see it as an enabler of other products. They also see it as an increasingly integrated piece of Microsoft’s overall NUI plan.

Over the next 12 months, Microsoft will be bringing to market four new products that use its various speech technologies. The four:

Auto entertainment systems, like the Kia UVO announced at the Consumer Electronics Show at the start of this year. The first cars with UVO are due out this summer.

Windows Phone 7 devices, which have TellMe’s speech technology is embedded right into the device shell. The phones will allow users to control dialing and search using voice, and integrated text-to-speech means the phones also will be able to “talk back”  to users. (This is an example of what Microsoft execs mean when they talk about an “Internet of things” that connects up to the cloud)

Kinect sensors for Xbox incorporate voice-recognition capabilities, allowing users to pause, play, advance and stop games, TV shows and movies via voice commands

Corporate productivity products. There are more than 100 million Exchange users today who can make use of voice mail preview, voice translation and other voice-powered technologies that are built into the product (and will be built into Exchange Online, as Microsoft makes those features available to cloud users). Meanwhile, Microsoft’s TellMe product currently is handling 2.5 billion calls a year, making use of TellMe’s cloud back-end. (Interestingly, Serafin didn’t mention Office Communications Server 14, which Microsoft is touting as its entry into the “enterprise voice” market.)

In the longer term, Microsoft is trying to help answer the question “When an we deploy systems with a human level of conversational understanding?” said Larry Heck, Chief Speech Scientist in the Speech at Microsoft group.

Heck told SpeechTEKers that there are three drivers that will help the company address this question:

  • Data and relevant machine-learning algorithms
  • Cloud-computing platforms, like Azure and TellMe Network’s back-end platform
  • Search

There needs to be a lot more data collected on user-machine interaction before Microsoft and others can realistically expect machine interfaces, including speech, to be more natural, Heck said. NUIs can help provide ubiquity, by enabling users to access data wherever they are, he acknowledged. But currently entry points like search engines aren’t doing much to help advance work in making computers and devices more conversational. Users are accustomed to typing in a few keywords, rather than naturally phrased queries, but voice search on mobile devices more closely mimics human conversation, Heck explained.

Heck told attendees to “stay tuned” for new Microsoft products coming in the next few years that will reflect advances in conversational expression and understanding. (I’m guessing something like the client-plus-cloud patient-information systems Microsoft demonstrated at its Financial Analyst Meeting last week might be among those products to which Heck was alluding.)

Anywhere else you think Microsoft could, should or will incorporate speech recognition or synthesis technologies?

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Mary Jo has covered the tech industry for more than 25 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Disclosure

Mary-Jo Foley

Freelance journalist/blogger Mary Jo Foley has nothing to disclose. WYSIWYG (what you see is what you get). I do not own Microsoft stock or stock in any of its partners or competitors. I have no business ventures that are sponsored by/funded by Microsoft or any of its partners or competitors.

Biography

Mary-Jo Foley

Mary Jo Foley has covered the tech industry for 25 years for a variety of publications, including ZDNet, eWeek and Baseline. She has kept close tabs on Microsoft strategy, products and technologies for the past 10 years. In the late 1990s, she penned the award-winning "At The Evil Empire" column for ZDNet, and more recently the Microsoft Watch blog for Ziff Davis.

Got a tip? Send her an email with your rants, rumors, tips and tattles. Confidentiality guaranteed.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
46
Comments

Join the conversation!

Just In

soyas
dLySin 16th Apr
http://www.arsizbela.com thanks admin good blog
Any word on ink?
0 Votes
+ -
Contributr
Ink
Mary Jo Foley 3rd Aug 2010
No mentions of ink today at speechtek. MJ
@Mary Jo Foley I'd be happy if sohbet got rid of the ribbon and gave us chat users the pull-down menus that existed before Office 2007 for forum . I like getting Outlook for the portal . It has grown on me since my new Windows izlesene box had to use Outlook since chat sohbet was removed. sohbet odalari email client is mynet sohbet cinsel sohbet weak.
@angarita calvo Switches and routers are also one of the most energy dense, heat generating devices in the DC. Eliminating them from the architecture not only improves network throughput but also reduces cooling requirements and energy consumption which are major costs factors for this scale of DC. tabu oyna yeni oyunlar ben 10
"When an we deploy systems with a human level of conversational understanding?"

Depends on how you define it, I guess.

If you mean something that is "good enough," I think we can reach that via something like ALICEbot-like technologies.

Thing is, ALICE doesn't actually parse the words and assign meanings to them or anything like that - ALICE is just a matching engine with pre-determined responses.

But it works, and perhaps we could use it for something that resembles Star Trek computers, where you can pattern match speech well enough to create a command system.

For something more like actual human level intelligence, however, I don't think that's happening any time soon.
0 Votes
+ -
Depends in the human...
jasonp@... 4th Aug 2010
I'd say computers have far exceeded the level of intelligence of some people I know.
@jasonp@... yes you are right that the computer exceeded te level of intelligence its soo ture. Online Schools | university degrees | Associate Degrees
0 Votes
+ -
Touch is another dead-end
tonymcs@... 3rd Aug 2010
While touch may be useful for small devices, its limitations become evident with larger devices. There is no action at a distance, your hand is all over the screen and fine control is a joke. Touch is the equivalent of finger painting rather that painting complex art.

Voice, body and face recognition with gesture based computing means the computer is doing the work. Glasses with head-up displays and other wearable interfaces and inevitably implants will all offer a new way to use computers.

When Scottie faced a Mac in one of the Star Trek movies, he first tried talking and then talking into the mouse before he realised he had to use the keyboard. It will not be long before people will be staring at a computer in bewilderment before finally realising they have to touch it to make it work - very quaint wink
0 Votes
+ -
I do recall talk of that on board the ship
Mister Spock 3rd Aug 2010
they made fun of Mr Scott for weeks.
@Mister Spock

The same advertisers that brought us Seinfeld (lets play footsie and wiggle our shorts Bill), Laptop Hunters (that got all sorts of bad press for lies (incorrect pricing and customer never actually went into an Apple store) and portraying windows as "cheep"), And Windows 7 was Macs idea (where a college kid who can't get laid and get kicked out of his dorm room (by his Mac roommate) has to watch TV in the hall because he doesn't even have a friend whom he could visit).

I bet Kinect will cinsel sohbet not be magical either.
IE8 had multi-process architecture before Chrome launched, and in fact sohbet was the first browser to announce the feature. gay sohbet That's why both Chrome and IE use far more memory than the other browsers. mynet sohbet Chrome is a bit more strict than IE, IE will allow tabs with the same integrety level to mynet sohbet share a single process. mynet mynet sohbet Outside of that MS beat Google to the punch. mynet Good try though. indirmeden film izle If MS came out with touch UIs for at least Word, Excel, forum OneNote, and Outlook, with super slick, and highly youtube effective integrated virtual keyboards, that would be mind blowing! I think canli sohbet that would be like lighting a rocket under PC touch computing. bedava film izle
@Mister Spock yes they made fun its really interesting and also i appreciate to him for this. bachelors Degree | Masters Degrees
0 Votes
+ -
It was Star Trek 4 The Way Home
Bill4 3rd Aug 2010
@tonymcs@... Also known as Star Trek 4 Save the Whales. One of the all time greats.
@tonymcs@... So throw your remote control away and get up and change the channel...throw away your garage door opener and open it by going in the house and opening it by pushing the button on the wall.

The only difference is you're touching a remote, not a touch screen.

You're the dead end.
@tonymcs@... I would argue that they talk because it is a show or movie.

Remember the context of Star Trek. There is Jordie or Scott and they are the only ones talking. And the computer seems to focus on THEM! Not the minions that are floating around, who don't happen to be talking.

So if Jordie or Scott were not talking you would have to guess what they were doing. Or they would have to have characters saying, "hey Mr Scott what are you doing?" By having voice activated computers the show can offload the responsibility of the plot to the computer, which is you.
0 Votes
+ -
@tonymcs@... make product development plans based on what they saw in thirty and forty year old movies.
0 Votes
+ -
Not sure speech is all that useful
Fred Fredrickson 3rd Aug 2010
I'm sure there are some scenarios where a voice interface is OK, but using it as a general UI just won't happen. I was first involved in speech control of computers in 1975 as a user. The biggest issue then (other than the laborious training required to get the computer to recognise voice commands) was speed. It was faster to push a button (even several buttons) than speak the equivalent command, even if it was a one syllable word.

Later, I started using voice with a Mac in the early 1990s. There have been various voice control efforts on Windows too over the decades, I used one in the late 1990s. But they just aren't appealing, a keyboard and mouse (or other pointing device like touch) is very much faster and more efficient. And makes far less noise. Go into a call center sometime and listen to the cacophony of sound. That is what busy offices will be turned into if they use voice control for their PCs.

As for games, if you have 4 kids playing a game of Super Mario Cart, which voice will the console listen to? How will it know commands from general "conversation"? Voice dialling has been available on many phones for years, but it's never been a killer feature, or even a sought after feature. I had a phone capable of voice dialling for years and never used it (voice dialling that is).

I think voice control systems are interesting and great for research, likely there are a few niches they fill perfectly. But they are few and far between, so don't expect a voice UI to shake the world just yet.
@Fred Fredrickson

Agree in most port. You what I hate the most in making a phone call? No real people answers, only automatic voice menus. You do one thing in 3 minutes while the same thing can be done in 3 seconds on a computer. In order to show you the voice menus, the machine read one by one, then you have to listen to them until the only you want come out. Then second level, the third. After a lot of frustration, it fanally forwards you to a real person. The person has to ask you the same information again.

However, it might be usual in a car, because voice does not require visual interaction, it is safe. Same reason why you are allowed to listen to music while driving but not watching TV.
@Fred Fredrickson - here i would like to see this is for closed caption for hard of hearing - IBM's ViaScribe is useful for colleges, courtrooms, anyplace where one might want verbatim notes. This software recognises pauses and er's and ums etc. But, most of all, i would like to see this linked to a translater that takes spoken words and translates into a sign-language animatron. i guess how useful it is depends on where its being used.
@Fred Fredrickson
>As for games, if you have 4 kids playing a game of Super >Mario Cart, which voice will the console listen to?
Kinect on xbox solves this in terms of motion. Each player is mapped so it knows when its actually you moving and not someone behind you walking by. Guessing they will map your voice as well eventually.

>Voice dialling has been available on many phones for >years, but it's never been a killer feature, or even a >sought after feature.
Depends on the appropriateness of the feature I guess. I used to use live search on my previous non "smartphone" I couldn't stand typing in long cities and then names of places I needed to search for. But the voice input was a godsend. My current smartphone doesn't do that.
I've seen a couple of speech recognition demos for the new Windows Phone 7 and I have to say, I am really impressed with what MS have done. It's not just because it works well, but the fact that it is integrated into the OS. Calling contacts, opening an application, searching for 'Pizza' within your locality on the internet! - It's all very impressive. This is another feature that sets WP7 apart from iOS and android.
0 Votes
+ -
@Poppets
One recent T-Mobile ad for the MyTouch 3G (running Android, of course) features a kid wanting ice cream in the middle of nowhere - and his dad says there's none - but he pushes the "genius" button on his MyTouch and says "Find Ice Cream Shops!" and it does. Mamma gets into the act - demanding to know where the outlet malls were located.
0 Votes
+ -
RE: Touch isn't Microsoft's only next-generation interface technology
de-void-21165590650301806002836337787023 4th Aug 2010
@Wolfie2K3 - I've seen both the Android voice features and WinPhone7 features. While the Android features are "cool" and "useful", they're not anywhere near as deeply and seamlessly integrated into the core of the shell. WinPhone7's voice support is just stunningly simple and ubiquitous.
0 Votes
+ -
Thinking...
CustomComputers 4th Aug 2010
Having used speech recognition in it's earlier forms with add on applications, and now seeing it's incorporation into Windows 7, in my opinion it is just in it's infancy. Three areas come to mind where incorporation would be beneficial if not mandatory.

(1) Bing Search... whereby one could search using voice commands of keywords could be the ingredient to set MS apart from the competition.

(2) Electronic Health Records ... as this technology gains acceptance in the medical field it is a given all doctors,nurses, transcriptionists,etc would find speech to digital data mandatory on smart phone OS and tablets of choice. The M Pad to come should include the feature!

(3) The Microsoft automated Customer Service areas ie: MS Answers, Technet database and forums, and the enlarging MS Fix-it solutions center may well benefit from the ability to find information with voice commands. Would put a more human face on finding technical assistance without the cost involved with phone support.
0 Votes
+ -
Electronic health records...
jasonp@... 4th Aug 2010
and voice recognition aren't a very good match. I hear transcriptionists complain about having to understand what doctors are trying to say. One hospital I've worked with in the past got rid of 80% of their transcription staff after going with Dragon, then wound up just outsourcing the whole transcription department rather than re-hire everyone back to deal with the 65% accuracy rate from the speech recognition software.
0 Votes
+ -
Just what I want; voice jail for the car
HollywoodDog 4th Aug 2010
"I'm sorry, I did not understand your response. To increase the air conditioning, say 'AC up'..."
etc.
I'm all for advances in conversational expression and understanding, I just hope that Microsoft has learned it's lesson from the Kin. I'm worried that Microsoft will market these new products before they're really ready and they will go the way of the Kin - DOA
0 Votes
+ -
takes the clapper to a whole new level
sparkle farkle 4th Aug 2010
the downside is a hard copy of your conversations, which could find their way onto the google, or bing. what a wonderful world where your every spoken word could be searched.....
0 Votes
+ -
I remember back in 1982...
jasonp@... 4th Aug 2010
having a speech emulator for the Commodore Vic-20. You could type in a word and it would attempt to verbalize it...usually poorly, but worth hours of entertainment for a 7th grader ("I filet fish" was a classroom favorite thanks to a popular LJS commercial at the time). We don't seem to have progressed very much in the last 28 years with voice technology.
I thought speech recognition capabilities "flat-lined" around 2001? We're at about an 80% accuracy rate and stable. I believe human discrimination is at about 98%.

If this is true, then I can see where speech recognition will be useful for phones and cars, but not in the general enterprise nor in video games. A really good Starcraft pro-player performs about 200 clicks a minute. That's like 3+ actions per second. Speak that.

Remote control? Sure.

"T.V. on."
"What honey?"
"Nothing, I was telling the T.V. to turn on."
"Oh." Wife comes in the room, "Honey, will you turn the T.V. off when you finish watching this show?"
The T.V. hears, "T.V. off" and cuts off. Right when Bill was going to reveal what is so special about Sookie too! Darn it, I didn't have my DVR set!

Speech recognition technologies have been around for a long time, and as far as I know, Microsoft has been investing in speech recognition for a while. We haven't seen much in results, and I believe this situation is because researchers just can't get really good results.
Personally, I can't see ever using voice for a general user interface for a computer. I'm sitting here staring at half a dozen application windows spread across two monitors, with literally hundreds of things I could select and manipulate. It would take me a hundred times longer to explain to a computer that I wanted to select the 45th sentence in the second column of the right pane of the mail application on my 2nd monitor and copy it into the 457th character position of the word processor document open on my 1st monitor. It takes 2 seconds to do it with a mouse and keyboard. It would take forever to do it by voice.

If they're talking about controlling home/office computers 100 percent by voice, Microsoft is heading the wrong direction and wasting time/money. Controlling some limited functions, I can certainly see. Dictating text, sure - we can do that already. Editing, manipulating, combining, splitting, or any other tasks just don't work with voice. Remember Xerox tried a zillion different ways to manipulate on screen information before inventing the mouse. It just works.
What Miss Foley??? "company?s (MSFT) speech engine and algorithms don?t often merit a mention."

Are you serious?? You probably tell this to joke or you never try the Microsft Sync system in a Ford car!! Because After trying this voice recognition system from Microsoft you can clearly understand that MSFT has the best voice recognition system actually!!! Sync is smooth and understand clearly what you say even in noise environement ( in a Car) with open windows with the noise of wind and other car and truck that run near you....

Microsoft has an exceptional voice recognition technoloies actually!!!!
0 Votes
+ -
Another MicroFAIL project....
MSFTWorshipper 4th Aug 2010
*yawn*
1) It's annoying.
2) It's annoying.
3) It's annoying.
0 Votes
+ -
Thinking of it the wrong way
mw_griffith@... 4th Aug 2010
Sure, speech recognition is great for telling the phone to call "Sam" but that's a mere convenience. Speech recognition has to be less about automating simple tasks with simple one or two word commands and more about using it as a robust comprehensive input method. The real practical application is in professions such as medical, legal, etc. to replace transcription. Today, we are still using the methods for dictating notes that we used 20 years ago. Someone talks into a recorder and someone else types what was just spoken. If speech recognition can evolve to meet the needs of these professions, it has a real place in technology. Otherwise, if it is to be used to automate simple tasks, it is yet another gadget for techies to play with.
What's next? A Tourette's Syndrome virus.
The impact of affordable, reliable speech-to-text-and-back technology - if you can stop being dazzled by the pretty technology and flashing LEDs - is going to cut a swathe through the jobs market like a combine harvester through a wheat field. Particularly right now, as corporations will use the excuse of the recession to "pink-slip" anyone whose job can be done cheaper by a machine. Most switchboard operators have long since gone the way of last years snow, replaced by "Press 1 for sales...." recordings. Wave goodbye to the last few remaining. Anyone whose job involves simple, script-driven conversation with strangers over the telephone had better start looking into alternative employment. And that's a LOT of people... sadly, many/most of them women. As usual, the USA leads the lemmings in the race to the cliff edge: a social system geared towards creating job losses (sorry, "productivity increases" - but they're too often the same thing) and the LEAST able to deal with the unemployment that results.
0 Votes
+ -
I'd use voice activation on my iPhone if it supported both Japanese and English in either setting.

A large portion of the North American population is bilingual, and their phone books are likewise bilingual.
0 Votes
+ -
The last time I tried to work with Speech on Windows 7, I found that it was not possible to enable Speech for other applications and disable it for Windows 7 command input (it is either enabled, or disabled for all applications including Windows). A support query on how to do this remains unanswered. I was writing the app using all Microsoft tools (Visual Studio 2008, SAPI 5.4) so there should have been no problems. The same application worked well on XP with SAPI 5.1 since speech is not "integrated" into XP.
I believe Rudy de Haas (Murphy) has room in his museum at the moment - why not apply?

I bet Kinect will not be magical either.

sesli sohbet could have made sohbet et magical if they wanted to, they could have hired a man in a turtleneck to give a magical presentation and BAM... a islami sohbet Event (maybe a mirc sohbet product too, who knows.)

video izle would have been much cheaper to cet too. No wonder cinsellik sohbet overtaking chat , dini sohbet know where NOT to spend their money. chat siteleri cinsel sohbet mirc mtrc
Thank you admin wordpress for themes.Very beatiful izle
dizi izle
film izle
filim izle
youtube video izle
Face
Facebook
Ask siirleri
Konya chat
Ask
chat yap
sohbet et
Hi,According to me, All the four products have better technology with good features as i always see in microsoft products.This is an intersting and useful article.Thanku so much!!
DUI Attorney
Super bonne continuation blog I think they must have to fighting hard in the last match because it's really deciding them to pass to round of last 16. Godd luck for them.
SEO Company
One way link building
0 Votes
+ -
RE: Touch isn't Microsoft's only next-generation interface technology
jackson1984-24316069205748857739440257893812 9th Oct
What i get hold of troublesome might nfl jerseys 2012 possibly be to identify a web site which could seize me for merely a instant but your running a blog website differs. Bravo.
0 Votes
+ -
RE: Touch isn't Microsoft's only next-generation interface technology
tomlin21-24319035676893835085146735905770 11th Oct
I appeal you sharing this blogging site web page short article.Countless nfljerseys many thanks Yet again. Keep on to maintain composing.
0 Votes
+ -
RE: Touch isn't Microsoft's only next-generation interface technology
dfwekrwe5301-24353688597717513211197799794676 Updated - 10th Nov
Involved free time to make sure you investigate numerous evaluations, but I effectively sought after the report. The nflshop software demonstrated cheap Clay Matthews Jerseys last Chiefly useful for me and I am self-assured cheap Aaron Rodgers Jerseys to most unquestionably the commenters in the following
0 Votes
+ -
soyas
dLySin 16th Apr
http://www.arsizbela.com thanks admin good blog

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix