Touch isn't Microsoft's only next-generation interface technology

Touch isn't Microsoft's only next-generation interface technology

Summary: While Microsoft's mult-touch capabilities (and lack thereof) are in the news daily, the company's speech engine and algorithms don't often merit a mention.At the SpeechTEK conference in New York City on August 3, Microsoft officials attempted to explain what the Redmondians have coming in the voice recognition and synthesis space -- without going so far as to announce undisclosed products.

SHARE:
44

While Microsoft's mult-touch capabilities (and lack thereof) are in the news daily, the company's speech engine and algorithms don't often merit a mention.

At the SpeechTEK conference in New York City on August 3, Microsoft officials attempted to explain what the Redmondians have coming in the voice recognition and synthesis space -- without going so far as to announce undisclosed products. And yes -- before you ask -- there is a cloud angle, like there seems to be for every Microsoft product and technology thesee days.

Zig Serafin, the General Manager of the "Speech at Microsoft" group, outlined for SpeechTEK attendees Microsoft's evolution in speech, a technology area that has been part of the natural user interface (NUI) focus for the Softies since 1993.

In 1999, Microsoft made its first speech-specific acquisition, the speech-toolkit vendor Entropic. In 2007, Microsoft spent $1 billion to buy speech-recognition vendor TellMe. But it wasn't until a little over a year ago that Microsoft consolidated its various speech-focused products and technologies into the Speech at Microsoft team, whose charter is "bringing speech to everyday life," Serafin said.

These days, Microsoft execs don't look at speech as a standalone product or technology. They see it as an enabler of other products. They also see it as an increasingly integrated piece of Microsoft's overall NUI plan.

Over the next 12 months, Microsoft will be bringing to market four new products that use its various speech technologies. The four:

Auto entertainment systems, like the Kia UVO announced at the Consumer Electronics Show at the start of this year. The first cars with UVO are due out this summer.

Windows Phone 7 devices, which have TellMe's speech technology is embedded right into the device shell. The phones will allow users to control dialing and search using voice, and integrated text-to-speech means the phones also will be able to "talk back"  to users. (This is an example of what Microsoft execs mean when they talk about an "Internet of things" that connects up to the cloud)

Kinect sensors for Xbox incorporate voice-recognition capabilities, allowing users to pause, play, advance and stop games, TV shows and movies via voice commands

Corporate productivity products. There are more than 100 million Exchange users today who can make use of voice mail preview, voice translation and other voice-powered technologies that are built into the product (and will be built into Exchange Online, as Microsoft makes those features available to cloud users). Meanwhile, Microsoft's TellMe product currently is handling 2.5 billion calls a year, making use of TellMe's cloud back-end. (Interestingly, Serafin didn't mention Office Communications Server 14, which Microsoft is touting as its entry into the "enterprise voice" market.)

In the longer term, Microsoft is trying to help answer the question "When an we deploy systems with a human level of conversational understanding?" said Larry Heck, Chief Speech Scientist in the Speech at Microsoft group.

Heck told SpeechTEKers that there are three drivers that will help the company address this question:

  • Data and relevant machine-learning algorithms
  • Cloud-computing platforms, like Azure and TellMe Network's back-end platform
  • Search

There needs to be a lot more data collected on user-machine interaction before Microsoft and others can realistically expect machine interfaces, including speech, to be more natural, Heck said. NUIs can help provide ubiquity, by enabling users to access data wherever they are, he acknowledged. But currently entry points like search engines aren't doing much to help advance work in making computers and devices more conversational. Users are accustomed to typing in a few keywords, rather than naturally phrased queries, but voice search on mobile devices more closely mimics human conversation, Heck explained.

Heck told attendees to "stay tuned" for new Microsoft products coming in the next few years that will reflect advances in conversational expression and understanding. (I'm guessing something like the client-plus-cloud patient-information systems Microsoft demonstrated at its Financial Analyst Meeting last week might be among those products to which Heck was alluding.)

Anywhere else you think Microsoft could, should or will incorporate speech recognition or synthesis technologies?

Topics: Microsoft, Networking, Telcos

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

44 comments
Log in or register to join the discussion
  • RE: Touch isn't Microsoft's only next-generation interface technology

    Any word on ink?
    angarita calvo
    • Ink

      No mentions of ink today at speechtek. MJ
      Mary Jo Foley
      • RE: Touch isn't Microsoft's only next-generation interface technology

        @Mary Jo Foley I'd be happy if <a href="http://www.trsohbet.com">sohbet</a> got rid of the ribbon and gave us <a href="http://www.trsohbet.com">chat</a> users the pull-down menus that existed before Office 2007 for <a href="http://www.forumuz.net/">forum</a>. I like getting Outlook for the <a href="http://www.trsohbet.com/portal/">portal</a>. It has grown on me since my new Windows <a href="http://video.trsohbet.com">izlesene</a> box had to use Outlook since <a href="http://www.trsohbet.com">chat sohbet</a> was removed. <a href="http://www.trsohbet.com">sohbet odalari</a> email client is <a href="http://www.trsohbet.com.tr">mynet sohbet</a> <a href="http://www.trsohbet.com.tr">cinsel sohbet</a> weak.
        timaeus
    • RE: Touch isn't Microsoft's only next-generation interface technology

      @angarita calvo Switches and routers are also one of the most energy dense, heat generating devices in the DC. Eliminating them from the architecture not only improves network throughput but also reduces cooling requirements and energy consumption which are major costs factors for this scale of DC. <a href="http://www.arabaoyunlarimiz.gen.tr/araba/tabu-oyna-2-kisilik/">tabu oyna</a> <a href="http://www.kraloyun.gen.tr/yeni-oyunlar/">yeni oyunlar</a> <a href="http://www.game.gen.tr/kategori-4-32-Ben_10_Oyunlari.html">ben 10</a>
      Arabalar
  • RE: Touch isn't Microsoft's only next-generation interface technology

    "When an we deploy systems with a human level of conversational understanding?"

    Depends on how you define it, I guess.

    If you mean something that is "good enough," I think we can reach that via something like ALICEbot-like technologies.

    Thing is, ALICE doesn't actually parse the words and assign meanings to them or anything like that - ALICE is just a matching engine with pre-determined responses.

    But it works, and perhaps we could use it for something that resembles Star Trek computers, where you can pattern match speech well enough to create a command system.

    For something more like actual human level intelligence, however, I don't think that's happening any time soon.
    CobraA1
    • Depends in the human...

      I'd say computers have far exceeded the level of intelligence of some people I know.
      jasonp@...
      • RE: Touch isn't Microsoft's only next-generation interface technology

        @jasonp@... yes you are right that the computer exceeded te level of intelligence its soo ture. <a href="http://www.worldwideacademics.com/">Online Schools </a> | <a href="http://www.worldwideacademics.com/programs/online-degree.asp">university degrees</a> | <a href="http://www.worldwideacademics.com/programs/online-associate-degree.asp">Associate Degrees</a>
        otisa
  • Touch is another dead-end

    While touch may be useful for small devices, its limitations become evident with larger devices. There is no action at a distance, your hand is all over the screen and fine control is a joke. Touch is the equivalent of finger painting rather that painting complex art.

    Voice, body and face recognition with gesture based computing means the computer is doing the work. Glasses with head-up displays and other wearable interfaces and inevitably implants will all offer a new way to use computers.

    When Scottie faced a Mac in one of the Star Trek movies, he first tried talking and then talking into the mouse before he realised he had to use the keyboard. It will not be long before people will be staring at a computer in bewilderment before finally realising they have to touch it to make it work - very quaint ;-)
    tonymcs@...
    • I do recall talk of that on board the ship

      they made fun of Mr Scott for weeks.
      Tim Cook
      • RE: Touch isn't Microsoft's only next-generation interface technology

        @Mister Spock

        The same advertisers that brought us Seinfeld (lets play footsie and wiggle our shorts Bill), Laptop Hunters (that got all sorts of bad press for lies (incorrect pricing and customer never actually went into an Apple store) and portraying windows as "cheep"), And Windows 7 was Macs idea (where a college kid who can't get laid and get kicked out of his dorm room (by his Mac roommate) has to watch TV in the hall because he doesn't even have a friend whom he could visit).

        I bet Kinect will<a href="http://www.gopsohbet.net" title="cinsel sohbet" target="_blank">cinsel sohbet</a> not be magical either.
        IE8 had multi-process architecture before Chrome launched, and in fact<a href="http://sohbettir.com" title="sohbet" target="_blank">sohbet</a> was the first browser to announce the feature. <a href="http://www.gopsohbet.net" title="gay sohbet" target="_blank">gay sohbet</a> That's why both Chrome and IE use far more memory than the other browsers.<a href="http://www.alemchat.net" title="mynet sohbet" target="_blank">mynet sohbet</a> Chrome is a bit more strict than IE, IE will allow tabs with the same integrety level to <a href="http://www.eskimynetsohbet.com" title="mynet sohbet" target="_blank">mynet sohbet</a> share a single process.<a href="http://www.eskimynetsohbet.com" title="mynet" target="_blank">mynet</a> <a href="http://www.mynetci.com" title="mynet sohbet" target="_blank">mynet sohbet</a> Outside of that MS beat Google to the punch.<a href="http://www.mynetci.com" title="mynet" target="_blank">mynet</a> Good try though.<a href="http://www.indirmedenfilmizlehd.com" title="indirmeden film izle" target="_blank">indirmeden film izle</a>If MS came out with touch UIs for at least Word, Excel,<a href="http://sohbettir.com/forum" title="forum" target="_blank">forum</a> OneNote, and Outlook, with super slick, and highly<a href="http://eglence.sohbettir.com" title="youtube" target="_blank">youtube</a> effective integrated virtual keyboards, that would be mind blowing! I think <a href="http://sohbettir.com" title="canli sohbet" target="_blank">canli sohbet</a>that would be like lighting a rocket under PC touch computing.<a href="http://www.indirmedenfilmizlehd.com" title="bedava film izle" target="_blank">bedava film izle</a>
        exibir
      • RE: Touch isn't Microsoft's only next-generation interface technology

        @Mister Spock yes they made fun its really interesting and also i appreciate to him for this. <a href="http://www.worldwideacademics.com/programs/online-bachelors-degree.asp">bachelors Degree</a> | <a href="http://www.worldwideacademics.com/programs/online-master-degree.asp">Masters Degrees</a>
        otisa
    • It was Star Trek 4 The Way Home

      @tonymcs@... Also known as Star Trek 4 Save the Whales. One of the all time greats.
      Bill4
    • RE: Touch isn't Microsoft's only next-generation interface technology

      @tonymcs@... So throw your remote control away and get up and change the channel...throw away your garage door opener and open it by going in the house and opening it by pushing the button on the wall.

      The only difference is you're touching a remote, not a touch screen.

      You're the dead end.
      cyberslammer
    • RE: Touch isn't Microsoft's only next-generation interface technology

      @tonymcs@... I would argue that they talk because it is a show or movie.

      Remember the context of Star Trek. There is Jordie or Scott and they are the only ones talking. And the computer seems to focus on THEM! Not the minions that are floating around, who don't happen to be talking.

      So if Jordie or Scott were not talking you would have to guess what they were doing. Or they would have to have characters saying, "hey Mr Scott what are you doing?" By having voice activated computers the show can offload the responsibility of the plot to the computer, which is you.
      serpentmage
    • Maybe the 'big brains' at Microsoft shouldn't...

      @tonymcs@... make product development plans based on what they saw in thirty and forty year old movies.
      HollywoodDog
  • Not sure speech is all that useful

    I'm sure there are some scenarios where a voice interface is OK, but using it as a general UI just won't happen. I was first involved in speech control of computers in 1975 as a user. The biggest issue then (other than the laborious training required to get the computer to recognise voice commands) was speed. It was faster to push a button (even several buttons) than speak the equivalent command, even if it was a one syllable word.

    Later, I started using voice with a Mac in the early 1990s. There have been various voice control efforts on Windows too over the decades, I used one in the late 1990s. But they just aren't appealing, a keyboard and mouse (or other pointing device like touch) is very much faster and more efficient. And makes far less noise. Go into a call center sometime and listen to the cacophony of sound. That is what busy offices will be turned into if they use voice control for their PCs.

    As for games, if you have 4 kids playing a game of Super Mario Cart, which voice will the console listen to? How will it know commands from general "conversation"? Voice dialling has been available on many phones for years, but it's never been a killer feature, or even a sought after feature. I had a phone capable of voice dialling for years and never used it (voice dialling that is).

    I think voice control systems are interesting and great for research, likely there are a few niches they fill perfectly. But they are few and far between, so don't expect a voice UI to shake the world just yet.
    Fred Fredrickson
    • RE: Touch isn't Microsoft's only next-generation interface technology

      @Fred Fredrickson <br><br>Agree in most port. You what I hate the most in making a phone call? No real people answers, only automatic voice menus. You do one thing in 3 minutes while the same thing can be done in 3 seconds on a computer. In order to show you the voice menus, the machine read one by one, then you have to listen to them until the only you want come out. Then second level, the third. After a lot of frustration, it fanally forwards you to a real person. The person has to ask you the same information again. <br><br>However, it might be usual in a car, because voice does not require visual interaction, it is safe. Same reason why you are allowed to listen to music while driving but not watching TV.
      jk_10
    • RE: Touch isn't Microsoft's only next-generation interface technology

      @Fred Fredrickson - here i would like to see this is for closed caption for hard of hearing - IBM's ViaScribe is useful for colleges, courtrooms, anyplace where one might want verbatim notes. This software recognises pauses and er's and ums etc. But, most of all, i would like to see this linked to a translater that takes spoken words and translates into a sign-language animatron. i guess how useful it is depends on where its being used.
      madagoouk@...
    • RE: Touch isn't Microsoft's only next-generation interface technology

      @Fred Fredrickson
      >As for games, if you have 4 kids playing a game of Super >Mario Cart, which voice will the console listen to?
      Kinect on xbox solves this in terms of motion. Each player is mapped so it knows when its actually you moving and not someone behind you walking by. Guessing they will map your voice as well eventually.

      >Voice dialling has been available on many phones for >years, but it's never been a killer feature, or even a >sought after feature.
      Depends on the appropriateness of the feature I guess. I used to use live search on my previous non "smartphone" I couldn't stand typing in long cities and then names of places I needed to search for. But the voice input was a godsend. My current smartphone doesn't do that.
      rengek
  • RE: Touch isn't Microsoft's only next-generation interface technology

    I've seen a couple of speech recognition demos for the new Windows Phone 7 and I have to say, I am really impressed with what MS have done. It's not just because it works well, but the fact that it is integrated into the OS. Calling contacts, opening an application, searching for 'Pizza' within your locality on the internet! - It's all very impressive. This is another feature that sets WP7 apart from iOS and android.
    Poppets