Microsoft's speech aspirations fulfilled?

Microsoft's speech aspirations fulfilled?

Summary: Microsoft and Bill Gates have been in pursuit of speech technologies for a long time, and is reporting that it may end up buying Tellme Networks for a pretty price (the company has received $230 million in funding so far and is profitable).

TOPICS: Microsoft

Microsoft and Bill Gates have been in pursuit of speech technologies for a long time, and is reporting that it may end up buying Tellme Networks for a pretty price (the company has received $230 million in funding so far and is profitable).

Speech input has been the next big interface thing from Microsoft for at least a decade, but few mainstream users are giving speech commands to their Windows computers, telling it to "open file," "close window" or take a letter. According to GigaOm, Microsoft may pay $800 million or more for Tellme's speech technology.

It turns out the big application for speech is on the telephony and Internet communications front, from call centers to cell phones. Tellme claims 35 million unique callers per month, over 10 billion utterances per year, over 2 billion calls per year, the world's largest recorded audio library and a carrier-grade network for its VoiceXML platform traffic. The company also says that it accounts for 40 percent of 411 traffic in the U.S.

You have to wonder about the effectiveness of all the money Microsoft has spent on speech technology R&D over the years if the company ends up acquiring Tellme for hundreds of millions.

Of course, Microsoft is not lacking in cash for such transactions, and views this kind of technology as core to its mobile and communications aspirations, and its Speech Server business. Microsoft's  speech products group is centered around its speech recognition software for Vista and Speech Server for handling speech-enabled telephony and speech-enabled applications.  

In 1998, Microsoft hired top speech recognition technologist Kai Fu Lee, and was a founder of Microsoft Research Asia, and then lost him to Google in a much publicized battle in 2005.

Microsoft Research has speech tech teams in Redmond and Beijing working on spoken language technologies with a vision of the "fully speech-enabled computer." Check out the marketing flash demo explaining Microsoft speech recognition ambitions and research activities, which focus on continuous free flowing speech and ways to obtain cleaner signals for computing input in public places by sensing vibrations in skin.

More from Techcrunch and WSJ ($)

Topic: Microsoft

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • I find the enthusiasm baffling

    Sitting in an open plan office where everyone is speaking to there computer sounds like my idea of hell.

    Furthermore what happens when your neighbour shouts out "delete all the files!", I imagine there will be a few unhappy faces in the office after that.

    It doesn't need reiterating often enough that getting a computer to understand continuous speech is a very difficult task. "Recognize speech" and "wreck a nice beach" are phonetically identical when spoken continuously, human beings are usually able to disambiguate from the context. Decades of AI research have failed to capture this kind of "common sense" knowledge in a way that is consistently usable by a computer system.

    If you want to increase your productivity I would suggest learning to touch type. It is something anyone of average intelligence can learn in a week and you won't disturb you colleagues with it.
    • You've totaly missed the concept

      The keyboard and mouse are great for lots of types on interaction with a computer, but not all, and not for all. What about those how do not have use of their limbs to type? Speech interaction can give them an effective means of machine interaction.

      Also, you seem to be forgetting interaction with different types of devices. I bought a $20 voice activated light controller for the bedroom. It's nice to not have to get out of bed and turn the lights on and off! I'm a lazy bum I guess.

      My Windows Mobile smart phone has an application called Voice Command, which does voice dialing and will actually allow me to launch apps and such via voice. It's quite handy while driving.

      The point is that while a pointing device and keyboard are great and can be the best interaction is many cases, they are not it all. Natural human machine interaction has its place, and it?s actually gotten pretty good. Not perfect but there practical useful applications for it right now, as in my examples.

      As software company like Microsoft has to make these kinds of investments. Obviously operating systems are becoming commoditized and free with Linux and other open source projects out there. Microsoft has to do stuff that?s beyond that realm. While sometimes people call it lock in, it is often times differentiation. Why would I buy an OS if I can get it for free unless the one that costs money does something that the others don?t? I?m not saying that Microsoft doesn?t sometimes create lock in artificially, but there are times when they do offer something unique.

      The Tablet PC is a good. Yes, I know that Microsoft wasn?t first in this. I had a Newton years ago. But I?d never seen such a useful and powerful application of this type of technology. Hand writing recognition on the Tablet PC is fairly remarkable. I've seen examples of the recognition actually doing a better job at reading some bad hand writing than even I could! Not perfect, but then humans misread and mishear a lot as well. (Note: There is a company that?s making a Mac tablet:

      I look forward to seeing what comes of this. Natural machine interaction is an area that Microsoft is, at least in the commodity OS and platforms world, done some good work and is leader. I hope they continue to make progress.
    • Your head is firmly planted

      There is a group called IT-Valour which exists to provide speech recognition computers to the wounded soldiers in the military hospitals. Using this software the soldiers can send emails and stay in contact with their families and with the troops in Iraq who rescued them. In addition they can also blog and surf the internet. Sorry that this doesn't seem important to you but they surely appreciate it.

      Also useful for those who are blind. Guess is you are not 100% healthy and suffer no disability you can use the computer, otherwise go way. Is that your point?
      • Yes, useful for the disabled

        But for people without disabilities the keyboard (and possibly the mouse) are singularly more effective, if people take the trouble to learn how to use them of course.

        It also doesn't alter the fact that computers still cannot do continuous speech recognition.

        I should also point out that speech recognition interfaces can be just as bad as any other kind, but maybe there are people who love using telephone menu systems?
        • 1990s thinking...

          Sorry to be a little abrupt but the assumption that voice recognition will NEVER advance to the point where it surpasses manual input mechanisms is rather luddite.

          It will take time - and there will be (and have been) false starts - but I think it is reasonable to assume that voice recognition, possibly augmented by visual clue gathering (lip reading etc.) will advance to the point where it becomes pervasive and IMO preferred.

          We all take the fancy 'stuff' in our systems and PCs for granted but it has not always been that way. Mouse / GUI interfaces went thru their ups and down in the early 1980s and started to become mainstream useable in the late 80s / 90s. In other words it took a while to get it 'right' - voice recognition (streaming / continuous) is a harder nut to crack but it will happen.

          .. and whilst we're at it, the keyboard (QWERTY) is NOT the most effective means for input into a computer system - it is simply the most pervasive. The layout of QWERTY keyboards has little to do with human efficiency... it was developed to prevent jamming strikers in mechanical typewriters.

          So all of your comments about "speech recognition does not work" or "computers cannot do continuous speech recognition" are (debateably) true today - but that is not the point of the article. The whole point of the article is that there is a recognised need by MSFT to inject a significant investment to get this stuff right.
          • 1990s no different from today

            Back then lots of people thought they would soon solve the continuous speech recognition problem "real soon", even though AI research had consistently demonstrated how difficult a problem it is. I would say we still just about as far off.

            I seriously question the need for speech recognition outside a few specialist areas.

            By the way I think blue skies research is very important and too few companies are doing it, I just don't think speech recognition is an area that is likely to deliver practical returns very soon and there are some more important areas of usability that urgently need more effort.

            I fully agree there may be a better way of entering input than keyboards but I haven't seen it yet and the most practical thing full bodied person can do to improve their productivity is to learn how to touch type, a skill that I can say with some certainty will be valuable for the rest of your life.