While Microsoft's mult-touch capabilities (and lack thereof) are in the news daily, the company's speech engine and algorithms don't often merit a mention.
At the SpeechTEK conference in New York City on August 3, Microsoft officials attempted to explain what the Redmondians have coming in the voice recognition and synthesis space -- without going so far as to announce undisclosed products. And yes -- before you ask -- there is a cloud angle, like there seems to be for every Microsoft product and technology thesee days.
Zig Serafin, the General Manager of the "Speech at Microsoft" group, outlined for SpeechTEK attendees Microsoft's evolution in speech, a technology area that has been part of the natural user interface (NUI) focus for the Softies since 1993.
In 1999, Microsoft made its first speech-specific acquisition, the speech-toolkit vendor Entropic. In 2007, Microsoft spent $1 billion to buy speech-recognition vendor TellMe. But it wasn't until a little over a year ago that Microsoft consolidated its various speech-focused products and technologies into the Speech at Microsoft team, whose charter is "bringing speech to everyday life," Serafin said.
These days, Microsoft execs don't look at speech as a standalone product or technology. They see it as an enabler of other products. They also see it as an increasingly integrated piece of Microsoft's overall NUI plan.
Over the next 12 months, Microsoft will be bringing to market four new products that use its various speech technologies. The four:
Auto entertainment systems, like the Kia UVO announced at the Consumer Electronics Show at the start of this year. The first cars with UVO are due out this summer.
Windows Phone 7 devices, which have TellMe's speech technology is embedded right into the device shell. The phones will allow users to control dialing and search using voice, and integrated text-to-speech means the phones also will be able to "talk back" to users. (This is an example of what Microsoft execs mean when they talk about an "Internet of things" that connects up to the cloud)
Kinect sensors for Xbox incorporate voice-recognition capabilities, allowing users to pause, play, advance and stop games, TV shows and movies via voice commands
Corporate productivity products. There are more than 100 million Exchange users today who can make use of voice mail preview, voice translation and other voice-powered technologies that are built into the product (and will be built into Exchange Online, as Microsoft makes those features available to cloud users). Meanwhile, Microsoft's TellMe product currently is handling 2.5 billion calls a year, making use of TellMe's cloud back-end. (Interestingly, Serafin didn't mention Office Communications Server 14, which Microsoft is touting as its entry into the "enterprise voice" market.)
In the longer term, Microsoft is trying to help answer the question "When an we deploy systems with a human level of conversational understanding?" said Larry Heck, Chief Speech Scientist in the Speech at Microsoft group.
Heck told SpeechTEKers that there are three drivers that will help the company address this question:
- Data and relevant machine-learning algorithms
- Cloud-computing platforms, like Azure and TellMe Network's back-end platform
There needs to be a lot more data collected on user-machine interaction before Microsoft and others can realistically expect machine interfaces, including speech, to be more natural, Heck said. NUIs can help provide ubiquity, by enabling users to access data wherever they are, he acknowledged. But currently entry points like search engines aren't doing much to help advance work in making computers and devices more conversational. Users are accustomed to typing in a few keywords, rather than naturally phrased queries, but voice search on mobile devices more closely mimics human conversation, Heck explained.
Heck told attendees to "stay tuned" for new Microsoft products coming in the next few years that will reflect advances in conversational expression and understanding. (I'm guessing something like the client-plus-cloud patient-information systems Microsoft demonstrated at its Financial Analyst Meeting last week might be among those products to which Heck was alluding.)
Anywhere else you think Microsoft could, should or will incorporate speech recognition or synthesis technologies?