Microsoft's voice-enabled assistant technology still in the works

Microsoft is still a big believer in voice becoming a key way for users to communicate with and query their devices in a more natural way.
Written by Mary Jo Foley, Senior Contributing Editor

Back in 2011, Microsoft CEO Steve Ballmer was touting heavily the idea that voice-enabled assistants would become key to the way users interacted with their PCs and devices.


Since that time, Microsoft execs have gone largely quiet on that front. Even at this week's TechFest Microsoft Research fair (at least on the day that was open to invited external guests), Microsoft had little to show or tell about developments on the voice-control front.

In spite of the silence, Microsoft's work on this front is alive and well. At the company's newly relaunched Envisioning Center, voice-enabled assistant technology was almost as front-and-center as touch.

The Envisioning Center features demos of technologies that Microsoft officials believe are between three and 10 years away from widespread usage. The Envisioning Center replaces Microsoft's separate Microsoft Home demo space and its Office Labs Envisioning work. The single redesiged center is meant to show that home and work are no longer separate.

I got a tour of the Envisioning Center this week while on Microsoft's Redmond campus. (No photos were allowed, but Microsoft did post a few of its own.) The Center included collaborative office, remote "touchdown" spaces, team project rooms, small-huddle brainstorming areas, and kitchen and living-room spaces. Large-screen, multitouch Perceptive Pixel Inc. (PPI) displays were everywhere -- on walls and even embedded in desks. Kinect sensors were built into display surfaces throughout. And voice-assistant technology figured prominently in home and work settings.


In these demos, users could ask their displays to pull up information then refine their queries without having to figure out specific keywords or artfully craft and hone their search queries. Users could just ask things like "Is this part in stock? Will it fit in my design? Are there local suppliers who can get this to me today?" and let the system figure out the relevant context and access the necessary metadata attached to any given object. This is not simply Clippy 2.0 (without the avatar) or Apple's Siri; it's more far-reaching than either.

As I blogged back in 2011, there are a number of current and future Microsoft technologies at work behind the scenes in these voice-enabled-assistant demos. Kinect sensors, which include voice recognition, are one piece of the puzzle. But the work coming from the Bing, Microsoft Research and the former TellMe team (which is part of Microsoft's Online Services unit) are all key, too.

The Bing team has been working with Microsoft Research to improve Bing's inherent natural-language-search capabilities. At the same time, Microsoft has been doing work around Bing/Tellme/social-graph integration, officials have said, with the ultimate goal being a speech natural-user-interface (NUI) service to help users accomplish tasks in a more natural and conversational manner. 

There's also an augmented-reality-focused team in Bing that's been working on everything from camera tracking, to visual and audio recognition, to optical character recognition and translation and vision-based natural-user interfaces. The team already has made available some AR deliverables, including the Bing translation app, augmented-reality-enriched Bing Maps, and the Bing Vision and Bing Audio technologies in Windows Phone.

For those continuing to wonder whether Microsoft intends to unload Bing and throw in the towel on search, I'd say these efforts show Microsoft has no such plans. Bing is central to Microsoft's plans for the future, as I've said before and will repeat again.

As I noted earlier, the Envisioning Center tech demos are meant to represent projects that are three to 10 years away from commercial reality. It will be interesting to see whether this speech-enabled assistant technology materializes on the nearer or further end of that scale ....

Editorial standards