Microsoft is looking for new ways for people to interact with computers that build upon the actions that mouse and keyboard make possible.
The software giant's search for new human-machine interfaces has led it to investigate gesture-recognition, where a computer user controls their machine using hand movements.
On Wednesday Microsoft showed off a system that captures gestures made above the surface of the PC keyboard and translates them into commands for Windows.
At the base of this new control system is the Kinect, Microsoft's sensor bar that not only captures 2D video but the depth of the scene it's filming — allowing it to place objects in 3D space.
Kinect was originally developed as a controller for the Xbox 360 games console, but the device is also available with a software development kit for Windows PCs and has been used to develop a number of non-gaming applications.
A gesture towards future PCs
The prototype gesture-controlled PC on show was controlling a Windows 8 PC at Microsoft Research labs in Cambridge, where developers demonstrated the 16 gestures the device can detect.
The gestures — which include swipes, clasps and pinches — can be used for a range of everyday tasks within Windows, such as bringing an application window to the fore or showing the desktop. Because the system is detecting gestures and then triggering commands already built into Windows — for instance, alt-tabbing through a list of running programs or maximising a window — it could be run on almost any Window 8 or 7 machine.
Gestures are detected via a Kinect positioned above the keyboard and monitoring the movement of the hands above the keys, capturing the hand movements and position in 3D space.
Some gestures are designed around manipulating windows on the desktop — particularly around simplifying the sizing and positioning of windows, which Microsoft believe may be achieved more easily using gestures rather than with a mouse.
Here a window is maximised by clenching a fist to "grab" it and then opening the hand while moving towards the top of the keyboard.
Performing the same series of gestures in reverse minimises the window. Repeating the gesture while moving the hand to the left or right edge of the keyboard docks the window with the left or right edge of the screen. The same series of gestures while moving the hand to the top left and right corners of the keyboard will throw the window to the left or right of screen, but not dock it with the edge.
Bringing hands together in the middle of the keyboard and then moving them to the keyboard's left and right edge with palms down and fingers splayed will show the desktop. Repeating the gesture restores the original view.
Placing a flat hand on the keyboard with the palm facing up allows the user to peek, which means bringing to the fore another application running on the desktop.
One way this could be used is to quickly bring up a web browser while writing in a text editor, as seen here, maybe to copy text from the browser. Once the gesture is reversed, the desktop view reverts back to what it was prior to the peek command.
To help users bring the desired application to the fore, users can hold their other hand perpendicular to the keyboard and swipe in left or right to select the application from an application bar — pointing with the right hand when the appropriate application is highlighted.
Another gesture sees users pinch thumb and forefinger to make the shape of a magnifying glass to bring up the Windows 8 search menu.
The software needed to run the prototype requires little processing power and memory, according to Microsoft, and is capable of running on PCs released in recent years.
Although the system on display at Microsoft's Cambridge labs used a Kinect pointing down at the keyboard Microsoft said it has made another prototype system that works with the Kinect at a 45 degree angle to the keyboard, so could be attached or fitted inside a monitor bezel.
Improving on Kinect
While the prototype system uses the same base hardware as a Kinect for Windows kit Microsoft has spent about 18 months building software to augment the Kinect's abilities.
Primarily Microsoft has been building machine learning software that allows the system to accurately recognise the gestures and to not confuse unrelated hand movements with commands.
Doing this required a lot of data about the different ways even these relatively few gestures could be performed.
For a system to be a workable gesture control interface, it needs to recognise gestures without getting confused by the myriad subtly different ways someone might hold or move their hand.
Collecting data on all these variations by filming the gestures being manually repeated many times was too slow, so the team decided to make its own computer model of the human hand.
"We have a skeletal model of the human hand and we will literally just say 'Here's the base pose' now perturb it. It just uses randomisation on all of the joints, knowing how the joints are allowed to move, moves them around in space and generates as many images as we need," said Chris O'Prey, a Microsoft senior development engineer who is one of the builders of the system.
"It generates images that look like they've come from a Kinect, with the right noise, depth and image aspects. That's what we use to build our machine learning processes and because we can do that we can build a classifier [for a gesture] in the space of about a day."
This classifier provides a yes/no decision tree, which the system uses to work out what gesture the user is trying to perform. The depth images captured by Kinect at 30 frames per second are fed into this decision tree pixel by pixel.
"It looks at each pixel we're getting back from the Kinect and says 'If this is part of the hand, where are the fingers? What is the pose? What do you think this is?' It gives us back statistics that give us a confidence for which pose we're in and how classically correct it is," O'Prey said.
Microsoft has also added software to track the hand's position relative to the desktop and the keyboard.
The system correctly recognises gestures between 70 to 90 percent of the time, according to Microsoft, with the proviso that hands need to be "held steady" to achieve this.
"That's highly above the usual machine learning bracket of 60 – 65 percent that most people quote," O'Prey said.
"It's using the algorithms that we originally used for Kinect but they've been augmented and processed further.
"We designed a way of looking at body parts, as well as body poses at the same time. Rather than just saying 'Here's your arm and here's your hand'. What it's saying is 'Here's your arm, here's your hand and by the way the hand's fingers are splayed and pointing downwards'."
Researchers are also trying to build safeguards into the system to avoid the user firing off commands unintentionally, for example so it can tell the difference between a user reaching from the keyboard to the mouse, and one of the swipe gestures.
"We have seen those false positives but as we've been building on the system and iterating it those have got lower and lower," he said.
Work is also continuing to refine the system to make it easier to use and work out what gestures are most beneficial to computer users.
"What we're learning from user studies is that some of these gestures conflict with existing usage of the machine, some are not as easy to do [as others], while some are very nice for users.
"It's very much an iterative process to design gestures that users will find easy to use, but that will also be intuitive, ergonomic and quick."
To command the system users perform gestures just above the surface of the keyboard. Microsoft are particularly keen to avoid users having to mimic the scene in the science fiction film Minority Report, where Tom Cruise's character manipulated digital images by holding his hands in the air for prolonged periods.
"We don't want to end up with this sustained hands in air thing that you see in Minority Report because it's painful."
The software is written in C++, for the "high speed" components, and C#, for the "lower speed" parts and "glue that keeps everything together".
Microsoft will not say how long it will be until a similar Kinect-based gesture system may be available to buy, or the technology in the system finds its way into other Microsoft products – saying the project is currently about better understand how to evolve computer interfaces.
"It's all about doing rapid prototyping to understand how we can interact with users and how we can potentially aid the use of computers through multiple modalities, going beyond touch, keyboard and mouse," O'Prey said.