X
Business

Using your voice to pilot your computer

According to The Seattle Times, an interdisciplinary team of scientists of the University of Washington (UW) has developed Vocal Joystick, a software which enables people with disabilities to control their computers using the sound of their voice and without the need to use a mouse. Their virtual computer mouse driven by sound has already been tested at the UW Medical Center with spinal-cord-injury patients and other participants with varying levels of disabilities. The researchers, who developed their own voice-recognition technology, hope to have a prototype available online this fall. But read more...
Written by Roland Piquepaille, Inactive

According to The Seattle Times, an interdisciplinary team of scientists of the University of Washington (UW) has developed Vocal Joystick, a software which enables people with disabilities to control their computers using the sound of their voice and without the need to use a mouse. Their virtual computer mouse driven by sound has already been tested at the UW Medical Center with spinal-cord-injury patients and other participants with varying levels of disabilities. The researchers, who developed their own voice-recognition technology, hope to have a prototype available online this fall. But read more...

UW Vocal Joystick

You can see above "the mapping of the vowel sounds recognized by the Vocal Joystick engine to the radial direction resulting in a mouse pointer movement. The VJ engine also captures loudness and pitch information, which can be used to control the speed of the pointer movement." (Credit: UW) Here is a link to a larger version of this diagram.

This research project has been led at UW by various teams including the AIM group (Accessibility, Interaction and Mobility) of professor Jacob Wobbrock or the DUB group (Human-Computer Interaction & Design) where professor Jeff Bilmes created the sound-recognition engine with his students.

UW VoiceDraw screenshot

You can see above "a screenshot of the VoiceDraw application showing (a) the status bar, (b) help overlay, and (c) canvas area. Susumu Harada, a graduate student in the Computer Science Department, who designed the VoiceDraw hands-free drawing program, created this painting of Mount Fuji in Japan using only his voice in about 2.5 hours. (Credit: UW)

So how does this software work? Here are some short excerpts from The Seattle Times mentioned in the introduction. "There are several options for people who need accommodations in using computers, but the UW software is distinguished on several levels. For one, it doesn't use standard voice-recognition technology. Instead, it detects basic sounds at about 100 times a second and harnesses them to generate fluid, adaptive cursor movement. Vocal-joystick researchers maintain the system is easier to use because it allows users to exploit a large set of sounds for both continuous and discrete movement and to make visual adjustments on the fly. Kurt L. Johnson, a professor in the Department of Rehabilitation Medicine at the UW, says he believes the software has great potential because it is easy to both learn and use."

You'll find more details about the Vocal Joystick project at the DUB page and at the official Vocal Joystick homepage.

Here are some more details about the Vocal Joystick voice-recognition technology engine. "The VJ system consists of three main components: acoustic signal processing, pattern recognition and motion control. First, the signal processing module extracts short-term acoustic features, such as energy, autocorrelation coefficients, linear prediction coeffients and mel frequency cepstral coefficients (MFCC). Signal conditioning and analysis techniques are needed for accurate estimation of these features. Next, these features are piped into the pattern recognition module, where energy smoothing, pitch and formant tracking, vowel classification and discrete sound recognition take place. This stage involves statistical learning techniques such as neural networks and dynamic Bayesian networks. Finally, energy, pitch, vowel quality and discrete sound become acoustic parameters to be transformed into direction, speed and other motion related parameters. The application driver takes the motion control parameters and launches corresponding actions."

For more information, here are some documents to read:

Sources: Richard Seven, The Seattle Times, October 6, 2008; and various websites

You'll find related stories by following the links below.

Editorial standards