Speech recognition gaining ground

Decades of research and development into speech recognition technology are finally beginning to result in promising commercial signs, according to research firm Gartner.
Written by Matthew Broersma, Contributor on
Decades of research and development into speech recognition technology are finally beginning to result in promising commercial signs, according to research firm Gartner.

After declining from a peak in 2000, the worldwide market for speech-recognition products is on pace to reach US$130m in 2003, up from US$128m in 2002, Gartner said on Wednesday. The figures show that buyers are once again beginning to show interest in a technology that many believe will ultimately transform the way people interact with computers.

The industry suffered declines in 2001 and 2002, after a peak of US$140m in 2000, but now companies such as market leaders Nuance and ScanSoft are putting forward a good business case for speech recognition, according to Gartner. "Many implementations provide proof that solutions that use speech recognition can deliver business value, as cost savings or improved customer service," said analyst Steve Cramoysan in a statement. Efforts by Microsoft and IBM are adding momentum to the industry, Cramoysan said.

The technology has improved to the point where vendors can't compete solely on the basis of speech recognition success rates, Gartner said, while Internet applications and standards such as VoiceXML are helping to broaden the technology's appeal. Most products are used in call centres and business portals.

North America is the biggest speech recognition market, generating 61 percent of 2003 revenues, but this will decline as markets such as EMEA (Europe, the Middle East and Africa) develop, Gartner predicted. EMEA currently represents 26 percent of the market.

Giants of the high-tech industry such as IBM, Microsoft and Intel are continuing to invest heavily in improving the ability of PCs and servers to interpret spoken language.

Microsoft in July released the first public beta of its Speech Server, which lets servers better handle oral commands. Speech Server, formerly .Net Speech Platform, is an attempt to reduce the cost of creating automated phone response systems.

IBM, meanwhile, is using its research labs and services divisions to create showcase applications for large corporations. Financial services firm T Rowe Price has installed an account management system from Big Blue that lets its customers conduct transactions through common spoken requests.

In April, Intel released software that lets computers read lips, a step forward that could lead to better speech recognition applications. The Audio Visual Speech Recognition (AVSR) software tracks a speaker's face and mouth movements. By matching these movements with speech, the application can provide a computer with enough data to respond to spoken commands, even when these are given in noisy environments. Most in the industry agree that it will take some time for the benefits of speech recognition to develop -- closer to 50 years than to 10, according to Intel co-founder and chairman emeritus Gordon Moore. By 2010, through its "Super human speech recognition project", IBM hopes to develop commercially viable systems that can transcribe speech into written text more accurately than humans. At the moment, machines have an error rate that is five to 10 times higher than that of humans, according to various estimates. Automated translation will also be greatly improved. Researchers at Microsoft and elsewhere are creating computers that can understand speech as a function of probability, rather than trying to understand syntax. For example, Yoda, a speech-to-text engine under development at Microsoft, can turn spoken word into coherent text email messages by studying a user's habits. ZDNet U.K's Matthew Broersma reported from London. CNET News.com's Michael Kannellos contributed to this report.
Editorial standards