A little more than a year ago, Sebastian Krahmer posted a question on the Dailydave security mailing list whether Vista's speech recognition was exploitable or not via malicious sound files that could be hosted on websites. I was the first to answer his call with some initial skepticism but that turned in to astonishment when I ran some tests that confirmed the vulnerability. Stories ran a few months ago before the finalization of Vista Service Pack 1 that SP1 would close this speech recognition vulnerability but I couldn't get any confirmation or denial from Microsoft after multiple queries. I finally got tired of waiting and decided to test the exploit again with Vista SP1 RTM installed and found that the vulnerability still exists.
The test sound file I created managed to wake Vista speech recognition, highlight all the files on my desktop or all my pictures via Windows Explorer, and invoke the shift-delete command which wipes the files without the ability to undelete from the Recycle Bin. I could also open Internet Explorer and invoke TinyURL addresses which in turn redirect to some other malicious executable. While the damage is limited to the user space since Vista speech recognition can't get around the UAC prompt (assuming it's on), code execution in the user space is still a serious vulnerability.
When this story first got some traction last year, it stirred up some debate and controversy arguing over the seriousness of this exploit. I had people privately and openly criticizing me saying this was a nonissue while others like Scott M. Fulton understood the seriousness of it called it the "low-tech" Vista exploit. Others like Ryan Naraine and Thierry Zoller openly thought I was crying wolf and not to be taken seriously but I'm sure there are those in the disabled community who need to use speech recognition would vehemently disagree. The bottom line is that while the vulnerability has zero impact to people who don't use speech recognition, it has full impact on people who do use speech recognition with a desktop microphone and speaker. While this number is still rare, Microsoft wants to make this a mainstream feature and they should address the problem. Otherwise why would Bill Gates declare speech recognition as a key feature of Windows Vista last year?
Last years I gave two simple recommendations for mitigating this vulnerability:
- Don't allow the generic "start listening" command to wake speech recognition. Require some kind of keyword like "Jenny, start listening" if you decide to name your computer "Jenny" but you can name it whatever. That at least breaks the generic and universal attack vector.
- Don't allow sound being played by the computer itself to be processed by the speech recognition engine. While this doesn't stop neighboring computers, radios, TVs, or people nearby from shouting in to the computer, it does close off one key vector for this exploit.
I think most security experts would agree that these are reasonable security measures and more needs to be done in this research area. Microsoft has had a year to implement some basic security mechanisms to mitigate this vulnerability in Vista Service Pack 1 but they haven't bothered with it. In my opinion, this is very disappointing and a lost opportunity for Microsoft on the security front. For now, the only thing users can do is disable Vista speech recognition and only use a headset if they need to use voice dictation and not the more convenient desktop microphone plus speakers.