[Update 1/31/2007 - Microsoft confirms] Sebastian Krahmer on the Dailydave security mailing list started a discussion about the potential for exploiting Vista's speech recognition feature by hosting malicious sound files on a website that would playback a series of audio commands to try to subvert the Operating System. Krahmer didn't actually test any of these theories, but raised an interesting concern about the safety of Vista's speech command system and I followed up and came up with the actual tests to prove the first Vista remote exploit.
I initially responded to the list explaining that an Operating System should filter out the sounds it picks up on the Microphone to avoid a nasty feedback problem, but it's still possible for the Mic to pick up enough of the voice to run. Someone else responded that Apple tried similar functionality 15 years ago and quickly realized that they had to guard the feature with a keyword that needed to be spoken because people were playing gags with the "shutdown" command. But I have used speech command and realized that Vista only requires a static command so I proceeded to investigate with an actual test to test these theories.
I recorded a sound file that would engage speech command on Vista, then engaged the start button, and then I asked for the command prompt. When I played back the sound file with the speakers turned up loud, it actually engaged the speech command system and fired up the start menu. I had to try a few more times to get the audio recording quality high enough to get the exact commands I wanted but the shocking thing is that it worked! Anyone that's ever visited MySpace knows how many annoying webpages out there that will start blasting loud MP3 music as soon as they enter the page. [Update 4:17PM - Someone asked me how loud I had the speakers. To my surprise, not very loud at all and I was shocked at how well it worked. I didn't even believe it would work at the loudest setting let alone at a moderate sound level.]
There are some mitigating factors but there is no doubt this is still a serious exploit. Most people won't have Vista speech commands configured and enabled but if they do, the speech command control console will automatically load with the operating system and park itself on the top of the desktop waiting for audio commands. The other mitigating factor is that if you visit a webpage and it starts barking out slow and loud Vista speech commands, it will be rather obvious to most people that something is very wrong. But it's still possible that a webpage might delay the sound playback and hope that the user is not around to stop the exploit. Another mitigating factor is that the Vista command prompt doesn't seem to take any speech commands at all, but that doesn't prevent a remote hacker from interacting with your OS in an unauthorized manner.
My recommendation is that Vista users disable the speech command feature from automatically starting up in Vista and only use it in a supervised manner until there is a patch for this. Vista speech commands should completely filter out any sound coming out of the computer system to prevent unauthorized speech commands coming from malicious sound files for a long term fix. Microsoft should at least implement a short term fix by letting the user set a unique pass phrase or series of numbers to activate speech commands rather than allowing a fixed phrase activate the system.
[Update 4:55 PM - Someone (who shall remain unnamed until they give me permission to name them) emailed me and criticized me that this isn't a remote exploit and that I was being "ludicrous" and that this can't bypass UAC. Well I never claimed this would bypass UAC and secure desktop nor do I think it needs to to be able to do some serious damage. The fact that a website can play a moderate level sound file to interact in a way with the desktop by activating an idle speech command system and be able to delete user documents with zero user interaction is serious by any stretch of the imagination.]
[Update 2/1/2007] Disagreement over impact of Vista’s analog hole