Speech recognition is inherently unreliable...Since you deem the problem as remotely exploitable, let's ignore for one that I have to actively browse to a website and as such be physically in front of the PC and assume we use XSS to zombie the browser and play the audio 5 minutes later. Then we assume there is not too much background noise, assume the audio level is ok, assume the microphone is on, assume Speech recognition is used, assume audio is on, and so forth.
Too many assumption to make it a real risk for me remotely, sorry. That's my personal opinion. Is is a vulnerability ? Yes. Is it likely to work 100% like a good crafted exploit? No.
* Via Full Disclosure.