Users of Microsoft's voice-enabled services such as Cortana will now be able to decide whether or not the audio recordings of their interactions can be used by the company to improve its speech recognition algorithms.
By default, customers' voice clips will not be contributed for review, said Microsoft in a new blog post; instead, users will be required to actively opt in to allow the company to store and access their audio recordings.
Customers who have chosen to remain opted out will still be able to use all of Microsoft's voice-enabled products and services, confirmed the company. Their audio recordings won't be stored, but Microsoft will still have access to some information associated with voice activity, such as the transcriptions automatically generated during user interactions with speech recognition AI.
If and once they have opted in, however, users' voice data might be listened to by Microsoft employees and contractors as part of a process to refine the AI systems used to power speech recognition technology.
According to Neeta Saran, a senior attorney at Microsoft, the announcement reflects an effort to increase users' control over their privacy, and to make sure that "meaningful consent" has effectively been granted by the customer before their voice data is shared and used by the company. "This new meaningful consent release is about making sure that we're transparent with users about how we are using this audio data to improve our speech recognition technology," said Saran.
The voice clips that will be contributed by users who have opted in will be de-identified as they are stored, said Microsoft. Microsoft identifiers, as well as any strings of letters or numbers that can be telephone numbers, social security numbers or email addresses will be removed, to make sure that the data cannot be tied back to an individual. Any voice clip that is found to contain personal information will be deleted.
The process is part of a program already tried and tested. When users interact with voice-enabled technology, such as dictating a text message or requesting a web search, Microsoft's algorithms automatically translate speech into words – and improving the accuracy of that translation is an on-going challenge for the company's researchers. One way to do so is to train the AI system with more real-world data, to refine the technology's ability to make out words spoken in a variety of different contexts.
This is why Microsoft, along with most big tech players that offer voice-enabled products and services, is interested in re-using the recordings of customers' voice interactions with their devices. Audio clips can be stored and listened to by employees who manually transcribe what they hear, to improve the scope and accuracy of the database that is then used to train speech recognition algorithms.
The objective is to make sure that the technology understands voice requests in many different languages, and regional accents and dialects even with background noise. "The more diverse ground truth data that we are able to collect and use to update our speech models, the better and more inclusive our speech recognition technology is going to be for our users across many languages," said Saran.
Without the appropriate privacy safeguards, it is easy to see why listening into real-world conversations can be seriously intrusive. Examples abound of private voice clips that mistakenly ended up in the hands – or rather, ears – of technology companies without the knowledge or consent of customers.
In 2019, for example, Apple had to suspend a program similar to Microsoft's, in which contractors listened to recordings of Siri users' queries to improve the voice assistant's performance, after it emerged that the workers regularly heard voice clips containing highly sensitive information ranging from health data to sexual encounters and discussions with doctors.
In the same year, Google found itself in hot waters when it was reported that the company's employees were "systematically" listening to audio files recorded after users activated the Assistant by saying "Okay Google" or "Hey Google". News of privacy-invading eavesdropping also came that year from Amazon's Alexa-enabled smart speakers.
Microsoft, for its part, stopped logging any voice data for product improvements across all of its services at the end of October 2020. The option to allow the company to use voice recordings will now roll out gradually on a product-by-product basis, but the company has already confirmed that Microsoft Translator, SwiftKey, Windows, Cortana, HoloLens, Mixed Reality and Skype voice translation will be included.
Users who choose to let the company listen to their voice recordings will see all of their audio data contributed for review for up to two years, and stored in an encrypted server. If during this time, a voice clip is sampled for transcription, the recording may be retained for a longer period, "to continue training and improving the quality of speech recognition AI," said Microsoft.