Adobe's VoCo voice project: Now you really can put words in someone else's mouth

It may be hard to trust the authenticity of any recorded speech in the not too distant future.
Written by Liam Tung, Contributing Writer

Source: Chris Dotson/YouTube/Adobe MAX

You may soon be able to make people say things they never did, if Adobe one day gets to release its new voice-editing software.

The company showed off Project VoCo yesterday at its annual MAX event, revealing a tool that will do for audio what Photoshop does for the manipulation of images.

As The Verge reports, with about a 20-minute recording of a speaker's voice, VoCo can be used to insert single new words that the speaker never said and even create entirely new, natural-sounding sentences.

The technology was demonstrated by researcher Zeyu Jin, who was offering MAX attendees a sneak peak at products under development. It's not clear whether VoCo will eventually be released as a product. Adobe Research is collaborating with Princeton University on the project.

"We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words," Adobe told The Verge.

It's aiming to help content creators edit voiceovers, dialog, and narration either to fix error or change a storyline.

Despite its intended audience, if it is released, it might be hard from that point on to trust a recording of someone's speech. On the other hand, it could open up a whole new way of preserving someone's voice or for using voices in other technology.

Adobe told TechCrunch VoCo is an example of "voice conversion" rather than speech synthesis.

For a deeper dive into the technology, Jin and fellow Princeton researcher Adam Finkelstein teamed up with members from Adobe Research to publish a paper earlier this year describing their CUTE technique for voice conversion, which delivered significant improvements over other methods.

"The goal of voice conversion (VC) is to modify an audio recording containing the voice of one speaker, the source, so that the identity sounds like that of another speaker, the target, without altering the speech content," they write.

Not surprisingly, both Google and Microsoft are working on improving voice conversion using other techniques.


Editorial standards