Innovation

Finally, text-to-speech that doesn't suck

Big backing for company that makes machines sound human.

Written by Greg Nichols, Contributing Writer July 7, 2021 at 9:00 a.m. PT

We're a couple of decades into the 21st century, cars are literally starting to fly, a vacation to space is just around the corner ... and yet somehow, computers still sound like parodies of confused robots whenever asked to convert text-to-speech (TTS). Come on, devs, there has to be a better solution.

A firm called WellSaid Labs believes it has one, and it's getting a boost thanks to an oversubscribed Series A.

"Plain and simple, WellSaid is the future of content creation for voice. This is why thousands of customers love using the product daily with off-the-charts bottom-up adoption. Matt and Michael have assembled a world-class team, and we couldn't be more thrilled to be a part of the WellSaid journey," says Cameron Borumand, General Partner at FUSE, which led the round.

I'll just cut to the chase and tell you you can listen to samples of the voices here.

The problem of making a digitized voice sound human when converting text to speech is deceptively complex, one of the grand challenges in the field of AI and a subject of considerable research in fields like computer science, human-machine-interface, and robotics. In June 2020, according to a statement, WellSaid Labs' text-to-speech became the first to achieve human parity for naturalness on short audio clips across multiple voices.

"We've added AI Voice to the toolkit of thousands of content creators and their teams," says Matt Hocking, CEO of WellSaid Labs. "Our human-parity AI voice can be produced faster than real-time and updated on-demand. Opening up new and exciting opportunities to "add voice" was never before perceived possible. AI voice easily ensures every production can be created and updated efficiently at scale."

The human parity milestone has significant implications for how audio content is created, which has made investors keen to jump on board. Use cases include streaming services, radio, programmatic advertising, digital marketing, and corporate training content. WellSaid Labs has a Voice Avatar library that provides access to multiple read styles and tones. In addition, brands can create their own AI Voice Avatars to capture the voice's likeliness, style, and uniqueness needed to tell their stories.

"Content creators or product experience designers were previously faced with difficult tradeoffs between quality and scalability when using TTS tools or human voiceover. WellSaid's incredible voices, accessible through a studio application or a scalable API, remove the need to choose whether you want natural, lifelike speech or infinitely scalable and easily editable voice content. WellSaid provides both and delivers it however your team would like to consume it," says James Newell of Voyager Capital. "Creative teams have found it to be extremely useful when they need to produce multiple pieces of high-quality content in a consistent voice in hours instead of weeks."

Robotics

Editorial standards

Show Comments

Finally, text-to-speech that doesn't suck

Robotics

Related

The best robot mops you can buy: Expert tested

How Apple can rescue miserable Sonos users

The best robot vacuum mops you can buy: Expert tested