clock menu more-arrow no yes

Filed under:

Lyrebird claims it can recreate any voice using just one minute of sample audio

New, 5 comments

The results aren’t 100 percent convincing, but it’s a sign of things to come

Microphone (stock Lowensohn)

Artificial intelligence is making human speech as malleable and replicable as pixels. Today, a Canadian AI startup named Lyrebird unveiled its first product: a set of algorithms the company claims can clone anyone’s voice by listening to just a single minute of sample audio.

A few years ago this would have been impossible, but the analytic prowess of machine learning has proven to be a perfect fit for the idiosyncrasies of human speech. Using artificial intelligence, companies like Google have been able to create incredibly life-like synthesized voices, while Adobe has unveiled its own prototype software called Project VoCo that can edit human speech like Photoshop tweaks digital images.

But while Project VoCo requires at least 20 minutes of sample audio before it can mimic a voice, Lyrebird cuts this requirements down to just 60 seconds. The results certainly aren’t indistinguishable from human speech, but they’re impressive all the same, and will no doubt improve over time. Below you can hear the synthesized voices of Donald Trump, Barack Obama, and Hillary Clinton discussing the startup:

Lyrebird says its algorithms can also infuse the speech it creates with emotion, letting customers make voices sound angry, sympathetic, or stressed out. The resulting speech can be put to a wide range of uses, says Lyrebird, including “reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.” It takes quite a bit of computing power to generate a voice-print, but once done, the speech is easy to make — Lyrebird can create one thousand sentences in less than half a second.

There are more troubling uses as well. We already know that synthetic voice generators can trick biometric software used to verify identity. And, given enough source material, AI programs can generate pretty convincing fake pictures and video of anyone you like. For example, this research from 2016 uses 3D mapping to turn videos of famous politicians, including George W. Bush and Vladimir Putin, into real-time “puppets” controlled by engineers. Combine this with a realistic voice synthesizer and you could have a Facebook video of Donald Trump announcing that the US is bombing North Korea going viral before you know it. That said, while Lyrebird does do a good Trump impression, its other voices are noticeably more robotic:

Lyrebird is aware of these problems, but its suggested fix feels far from adequate. In an “Ethics” section on the company’s website, Lyrebird’s founders (three university students from the University of Montréal) acknowledge that their technology “raises important societal issues,” including bringing into question the veracity of audio recordings used in court. “This could potentially have dangerous consequences such as misleading diplomats, fraud, and more generally any other problem caused by stealing the identity of someone else,” they write.

Their solution is to release the technology publicly and make it “available to anyone.” That way, they say, the damage will be lessened because “everyone will soon be aware that such technology exists.” Speaking to The Verge, Alexandre de Brébisson of Lyrebird adds: “The situation is comparable to Photoshop. People are now aware that photos can be faked. I think in the future, audio recordings are going to become less and less reliable [as evidence].” However, de Brébisson concedes that even though Photoshop is now well known, people still fall for convincing fakes in the right context. The same would surely be true of voice synthesis.

For now, Lyrebird tech’s is still in development, and the company doesn’t want to discuss pricing. But de Brébisson says more than 6,000 individuals have signed up for early access to its APIs, and Lyrebird is working to improve its algorithms, including adding support for different languages like French. “This technology is going to happen,” says de Brébisson. “If it’s not us it’s going to be someone else.”

Update April 25th, 12.30PM ET: Updated with quotes from Lyrebird’s Alexandre de Brébisson