Rapid progress in voice cloning technology is making it harder to tell real voices from synthetic ones. But while audio deepfakes — which can trick people into giving up sensitive information — are a growing problem, there are some good and legitimate uses for the technology as well, a group of experts told an FTC workshop this week.
“People have been mimicking voices for years, but just in the last few years, the technology has advanced to the point where we can clone voices at scale using a very small audio sample,” said Laura DeMartino, associate director in the FTC’s division of litigation technology and analysis.
At its first public workshop on audio cloning technology, the FTC enlisted experts from academia, government, medicine, and entertainment to highlight the implications of the tech and the potential harms.
FTC spokesperson Juliana Gruenwald Henderson said after the workshop that impostor schemes are the number one type of complaint the agency receives. “We began organizing this workshop after learning that machine learning techniques are rapidly improving the quality of voice clones,” she said in an email.
Deepfakes, both audio and visual, let criminals communicate anonymously, making it much easier to pull off scams, says Mona Sedky of the Department of Justice Computer Crime and Intellectual Property Section. Sedky, who said she was the “voice of doom” on the panel, says communication-focused crime has historically been less appealing to criminals because it’s hard and time-consuming to pull off. “It’s difficult to convincingly pose as someone else,” she says. “But with deep fake audio and anonymizing tools, you can communicate anonymously with people anywhere in the world.”
Sedky said audio cloning can be weaponized just like the internet can be weaponized. “That doesn’t mean we shouldn’t use the internet, but there may be things we can do, things on the front end, to bake into the technology to make it harder to weaponize voices.”
John Costello, director of the Augmentative Communication Program at Boston Children’s Hospital, said audio cloning technology has practical applications for patients who lose their voice. They’re able to “bank” audio samples that can then be used to create synthetic versions of their voices later on. “Many people want to make sure they have an authentic-sounding synthetic voice, so after they lose their voice, for things they never thought to bank, they want to be able to ‘speak’ those things and have it sound like themselves,” he said.
For voice actors and performers, the concept of audio cloning presents a different set of problems, including consent and compensation for use of their voices, said Rebecca Damon of the Screen Actors Guild - American Federation of Television and Radio Artists. A voice actor may have contractual obligations around where their voice is heard, or may not want their voice to be used in a way not compatible with their beliefs, she said.
And for broadcast journalists, she added, the misuse or replication of their voices without permission has the potential to affect their credibility. “A lot of times people get excited and rush in with the new technology and then don’t necessarily think through all the applications,” Damon said.
While people often talk about social media and its ability to spread audio and video deepfakes — think of the faked Joe Rogan voice, or the AI-assisted impersonation of President Obama by Jordan Peele — most of the panelists agreed that the most immediate audio deepfake concern for most consumers was via telephone.
“Social media platforms are the front line, that is where messages are getting conveyed and latched on to and disseminated,” said Neil Johnson, an advisor with the Defense Advanced Research Projects Agency (DARPA). And text-to-speech applications that generate voices, like when a company calls to tell you a package has been delivered, have widespread and valuable applications. But Johnson cited an example of a UK company that was extorted for about $220,000 because someone spoofed the CEO’s voice for a wire transfer scam.
Patrick Traynor of the Herbert Wertheim College of Engineering at the University of Florida said the sophistication around phone scams and audio deepfakes was likely to continue to improve. “Ultimately, it will be a combination of techniques that will get us there,” to combat and detect synthetic or faked voices, he said. The best way to determine if a caller is who they say they are, Traynor added, is a tried-and-true method: “Hang up and call them back. Unless it’s a state actor who can reroute phone calls or a very, very sophisticated hacking group, chances are that’s the best way to figure out if you were talking to who you thought you were.”