Recording advertisements and product endorsements can be lucrative work for celebrities and influencers. But is it too much like hard work? That’s what US firm Veritone is betting. Today, the company is launching a new platform called Marvel.AI that will let creators, media figures, and others generate deepfake clones of their voice to license as they wish.
“People want to do these deals but they don’t have enough time to go into a studio and produce the content,” Veritone president Ryan Steelberg tells The Verge. “Digital influencers, athletes, celebrities, and actors: this is a huge asset that’s part of their brand.”
With Marvel.AI, he says, anyone can create a realistic copy of their voice and deploy it as they see fit. While celebrity Y is sleeping, their voice might be out and about, recording radio spots, reading audiobooks, and much more. Steelberg says the platform will even be able to resurrect the voices of the dead, using archive recordings to train AI models.
“Whoever has the copyright to those voices, we will work with them to bring them to the marketplace,” he says. “That will be up to the rightsholder and what they feel is appropriate, but hypothetically, yes, you could have Walter Cronkite reading the nightly news again.”
Speech synthesis has improved rapidly in recent years, with machine learning techniques enabling the creation of ever-more realistic voices. (Just think about the difference between how Apple’s Siri sounded when it launched in 2011 and how it sounds now.) Many big tech firms like Amazon offer off-the-shelf text-to-speech models that generate voices at scale that are robotic but not unpleasant. But new companies are also making boutique voice clones that sound like specific individuals, and the results can be near-indistinguishable from the real thing. Just listen to this voice clone of podcaster Joe Rogan, for example.
It’s this leap forward in quality that motivated Veritone to create Marvel.AI, says Steelberg, as well as the potential for synthetic speech to dovetail with the firm’s existing businesses.
Veritone places more than 75,000 ads in podcasts monthly
Although Veritone markets itself as an “AI company,” a big part of its revenue apparently comes from old-school advertising and content licensing. As Steelberg explains, its advertising subsidiary Veritone One is heavily invested in the podcast space, and every month places more than 75,000 “ad integrations” with influencers. “It’s mostly native integrations, like product placements,” he says. “It’s getting the talent to voice sponsorships and commercials. That’s extremely effective but very expensive and time consuming.”
Another part of the firm, Veritone Licensing, licenses video from a number of major archives. These include archives owned by broadcasters like CBS and CNN and sports organizations like the NCAA and US Open. “When you see the Apollo moon landing footage show up in movies, or Tiger words content in a Nike commercial, all that’s coming through Veritone,” says Steelberg. It’s this experience with licensing and advertising that will give Veritone the edge over AI startups focusing purely on technology, he says.
To customers, Marvel.AI will offer two streams. One will be a self-service model, where anyone can pick from a catalog of pre-generated voices and create speech on demand. (This is how Amazon, Microsoft, et al. have been doing it for years.) But the other stream — “the focus,” says Steelberg — will be a “managed, white-glove approach,” where customers submit training data, and Veritone will create a voice clone just for them. The resulting models will be stored on Veritone’s systems and available to generate audio as and when the client wants. Marvel.AI will also function as a marketplace, allowing potential buyers to submit requests to use these voices. (How all this will be priced isn’t yet clear.)
Veritone has yet to prove its AI voices are worth the effort
Steelberg is convincing that the demand for these voices exists and that Veritone’s business model is ready to go. But one major factor will decide whether Marvel.AI succeeds: the quality of the AI voices the platform can generate. And this is much less certain.
When asked for examples of the company’s work, Veritone shared three short clips with The Verge, each a single line endorsement for a brand of mints. The first line is read by Steelberg himself, the second by his AI clone, and the third by a gender-swapped voice. You can listen to all three below:
The AI clone is, to my ear at least, a pretty good imitation, though not a perfect copy. It’s flatter and more clipped than the real thing. But it’s also not a full demonstration of what voices can do during an endorsement. Steelberg’s delivery lacks the enthusiasm and verve you’d expect of a real ad (we’re not faulting him for this — he’s an executive, not a voice actor), and so it’s not clear if Veritone’s voice models can capture a full range of emotion.
It’s also not a great sign that the voiceover for the platform’s sizzle reel (embedded at the top of the story) was done by Steelberg himself rather than an AI copy. Either the company didn’t think a voice clone was good enough for the job, or it ran out of time to generate one — either way, it’s not a great endorsement of the product.
The technology is moving quickly, though, and Steelberg is keen to stress that Veritone has the resources and expertise to adopt whatever new machine learning models emerge in the years to come. Where it’s going to differentiate itself, he says, is managing the experience of customers and clients into actually deploying synthetic speech at scale.
Synthetic content will raise lots of questions about authenticity
One problem Steelberg offers is how synthetic speech might dilute the power of endorsements. After all, the attraction of product endorsement hinges on the belief (however delusional) that this or that celebrity really does enjoy this particular brand of cereal / toothpaste / life insurance. If the celeb can’t be bothered to voice the endorsement themselves, doesn’t it take away from the ad’s selling power?
Steelberg’s solution is to create an industry standard for disclosure — some sort of audible tone that plays before synthetic speech to a) let listeners know it’s not a real voice, and b) reassure them that the voice’s owner endorses this use. “It’s not just about avoiding the negative connotations of tricking the consumer, but also wanting them to be confident that [this or that celebrity] really approved this synthetic content,” he says.
It’s questions like these that are going to be increasingly important as synthetic content becomes more common, and it’s clear Veritone has been thinking hard about these issues. Now the company just needs to convince the influencers, athletes, actors, podcasters, and celebrities of the world to lend it their voices.