People like to argue that technology is value neutral; that it’s neither inherently good nor inherently bad, but can simply be put to different uses. As a rebuttal, I’d like to direct the court’s attention to exhibit a), a video of “digital humans” rapping using AI-synthesized voices, that is intrinsically awful.
Indeed, I’d argue that the video above not only disproves the whole value-neutral thing, but makes a decent case for shutting down this “technology” lark altogether and heading back to the trees before it’s too late. What I mean is: AI is posting cringe and I don’t like it.
Okay, so I’m being a little harsh here and the video is obviously a joke. It’s the work of Replica, an AI startup that does interesting things with synthetic speech. The company tells us that during a recent hackathon, one employee worked out how to capture live audio of himself rapping and transfer “the timing, cadence and energy of his delivery onto one of our AI voices.” Combined with a little 3D animation and rendering, this video is the result.
For fairness, here’s Replica’s mea culpa, sent to us via email:
“DISCLAIMER - we know this video is deep in the heart of the uncanny valley. That’s not because the tech is bad, it’s because we’re amateurs at using 3D real-time rendering software - that’s not our speciality. The only reason this video exists is because the team created this during an internal company hackathon for fun using a new feature that’s under development, not yet open to the public.”
That feature in question is an upcoming integration due to be announced at GDC in July between Replica’s speech synthesis tools and Unreal Engine’s MetaHumans software, which generates realistic CGI humans. By combining the two tools, says Replica, anyone will be able to “create lip sync dialogue for games and movies, and even rap.”
As a reminder, though, you can also not do that. Just a thought.