Hip-hop mogul Jay-Z has become an unlikely source of inspiration within the budding community of artificial intelligence-powered impersonation, known colloquially as deepfakes, and the rap icon’s record label has taken notice and issued copyright strikes to shut some of it down.
It’s not that Jay-Z is being maligned without his consent, as a majority of deepfakes tend to do by pasting the head of female celebrities and other women onto the bodies of adult film actresses. Rather, this is the world of safe-for-work, audio-only deepfakes that dabble mostly in parody, as well as putting on display the awe-inspiring technical sophistication the field has produced in just a few short years time.
Roc Nation has issued strikes against some Jay-Z audio deepfakes
In a fascinating deep dive from XOXO festival co-founder Andy Baio over at his website Waxy, Baio looks into AI-powered Jay-Z impersonations on YouTube, specifically one creator using Jay-Z’s iconic voice and hip-hop flow to rhyme classics like William Shakespeare's “To Be or Not to Be” soliloquy from Hamlet and Billy Joel’s “We Didn’t Start the Fire.”
In a strange turn of events, Roc Nation LLC, Jay-Z’s full-service entertainment agency, filed copyright strikes against the YouTube uploads of the above-mentioned deepfakes. The notices specifically cite AI, with Roc Nation writing, “this content unlawfully uses an AI to impersonate our client’s voice,” according to Baio’s conversations with the creator, who remains anonymous and goes by the online handle Voice Synthesis. The channel itself has nearly 40,000 subscribers, and many of its videos have racked up hundreds of thousands of views.
Weirdly enough, Jay-Z deepfake videos featuring the rapper’s synthetic voice rhyming the Book of Genesis and the infamous Navy Seal copypasta meme remain on YouTube. But you can’t chalk it up to those being free to use; “We Didn't Start the Fire” is of course copyrighted, but Shakespeare’s works are public domain.
All of Voice Synethesis’ videos are created by feeding Google’s open source Tacotron 2 text-to-speech model with Jay-Z songs and lyrics and having the synthetic voice read pre-written text. The situation raises fascinating questions about what exactly is being infringed upon here if the synthetic voice is simply producing original content using the likeness of a celebrity. For a deeper look into the copyright issues at play here, I highly suggest you read Baio’s analysis in full because he gets into the finer details of fair use and why Jay-Z’s claims may not hold up in court.
“It seems reasonable to assume that a model and audio generated from copyrighted audio recordings would be considered derivative works. But is it copyright infringement? Like virtually everything in the world of copyright, it depends — on how it was used, and for what purpose,” Baio explains. He uses an example of a record producer featuring Jay-Z on a new single without his permission as an obvious legal transgression before arguing why Vocal Synthesis may likely be in the right here.
After initially removing the videos, YouTube has since reinstated them. “After reviewing the DMCA takedown requests for the videos in question, we determined that they were incomplete,” a Google spokesperson tells The Verge. “Pending additional information from the claimant, we have temporarily reinstated the videos.”
Deepfakes stand on murky legal ground. Some US states like California have banned political candidates from disseminating deepfakes to try and influence an election, while Virginia last year expanded its revenge porn laws to cover computer-generated images and video. A number of social platforms — including Facebook, Reddit, and Twitter — all have deepfake bans on the books that cover a wide-ranging variety of content, in most cases to prohibit manipulated or deceptive video or audio that’s designed to cause harm.
“As far as I’m aware, this was the first time YouTube has removed a video for impersonating a voice using AI.”
But none of these bans seem to cover harmless entertainment, which Voice Synthesis’ videos appear to fall under. “I was pretty surprised to receive the takedown order. As far as I’m aware, this was the first time YouTube has removed a video for impersonating a voice using AI,” the account creator tells Baio in an interview, which you should definitely read in full over at his website. “I’ve been posting these kind of videos for months and have not had any other videos removed for this reason. There are also several other channels making speech synthesis videos similar to mine, and I’m not aware of any of them having videos removed for this reason.”
Voice Synthesis has a point; AI-powered voice impersonation is all over the internet, especially on YouTube. It’s just that, in this one particular case, a well-known celebrity happened to take notice when he became the subject and took an action that may have some far-reaching consequences. Although YouTube’s decision to reinstate the videos seems to illustrate that the copyright argument may not hold water.
“I’m not a lawyer and have not studied intellectual property law, but logically I don’t really understand why mimicking a celebrity’s voice using an AI model should be treated differently than someone naturally doing an (extremely accurate) impression of that celebrity’s voice,” Voice Synthesis argues.
Roc Nation does not list phone numbers or email addresses on its website for contacting the company or its record label subsidiary.
Update April 28th, 11:26PM ET: Added comment from YouTube clarifying that the takedown requests were incomplete and that the videos in question had been reinstated.