To make a convincing deepfake — an AI-generated fake of a video or audio clip — you usually need a neural model that’s trained with a lot of reference material. Generally, the larger your dataset of photos, video, or sound, the more eerily accurate the result will be. But now, researchers at Samsung’s AI Center have devised a method to train a model to animate with an extremely limited dataset: just a single photo, and the results are surprisingly good.
The researchers are able to achieve this effect, (as spotted by Motherboard) by training its algorithm on “landmark” facial features (the general shape of the face, eyes, mouth shape, and more) scraped from a public repository of 7,000 images of celebrities gathered from YouTube.
From there, it can map these features onto a photo to bring it to life. As the team proves, its model even works on the Mona Lisa, and other single-photo still portraits. In the video, famous portraits of Albert Einstein, Fyodor Dostoyevsky, and Marilyn Monroe come to life as if they’re Live Photos in your iPhone’s camera roll.
Like with most deepfakes, it’s pretty easy to see the seams at this stage. Most of the faces are surrounded by visual artifacts. Though, fixing this component is likely easier compared to the feat of accurately faking the Mona Lisa to look like she’s a breathing human.
Despite some flaws, fake videos and audio are getting more realistic. If you need more proof, check out this uncanny AI-generated recreation of Joe Rogan’s voice. As researchers continue to come up with low-lift methods for making high-quality fakes, there’s a concern that they’ll be used against people in the form of propaganda — or to depict people in situations they’d object to, like pornographic videos, which was the software’s original purpose. According to my colleague Russell Brandom, the potential political danger of deepfakes is real, but the worry is currently overblown.