Skip to main content

Facebook is using billions of Instagram images to train artificial intelligence algorithms

Facebook is using billions of Instagram images to train artificial intelligence algorithms


Your brunch and sunset photos are helping software learn how to see like humans do

Share this story

Photo by Amelia Holowaty Krales / The Verge

Your Instagram photo of a perfectly composed plate of pancakes or an exquisitely framed sunset is helping Facebook train its artificial intelligence algorithms to better understand objects in images, the company announced today at its annual F8 developer conference. Facebook says the approach, which culls images from publicly available hashtags, is a way to amass and train software with billions of images without the need for human workers to laboriously analyze the data and annotate it. The end result is a training system that created algorithms Facebook says beat top-of-the-line industry benchmarks.

“We rely almost entirely on hand-curated, human-labeled data sets. If a person hasn’t spend the time to label something specific in an image, even the most advanced computer vision systems won’t be able to identity it,” Mike Schroepfer, Facebook’s chief technology officer, said onstage at F8. But using Instagram images that are already labeled by way of hashtags, Facebook was able to collect relevant data and use it to train its computer vision and object recognition models. “We’ve produced state-of-the-art results that are 1 to 2 percent better than any other system on the ImageNet benchmark.”

Facebook needs better trained AI to help it scale its moderation efforts

It’s a practical approach, but it’s also one that raises some interesting questions about privacy and Facebook’s competitive advantage. Because it owns and operates such a large platform encompassing billions of users across apps like Instagram, WhatsApp, and Messenger, Facebook has access to extremely valuable text and image data it can use to inform its AI models, so long as that text and those images are posted publicly. But users may not necessarily be aware that the public data they’ve shared are being mined to build AI systems, and not just for serving ads.

Of course, Facebook is only extracting object-based data at the moment, and it’s not necessarily trying to draw inferences about user behavior from the contents of photos. But as we know with Facebook’s facial recognition system that automatically tags photos, the company does see value in being able to understand who users are with and where they are in the world.

On a grander scale, Facebook is building these AI systems primarily to help it scale its moderation efforts. In addition to 20,000 new human moderators for its platform, Facebook is increasingly looking to automation as it grapples with Russia election interference, the Cambridge Analytica data privacy scandal, and other hard questions about how to moderate content on its platform and keep bad actors from abusing its tools.

“Until very recently we often had to rely on reactive reports. We had to wait for something bad to be spotted by someone and do something about it,” Schroepfer said. Now, he added, the bulk of the moderation is being handled by AI, which is helping the company screen for and scrub its platform of terrorist propaganda, nudity, violence, spam, and hate speech. “This is why we are so focused on core AI research. We require new breakthroughs, and we require new technologies to solve problems all of us want to solve.”