Facebook is using billions of Instagram images to train artificial intelligence algorithms

Photo by Amelia Holowaty Krales / The Verge

Your Instagram photo of a perfectly composed plate of pancakes or an exquisitely framed sunset is helping Facebook train its artificial intelligence algorithms to better understand objects in images, the company announced today at its annual F8 developer conference. Facebook says the approach, which culls images from publicly available hashtags, is a way to amass and train software with billions of images without the need for human workers to laboriously analyze the data and annotate it. The end result is a training system that created algorithms Facebook says beat top-of-the-line industry benchmarks.

“We rely almost entirely on hand-curated, human-labeled data sets. If a person hasn’t spend the time to label something specific in an image, even the most advanced computer vision systems won’t be able to identity it,” Mike Schroepfer, Facebook’s chief technology officer, said onstage at F8. But using Instagram images that are already labeled by way of hashtags, Facebook was able to collect relevant data and use it to train its computer vision and object recognition models. “We’ve produced state-of-the-art results that are 1 to 2 percent better than any other system on the ImageNet benchmark.”

It’s a practical approach, but it’s also one that raises some interesting questions about privacy and Facebook’s competitive advantage. Because it owns and operates such a large platform encompassing billions of users across apps like Instagram, WhatsApp, and Messenger, Facebook has access to extremely valuable text and image data it can use to inform its AI models, so long as that text and those images are posted publicly. But users may not necessarily be aware that the public data they’ve shared are being mined to build AI systems, and not just for serving ads.

Of course, Facebook is only extracting object-based data at the moment, and it’s not necessarily trying to draw inferences about user behavior from the contents of photos. But as we know with Facebook’s facial recognition system that automatically tags photos, the company does see value in being able to understand who users are with and where they are in the world.

On a grander scale, Facebook is building these AI systems primarily to help it scale its moderation efforts. In addition to 20,000 new human moderators for its platform, Facebook is increasingly looking to automation as it grapples with Russia election interference, the Cambridge Analytica data privacy scandal, and other hard questions about how to moderate content on its platform and keep bad actors from abusing its tools.

“Until very recently we often had to rely on reactive reports. We had to wait for something bad to be spotted by someone and do something about it,” Schroepfer said. Now, he added, the bulk of the moderation is being handled by AI, which is helping the company screen for and scrub its platform of terrorist propaganda, nudity, violence, spam, and hate speech. “This is why we are so focused on core AI research. We require new breakthroughs, and we require new technologies to solve problems all of us want to solve.”

Recommended by Outbrain

Comments

"Ads are planned to feature one of our many AI generated instagram models…"

1 to 2 percent better doesn’t sound like a lot, though

It depends if they mean 1-2% better or 1-2 percentage points higher. The latter would be quite impressive.

It’s 2+ percentage points. They hit 85.4% on a specific ImageNet image classification benchmark. The previous best result was 83.1%. It’s considered a pretty big deal in research circles.

It’s a practical approach, but it’s also one that raises some interesting questions about privacy […] Facebook has access to extremely valuable text and image data it can use to inform its AI models, so long as that text and those images are posted publicly.

I’m sorry but what privacy issues are raised by images that are published publicly ? You post an image on a public Instagram account, you can expect worst things to happen to them than being used to train an AI.

But users may not necessarily be aware that the public data they’ve shared are being mined to build AI systems, and not just for serving ads.


Who exactly is still not aware that Facebook is using whatever you post on their networks to train AIs and do all kind of tech stuff ?

Seriously people if you want your pictures to be safe from everyone in the Silicon Valley store them locally on a computer.

store them locally on a computer in a safety vault in printed form.

So Facebooks algorithms can identify a Thot, and a Thirst trap?

On a grander scale, Facebook is building these AI systems primarily to help it scale its moderation efforts. In addition to 20,000 new human moderators for its platform, Facebook is increasingly looking to automation as it grapples with Russia election interference, the Cambridge Analytica data privacy scandal, and other hard questions about how to moderate content on its platform and keep bad actors from abusing its tools. reverse damage to its advertising business.

I wish I had your optimism.

This will create the most narcissistic AI in the world.

Can’t WAIT!

Great, the AI will have the mentality of a 20-something combined with #brands.

Great…another way to become a product of Facebook. Now more and more of our private behavior getting understood by FB and its machines.

Anime imageboards like Danbooru literally have millions of images all exhaustively tagged by hordes of weaboos

They’re literally begging for AI training.

Interesting, one downside might be the incorrect hashtags some people falsely apply to increase their views.

View All Comments
Back to top ↑