Instagram has shared new details on how its app uses machine learning to surface content for users, stressing that, when making recommendations, it focuses on finding accounts it thinks people will enjoy, rather than individual posts.
The blog post is technical in nature and contains no big surprises, but it offers an interesting behind-the-scenes perspective at a time when algorithmic recommendation systems are under scrutiny for pushing users toward dangerous, hateful, and extremist content.
While Instagram has not been criticized with the same ferocity as YouTube (dubbed “the Great Radicalizer” by The New York Times), it certainly has its share of problems. Hateful content and misinformation thrive on the platform as much as any other social network, and certain mechanisms in the app (like its suggested follows feature) have been shown to push users toward extreme viewpoints for topics like anti-vaccination.
In its blog post, though, Instagram’s engineers explain the operation of the Explore tab while steering clear of thorny political issues. “This is the first time we’re going into heavy detail on the foundational building blocks that help us provide personalized content at scale,” Instagram software engineer Ivan Medvedev told The Verge over email. (You can read about how Instagram organizes content on the main feed in this story from last year.)
The post emphasizes that Instagram is huge, and the content it contains is extremely varied, “with topics varying from Arabic calligraphy to model trains to slime.” This presents a challenge for recommending content, which Instagram overcomes by focusing not on what posts users might like to see, but on what accounts might interest them instead.
Instagram identifies accounts that are similar to one another by adapting a common machine learning method known as “word embedding.” Word embedding systems study the order in which words appear in text to measure how related they are. So, for example, a word embedding system would note that the word “fire” often appears next to the words “alarm” and “truck,” but less frequently next to the words “pelican” or “sandwich.” Instagram uses a similar process to determine how related any two accounts are to one another.
To make its recommendations, the Explore system begins by looking at “seed accounts,” which are accounts that users have interacted with in the past by liking or saving their content. It identifies accounts similar to these, and from them, it selects 500 pieces of content. These candidates are filtered to remove spam, misinformation, and “likely policy-violating content,” and the remaining posts are ranked based on how likely a user is to interact with each one. Finally, the top 25 posts are sent to the first page of the user’s Explore tab.
There are a few things to note here. First, Instagram is not being completely transparent about its process. There are no details on what signals are used to identify spam or misinformation, and that’s not too surprising considering that explaining this would help individuals who want to spread this sort of content. The company is also unclear about to what degree machine learning is used to filter inappropriate content, a key detail given that Facebook often presents AI as a magic bullet for moderation (while experts disagree).
Take the example of anti-vax content. Instagram has cracked down on this but mainly leveraging manual methods. It blocks hashtags that contain what it says is “verifiably false information” like “#vaccinescauseaids” and relies on health agencies like the World Health Organization to flag dangerous posts, which it takes down.
Will AI be useful? It’s not clear, but Medvedev says the company is working on it. “We are also training AI models to proactively detect vaccine misinformation and take automatic action,” he says.
The second takeaway from the post is that, by Instagram’s own telling, the best way for users to shape what content they see in the Explore tab is by interacting with the stuff they like. (That is good for Instagram, I guess!) If you don’t want to see certain sorts of posts, then your best bet is to use the “see fewer posts like this” tool, which you can access by clicking the three-dot menu in the top-right corner of each post. The algorithm will notice.