Facebook is open sourcing a set of computer vision software tools that can identify both the variety and the shape of objects within photos. The tools, developed by the Facebook AI Research (FAIR) team, are called DeepMask, SharpMask, and MultiPathNet, and all three work in tandem to help break down and contextualize the contents of images. These technologies, though not in active use in consumer Facebook products right now, are similar to the software the company uses to describe photos to blind users, a feature it calls "automatic alternative text" that launched back in April.
DeepMask and SharpMask are more experimental research projects and focus on what the FAIR team calls segmentation. While human beings can discern the various elements of a photograph in mere seconds, the process is much harder for computers, which perceive pixels as a series of number values corresponding to changes in color. It’s not easy then to help software make sense of where an image’s background becomes its subject, or which parts of the foreground can be broken down into distinct objects. It’s also difficult to then have the computer identify the objects correctly.
Through machine learning, a widely used AI training technique, Facebook is able to teach algorithms how to perform traditional human cognitive tasks by feeding what are called neural networks large sets of data. These sets are essentially millions upon millions of examples, from which these neural nets can develop an understanding of real-world objects and environmental traits. In other words, show an algorithm enough pictures of a sheep — and tell the algorithm that what it’s seeing is a sheep — and it will begin identifying the animal in photos on its own.
DeepMask is used to segment out different objects in a photo
The process by which a neural net identifies these objects is called segmentation, which asks the computer a series of yes / no questions about an image in an attempt to classify its contents. That’s DeepMask’s role, whereas SharpMask is used to refine the selection of objects for better accuracy.
"DeepMask knows nothing about specific object types, so while it can delineate both a dog and a sheep, it can't tell them apart," writes FAIR research scientist Piotr Dollar in a technical paper. To do that, the team relies on MultiPathNet, along with foundational object recognition techniques developed by Ross Girshick, a former member of Microsoft Research and a current FAIR research scientist. MultiPathNet effectively tells objects apart and categorizes them.
FAIR sees a wide variety of applications for this type of image and object recognition. Beyond the obvious use cases, like letting you search for an image without having to tag it, this type of AI can be very useful to people with disabilities. "Our goal is to enable even more immersive experiences that allow users to 'see' a photo by swiping their finger across an image and having the system describe the content they're touching," writes Dollar.
So why is Facebook giving this technology away for free? "We open source our code and publish our discoveries as academic papers freely available from open-access sites and want to encourage others to make it easier to share techniques and technologies," a Facebook spokesperson told The Verge. "It’s our hope that others will be able to work with us to improve our tools and technologies."
Facebook and Google are racing one another to develop smarter AI
It’s also important to remember, however, that much of this research is also being conducted by companies like Microsoft and Google. The latter uses AI-powered image recognition to surface a photo of your vacation when you search for "beach" in Google Photos, as well as natural language processing to prewrite email responses and auto-complete search requests. Google has open sourced some of its technology too, like its TensorFlow AI-training software. So it’s likely Facebook feels pressure to contribute to the research community in hopes its approaches and tools to building and training AI aren’t supplanted by others.
Looking down the line, FAIR wants to tackle the challenge of identifying what’s happening in video, a much harder task given the movement and interaction of objects in a frame. This video-centric path is a no-brainer for Facebook. CEO Mark Zuckerberg has identified the media format as his social network’s biggest opportunity over the next five years, before things like virtual reality and more powerful AI truly take off. So far, we’ve seen Facebook transform its site and mobile apps over the last year into premiere destinations for both prerecorded video and live-streamed clips from users and news organizations alike. The next step is to draw insights from those videos, just as Facebook does with photos today.
"We've already made some progress with computer vision techniques to watch videos and, in real time, understand and classify what's in them, for example, cats or food," Dollar explains. "Real-time classification could help surface relevant and important live videos on Facebook, while applying more refined techniques to detect scenes, objects, and actions over space and time could one day allow for real-time narration."