Google is partnering with a Harvard professor to promote a new scale for measuring skin tones with the hope of fixing problems of bias and diversity in the company’s products.
The tech giant is working with Ellis Monk, an assistant professor of sociology at Harvard and the creator of the Monk Skin Tone Scale, or MST. The MST Scale is designed to replace outdated skin tone scales that are biased towards lighter skin. When these older scales are used by tech companies to categorize skin color, it can lead to products that perform worse for people with darker coloring, says Monk.
“Unless we have an adequate measure of differences in skin tone, we can’t really integrate that into products to make sure they’re more inclusive,” Monk tells The Verge. “The Monk Skin Tone Scale is a 10-point skin tone scale that was deliberately designed to be much more representative and inclusive of a wider range of different skin tones, especially for people [with] darker skin tones.”
Fixing bias in AI often means fixing training data
There are numerous examples of tech products, particularly those that use AI, that perform worse with darker skin tones. These include apps designed to detect skin cancer, facial recognition software, and even machine vision systems used by self-driving cars.
Although there are lots of ways this sort of bias is programmed into these systems, one common factor is the use of outdated skin tone scales when collecting training data. The most popular skin tone scale is the Fitzpatrick scale, which is widely used in both academia and AI. This scale was originally designed in the ’70s to classify how people with paler skin burn or tan in the sun and was only later expanded to include darker skin.
This has led to some criticism that the Fitzpatrick scale fails to capture a full range of skin tones and may mean that when machine vision software is trained on Fitzpatrick data, it, too, is biased towards lighter skin types.
The Fitzpatrick scale is comprised of six categories, but the MST Scale expands this to 10 different skin tones. Monk says this number was chosen based on his own research to balance diversity and ease of use. Some skin tone scales offer more than a hundred different categories, he says, but too much choice can lead to inconsistent results.
“Usually, if you got past 10 or 12 points on these types of scales [and] ask the same person to repeatedly pick out the same tones, the more you increase that scale, the less people are able to do that,” says Monk. “Cognitively speaking, it just becomes really hard to accurately and reliably differentiate.” A choice of 10 skin tones is much more manageable, he says.
Creating a new skin tone scale is only a first step, though, and the real challenge is integrating this work into real-world applications. In order to promote the MST Scale, Google has created a new website, skintone.google, dedicated to explaining the research and best practices for its use in AI. The company says it’s also working to apply the MST Scale to a number of its own products. These include its “Real Tone” photo filters, which are designed to work better with darker skin tones, and its image search results.
Google says it’s introducing a new feature to image search that will let users refine searches based on skin tones classified by the MST Scale. So, for example, if you search for “eye makeup” or “bridal makeup looks,” you can then filter results by skin tone. In the future, the company also plans to use the MST Scale to check the diversity of its results so that if you search for images of “cute babies” or “doctors,” you won’t be shown only white faces.
“One of the things we’re doing is taking a set of [image] results, understanding when those results are particularly homogenous across a few set of tones, and improving the diversity of the results,” Google’s head of product for responsible AI, Tulsee Doshi, told The Verge. Doshi stressed, though, that these updates were in a “very early” stage of development and hadn’t yet been rolled out across the company’s services.
Google is experimenting with balancing image search results to be more inclusive
This should strike a note of caution, not just for this specific change but also for Google’s approach to fixing problems of bias in its products more generally. The company has a patchy history when it comes to these issues, and the AI industry as a whole has a tendency to promise ethical guidelines and guardrails and then fail on the follow-through.
Take, for example, the infamous Google Photos error that led to its search algorithm tagging photos of Black people as “gorillas” and “chimpanzees.” This mistake was first noticed in 2015, yet Google confirmed to The Verge this week that it has still not fixed the problem but simply removed these search terms altogether. “While we’ve significantly improved our models based on feedback, they still aren’t perfect,” Google Photos spokesperson Michael Marconi told The Verge. “In order to prevent this type of mistake and potentially causing additional harm, the search terms remain disabled.”
Introducing these sorts of changes can also be culturally and politically challenging, reflecting broader difficulties in how we integrate this sort of tech into society. In the case of filtering image search results, for example, Doshi notes that “diversity” may look different in different countries, and if Google adjusts image results based on skin tone, it may have to change these results based on geography.
“What diversity means, for example, when we’re surfacing results in India [or] when we’re surfacing results in different parts of the world, is going to be inherently different,” says Doshi. “It’s hard to necessarily say, ‘oh, this is the exact set of good results we want,’ because that will differ per user, per region, per query.”
Introducing a new and more inclusive scale for measuring skin tones is a step forward, but much thornier issues involving AI and bias remain.