Speech recognition systems have more trouble understanding black users’ voices than those of white users, according to a new Stanford study.
The researchers used voice recognition tools from Apple, Amazon, Google, IBM, and Microsoft to transcribe interviews with 42 white people and 73 black people, all of which took place in the US. The tools misidentified words about 19 percent of the time during the interviews with white people and 35 percent of the time during the interviews with black people. The system found 2 percent of audio snippets from white people to be unreadable, compared to 20 percent of those from black people. The errors were particularly large for black men, with an error rate of 41 percent compared to 30 percent for black women.
Previous research has shown that facial recognition technology shows similar bias. An MIT study found that an Amazon facial recognition service made no mistakes when identifying the gender of men with light skin, but performed worse when identifying an individual’s gender if they were female or had darker skin. Another paper identified similar racial and gender biases in facial recognition software from Microsoft, IBM, and Chinese firm Megvii.
In the Stanford study, Microsoft’s system achieved the best result, while Apple’s performed the worst. It’s important to note that these aren’t necessarily the tools used to build Cortana and Siri, though they may be governed by similar company practices and philosophies.
“Fairness is one of our core AI principles, and we’re committed to making progress in this area,” said a Google spokesperson in a statement to The Verge. “We’ve been working on the challenge of accurately recognizing variations of speech for several years, and will continue to do so.”
“IBM continues to develop, improve, and advance our natural language and speech processing capabilities to bring increasing levels of functionality to business users via IBM Watson,” said an IBM spokesperson. The other companies mentioned in the paper did not immediately respond to requests for comment.
The Stanford paper posits that the racial gap is likely the product of bias in the datasets that train the system. Recognition algorithms learn by analyzing large amounts of data; a bot trained mostly with audio clips from white people may have difficulty transcribing a more diverse set of user voices.
The researchers urge makers of speech recognition systems to collect better data on African American Vernacular English (AAVE) and other varieties of English, including regional accents. They suggest these errors will make it harder for black Americans to benefit from voice assistants like Siri and Alexa. The disparity could also harm these groups when speech recognition is used in professional settings, such as job interviews and courtroom transcriptions.
Update March 24th, 2:33PM ET: This post has been updated with statements from Google and IBM.