google assistant
Ars Technica

Voice-controlled digital assistants such as Apple's Siri, Amazon Alexa and Google Assistant have come a long way. They now sound more human and their responses to your everyday questions have become wittier. They also allow you to choose their gender and change their language and accents. But a new study has found that the voice-recognition systems that power the virtual assistants have trouble understanding black users' accents more than they have trouble understanding white users', suggesting that they may be ingrained with racial bias.

The study, led by academics at Stanford University and Georgetown University in the US, examined how voice recognition systems from top tech firms fared when transcribing the voices of black users and white users.

Speech-recognition systems have trouble transcribing black folks

The researchers probed five "state-of-the-art" cloud-based automated speech recognition (ASR) tools from Apple, Amazon, Google, IBM and Microsoft which could transcribe speech-to-text. As part of the study, they gave the voice-recognition systems almost 20 hours of interviews with 42 white and 73 black interviewees to transcribe.

All five of the ASR systems exhibited "substantial racial disparities", with an average word error rate (WER) of 35 per cent for black interviewees, compared with a 19 per cent error rate for white interviewees.

In other words, for every 100 words spoken by both black and white speakers, the machine-learning tools misunderstood 35 words for black speakers, while they failed to understand only 19 words in case of the white speakers, on average.

Apple's ASR is worst, Microsoft's best

The study also found that Apple's speech-recognition system performed the worst when it came to black speakers, with an average WER of 45 per cent for black speakers compared to 23 per cent for white speakers. On the other hand, Microsoft has the best results with an average WER of 27 per cent for black and 15 per cent for white speakers.

Researchers also found 2 per cent of audio snippets from white speakers that were unreadable, compared to 20 per cent unreadable audio for black speakers.

Black men more misunderstood than black women

The study also suggests that the voice-recognition systems found it hard to understand black men more than black women, with an average WER of 41 per cent for black males compared to 30 per cent for black females that were interviewed. So it seems there's a bit of gender bias too.

This isn't the first instance of a machine learning and artificial intelligence (AI) tool having a problem with racial and gender bias.

AI tools have been racially braised in the past too

Earlier research has shown that AI facial recognition technology can become racially biased too, ending up misidentifying people with darker skin tones more often than not.

The latest Stanford study on ASR systems mirrors an MIT study from last year which found Amazon's facial recognition system made almost zero mistakes when identifying the gender of men with light skin, but made errors when identifying the gender of individuals with darker skin. There are several other reports of similar racial and gender biases in facial recognition systems from Microsoft and IBM.

Why this disparity?

Experts and researchers believe that the reason for these types of bias is the fact that the data used to develop AI and machine learning tools comes from predominantly white people.

The researchers from the Stanford study urge companies that make speech recognition systems to collect better data on African American Vernacular English (AAVE) and other accents of English, including regional accents, such as the accents spoken in the southern states of the US.

They also suggest that these types of bias will make it harder for the African American community to benefit from virtual assistants like Siri and Alexa, which are powered by automated speech recognition systems. Such disparities could also harm the community when speech recognition systems are used in a professional setting such as job interviews and courtroom transcriptions, according to The Verge.

"Compared to 'traditional' forms of discrimination, automated discrimination is more abstract and unintuitive, subtle, intangible, and difficult to detect," a Business Insider report quoted AI expert Sandra Watcher as saying.

She suggests two ways to fight such ingrained bias. Firstly, diversify the dataset. Secondly, give courts the tools to detect and punish when algorithms are continuing historic discrimination. She also says that bias testing is essential and that we should act now or we will not only "exacerbate existing inequalities in our society" but also make them "less detectable."

Google and IBM told The Verge that they are committed to making progress in the area and are working continuously to develop and improve their speech recognition systems.