SAN FRANCISCO — With an iPhone, you can dictate a text message. Put Amazon’s Alexa on your coffee table, and you can request a song from across the room.

But these devices may understand some voices better than others. Speech recognition systems from five of the world’s biggest tech companies — Amazon, Apple, Google, IBM and Microsoft — make far fewer errors with users who are white than with users who are black, according to a study published Monday in the journal Proceedings of the National Academy of Sciences.

The systems misidentified words about 19 percent of the time with white people. With black people, mistakes jumped to 35 percent. About 2 percent of audio snippets from white people were considered unreadable by these systems, according to the study, which was conducted by researchers at Stanford University. That rose to 20 percent with black people.

The study, which took an unusually comprehensive approach to measuring bias in speech recognition systems, offers another cautionary sign for A.I. technologies rapidly moving into everyday life.

Other studies have shown that as facial recognition systems move into police departments and other government agencies, they can be far less accurate when trying to identify women and people of color. Separate tests have uncovered sexist and racist behavior in “chatbots,” translation services, and other systems designed to process and mimic written and spoken language.

“I don’t understand why there is not more due diligence from these companies before these technologies are released,” said Ravi Shroff, a professor of statistics at New York University who explores bias and discrimination in new technologies. “I don’t understand why we keep seeing these problems.”

“Here are probably the five biggest companies doing speech recognition, and they are all making the same kind of mistake,” said John Rickford, one of the Stanford researchers behind the study, who specializes in African-American speech. “The assumption is that all ethnic groups are well represented by these companies. But they are not.”

The best performing system, from Microsoft, misidentified about 15 percent of words from white people and 27 percent from black people. Apple’s system, the lowest performer, failed 23 percent of the time with whites and 45 percent of the time with black people.

Based in a largely African-American rural community in eastern North Carolina, a midsize city in western New York and Washington, D.C., the black testers spoke in what linguists call African-American Vernacular English — a variety of English sometimes spoken by African-Americans in urban areas and other parts of the United States. The white people were in California, some in the state capital, Sacramento, and others from a rural and largely white area about 300 miles away.

The study found that the “race gap” was just as large when comparing the identical phrases uttered by both black and white people. This indicates that the problem lies in the way the systems are trained to recognize sound. The companies, it seems, are not training on enough data that represents African-American Vernacular English, according to the researchers.

Companies like Google may have trouble gathering the right data, and they may not be motivated enough to gather it. “This is difficult to fix,” said Brendan O’Connor, a professor at the University of Massachusetts Amherst who specializes in A.I. technologies. “The data is hard to collect. You are fighting an uphill battle.”

The companies may face a chicken-and-egg problem. If their services are used mostly by white people, they will have trouble gathering data that can serve black people. And if they have trouble gathering this data, the services will continue to be used mostly by white people.

“Those feedback loops are kind of scary when you start thinking about them,” said Noah Smith, a professor at the University of Washington. “That is a major concern.”



Please enter your comment!
Please enter your name here