If your facial recognition system works worse with women or people with darker skin, it’s in your own interest to get rid of that bias.
That’s the advice of Joy Buolamwini, an MIT researcher and founder of the Algorithmic Justice League. A huge fraction of the world’s population is made up of women or people who don’t have European-heritage white skin — the undersampled majority, as she called them in a speech Tuesday at the Women Transforming Technology conference.
“You have to include the undersampled majority if you have global aspirations as a company,” she said.
Buolamwini gave companies including Microsoft, IBM and Megvii Face++ some credit for improving their results from her first test in 2017 to a later one in 2018. Bias is a problem with AI, since it can reflect problems in the data used to train AI systems used in the real world. But facial recognition bias is more than just a commercial matter for companies selling the product, since it can also affect bigger issues like justice and institutional prejudice.
Why is there even an “undersampled majority” in facial recognition, one of the hottest areas of AI? Buolamwini rose to prominence — including a TED talk — after her research concluded that facial recognition systems worked better on white men. One problem: measuring results with benchmarks that feature a disproportionately large number of men.
“We have a lot of pale male data sets,” Buolamwini said, mentioning the Labeled Faces in the Wild (LFW) set that’s 78% male and 84% white — and that Facebook used in a 2014 paper on the subject. Another from the US National Institute of Standards and Technology has subjects who are 75.4% male and 80% lighter skinned. “Pale male data sets are destined to fail the rest of the world,” she said.
Just getting the right answer is only one issue with facial recognition. “Accurate facial analysis systems can also be abused,” Buolamwini added, pointing to issues like police scanning and automated military weapons.
Accuracy beyond pale males
In her 2017 research, Buolamwini measured how well facial recognition worked across different genders and skin tones using a data set of 1,270 people she drew from members of parliaments in three European and three African countries. She concluded that the systems worked best on white males and failed most often with the combination of female and dark-skinned.
For example, Microsoft correctly identified the gender of 100% of lighter-skinned men, 98.3% of lighter-skinned women, 94% of darker-skinned men and 79.2% of darker-skinned women — a 20.8 percentage point difference from the best and worst categories. IBM and Face++ fared worse with differences of 34.4 and 33.8 percentage points, respectively.
The 2018 update study that showed improvement also added Amazon and Kairos, with similar results. They each scored 100% with lighter-skinned men, but Amazon assessed gender correctly only 68.6% of the time for darker-skinned women. Kairos scored 77.5%, Buolamwini said.
IBM, which declined to comment for this story, updated its algorithm to improve its performance on tests such as Buolamwini’s and said in 2018 that it’s “deeply committed to delivering services that are unbiased, explainable, value aligned and transparent.” Microsoft also didn’t comment for this story, but said at the time it was committed to improvements. And a few months later, it touted its AI’s improved abilities to handle different genders and skin tones later in 2018. Megvii didn’t respond to a request for comment.
Amazon was more strident, calling some of Buolamwini’s conclusions “false” earlier this year — though also saying it’s “interested in working with academics in establishing a series of standardized tests for facial analysis and facial recognition and in working with policy makers on guidance and/or legislation of its use.” Amazon didn’t comment further for this story. Buolamwini countered Amazon’s stance in a blog post of her own.
But Kairos Chief Executive Melissa Doval agreed with Buolamwini’s general position.
“Ignorance is no longer a viable business strategy,” she said. “Everyone at Kairos supports Joy’s work in helping bring attention to the ethical questions the facial recognition industry has often overlooked. It was her initial study that actually catalyzed our commitment to help fix misidentification problems in facial recognition software, even going so far as completely rethinking how we design and sell our algorithms.”
Troubles for women in tech
Buolamwini spoke at a Silicon Valley conference dedicated to addressing some of the issues women face in technology. Thousands gathered at the Palo Alto, California, headquarters of server and cloud software company VMware for advice, networking, and a chance to improve resumes and LinkedIn profiles.
They also heard tales from those who struggled with sexism in the workplace, most notably programmer Susan Fowler, who skyrocketed to Silicon Valley prominence with a blog post about her ordeals at ride-hailing giant Uber. Her account helped shake Uber to its core.
Most companies and executives don’t want discrimination, harassment or retaliation, she believes. If you do have a problem, she said, skip talking to your manager and go straight to the human resources department and escalate higher if necessary.
“If it is a systemic thing, it’ll never get fixed” unless you speak out, Fowler said. She raised her issues as high as the chief technology officer, but that didn’t help. “OK, I’m going to tell the world,” she recounted. “What else have you left me?”
Sexism isn’t unique to Silicon Valley said Lisa Gelobter, a programmer who’s now the CEO of Tequitable, a company that helps companies with internal conflicts and other problems. What is different is the attitude Silicon Valley has about improving the world.
“Silicon Valley has this ethos and culture,” Gelobter said. Wall Street makes no bones about its naked capitalism, she said. “The tech industry pretends to be somebody else, pretends to care.”
First published April 23, 6:09 p.m. PT.
Update, 8:26 p.m. PT and 9:16 p.m.: Corrects a quotation from Joy Buolamwini, who described the women and people with dark skin as the world’s “undersampled majority,” and the characterization of IBM’s work. It generally reproduced Buolamwini’s research and improved with an updated algorithm. Also adds that IBM’s declined to comment and Amazon didn’t comment.