Flawed Data In, Bias Out

August 3, 2019

Artificial intelligence is biased. AI algorithms are biased against non-white people as well as females. The reason is that the programmers are usually white males and it is usually an oversight to add data that makes their AI algorithms diverse. Silicon Republic shares a brand new ways that AI is biased, this time against poorer individuals: “Biased AI Reportedly Struggles To Identify Objects From Poorer Households.”

The biggest biased AI culprits are visual recognition algorithms built to identify people and objects. The main cause behind their biases is the lack of diverse data. The article points out how Facebook’s AI research lab discovered how biased data exists in internationally used visual object recognition systems. Microsoft Azure, Google Cloud Vision, Amazon Rekognition, Clarifai, and IBM Watson use algorithms that were tasked with identifying common household items from a global dataset. Information in the dataset included:

“The dataset covers 117 categories of different household items and documents the average monthly income of households from various countries across the world, ranging from $27 in Burundi to $10,098 in China. When the algorithms were shown the same product but from different parts of the world, the researchers found that there was a 10pc increase in chance they would fail to identify items from a household earning less than $50 versus one making more than $3,500 a month.”

This raises an interesting view on how the AI are programmed to identify objects. One example is identifying soap on different surfaces. In richer countries, soap was identified when it was in a soap pump dispenser on a tiled counter, but in poorer countries it was bar soap on a dirty surface. The AI was 20% more likely to identify objects in richer countries than poor ones. The difference increases with living rooms with a 40% accuracy difference and it is due to the lack of items in poorer homes. The programmers believe the bias is due to most of the data comes from wealthier countries and lack of information from poorer ones.

Is this another finding from Captain Obvious’ research lab? Is it possible to generate more representative datasets? Obviously not.

Whitney Grace, August 3, 2019

Written by Stephen E. Arnold · Filed Under AI, News, Statistics

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.