Eliminate Bias from Human Curated Training Sets with Training Sets Created by Humans

September 5, 2016

I love the quest for objectivity in training smart software. I recall a professor in my undergraduate days named Dr. Stephen Pence I believe. He was an interesting fellow who enjoyed pointing out logical fallacies. Pence introduced me to the work of Stephen Toulmin, an author who is a fun read.

I thought about argument by sign when I read “Language Necessarily Contains Human Biases, and So Will Machines Trained on Language Corpora.” The write up points out that smart software processing human utterances for “information” will end up with biases. The notion matches my experience.

I highlighted:

for 50 occupation words (doctor, engineer, …), we can accurately predict the percentage of U.S. workers in that occupation who are women using nothing but the semantic closeness of the occupation word to feminine words!… These results simultaneously show that the biases in question are embedded in human language, and that word embeddings are picking up the biases.

Algorithms, the write up points out, “Algorithms don’t have a way to identify biases.”

When we read about smart software taking a query like “beautiful girls” and returning a skewed data set, we wonder how vendors can ignore the distortions in their artificially intelligent routines.

Objectivity, gentle reader, is not easy to come by. Vendors of smart software who ignore the biases created by training sets and by the engineers’ decisions about threshold settings in numerical recipes may benefit from some critical thinking. Reading the work of Toulmin may be helpful as well.

Stephen E Arnold, September 5, 2016

Written by Stephen E. Arnold · Filed Under AI, algorithms, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.