Algorithmic Bias: CPD or Cute Puppy Delta
September 20, 2016
With the publication of “Weapons of Math Destruction,” algorithmic bias identification has become the new parlor game. I enjoyed the write up with the cute puppy title “For More Proof Algorithms Can Be Biased, Look No Further Than Cute Puppies.”
The main point of the write up struck me as:
If men preferred small dogs and women preferred large dogs, and the researchers used more data from men, then the algorithm would lean towards ranking smaller dogs cuter, because those dogs are better-known by the algorithm.
My immediate reaction is that this type of training set bias is commonplace.
More interesting are the following biases in algorithms:
- Selecting the numerical recipes to use for numerical alchemy
- Sequencing of the chains of numerical recipes
- Threshold settings for the numerical recipes
- Methods for automatic or semi automatic threshold adjustments.
There’s more to bias than meets the eyes of puppy lovers. That’s the CPD or cute puppy delta.
Stephen E Arnold, September 20, 2016
The Case for Algorithmic Equity
September 20, 2016
We know that AI algorithms are skewed by the biases of both their creators and, depending on the application, their users. Social activist Cathy O’Neil addresses the broad consequences to society in her book, Weapons of Math Destruction. Time covers her views in its article, “This Mathematician Says Big Data is Causing a ‘Silent Financial Crisis’.” O’Neil studied mathematics at Harvard, utilized quantitative trading at a hedge-fund, and introduced a targeted-advertising startup. It is fair to say she knows what she is talking about.
More and more businesses and organizations rely on algorithms to make decisions that have big impacts on people’s lives: choices about employment, financial matters, scholarship awards, and where to deploy police officers, for example. Yet, the processes are shrouded in secrecy, and lawmakers are nowhere close to being on top of the issue. There is currently no way to ensure these decisions are anything approaching fair. In fact, the algorithms can create a sort of feedback loop of disadvantage. Reporter Rana Foroohar writes:
Using her deep technical understanding of modeling, she shows how the algorithms used to, say, rank teacher performance are based on exactly the sort of shallow and volatile type of data sets that informed those faulty mortgage models in the run up to 2008. Her work makes particularly disturbing points about how being on the wrong side of an algorithmic decision can snowball in incredibly destructive ways—a young black man, for example, who lives in an area targeted by crime fighting algorithms that add more police to his neighborhood because of higher violent crime rates will necessarily be more likely to be targeted for any petty violation, which adds to a digital profile that could subsequently limit his credit, his job prospects, and so on. Yet neighborhoods more likely to commit white collar crime aren’t targeted in this way.
Yes, unsurprisingly, it is the underprivileged who bear the brunt of algorithmic aftermath; the above is just one example. The write-up continues:
Indeed, O’Neil writes that WMDs [Weapons of Math Destruction] punish the poor especially, since ‘they are engineered to evaluate large numbers of people. They specialize in bulk. They are cheap. That’s part of their appeal.’ Whereas the poor engage more with faceless educators and employers, ‘the wealthy, by contrast, often benefit from personal input. A white-shoe law firm or an exclusive prep school will lean far more on recommendations and face-to-face interviews than a fast-food chain or a cash-strapped urban school district. The privileged… are processed more by people, the masses by machines.
So, algorithms add to the disparity between how the wealthy and the poor experience life. Compounding the problem, algorithms also allow the wealthy to isolate themselves online as well as in real life, through curated news and advertising that make it ever easier to deny that poverty is even a problem. See the article for its more thorough discussion.
What does O’Neil suggest we do about this? First, she proposes a “Hippocratic Oath for mathematicians.” She also joins the calls for much more thorough regulation of the AI field and to update existing civic-rights laws to include algorithm-based decisions. Such measures will require the cooperation of legislators, who, as a group, are hardly known for their technical understanding. It is up to those of us who do comprehend the issues to inform them action must be taken. Sooner rather than later, please.
Cynthia Murrell, September 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/
HonkinNews for September 20, 2016 Available
September 20, 2016
Stories in the Beyond Search weekly video news program “HonkinNews” include LinkedIn’s censorship of a former CIA professional’s post about the 2016 election. Documentum, founded in 1990, has moved to the frozen wilds of Canada. A Microsoft and Nvidia sponsored online beauty contest may have embraced algorithmic bias. Google can write a customer’s ad automatically and may be able to alter users’ thoughts and actions. Which vendors of intelligence-centric software may be shown the door to the retirement home? The September 20, 2016, edition of “HonkinNews”, filmed with old-fashioned technology in the wilds of rural Kentucky is online at this link.
Kenny Toth, September 20, 2016
Lousy Backlog? Sell with Interesting Assertions.
September 19, 2016
If you are struggling to fill the sales pipeline, you will feel some pressure. If you really need to make sales, marketing collateral may be an easy weapon to seize.
I read “Examples of False Claims about Self-Service Analytics.” The write up singles out interesting sales assertions and offers them up in a listicle. I loved the write up. I lack the energy to sift through the slices of baloney in my enterprise search files. Therefore, let’s highlight the work the brave person who singled out eight vendors’ marketing statements as containing what the author called “false claims.” Personally I think each of these claims is probably rock solid when viewed from the point of view of the vendors’ legal advisers.
Here are three examples of false claims about self service analytics. (For the other five, consult the article cited in the preceding paragraph.) Keep in mind that I find these statements as good as the gold for sale in the local grocery in Harrod’s Creek. Come to think of it the gold is foil wrapped around a lousy piece of ersatz chocolate. But gold is gold sort of.
Example 1 from Information Builders. Information Builders loves New York. The company delivers “integrated solutions for data management.” Here’s the item from the article which contains a “false claim.”
Self-service BI and analytics isn’t just about giving tools to analysts; it’s about empowering every user with actionable and relevant information for confident decision-making. (link). Self-service Analytics for Everyone…Who’s Everyone? Your entire universe of employees, customers, and partners. Our WebFOCUS Business Intelligence (BI) and Analytics platform empowers people inside and outside your organization to attain insights and make better decision.
I see a subject verb agreement error. I see a semicolon which puts me on my rhetorical guard. I see the universal “everyone”. I see the fact that WebFOCUS empowers.
What’s not to like? Information Builders is repeating facts which I accept. The fact that the company is in New York enhances the credibility of the statements. Footnotes, evidence? Who needs them?
Example 2 from SAP, the outfit that delivered R3 to Westinghouse and Trex to the enterprise search market. Here’s the “false assertion” which looks as solid as a peer reviewed journal containing data related to drug trials. Remember. This quote comes from the source article. I believe absolutely whatever SAP communicates to me. Don’t you?
This tool is intended for those who need to do analysis but are not Analysts nor wish to become them.
Why study math, statistics, and related disciplines? Why get a degree? I know that I can embrace the SAP way (which is a bit like the IBM way) and crunch numbers until the cows return to my pasture in Harrod’s Creek. Who needs to worry about data integrity, sample size, threshold settings, and algorithmic sequencing? Not me. Gibraltar does not stand on such solid footing as SAP’s tool for those who eschew analysts and does not want to wake up like Kafka’s protagonist as an analyst.
Example 3 from ZoomData, a company which has caught the attention of some folks in the DC area. I love those cobblestones in Reston too.
ZoomData brings the power of self-service BI to the 99%—the non-data geeks of the world who thirst for a simple, intuitive, and collaborative way to visually interact with data to solve business problems.
To me this looks better than the stone tablets Moses hauled down from the mountain. I love the notion of non geeks who thirst for pointing and clicking. I would have expressed the idea as drink deep of data’s Empyrean spring, but I am okay with the split infinitive “to visually interact” because the statement is a fact. I tell you that it is a fact.
For the other five allegedly false assertions, please, consult the original article. I have to take a break. When my knowledge is confirmed in these brilliant assertions, I need a moment to congratulate myself on my wisdom. Wait. I am an addled goose. Maybe these examples really are hog wash? Because i live in rural Kentucky, I will step outside and seek inputs from Henrietta, my hog.
Stephen E Arnold, September 19, 2016
Algorithm Bias in Beauty Contests
September 16, 2016
I don’t read about beauty contests. In my college dorm, I recall that the televised broadcast of the Miss America pageant was popular among some of the residents. I used the attention grabber as my cue to head to the library so I could hide reserved books from my classmates. Every little bit helps in the dog eat dog world of academic achievement.
“When Artificial Intelligence Judges a Beauty Contest, White People Win” surprised me. I thought that algorithms were objective little numerical recipes. Who could fiddle 1=1=2?
I learned:
The foundation of machine learning is data gathered by humans, and without careful consideration, the machines learn the same biases of their creators. Sometimes bias is difficult to track, but other times it’s clear as the nose on someone’s face—like when it’s a face the algorithm is trying to process and judge.
Its seems that an algorithm likes white people. The write up informed me:
An online beauty contest called Beauty.ai, run byYouth Laboratories (that lists big names in tech like Nvidia and Microsoft as “partners and supporters” on the contest website), solicited 600,000 entries by saying they would be graded by artificial intelligence. The algorithm would look at wrinkles, face symmetry, amount of pimples and blemishes, race, and perceived age. However, race seemed to play a larger role than intended; of the 44 winners, 36 were white.
Oh, oh. Microsoft and its smart software seem to play a role in this drama.
What’s the fix? Better data. The write up includes this statement from a Microsoft expert:
“If a system is trained on photos of people who are overwhelmingly white, it will have a harder time recognizing non-white faces,” writes Kate Crawford, principal researcher at Microsoft Research New York City, in a New York Times op-ed. “So inclusivity matters—from who designs it to who sits on the company boards and which ethical perspectives are included. Otherwise, we risk constructing machine intelligence that mirrors a narrow and privileged vision of society, with its old, familiar biases and stereotypes.”
In the last few months, Microsoft’s folks were involved in Tay, a chatbot which allegedly learned to be racist. Then there was the translation of “Daesh” as Saudi Arabia. Now algorithms appear to favor folks of a particular stripe.
Exciting math. But Microsoft has also managed to gum up webcams and Kindle access in Windows 10. Yep, the new Microsoft is a sparkling example of smart.
Stephen E Arnold, September 16, 2016
HonkinNews, September 13, 2016 Now Available
September 13, 2016
Interested in having your polynomials probed? The Beyond Search weekly news explains this preventive action. In this week’s program you will learn about Google new enterprise search solution. Palantir is taking legal action against an investor in the company. IBM Watson helps out at the US Open. Catch up on the search, online, and content processing news that makes the enterprise procurement teams squirm. Dive in with Springboard and Pool Party. To view the video, click this link.
Kenny Toth, September 13, 2016
More on Biased Algorithms: Humans in the Mix
September 6, 2016
I read “When Computers Learn Human Languages, They Also Learn Human Prejudices.” The write up makes a point which seems obvious to me and the goslings. Numbers may be neutral in the ivory tower of a mathematician in Minsk or Midland. But in the world of smart software, the human influence may be inescapable like death. Oh, Google will solve death, and I suppose at some point Google will eliminate the human element in its fancy math.
For all others, I learned:
Implicit biases are a well-documented and pernicious feature of human languages.
Okay.
In the write up full of revelations, I highlighted this passage:
New research from computer scientists at Princeton suggests that computers learning human languages will also inevitably learn those human biases.
What’s the fix? The write up and the wizards have an answer:
The solution to these problems is probably not to train algorithms to be speakers of a more ideal English language (or believers in a more ideal world), but rather in ensuring “algorithmic accountability” (pdf), which calls for layers of accountability for any decisions in which an algorithm is involved….It may be necessary to override the results to compensate—a sort of “fake it until you make it” strategy for erasing the biases that creep into our algorithms.
I love the “fake it until you make it” idea.
Who will analyze the often not-so-accessible numerical recipes in use at the centralized online services? Will it be “real” journalists? Will it be legal eagles? Will it be self regulation just like the banking sector enforces with such assiduousness?
My hunch is that this algorithm bias thing will be a problem a bit like death; that is, no solution for now.
Stephen E Arnold, September 6, 2016
Eliminate Bias from Human Curated Training Sets with Training Sets Created by Humans
September 5, 2016
I love the quest for objectivity in training smart software. I recall a professor in my undergraduate days named Dr. Stephen Pence I believe. He was an interesting fellow who enjoyed pointing out logical fallacies. Pence introduced me to the work of Stephen Toulmin, an author who is a fun read.
I thought about argument by sign when I read “Language Necessarily Contains Human Biases, and So Will Machines Trained on Language Corpora.” The write up points out that smart software processing human utterances for “information” will end up with biases. The notion matches my experience.
I highlighted:
for 50 occupation words (doctor, engineer, …), we can accurately predict the percentage of U.S. workers in that occupation who are women using nothing but the semantic closeness of the occupation word to feminine words!… These results simultaneously show that the biases in question are embedded in human language, and that word embeddings are picking up the biases.
Algorithms, the write up points out, “Algorithms don’t have a way to identify biases.”
When we read about smart software taking a query like “beautiful girls” and returning a skewed data set, we wonder how vendors can ignore the distortions in their artificially intelligent routines.
Objectivity, gentle reader, is not easy to come by. Vendors of smart software who ignore the biases created by training sets and by the engineers’ decisions about threshold settings in numerical recipes may benefit from some critical thinking. Reading the work of Toulmin may be helpful as well.
Stephen E Arnold, September 5, 2016
Machine Learning Search Algorithms Reflect Female Stereotypes
August 26, 2016
The article on MediaPost titled Are Machine Learning Search Algorithms To Blame for Stereotypes? poses a somewhat misleading question about the nature of search algorithms such as Google and Bing in the area of prejudice and bias. Ultimately they are not the root, but rather a reflection on their creators. Looking at the images that are returned when searching for “beautiful” and “ugly” women, researchers found the following.
“In the United States, searches for “beautiful” women return pictures that are 80% white, mostly between the ages of 19 and 28. Searches for “ugly” women return images of those about 60% white and 20% black between the ages of 30 to 50. Researchers admit they are not sure of the reason for the bias, but conclude that they may stem from a combination of available stock photos and characteristics of the indexing and ranking algorithms of the search engines.”
While it might be appealing to think that machine learning search algorithms have somehow magically fallen in line with the stereotypes of the human race, obviously they are simply regurgitating the bias of the data. Or alternately, perhaps they learn prejudice from the humans selecting and tuning the algorithms. At any rate, it is an unfortunate record of the harmful attitudes and racial bias of our time.
Chelsea Kerwin, August 26, 2016
Microsoft Considers next Generation Artificial Intelligence
August 24, 2016
While science fiction portrays artificial intelligence in novel and far-reaching ways, certain products utilizing artificial intelligence are already in existence. WinBeta released a story, Microsoft exec at London conference: AI will “change everything”, which reminds us of this. Digital assistants like Cortana and Siri are one example of how mundane AI can appear. However, during a recent AI conference, Microsoft UK’s chief envisioning officer Dave Choplin projected much more impactful applications. This article summarizes the landscape of concerns,
Of course, many also are suspect about the promise of artificial intelligence and worry about its impact on everyday life or even its misuse by malevolent actors. Stephen Hawking has worried AI could be an existential threat and Tesla CEO Elon Musk has gone on to create an open source AI after worrying about its misuse. In his statements, Choplin also stressed that as more and more companies try to create AI, ‘We’ve got to start to make some decisions about whether the right people are making these algorithms.
There is much to consider in regards to artificial intelligence. However, such a statement about “the right people” cannot stop there. Choplin goes on to refer to the biases of people creating algorithms and the companies they work for. Because organizational structures must also be considered, so too must their motivator: the economy. Perhaps machine learning to understand the best way to approach AI would be a good first application.
Megan Feil, August 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph