Algorithmic Bias and the Unintentional Discrimination in the Results
October 21, 2015
The article titled When Big Data Becomes Bad Data on Tech In America discusses the legal ramifications of relying on algorithms for companies. The “disparate impact” theory has been used in the courtroom for some time to ensure that discriminatory policies be struck down whether they were created with the intention to discriminate or not. Algorithmic bias occurs all the time, and according to the spirit of the law, it discriminates although unintentionally. The article states,
“It’s troubling enough when Flickr’s auto-tagging of online photos label pictures of black men as “animal” or “ape,” or when researchers determine that Google search results for black-sounding names are more likely to be accompanied by ads about criminal activity than search results for white-sounding names. But what about when big data is used to determine a person’s credit score, ability to get hired, or even the length of a prison sentence?”
The article also reminds us that data can often be a reflection of “historical or institutional discrimination.” The only thing that matters is whether the results are biased. This is where the question of human bias becomes irrelevant. There are legal scholars and researchers arguing on behalf of ethical machine learning design that roots out algorithmic bias. Stronger regulations and better oversight of the algorithms themselves might be the only way to prevent time in court.
Chelsea Kerwin, October 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The State Department Delves into Social Media
October 13, 2015
People and companies that want to increase a form of communication between people create social media platforms. Facebook was invented to take advantage of the digital real-time environment to keep people in contact and form a web of contacts. Twitter was founded for a more quick and instantaneous form of communication based on short one hundred forty character blurbs. Instagram shares pictures and Pinterest connects ideas via pictures and related topics. Using analytics, the social media companies and other organizations collect data on users and use that information to sell products and services as well as understanding the types of users on each platform.
Social media contains a variety of data that can benefit not only private companies, but the government agencies as well. According to GCN, the “State Starts Development On Social Media And Analytics Platform” to collaborate and contribute in real-time to schedule and publish across many social media platforms and it will also be mobile-enabled. The platform will also be used to track analytics on social media:
“For analytics, the system will analyze sentiment, track trending social media topics, aggregate location and demographic information, rank of top multimedia content, identify influencers on social media and produce automated and customizable reports.”
The platform will support twenty users and track thirty million mentions each year. The purpose behind the social media and analytics platform is still vague, but the federal government has proven to be behind in understanding and development of modern technology. This appears to be a step forward to upgrade itself, so it does not get left behind. But a social media platform that analyzes data should have been implemented years ago at the start of this big data phenomenon.
Whitney Grace, October 13, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Business Intelligence and Data Science: There Is a Difference
October 6, 2015
An article at the SmartDataCollective, “The Difference Between Business Intelligence and Real Data Science,” aims to help companies avoid a common pitfall. Writer Brigg Patton explains:
“To gain a competitive business advantage, companies have started combining and transforming data, which forms part of the real data science. At the same time, they are also carrying out Business Intelligence (BI) activities, such as creating charts, reports or graphs and using the data. Although there are great differences between the two sets of activities, they are equally important and complement each other well.
“For executing the BI functions and data science activities, most companies have professionally dedicated BI analysts as well as data scientists. However, it is here that companies often confuse the two without realizing that these two roles require different expertise. It is unfair to expect a BI analyst to be able to make accurate forecasts for the business. It could even spell disaster for any business. By studying the major differences between BI and real data science, you can choose the right candidate for the right tasks in your enterprise.”
So fund both, gentle reader. Patton distinguishes each position’s area of focus, the different ways they use and look at data, and their sources, migration needs, and job processes. If need to hire someone to perform these jobs, check out this handy clarification before you write up those job descriptions.
Cynthia Murrell, October 6, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Computers Learn Discrimination from Their Programmers
September 14, 2015
One of the greatest lessons one take learn from the Broadway classic South Pacific is that children aren’t born racist, rather they learn about racism from their parents and other adults. Computers are supposed to be infallible, objective machines, but according to Gizmodo’s article, “Computer Programs Can Be As Biased As Humans.” In this case, computers are “children” and they observe discriminatory behavior from their programmers.
As an example, the article explains how companies use job application software to sift through prospective employees’ resumes. Algorithms are used to search for keywords related to experience and skills with the goal of being unbiased related to sex and ethnicity. The algorithms could also be used to sift out resumes that contain certain phrases and other information.
“Recently, there’s been discussion of whether these selection algorithms might be learning how to be biased. Many of the programs used to screen job applications are what computer scientists call machine-learning algorithms, which are good at detecting and learning patterns of behavior. Amazon uses machine-learning algorithms to learn your shopping habits and recommend products; Netflix uses them, too.”
The machine learning algorithms are mimicking the same discrimination habits of humans. To catch these computer generated biases, other machine learning algorithms are being implemented to keep the other algorithms in check. Another option to avoid the biases is to reload the data in a different manner so the algorithms do not fall into the old habits. From a practical stand point it makes sense: if something does not work the first few times, change the way it is done.
Whitney Grace, September 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The AI Evolution
September 10, 2015
An article at WT Vox announces, “Google Is Working on a New Type of Algorithm Called ‘Thought Vectors’.” It sounds like a good use for a baseball cap with electrodes, a battery pack, WiFi, and a person who thinks great thoughts. In actuality, it’s a project based on the work of esteemed computer scientist Geoffrey E. Hinton, who has been exploring the idea of neural networks for decades. Hinton is now working with Google to create the sophisticated algorithm of our dreams (or nightmares, depending on one’s perspective).
Existing language processing software has come a very long way; Google Translate, for example, searches dictionaries and previously translated docs to translate phrases. The app usually does a passably good job of giving one the gist of a source document, but results are far from reliably accurate (and are often grammatically comical.) Thought vectors, on the other hand, will allow software to extract meanings, not just correlations, from text.
Continuing to use translation software as the example, reporter Aiden Russell writes:
“The technique works by ascribing each word a set of numbers (or vector) that define its position in a theoretical ‘meaning space’ or cloud. A sentence can be looked at as a path between these words, which can in turn be distilled down to its own set of numbers, or thought vector….
“The key is working out which numbers to assign each word in a language – this is where deep learning comes in. Initially the positions of words within each cloud are ordered at random and the translation algorithm begins training on a dataset of translated sentences. At first the translations it produces are nonsense, but a feedback loop provides an error signal that allows the position of each word to be refined until eventually the positions of words in the cloud captures the way humans use them – effectively a map of their meanings.”
But, won’t all efficient machine learning lead to a killer-robot-ruled dystopia? Hinton bats away that claim as a distraction; he’s actually more concerned about the ways big data is already being (mis)used by intelligence agencies. The man has a point.
Cynthia Murrell, September 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Algorithms Are Objective As Long As You Write Them
September 8, 2015
I read “Big Data’s Neutral Algorithms Could Discriminate against Most Vulnerable.” Ridiculous. Objective procedures cannot discriminate. The numerical recipes do what they do.
Ah, but when a human weaves together methods and look up tables, sets thresholds, and uses Bayesian judgments, well, maybe a little bit of bias can be baked in.
The write up reports:
So how will the courts address algorithmic bias? From retail to real estate, from employment to criminal justice, the use of data mining, scoring software and predictive analytics programs is proliferating at an exponential rate. Software that makes decisions based on data like a person’s ZIP code can reflect, or even amplify, the results of historical or institutional discrimination.”[A]n algorithm is only as good as the data it works with,” Solon Barocas and Andrew Selbst write in their article “Big Data’s Disparate Impact,” forthcoming in the California Law Review. “Even in situations where data miners are extremely careful, they can still affect discriminatory results with models that, quite unintentionally, pick out proxy variables for protected classes.”
And I liked this follow on:
It’s troubling enough when Flickr’s auto-tagging of online photos label pictures of black men as “animal” or “ape,” or when researchers determine that Google search results for black-sounding names are more likely to be accompanied by ads about criminal activity than search results for white-sounding names. But what about when big data is used to determine a person’s credit score, ability to get hired, or even the length of a prison sentence?
Shift gears. Navigate to “Microsoft Is Trying to Stop Users from Downloading Chrome or Firefox.” Objective, right?
Two thoughts. The math oriented legal eagles will sort this out. Lawyers are really good at math. Also, write your own algorithm and tune it to deliver what you want. No bias there. You are expressing your inner self.
It’s just a process and billable.
Stephen E Arnold, September 8, 2015
Algorithms Still Need Oversight
September 8, 2015
Many have pondered what might happen when artificial intelligence systems go off the rails. While not spectacular enough for Hollywood, some very real consequences have been observed; the BBC examines “The Bad Things that Happen When Algorithms Run Online Shops.”
The article begins by relating the tragic tale of an online T-shirt vendor who just wanted to capitalize on the “Keep Calm and Carry On” trend. He set up an algorithm to place random terms into the second half of that oft-copied phrase and generate suggested products. Unfortunately, the list of phrases was not sufficiently vetted, resulting in a truly regrettable slogan virtually printed on virtual examples. Despite the fact that the phrase appeared only on the website, not on any actual shirts, the business never recovered its reputation and closed shortly thereafter. Reporter Chris Baranuik writes:
“But that’s the trouble with algorithms. All sorts of unexpected results can occur. Sometimes these are costly, but in other cases they have benefited businesses to the tune of millions of pounds. What’s the real impact of the machinations of machines? And what else do they do?”
Well, one other thing is to control prices. Baranuik reports that software designed to set online prices competitively, based on what other sites are doing, can cause prices to fluctuate day-to-day, sometimes hour-to-hour. Without human oversight, results can quickly become extreme to either end of the scale. For example, for a short time last December, prices of thousands of products sold through Amazon were set to just one penny each. Amazon itself probably weathered the unintended near-giveaways just fine, but smaller merchants selling through the site were not so well-positioned; some closed as a direct result of the error. On the other hand, vendors trying to keep their prices as high as feasible can make the opposite mistake; the article points to the time a blogger found an out-of-print textbook about flies priced at more than $23 million, the result of two sellers’ dueling algorithms.
Such observations clearly mean that consumers should be very wary about online prices. The bigger takeaway, though, is that we’re far from ready to hand algorithms the reigns of our world without sufficient human oversight. Not yet.
Cynthia Murrell, September 8, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Neural Nets: A Categorical Affirmative Means No Exceptions
August 21, 2015
The opening sentence of “A Visual Proof That Neural Nets Can Compute Any Function” is a Duesy. Here it is:
One of the most striking facts about neural networks is that they can compute any function at all.
I am okay with neural nets, but I struggle against statements which assert universalities like “any function at all.”
There are fancy terms for this type of error; for example, formal fallacy.
What’s troubling is that the use of “all,” “every,” “any,” and other umbrella terms seem to be more common than Ashley Madison customer names.
Textbooks which “teach” that something works for “any function” accelerate generalization as a standard operating procedure.
Untidy? Yep.
Stephen E Arnold, August 21, 2015
How to Use Watson
August 17, 2015
While there are many possibilities for cognitive computing, what makes an idea a reality is its feasibility and real life application. The Platform explores “The Real Trouble With Cognitive Computing” and the troubles IBM had (has) trying to figure out what they are going to do with the supercomputer they made. The article explains that before Watson became a Jeopardy celebrity, the IBM folks came up 8,000 potential experiments for Watson to do, but only 20 percent of them.
The range is small due to many factors, including bug testing, gauging progress with fuzzy outputs, playing around with algorithmic interactions, testing in isolation, and more. This leads to the “messy” way to develop the experiments. Ideally, developers would have a big knowledge model and be able to query it, but that option does not exist. The messy way involves keeping data sources intact, natural language processing, machine learning, and knowledge representation, and then distributed on an infrastructure.
Here is another key point that makes clear sense:
“The big issue with the Watson development cycle too is that teams are not just solving problems for one particular area. Rather, they have to create generalizable applications, which means what might be good for healthcare, for instance, might not be a good fit—and in fact even be damaging to—an area like financial services. The push and pull and tradeoff of the development cycle is therefore always hindered by this—and is the key barrier for companies any smaller than an IBM, Google, Microsoft, and other giants.”
This is exactly correct! Engineering is not the same as healthcare and it not all computer algorithms transfer over to different industries. One thing to keep in mind is that you can apply different methods from other industries and come up with new methods or solutions.
Whitney Grace, August 18, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Thunderstone Rumbles about Webinator
August 13, 2015
There is nothing more frustrating than being unable to locate a specific piece of information on a Web site when you use its search function. Search is supposed to be quick, accurate, and efficient. Even if Google search is employed as a Web site’s search feature, it does not always yield the best results. Thunderstone is a company that specializes in proprietary software application developed specifically for information management, search, retrieval, and filtering.
Thunderstone has a client list that includes, but not limited to, government agencies, Internet developer, corporations, and online service providers. The company’s goal is to deliver “product-oriented R&D within the area of advanced information management and retrieval,” which translates to them wanting to help their clients found information very, very fast and as accurately as possible. It is the premise of most information management companies. On the company blog it was announced that, “Thunderstone Releases Webinator Web Index And Retrieval System Version 13.” Webinator makes it easier to integrate high quality search into a Web site and it has several new appealing features:
- “Query Autocomplete, guides your users to the search they want
- HTML Highlighting, lets users see the results in the original HTML for better contextual information
- Expanded XML/SOAP API allows integration of administrative interface”
We like the HTML highlighting that offers users the ability to backtrack and see a page’s original information source. It is very similar to old-fashioned research: go back to the original source to check a fact’s veracity.
Whitney Grace, August 13, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph