Algorithms: Thresholds and Recycling Partially Explained

April 19, 2019

Five or six years ago I prepared a lecture about the weaknesses in widely used algorithms. In that talk, which I delivered to intelligence operatives in Western Europe and the US, I pointed out two points which were significant to me and my small research team.

  1. There are about nine or 10 algorithms which are used again and again. One example is k means. The reason is that the procedure is a fixture in many university courses, and the method is good enough.
  2. Quite a bit of the work on smart software relies on cutting and pasting. In 1962, I discovered the value of this approach when I worked on a small project at my undergraduate university. Find a code snippet that does the needed task, modify it if necessary, and bingo! Today this approach remains popular.

I thought about my lectures and these two points when I read another part of the mathy series “Untold History of AI: Algorithmic Bias Was Born in the 1980s.” IEEE Spectrum does a reasonable job of explaining one case of algorithmic bias. The story is similar to the experience Amazon had with one of its smart modules. The math produced wonky results. The word “bias” is okay with me, but the outputs from systems which happily chug away and deliver “outputs” to clueless MBAs, lawyers, and marketers may be incorrect.

Several observations:

  1. The bias in methods goes back before I showed up at the university computer center to use the keypunch machines. Way back in fact.
  2. Developers today rely on copy and paste, open source, and the basic methods taught by professors who may be thinking about their side jobs as consultants.
  3. Training data may be skewed, and no one wants to spend the money or take the time to create training data. Why bother? Just use whatever is free, cheap, or already on a storage device. Close enough for horseshoes.
  4. Users do not know [a] what’s going on behind the point and click interfaces, nor do most users care. As a result, a good graphic is “correct.”

The chatter about the one percent focuses on money. There is another, more important one percent in my opinion. The one percent who take the time to look at a sophisticated system will find the same nine or 10 algorithms, the same open source components, and some recycled procedures that few think about. Quick question: How many smart software systems rely on Thomas Bayes’ methods? Give up? Lots.

I don’t have a remedy for this problem, and I am not sure too many people care, want to talk about the “accuracy” of a smart system’s outputs. That’s a happy thought for the weekend. Imagine bad outputs in an autonomous drone or a smart system in a commercial aircraft? Exciting.

Stephen E Arnold, April 19, 2019

Stephen E Arnold,

Facial Recogntion: An Important Technology Enters Choppy Waters

April 8, 2019

I wouldn’t hold my breath: The Electronic Frontier Foundation (EFF) declares, “Governments Must Face the Facts About Face Surveillance, and Stop Using It.” Writers Hayley Tsukayama and Adam Schwartz begin by acknowledging reality—the face surveillance technology business is booming, with the nation’s law enforcement agencies increasingly adopting it. They write:

EFF supports legislative efforts in Washington and Massachusetts to place a moratorium on government use of face surveillance technology. These bills also would ban a particularly pernicious kind of face surveillance: applying it to footage taken from police body-worn cameras. The moratoriums would stay in place, unless lawmakers determined these technologies do not have a racial disparate impact, after hearing directly from minority communities about the unfair impact face surveillance has on vulnerable people. We recently sent a letter to Washington legislators in support of that state’s moratorium bill.

EFF’s communications may be having some impact.

DarkCyber noted that Amazon will be allowing shareholders a vote about sales of the online bookstore’s facial recognition technology, Rekognition. “AI Researchers Tell Amazon to Stop Selling Facial Recognition to the Police” does not explain how Amazon can remove its FAR from those entities which have licensed the technology.

DarkCyber believes that the US is poised to become a procurement innovation center. Companies and their potential customers have to figure out how to work together without creating political, legal, and financial disruptions.

A failure to resolve what seems to be a more common problem may allow vendors in other countries to capture leading engineers, major contracts, and a lead in an important technology.

Stephen E Arnold, April 8, 2019

Content Management: Now a Playground for Smart Software?

March 28, 2019

CMS or content management systems are a hoot. Sometimes they work; sometimes they don’t. How does one keep these expensive, cranky databases chugging along in the zip zip world of content utilities which are really inexpensive?

Smart software and predictive analytics?

Managing a website is not what is used to be, and one of the biggest changes to content management systems is the use of predictive analytics. The Smart Data Collective discusses “The Fascinating Role of Predictive Analytics in CMS Today.” Reporter Ryan Kh writes:

“Predictive analytics is changing digital marketing and website management. In previous posts, we have discussed the benefits of using predictive analytics to identify the types of customers that are most likely to convert and increase the value of your lead generation strategy. However, there are also a lot of reasons that you can use predictive analytics in other ways. Improving the quality of your website is one of them. One of the main benefits of predictive analytics in 2019 is in improving the performance of content management systems. There are a number of different types of content management systems on the market, including WordPress, Joomla, Drupal, and Shopify. There are actually hundreds of content management systems on the market, but these are some of the most noteworthy. One of the reasons that they are standing out so well against their competitors is that they use big data solutions to get the most value for their customers.”

The author notes two areas in which predictive analytics are helping companies’ bottom lines: fraud detection and, of course, marketing optimization; the latter through capacities like more effective lead generation and content validation.

Yep, CMS with AI. The future with spin.

Cynthia Murrell, March 28, 2019

Good News about Big Data and AI: Not Likely

February 25, 2019

I read a write up which was a bit of a downer. The story appeared in Analytics India and was titled “10 Challenges That Data Science Industry Still Faces.” Oh, oh. Maybe not good news?

My first thought was, “Only 10?”

The write up explains that the number one challenge is humans. The idea that smart software would solve these types of problems: Sluggish workers at fast food restaurants, fascinating decisions made by entry level workers in some government bureaus, and the often remarkable statements offered by talking heads on US cable TV “real news” programs, among others.

Nope. The number one challenge is finding humans who can do data science work.

What’s number two after this somewhat thorny problem? The answer is finding the “right data” and then getting a chunk of data one can actually process.

So one and two are what I would call bedrock issues: Expertise and information.

What about the other eight challenges. Here are three of them. I urge you to read the original article for the other five issues.

  • Informing people why data science and its related operations are good for you. Is this similar to convincing a three year old that lima beans are just super.
  • Storytelling. I think this means, “These data mean…” One hopes the humans (who are in short supply) draw the correct inferences. One hopes.
  • Models. This is a shorthand way of saying, “What’s assembled will work.” Hopefully the answer is, “Sure, our models are great.”

Analytics India has taken a risk with their write up. None of the data science acolytes want to hear “bad news.”

Let’s federate and analyze that with great data we can select to generate a useful output. Maybe 80 percent “accuracy” on a good day?

Stephen E Arnold, February 25, 2019

False Positives: The New Normal

January 1, 2019

And this is why so many people are wary of handing too much power to algorithms. TechDirt reports, “School Security Software Decided Innocent Parent Is Actually a Registered Sex Offender.” That said, it seems some common sense on the part of the humans involved would have prevented the unwarranted humiliation. The mismatch took place at an Aurora, Colorado, middle school event, where parent Larry Mitchell presumably just wanted to support his son. When office staff scanned his license, however, the Raptor system flagged him as a potential offender. Reporter Tim Cushing writes:

“Not only did these stats [exact name and date of birth] not match, but the photos of registered sex offenders with the same name looked nothing like Larry Mitchell. The journalists covering the story ran Mitchell’s info through the same databases — including Mitchell’s birth name (he was adopted) — and found zero matches. What it did find was a 62-year-old white sex offender who also sported the alias ‘Jesus Christ,’ and a black man roughly the same age as the Mitchell, who is white. School administration has little to say about this botched security effort, other than policies and protocols were followed. But if so, school personnel need better training… or maybe at least an eye check. Raptor, which provides the security system used to misidentify Mitchell, says photo-matching is a key step in the vetting process….

We also noted:

“Even if you move past the glaring mismatch in photos (the photos returned in the Sentinel’s search of Raptor’s system are embedded in the article), neither the school nor Raptor can explain how Raptor’s system returned results that can’t be duplicated by journalists.”

This looks like a mobile version of the PEBCAK error, and such mistakes will only increase as these verification systems continue to be implemented at schools and other facilities across the country. Cushing rightly points to this problem as “an indictment of the security-over-sanity thinking.” Raptor, a private company, is happy to tout its great success at keeping registered offenders out of schools, but they do not reveal how often their false positives have ruined an innocent family’s evening, or worse. How much control is our society willing to hand over to AIs (and those who program them)?

Cynthia Murrell, January 1, 2018

Will Algorithms Become a Dying Language?

December 30, 2018

It may sound insane, considering how much of our daily life revolves around algorithms. From your work, to your online shopping, to the maps that guide you on vacation, we depend on these codes. However, some engineers fear older algorithms will be lost to the sands of time and future generations will not be able to learn from there. Thankfully, a solution has arrived in the form of The Algorithm Archive.

According to its mission statement:

“The Arcane Algorithm Archive is a collaborative effort to create a guide for all important algorithms in all languages. This goal is obviously too ambitious for a book of any size, but it is a great project to learn from and work on and will hopefully become an incredible resource for programmers in the future.”

A program like this is so important. Maybe the place that has the most to learn from this long evolution of algorithms are those public government agencies. Some writers think many of these agencies have no idea what is in their algorithms, let alone how much they have to do with major policy decisions. Hindsight is truly 20/20.

Patrick Roland, December 30, 2018

Who Is a Low Risk Hire?

November 21, 2018

Last week, a person who did some contract work for me a year ago asked me if I would provide a reference. I agreed. I assumed that a caring, thoughtful human resources professional would speak with me on the telephone. Wrong. I received a text message asking me if I would complete questions. Get this. Each text message would contain a question about the person who sought a reference. After I hit, send, I would receive another text message.


I was then sent a link to an online form that assured me my information was confidential. “Https” was not part of this outfit’s game plan. I worked through a form, providing scores from one to seven about the person. The fact that I hired this person to perform a specific job for me was evidence that the individual could be trusted. I am not making chopped liver or cranking out greeting cards. We produce training information for law enforcement and intelligence professionals.

I worked through the questions which struck me as worrying more about appearing to be interested in the individual than actually obtaining concrete information about the person. Here’s an example of what the online test reveals:


Yeah, pretty much useless. I am not sure what “adaptability” means. I tell contractors what I want. The successful contractor does that task and gets paid. A contractor who does not gets cut out of the pool. This means in politically incorrect speak: Gets fired.

I read “Public Attitudes Toward Computer Algorithms” a couple of days after going through this odd ball way to get information about a person working on law enforcement and intelligence related work. The write up makes clear that other people are not keen on the use of opaque methods to figure out if a person can do good work and be trusted.

Well, gentle reader, get used to this.

Human resources want to cover their precious mortgage, make a car payment, or buy a new gizmo at the Amazon online store. The HR professionals are not eager to be responsible for screening individuals and figuring out what questions to ask a person like me. For good reason, I am not sure I would spend more than two minutes on the phone with an actual HR person. For the last 30 years, I have worked as an independent consultant. My only interactions with HR are limited to my suggesting that the individual stay away from me. Fill out forms or something. Just leave me alone, or you will be talking to individuals whom I pay to make you go away. I have a Mensa paralegal who can tie almost anyone in knots.

Several observations:

  1. Algorithms for hiring are a big, big thing. Why? Tail covering and document trails that say, “See, I did everything I could required by applicable regulations.” Forget judgment.
  2. The online angle is cheaper than having an actual old fashioned HR department. Outsource benefit reduction. Outsource candidate screening. Heck, outsource the outsourcing.
  3. No one wants to be responsible— for anything. Look at the high school science club management methods at Facebook. The founder is at war. Former employees explain that no one gave direction. Yada yada.
  4. The use of algorithms presumably leads to efficiencies; that is, lower costs, better, faster, cheaper, MBA and bean counter fits of joy.

Just as Apple’s Tim Cook sees nothing objectionable about taking Google’s money as Apple talks up its privacy / security commitment, algorithms make everything — including HR — much better.

Net net: I am glad I am old and officially cranking along at 75, not a hapless 22 year old trying to get a job and do a good job at a zippy de doo dah company.

Stephen E Arnold, November 21, 2018

Amazon Rekognition: Great but…

November 9, 2018

I have been following the Amazon response to employee demands to cut off the US government. Put that facial recognition technology on “ice.” The issue is an intriguing one; for example, Rekognition plugs into DeepLens. DeepLens connects with Sagemaker. The construct allows some interesting policeware functions. Ah, you didn’t know that? Some info is available if you view the October 30 and November 6, 2018, DarkCyber. Want more info? Write benkent2020 at yahoo dot com.

Image result for facial recognition crowd

How realistic is 99 percent accuracy? Pretty realistic when one has one image and a bounded data set against which to compare a single image of of adequate resolution and sharpness.

What caught my attention was the “real” news in “Amazon Told Employees It Would Continue to Sell Facial Recognition Software to Law Enforcement.” I am less concerned about the sales to the US government. I was drawn to these verbal perception shifters:

  • under fire. [Amazon is taking flak from its employees who don’t want Amazon technology used by LE and similar services.]
  • track human beings [The assumption is tracking is bad until the bad actor tracked is trying to kidnap your child, then tracking is wonderful. This is the worse type of situational reasoning.]
  • send them back into potentially dangerous environments overseas. [Are Central and South America overseas, gentle reader?]

These are hot buttons.

But I circled in pink this phrase:

Rekognition is research proving the system is deeply flawed, both in terms of accuracy and regarding inherent racial bias.

Well, what does one make of the statement that Rekognition is powerful but has fatal flaws?

Want proof that Rekognition is something more closely associated with Big Lots than Amazon Prime? The write up states:

The American Civil Liberties Union tested Rekognition over the summer and found that the system falsely identified 28 members of Congress from a database of 25,000 mug shots. (Amazon pushed back against the ACLU’s findings in its study, with Matt Wood, its general manager of deep learning and AI, saying in a blog post back in July that the data from its test with the Rekognition API was generated with an 80 percent confidence rate, far below the 99 percent confidence rate it recommends for law enforcement matches.)

Yeah, 99 percent confidence. Think about that. Pretty reasonable, right? Unfortunately 99 percent is like believing in the tooth fairy, just in terms of a US government spec or Statement of Work. Reality for the vast majority of policeware systems is in the 75 to 85 percent range. Pretty good in my book because these are achievable accuracy percentages. The 99 percent stuff is window dressing and will be for years to come.

Also, Amazon, the Verge points out, is not going to let folks tinker with the Rekognition system to determine how accurate it really is. I learned:

The company has also declined to participate in a comprehensive study of algorithmic bias run by the National Institute of Standards and Technology that seeks to identify when racial and gender bias may be influencing a facial recognition algorithm’s error rate.

Yep, how about those TREC accuracy reports?

My take on this write up is that Amazon is now in the sites of the “real” journalists.

Perhaps the Verge would like Amazon to pull out of the JEDI procurement?

Great idea for some folks.

Perhaps the Verge will dig into the other components of Rekognition and then plot the improvements in accuracy when certain types of data sets are used in the analysis.

Facial recognition is not the whole cloth. Rekognition is one technology thread which needs a context that moves beyond charged language and accuracy rates which are in line with those of other advanced systems.

Amazon’s strength is not facial recognition. The company has assembled a policeware construct. That’s news.

Stephen E Arnold, November 9, 2018

Analytics: From Predictions to Prescriptions

October 19, 2018

I read an interesting essay originating at SAP. The article’s title: “The Path from Predictive to Prescriptive Analytics.” The idea is that outputs from a system can be used to understand data. Outputs can also be used to make “predictions”; that is, guesses or bets on likely outcomes in the future. Prescriptive analytics means that the systems tell or wire actions into an output. Now the output can be read by a human, but I think the key use case will be taking the prescriptive outputs and feeding them into other software systems. In short, the system decides and does. No humans really need be involved.

The write up states:

There is a natural progression towards advanced analytics – it is a journey that does not have to be on separate deployments. In fact, it is enhanced by having it on the same deployment, and embedding it in a platform that brings together data visualization, planning, insight, and steering/oversight functions.

What is the optimal way to manage systems which are dictating actions or just automatically taking actions?

The answer is, quite surprisingly, a bit of MBA consultantese: Governance.

The most obvious challenge with regards to prescriptive analytics is governance.

Several observations:

  • Governance is unlikely to provide the controls which prescriptive systems warrant. Evidence is that “governance” in some high technology outfits is in short supply.
  • Enhanced automation will pull prescriptive analytics into wide use. The reasons are one you have heard before: Better, faster, cheaper.
  • Outfits like the Google and In-Q-Tel funded Recorded Future and DarkTrace may have to prepare for new competition; for example, firms which specialize in prescription, not prediction.

To sum up, interesting write up. perhaps SAP will be the go to player in plugging prescriptive functions into their software systems?

Stephen E Arnold, October 19, 2018

Free Data Sources

October 19, 2018

We were plowing through our research folder for Beyond Search. We overlooked the article “685 Outstanding Free Data Sources For 2017.” If you need a range of data sources related to such topics as government data, machine learning, and algorithms, you might want to bookmark this listing.

Stephen E Arnold, October 19, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta