Predictive Analytics: Follow These Puffy Thought Bubbles

September 21, 2020

Predictive analytics is about mathematics; for instance, Bayesian confections and Markov doodling. The write up “Predictive Analytics: 4 Primary Aspects of Predictive Analytics” uses the bound phrase “predictive analytics” twice in one headline and cheerfully ignores the mathy reality of the approach.

Does this marshmallow approach make a difference? Yes, I believe it does. Consider this statement from the write up:

These predictive models can be used by enterprise marketers to more effectively develop predictions of future user behaviors based on the sourced historical data. These statistical models are growing as a result of the wide swaths of available current data as well as the advent of capable artificial intelligence and machine learning.

Okay, marketers. Predictive analytics are right in your wheelhouse. The assumption that “statistical models are growing” is interesting. The statistical models with which I am familiar require work to create, test, refine, and implement. Yep, work, mathy work.

The source of data is important. However, data have to be accurate or verifiable or have some attribute that tries to ensure that garbage in does not become the mode of operation. Unfortunately data remain a bit of a challenge. Do marketers know how to identify squishy data? Do marketers care? Yeah, sure they do in a meeting during which smartphone fiddling is taking place.

The idea of data utility is interesting. If one is analyzing nuclear fuel pool rod placement, it does help to have data relevant to that operation. But are marketers concerned about “data utility”? Once again, thumbtypers say, “Yes.” Then what? Acquire data from a third party and move on with life? It happens.

The thrill of “deep learning” is like the promise of spring. Everyone likes spring? Who remembers the problems? Progress is evident in the application of different smart software methods. However, there is a difference between saying “deep learning” or “machine learning” and making a particular application benefit from available tools, libraries, and methods. The whiz kids who used smart software to beat a human fighter pilot got the job done. The work required to achieve the digital victory was significant, took time, and was difficult. Very difficult. Marketers, were you on the team?

Finally, what’s the point of predictive analytics? Good question. For the article, the purpose of predictive analytics is to refine a guess-timate. And the math? Just use a smart solution, click and icon, and see the future.

Yikes, puffy thought bubbles.

Stephen E Arnold, September 21, 2020

Count Bayesie Speaks Truth

September 10, 2020

Navigate to “Why Bayesian Stats Needs More Monte Carlo Methods.” Each time I read an informed write up about the 18th century Presbyterian minister who could do some math, I think about a fellow who once aspired to be the Robert Maxwell of content management. Noble objective is it not?

That person grew apoplectic when I explained how Autonomy in the early 1990s was making use of mathematical procedures crafted in the 18th century. I wish I have made a TikTok video of his comical attempt to explain that a human or software system should not under any circumstances inject a data point that was speculative.

Well, my little innumeric content management person, get used to Bayes. Plus there’s another method at which you can rage and bay. Yep, Monte Carlo. If you were horrified by the good Reverend’s idea, wait until you did into Monte Carlo. Strapping these two stastical stallions to the buggy called predictive analytics is commonplace.

The write up closes poetically, which may be more in line with the fuzzy wuzzy discipline of content management:

It may be tempting to blame the complexity of the details of Bayesian methods, but it’s important to realize that when we are taught the beauty of calculus and analytical methods we are often limited to a relatively small set of problems that map well to the solutions of calc 101. When trying to solve real world problems mathematically complex problems pop up everywhere and analytical solutions either escape or fail us.

Net net: Use what matches the problem. Also, understand the methods. Key word: Understand.

Stephen E Arnold, September 10, 2020

Machine Learning Like A Psychic: Sounds Scientific for 2020

September 8, 2020

DarkCyber thinks most psychics are frauds. They are observers and manipulators of human behavior. They take people’s weaknesses and turn it into profit for themselves. In other words, they do not know the winning lottery numbers, they cannot predict stock options, and they cannot find missing pets.

Machine learning algorithms built on artificial intelligence, however, might have the “powers” psychics claim to have. EurekaAlert! Has a brand new: “Study: Machine Learning Can Predict Market Behavior.” Machine learning algorithms are smart, because they were programmed to find and interpret patterns. They can also assess how effective mathematical tools are predicting financial markets.

Cornell University researchers used a large dataset to determine if a machine learning algorithm could predict future financial events. It is a large task to undertake, because financial markets have tons of information and high volatility. Maureen O’Hara, the Robert W. Purcell Professor of Management at the SC Johnson College of Business said:

“ ‘Trying to estimate these sorts of things using standard techniques gets very tricky, because the databases are so big. The beauty of machine learning is that it’s a different way to analyze the data,’ O’Hara said. ‘The key thing we show in this paper is that in some cases, these microstructure features that attach to one contract are so powerful, they can predict the movements of other contracts. So we can pick up the patterns of how markets affect other markets, which is very difficult to do using standard tools.’”

Companies exist solely on the basis of understanding how financial markets work and they have developed their own machine learning algorithms for that very purpose. Cornell’s study used a random forest machine learning algorithm to examine these models using a dataset with 87 future contracts. The study used every single trade, tens of millions, for their analysis. They discovered that some of the variables worked, while others did not.

There are millions of datasets available since every trade has been recorded since 1980. Machine learning interprets this data and makes predictions, but it acts more like a black box. In other words, the algorithms predict patterns but it does not reveal the determinations.

Psychics have tried to predict the future for centuries and have failed. Machine learning algorithms are better at it, but they still are not 100% accurate. Predicting the future still remains consigned to fantasy and science fiction.

Whitney Grace, September 8, 2020

Predictive Policing: A Work in Progress or a Problem in Action?

September 2, 2020

Amid this year’s protests of police brutality, makers of crime-predicting software took the occasion to promote their products as a solution to racial bias in law enforcement. The Markup ponders, “Data-Informed Predictive Policing Was Heralded as Less Biased. Is It?” Writer Annie Gilbertson observes, as we did, that more than 1,400 mathematicians signed on to boycott predictive policing systems. She also describes problems discovered by researchers at New York University’s AI Now Institute:

“‘Police data is open to error by omission,’ [AI Now Director Rashida Richardson] said. Witnesses who distrust the police may be reluctant to report shots fired, and rape or domestic violence victims may never report their abusers. Because it is based on crime reports, the data fed into the software may be less an objective picture of crime than it is a mirror reflecting a given police department’s priorities. Law enforcement may crack down on minor property crime while hardly scratching the surface of white-collar criminal enterprises, for instance. Officers may intensify drug arrests around public housing while ignoring drug use on college campuses. Recently, Richardson and her colleagues Jason Schultz and Kate Crawford examined law enforcement agencies that use a variety of predictive programs. They looked at police departments, including in Chicago, New Orleans, and Maricopa County, Ariz., that have had problems with controversial policing practices, such as stop and frisk, or evidence of civil rights violations, including allegations of racial profiling. They found that since ‘these systems are built on data produced during documented periods of flawed, racially biased, and sometimes unlawful practices and policies,’ it raised ‘the risk of creating inaccurate, skewed, or systemically biased data.’”

The article also looks at a study from 2016 by the Royal Statistical Society. Researchers supplied PredPol’s algorithm with arrest data from Oakland California, a city where estimated drug use is spread fairly evenly throughout the city’s diverse areas. The software’s results would have had officers target Black neighborhoods at about twice the rate of white ones. The team emphasized the documented harm over-policing can cause. The write-up goes on to cover a few more studies on the subject, so navigate there for those details. Gilberston notes that concerns about these systems are so strong that police departments in at least two major cities, Chicago and Los Angeles, have decided against them. Will others follow suit?

Cynthia Murrell, September 2, 2020

Bias in Biometrics

August 26, 2020

How can we solve bias in facial recognition and other AI-powered biometric systems? We humans could try to correct for it, but guess where AI learns its biases—yep, from us. Researcher Samira Samadi explored whether using a human evaluator would make an AI less biased or, perhaps, even more so. We learn of her project and others in Biometric’s article, “Masks Mistaken for Duct Tape, Researchers Experiment to Reduce Human Bias in Biometrics.” Reporter Luana Pascu writes:

“Curious to understand if a human evaluator would make the process fair or more biased, Samadi recruited users for a human-user study. She taught them about facial recognition systems and how to make decisions about system accuracy. ‘We really tried to imitate a real-world scenario, but that actually made it more complicated for the users,’ Samadi said. The experiment confirmed the difficulty in finding an appropriate dataset with ethically sourced images that would not introduce bias into the study. The research was published in a paper called A Human in the Loop is Not Enough: The Need for Human-Subject Experiments in Facial Recognition.”

Many other researchers are studying the bias problem. One NIST report found a lot of software that produced 10-fold to 100-fold increase in the probability of Asian and African American faces being inaccurately recognized (though a few systems had negligible differences). Meanwhile, a team at Wunderman Thompson Data found tools from big players Google, IBM, and Microsoft to be less accurate than they had expected. For one thing, the systems had trouble accounting for masks—still a persistent reality as of this writing. The researchers also found gender bias in all three systems, even though the technologies used are markedly different.

There is reason to hope. Researchers at the Durham University’s Computer Science Department managed to reduce racial bias by one percent and improve ethnicity accuracy. To achieve these results, the team used a synthesized data set with a higher focus on feature identification. We also learn:

“New software to cut down on demographic differences in face biometric performance has also reached the market. The ethnicity-neutral facial recognition API developed by AIH Technology is officially available in the Microsoft Azure Marketplace. In March, the Canadian company joined the Microsoft Partners Network (MPN) and announced the plans for the global launch of its Facial-Recognition-as-a-Service (FRaaS).”

Bias in biometrics, and AI in general, is a thorny problem with no easy solution. At least now people are aware of the issue and bright minds are working to solve it. Now, if only companies would be willing to delay profitable but problematic implementations until solutions are found. Hmmm.

Cynthia Murrell, August 26, 2020

Informatica: An Old Dog Is Trying to Learn New Tricks?

August 20, 2020

Old dogs. Many people have to pause a moment when standing. Balancing is not the same when one is getting old. Others have to extend an arm, knee, or finger slowly. Joints? Don’t talk about those points of failure to a former athlete. Can bee pollen, a vegan diet, a training session with Glennon Doyle, or an acquisition do the trick?

Informatica Buys AI Startup for Entity and Schema Matching” explains a digital rejuvenation. The article reports:

Informatica’s latest acquisition extends machine learning capabilities into matching of data entities and schemas.

Entities and schemas are important when fiddling with data. I want to point out that Informatica was founded in 1993 and has been in the data entities and schema business for more than a quarter century. Obviously the future is arriving at the venerable software development company.

The technology employed by Green Bay Technologies is what the article calls “Random Forest” machine learning. The article explains that Green Bay’s method possesses:

the ability to handle more diverse data across different domains, including semi-structured and unstructured data, and a crowd-sourcing approach that improves performance.

The Green Bay method employs:

a machine learning approach where multiple decision trees are run, and then subjected to a crowd sourced consensus process to identify the best results. It is a supervised approach where models are auto generated after the user applies some declarative rules – that is, he or she labels a sample set of record pairs, and from there the system infers “blocking rules” to build the models.

Informatica will add Green Bay’s capabilities to its existing smart software engine called CLAIRE.

The write up does not dig into issues related to performance, over fitting, or dealing with rare outcomes or predictors.

Glennon Doyle does not dwell on her flaws either.

Stephen E Arnold, August 20, 2020

Predictive Analytics: A Time and a Place, Not Just in LE?

August 17, 2020

The concept seems sound: analyze data from past crimes to predict future crimes and stop them before they happen. However, in practice the reality is not so simple. That is, as Popular Mechanics explains, “Why Hundreds of Mathematicians Are Boycotting Predictive Policing.” Academic mathematicians are in a unique position—many were brought into the development of predictive policing algorithms in 2016 by The Institute for Computational and Experimental Research in Mathematics (ICERM). One of the partners, PredPol, makes and sells predictive policing tools. Reporter Courtney Linder informs us:

“Several prominent academic mathematicians want to sever ties with police departments across the U.S., according to a letter submitted to Notices of the American Mathematical Society on June 15. The letter arrived weeks after widespread protests against police brutality, and has inspired over 1,500 other researchers to join the boycott. These mathematicians are urging fellow researchers to stop all work related to predictive policing software, which broadly includes any data analytics tools that use historical data to help forecast future crime, potential offenders, and victims. … Some of the mathematicians include Cathy O’Neil, author of the popular book Weapons of Math Destruction, which outlines the very algorithmic bias that the letter rallies against. There’s also Federico Ardila, a Colombian mathematician currently teaching at San Francisco State University, who is known for his work to diversify the field of mathematics.”

Linder helpfully explains what predictive policing is and how it came about. The embedded four-minute video is a good place to start (interestingly, it is produced from a pro-predictive policing point of view). The article also details why many object to the use of this technology. Chicago’s Office of the Inspector General has issued an advisory with a list of best practices to avoid bias, while Santa Cruz has banned the software altogether. We’re told:

“The researchers take particular issue with PredPol, the high-profile company that helped put on the ICERM workshop, claiming in the letter that its technology creates racist feedback loops. In other words, they believe that the software doesn’t help to predict future crime, but instead reinforces the biases of the officers.”

Structural bias also comes into play, as well as the consideration that some crimes go underreported, skewing data. The piece wraps up by describing how widespread this technology is, an account that can be summarized by quoting PredPol’s own claim that one in 33 Americans are “protected” by its software.

With physics and other disciplines like Google online advertising based on probabilities and predictive analytics, what’s the scientific limit on real world applications? Subjective perceptions?

Cynthia Murrell, August 17, 2020

Search and Predicting Behavior

August 3, 2020

DarkCyber is interested in predictive analytics. Bayesian and other “statistical methods” are a go-to technique, and they find their way into many of the smart software systems. Developers rarely explain that systems share many features and functions. Marketers, usually kept in the dark like mushrooms, are free to formulate an interesting assertion or two.

I read “Google Searches During Pandemic Hint at Future Increase in Suicide,” and I was not sure about the methodology. Nevertheless, the write up provides some insight into what can be wiggled from Google search data.

Specifically Columbia University experts have concluded that financial distress is “strongly linked to suicide.”


I learned:

The researchers used an algorithm to analyze Google trends data from March 3, 2019, to April 18, 2020, and identify proportional changes over time in searches for 18 terms related to suicide and known suicide risk factors.

What algorithm?

The method is described this way:

The proportion of queries related to depression was slightly higher than the pre-pandemic period, and moderately higher for panic attack.

Perhaps the researchers looked at the number of searches and noted the increase? So comparing raw numbers? Tenure tracks and grants await! Because that leap between search and future behavior…

Stephen E Arnold, August 3, 2020

Off the Shelf Fancy Math

July 17, 2020

Did you wish you were in the Student Union instead of the engineering lab? Did you long for hanging out with your besties instead of sitting in the library trying to get some answer, any answer to a differential equation? Answer “yes” to either question, and you will enjoy “Algorithms Are Now Commodities.” The write up states:

Algorithms are now like the bolts in a bridge: very important, but nobody talks about them. Today developers talk about story points, features, business logic, etc. Given a well-defined problem, many are now likely to search for an existing package, rather than write code from scratch (I certainly work this way). New algorithms are still being invented, and researchers continue to look for improvements to existing algorithms. This is a niche activity. There are companies where algorithms are not commodities.

The author points out:

Algorithms have not yet completed their journey to obscurity, which has to wait until people can tell computers what they want and not be concerned about the implementation details (or genetic algorithm programming gets a lot better).

With productized code modules proliferating like Star Trek’s Tribbles, math is on the way to the happy condition of a mouse click.

One commenter pointed out:

This is as misguided as a chef claiming recipes are now commodities, and the common chef need not be familiar with any. As with cooking, any organized programming of a machine necessarily involves algorithms, although lesser programmers won’t notice them.—Verisimilitude

This individual then pointed out:

The ‘chefs’ in most restaurants heat precooked components of a meal and combine them on the plate. Progress requires being able to treat what used to be important as commonplace.

An interesting topic. Amazon among others is pushing hard to the “off the shelf” and “ready to consume” approach to a number of computer centric functions.

Push the wrong button, then what? An opportunity to push another button and pay again. Iteration is the name of the game, not figuring out mere exercise problems.

Stephen E Arnold, July 16, 2020

Smart Software and an Intentional Method to Increase Revenue

July 6, 2020

The excellent write up titled “How Researchers Analyzed Allstate’s Car Insurance Algorithm.” My suggestion? Read it.

The “how to” information is detailed and instructive. The article reveals the thought process and logical thinking that allows a giant company with “good hands” to manipulate its revenues.

Here’s the most important statement in the article:

In other words, it appears that Allstate’s algorithm built a “suckers list” that would simply charge the big spenders even higher rates.

The information in the article illustrates how difficult it may be for outsiders to figure out how some smart numerical procedures are assembled into “intentional machines.”

The idea is that data allow the implementation of quite simple big ideas in a slick, automated, obfuscated way.

As my cranky grandfather observed, “It all comes down to money.”

Stephen E Arnold, July 6, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta