Google and Its Hard-to-Believe Excuse of the Week

May 27, 2020

I taught for one or two years when I was in graduate school. Did I ever hear a student say, “My dog ate my homework”? I sure did. I heard other excuses as well; for example, “I was shot on Thanksgiving Day (a true statement. The student showed me the bullet wound in his arm.) I also heard, “I had to watch my baby sister, and she was sick so I couldn’t get the homework.” True. As it turned out, the kid was an only child.

But I never heard, “The algorithm did it.”

Believe it or not, Engadget reported this: “YouTube Blames Bug for Censoring Comments on China’s Ruling Party.” I think Engadget should have written “about China’s” but these real journalists use Grammarly, like, you know.

The article states:

Whatever way the error made its way into YouTube, Google has been slow to address it.

For DarkCyber, the important point is that software, not a work from home or soon to be RIFed human made the error.

The Google plays the “algorithm did it.”

Despite Google’s wealth of smart software, the company’s voice technology has said nothing about the glitch.

Stephen E Arnold, May 27, 2020

Discrete Mathematics: Free and Useful

May 18, 2020

DarkCyber notes that Oscar Levin’s Discrete Mathematics: An Open Introduction can be downloaded from this link. The volume, now in its third edition, includes new exercises. Our favorite section address Graph Theory. There are exercises and old chestnuts like coloring a map. If you want to work on those policeware relationship map, you will find the book useful. In fact, the book provides enough information to allow one to understand the basics of determining who knows whom and other interesting insights in data from Facebook-type aggregators.

Stephen E Arnold, May 18, 2020

Bayesian Math: Useful Book Is Free for Personal Use

May 11, 2020

The third edition of Bayesian Data Analysis (updated on February 13, 2020) is available at this link. The authors are Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. With the Bayes’ principles in hand, making sense of some of the modern smart systems becomes somewhat easier. The book covers the basics and advanced computation. One of the more interesting sections is Part V: Nonlinear and Nonparametric Models. You may want to add this to your library.

Stephen E Arnold, May11, 2020

LAPD Shutters Predictive Policing During Shutdown

May 7, 2020

Police departments are not immune to the economic impact of this pandemic. We learn the Los Angeles Police Department is shutting down its predictive policing program, at least for now, in TechDirt’s write-up, “LAPD’s Failed Predictive Policing Program the Latest COVID-19 Victim.” Writer Tim Cushing makes it perfectly clear he has never been a fan of the analytics approach to law enforcement:

“For the most part, predictive policing relies on garbage data generated by garbage cops, turning years of biased policing into ‘actionable intel’ by laundering it through a bunch of proprietary algorithms. More than half a decade ago, early-ish adopters were expressing skepticism about the tech’s ability to suss out the next crime wave. For millions of dollars less, average cops could have pointed out hot crime spots on a map based on where they’d made arrests, while still coming nothing close to the reasonable suspicion needed to declare nearly everyone in a high crime area a criminal suspect. The Los Angeles Police Department’s history with the tech seems to indicate it should have dumped it years ago. The department has been using some form of the tech since 2007, but all it seems to be able to do is waste limited law enforcement resources to violate the rights of Los Angeles residents. The only explanations for the LAPD’s continued use of this failed experiment are the sunk cost fallacy and its occasional use as a scapegoat for the department’s biased policing.”

Now, though, an April 15 memo from the LAPD declares the department is ceasing to use the PredPol software immediately due to COVID-19 related financial constraints. As one might suppose, Cushing hopes the software will remain off the table once the shutdown is lifted. Hey, anything is possible.

Cynthia Murrell, May 7, 2020

Google Recommendations: A Digital Jail Cell?

May 5, 2020

A team of researchers in at the Centre Marc Bloch in Berlin have closely studied filter bubbles (scientifically called “confinement”) on YouTube. While the phenomenon of filter bubbles across the Web has been a topic of study for several years, scientists Camille Roth, Antoine Mazieres, and Telmo Menezes felt the role of the recommendation algorithm on YouTube had been under-examined. In performing research to plug this gap, they found the dominant video site may produce the most confining bubbles of all. The team shares their main results in “Tubes and Bubbles: Topological Confinement of Recommendations on YouTube.” They summarize:

“Contrarily to popular belief about so-called ‘filter bubbles’, several recent studies show that recommendation algorithms generally do not contribute much, if at all, to user confinement; in some cases, they even seem to increase serendipity [see e.g., 1, 2, 3, 4, 5, 6]. Our study demonstrates however that this may not be the case on YouTube: be it in topological, topical or temporal terms, we show that the landscape defined by non-personalized YouTube recommendations is generally likely to confine users in homogeneous clusters of videos. Besides, content for which confinement appears to be most significant also happens to garner the highest audience and thus plausibly viewing time.”

The abstract to the team’s paper on the study describes their approach:

“Starting from a diverse number of seed videos, we first describe the properties of the sets of suggested videos in order to design a sound exploration protocol able to capture latent recommendation graphs recursively induced by these suggestions. These graphs form the background of potential user navigations along non-personalized recommendations. From there, be it in topological, topical or temporal terms, we show that the landscape of what we call mean-field YouTube recommendations is often prone to confinement dynamics.”

To read about the study in great, scientific detail, complete with illustrations, turn to the full paper published at the PLOS ONE peer-reviewed journal site. Established in 2012, The Centre Marc Bloch’s Computational Social Science Team enlists social scientists alongside computer scientists and modelers to study the social dynamics of today’s digital landscapes. If you are curious what that means, exactly, their page includes an interesting five-minute video describing their work.

Cynthia Murrell, May 5, 2020

Google and Its Mutating Smart Software

April 20, 2020

Google announced quantum supremacy. Earlier the online ad company asserted that it would solve death. Yeah.

Now the company has announced YAB or yet another breakthrough, according to “Google Engineers ‘Mutate’ AI to Make It Evolve Systems Faster Than We Can Code Them”:

For years, engineers at Google have been working on a freakishly smart machine learning system known as the AutoML system (or automatic machine learning system), which is already capable of creating AI that outperforms anything we’ve made. Now, researchers have tweaked it to incorporate concepts of Darwinian evolution and shown it can build AI programs that continue to improve upon themselves faster than they would if humans were doing the coding.

You can read about this mutating wonder in “AutoML-Zero: Evolving Machine Learning Algorithms From Scratch.”

DarkCyber assumes that the use of AutoML will allow Google to solve the death thing. However, it may be difficult for the Googler’s method to cope with degrading advertising and increasing infrastructure costs.

Stephen E Arnold, April 20, 2020

Mr. Bayes and Mr. Occam, Still Popular after All These Years

April 14, 2020

In the early 2000s, I met a self appointed expert who turned red when he talked about the good Reverend Bayes. The spark for his crimson outrage was the idea that one would make assumptions about the future and plug those assumptions into the good Reverend centuries old formula. If you have forgotten it, here it is:

image

Why the ire? I think the person was annoyed with Autonomy’s use of the theorem in its enterprise search system and neuro-linguistic marketing. On top of that, if not trained in an appropriate manner and then retrained, Autonomy’s integrated data operating layer would drift; that is, return results less precise than before. Most licensees were not able to get their manicured nails into this concept of retraining. As a result, the benighted would rail at the flaws of the UK’s first big software company that seemed to make money.

And Occam? Well, this self appointed expert (akin to a person who gets a PhD online and then wants everyone to call him/her “doctor”) did not know about William and his novacula Occami. This church person lived several centuries before the good Reverend Bayes. William’s big idea was KISS or keep it simple stupid. One of my now deceased instructors loved to call this lex parimoniae, but in this blog KISS is close enough for horse shoes. (A variant of this game may have been played a century before Willie was born in the 1280s.)

So what?

I read “How Should We Model Complex Systems?” The write up in my opinion makes the case for the good Reverend’s method with a dash of Willie’s as guiding principles. No doubt the self-appointed expert will be apoplectic if he reads this blog post. But the highlight of the write up is a comment by Yandon Zhang. The failings of modeling reality can be addressed in part by adding more data.

That is a triple play: Bayes’, Willie, and more data.

The result? Models are good enough despite the fancy math that many firms layer on these three legs of the predicting footstool.

What’s that mean in reality? Something is better than nothing. What is often overlooked is that guessing or operating in the manner of Monte Carlo might generate results closer to reality. Want an example? Maybe Covid models?

Stephen E Arnold, April 13, 2020

Forget Weak Priors, Certain Predictive Methods Just Fail

April 2, 2020

Nope. No equations. No stats speak. Tested predictive models were incorrect.

Navigate to “Researchers Find AI Is Bad at Predicting GPA, Grit, Eviction, Job Training, Layoffs, and Material Hardship.” Here’s the finding, which is a delightfully clear:

A paper coauthored by over 112 researchers across 160 data and social science teams found that AI and statistical models, when used to predict six life outcomes for children, parents, and households, weren’t very accurate even when trained on 13,000 data points from over 4,000 families.

So what? The write up states in the form of a quote from the author of the paywalled paper:

“Here’s a setting where we have hundreds of participants and a rich data set, and even the best AI results are still not accurate,” said study co-lead author Matt Salganik, a professor of sociology at Princeton and interim director of the Center for Information Technology Policy at the Woodrow Wilson School of Public and International Affairs. “These results show us that machine learning isn’t magic; there are clearly other factors at play when it comes to predicting the life course.”

We noted this comment from a researcher at Princeton University:

In the end, even the best of the over 3,000 models submitted — which often used complex AI methods and had access to thousands of predictor variables — weren’t spot on. In fact, they were only marginally better than linear regression and logistic regression, which don’t rely on any form of machine learning.

Several observations:

  1. Nice work AAAS. Keep advancing science with a paywall germane to criminal justice and policeware.
  2. Over inflation of the “value” of outputs from models is common in marketing. DarkCyber thinks that the weaknesses of these methods needs more than a few interviews with people like the Cathy O’Neil, author of Weapons of Math Destruction.
  3. Are those afflicted with innumeracy willing to delegate certain important actions to procedures which are worse than relying on luck, flipping a coin, or Monte Carlo methods?

Net net: No one made accurate predictions. Yep, no one. Thought stimulating research with implication for predictive analytics adherents. This open source paper provides some of the information referenced in the AAAS paper: Measuring the Predictability of Life Outcomes with a scientific mass collaboration

Stephen E Arnold, April 2, 2020

Israel and Mobile Phone Data: Some Hypotheticals

March 19, 2020

DarkCyber spotted a story in the New York Times: “Israel Looks to Repurpose a Trove of Cell Phone Data.” The story appeared in the dead tree edition on March 17, 2020, and you can access the online version of the write up at this link.

The write up reports:

Prime Minister Benjamin Netanyahu of Israel authorized the country’s internal security agency to tap into a vast , previously undisclosed trove of cell phone data to retract the movements of people who have contracted the corona virus and identify others who should be quarantined because their paths crossed.

Okay, cell phone data. Track people. Paths crossed. So what?

Apparently not much.

The Gray Lady does the handwaving about privacy and the fragility of democracy in Israel. There’s a quote about the need for oversight when certain specialized data are retained and then made available for analysis. Standard journalism stuff.

DarkCyber’s team talked about the write up and what the real journalists left out of the story. Remember. DarkCyber operates from a hollow in rural Kentucky and knows zero about Israel’s data collection realities. Nevertheless, my team was able to identify some interesting use cases.

Let’s look at a couple and conclude with a handful of observations.

First, the idea of retaining cell phone data is not exactly a new one. What if these data can be extracted using an identifier for a person of interest? What if a time-series query could extract the geolocation data for each movement of the person of interest captured by a cell tower? What if this path could be displayed on a map? Here’s a dummy example of what the plot for a single person of interest might look like. Please, note these graphics are examples selected from open sources. Examples are not related to a single investigation or vendor. These are for illustrative purposes only.

image

Source: Standard mobile phone tracking within a geofence. Map with blue lines showing a person’s path. SPIE at https://bit.ly/2TXPBby

Useful indeed.

Second, what if the intersection of two or more individuals can be plotted. Here’s a simulation of such a path intersection:

image

Source: Map showing the location of a person’s mobile phone over a period of time. Tyler Bell at https://bit.ly/2IVqf7y

Would these data provide a way to identify an individual with a mobile phone who was in “contact” with a person of interest? Would the authorities be able to perform additional analyses to determine who is in either party’s social network?

Third, could these relationship data be minded so that connections can be further explored?

Image result for analyst notebook mapping route

Source:  Diagram of people who have crossed paths visualized via Analyst Notebook functions. Globalconservation.org

Can these data be arrayed on a timeline? Can the routes be converted into an animation that shows a particular person of interest’s movements at a specific window of time?

image

Source: Vertical dots diagram from Recorded Future showing events on a timeline. https://bit.ly/39Xhbex

These hypothetical displays of data derived from cross correlations, geotagging, and timeline generation based on date stamps seem feasible. If earnest individuals in rural Kentucky can see the value of these “secret” data disclosed in the New York Times’ article, why didn’t the journalist and the others who presumably read the story?

What’s interesting is that systems, methods, and tools clearly disclosed in open source information is overlooked, ignored, or just not understood.

Now the big question: Do other countries have these “secret” troves of data?

DarkCyber does not know; however, it seems possible. Log files are a useful function of data processes. Data exhaust may have value.

Stephen E Arnold, March 19, 2020

Machine Learning Foibles: Are We Surprised? Nope

March 18, 2020

Eurekalert published “Study Shows Widely Used Machine Learning Methods Don’t Work As Claimed.” Imagine that? The article states:

Researchers demonstrated the mathematical impossibility of representing social networks and other co0mplex networks using popular methods of low dimensional embeddings.

To put the allegations and maybe mathematical proof in context, there are many machine learning methods and even more magical thresholds the data whiz kids fiddle to generate acceptable outputs. The idea is that as long as the outputs are “good enough”, the training method is okay to use. Statistics is just math with some good old fashioned “thumb on the scale” opportunities.

The article states:

The study evaluated techniques known as “low-dimensional embeddings,” which are commonly used as input to machine learning models. This is an active area of research, with new embedding methods being developed at a rapid pace. But Seshadhri and his coauthors say all these methods share the same shortcomings.

What are the shortcomings?

Seshadhri and his coauthors demonstrated mathematically that significant structural aspects of complex networks are lost in this embedding process. They also confirmed this result by empirically by testing various embedding techniques on different kinds of complex networks.

The method discards or ignores information, relying on a fuzz ball which puts an individual into a “geometric representation.” Individuals’ social connections are lost in the fuzzification procedures.

Big deal. Sort of. The paper opens the door to many graduate students’ beavering away on the “accuracy” of machine learning procedures.

Stephen E Arnold, March 18, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta