Facial Recognition: A Partial List

June 3, 2020

DarkCyber noted “From RealPlayer to Toshiba, Tech Companies Cash in on the Facial Recognition Gold Rush.” The write up provides two interesting things and one idea which is like a truck tire retread.

First, the write up points out that facial recognition or FR is a “gold rush.” That’s a comparison which eluded the DarkCyber research team. There’s no land. No seller of heavy duty pants. No beautiful scenery. No wading in cold water. No hydro mining. Come to think of it, FR is not like a gold rush.

Second, the write up provides a partial list of outfits engaged in facial recognition. The word partial is important. There are some notable omissions, but 45 is an impressive number. That’s the point. Just 45?

The aspect of the write the DarkCyber team ignored is this “from the MBA classroom” observation:

Despite hundreds of vendors currently selling facial recognition technology across the United States, there is no single government body registering the technology’s rollout, nor is there a public-facing list of such companies working with law enforcement. To document which companies are selling such technology today, the best resource the public has is a governmental agency called the National Institute of Standards and Technology.

Governments are doing a wonderful job it seems. Perhaps the European Union should step forward? What about Brazil? China? Russia? The United Nations? With Covid threats apparently declining, maybe the World Health Organization? Yep, governments.

Then, after wanting a central listing of FR vendors, this passage snagged one of my researcher’s attention:

NIST is a government organization responsible for setting scientific measurement standards and testing novel technology. As a public service, NIST also provides a rolling analysis of facial recognition algorithms, which evaluates the accuracy and speed of a vendor’s algorithms. Recently, that analysis has also included aspects of facial recognition field like algorithmic bias based on race, age, and sex. NIST has previously found evidence of bias in a majority of algorithms studied.

Yep, NIST. The group has done an outstanding job for enterprise search. Plus the bias in algorithms has been documented and run through the math grinding wheel for many years. Put in snaps of bad actors and the FR system does indeed learn to match one digital watermark with a similar digital watermark. Run kindergarten snaps through the system and FR matches are essentially useless. Bias? Sure enough.

Consider these ideas:

  • An organization, maybe Medium, should build a database of FR companies
  • An organization, maybe Medium, should test each of the FR systems using available datasets or better yet building a training set
  • An organization, maybe Medium, should set up a separate public policy blog to track government organizations which are not doing the job to Medium’s standards.

There is an interest in facial recognition because there is a need to figure out who is who. There are some civil disturbances underway in a certain high profile country. FR systems may not be perfect, but they may offer a useful tool to some. On the other hand, why not abandon modern tools until they are perfect.

We live in an era of good enough, and that’s what is available.

Stephen E Arnold, June 3, 2020

Google and Its Hard-to-Believe Excuse of the Week

May 27, 2020

I taught for one or two years when I was in graduate school. Did I ever hear a student say, “My dog ate my homework”? I sure did. I heard other excuses as well; for example, “I was shot on Thanksgiving Day (a true statement. The student showed me the bullet wound in his arm.) I also heard, “I had to watch my baby sister, and she was sick so I couldn’t get the homework.” True. As it turned out, the kid was an only child.

But I never heard, “The algorithm did it.”

Believe it or not, Engadget reported this: “YouTube Blames Bug for Censoring Comments on China’s Ruling Party.” I think Engadget should have written “about China’s” but these real journalists use Grammarly, like, you know.

The article states:

Whatever way the error made its way into YouTube, Google has been slow to address it.

For DarkCyber, the important point is that software, not a work from home or soon to be RIFed human made the error.

The Google plays the “algorithm did it.”

Despite Google’s wealth of smart software, the company’s voice technology has said nothing about the glitch.

Stephen E Arnold, May 27, 2020

Discrete Mathematics: Free and Useful

May 18, 2020

DarkCyber notes that Oscar Levin’s Discrete Mathematics: An Open Introduction can be downloaded from this link. The volume, now in its third edition, includes new exercises. Our favorite section address Graph Theory. There are exercises and old chestnuts like coloring a map. If you want to work on those policeware relationship map, you will find the book useful. In fact, the book provides enough information to allow one to understand the basics of determining who knows whom and other interesting insights in data from Facebook-type aggregators.

Stephen E Arnold, May 18, 2020

Bayesian Math: Useful Book Is Free for Personal Use

May 11, 2020

The third edition of Bayesian Data Analysis (updated on February 13, 2020) is available at this link. The authors are Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. With the Bayes’ principles in hand, making sense of some of the modern smart systems becomes somewhat easier. The book covers the basics and advanced computation. One of the more interesting sections is Part V: Nonlinear and Nonparametric Models. You may want to add this to your library.

Stephen E Arnold, May11, 2020

LAPD Shutters Predictive Policing During Shutdown

May 7, 2020

Police departments are not immune to the economic impact of this pandemic. We learn the Los Angeles Police Department is shutting down its predictive policing program, at least for now, in TechDirt’s write-up, “LAPD’s Failed Predictive Policing Program the Latest COVID-19 Victim.” Writer Tim Cushing makes it perfectly clear he has never been a fan of the analytics approach to law enforcement:

“For the most part, predictive policing relies on garbage data generated by garbage cops, turning years of biased policing into ‘actionable intel’ by laundering it through a bunch of proprietary algorithms. More than half a decade ago, early-ish adopters were expressing skepticism about the tech’s ability to suss out the next crime wave. For millions of dollars less, average cops could have pointed out hot crime spots on a map based on where they’d made arrests, while still coming nothing close to the reasonable suspicion needed to declare nearly everyone in a high crime area a criminal suspect. The Los Angeles Police Department’s history with the tech seems to indicate it should have dumped it years ago. The department has been using some form of the tech since 2007, but all it seems to be able to do is waste limited law enforcement resources to violate the rights of Los Angeles residents. The only explanations for the LAPD’s continued use of this failed experiment are the sunk cost fallacy and its occasional use as a scapegoat for the department’s biased policing.”

Now, though, an April 15 memo from the LAPD declares the department is ceasing to use the PredPol software immediately due to COVID-19 related financial constraints. As one might suppose, Cushing hopes the software will remain off the table once the shutdown is lifted. Hey, anything is possible.

Cynthia Murrell, May 7, 2020

Google Recommendations: A Digital Jail Cell?

May 5, 2020

A team of researchers in at the Centre Marc Bloch in Berlin have closely studied filter bubbles (scientifically called “confinement”) on YouTube. While the phenomenon of filter bubbles across the Web has been a topic of study for several years, scientists Camille Roth, Antoine Mazieres, and Telmo Menezes felt the role of the recommendation algorithm on YouTube had been under-examined. In performing research to plug this gap, they found the dominant video site may produce the most confining bubbles of all. The team shares their main results in “Tubes and Bubbles: Topological Confinement of Recommendations on YouTube.” They summarize:

“Contrarily to popular belief about so-called ‘filter bubbles’, several recent studies show that recommendation algorithms generally do not contribute much, if at all, to user confinement; in some cases, they even seem to increase serendipity [see e.g., 1, 2, 3, 4, 5, 6]. Our study demonstrates however that this may not be the case on YouTube: be it in topological, topical or temporal terms, we show that the landscape defined by non-personalized YouTube recommendations is generally likely to confine users in homogeneous clusters of videos. Besides, content for which confinement appears to be most significant also happens to garner the highest audience and thus plausibly viewing time.”

The abstract to the team’s paper on the study describes their approach:

“Starting from a diverse number of seed videos, we first describe the properties of the sets of suggested videos in order to design a sound exploration protocol able to capture latent recommendation graphs recursively induced by these suggestions. These graphs form the background of potential user navigations along non-personalized recommendations. From there, be it in topological, topical or temporal terms, we show that the landscape of what we call mean-field YouTube recommendations is often prone to confinement dynamics.”

To read about the study in great, scientific detail, complete with illustrations, turn to the full paper published at the PLOS ONE peer-reviewed journal site. Established in 2012, The Centre Marc Bloch’s Computational Social Science Team enlists social scientists alongside computer scientists and modelers to study the social dynamics of today’s digital landscapes. If you are curious what that means, exactly, their page includes an interesting five-minute video describing their work.

Cynthia Murrell, May 5, 2020

Google and Its Mutating Smart Software

April 20, 2020

Google announced quantum supremacy. Earlier the online ad company asserted that it would solve death. Yeah.

Now the company has announced YAB or yet another breakthrough, according to “Google Engineers ‘Mutate’ AI to Make It Evolve Systems Faster Than We Can Code Them”:

For years, engineers at Google have been working on a freakishly smart machine learning system known as the AutoML system (or automatic machine learning system), which is already capable of creating AI that outperforms anything we’ve made. Now, researchers have tweaked it to incorporate concepts of Darwinian evolution and shown it can build AI programs that continue to improve upon themselves faster than they would if humans were doing the coding.

You can read about this mutating wonder in “AutoML-Zero: Evolving Machine Learning Algorithms From Scratch.”

DarkCyber assumes that the use of AutoML will allow Google to solve the death thing. However, it may be difficult for the Googler’s method to cope with degrading advertising and increasing infrastructure costs.

Stephen E Arnold, April 20, 2020

Mr. Bayes and Mr. Occam, Still Popular after All These Years

April 14, 2020

In the early 2000s, I met a self appointed expert who turned red when he talked about the good Reverend Bayes. The spark for his crimson outrage was the idea that one would make assumptions about the future and plug those assumptions into the good Reverend centuries old formula. If you have forgotten it, here it is:

image

Why the ire? I think the person was annoyed with Autonomy’s use of the theorem in its enterprise search system and neuro-linguistic marketing. On top of that, if not trained in an appropriate manner and then retrained, Autonomy’s integrated data operating layer would drift; that is, return results less precise than before. Most licensees were not able to get their manicured nails into this concept of retraining. As a result, the benighted would rail at the flaws of the UK’s first big software company that seemed to make money.

And Occam? Well, this self appointed expert (akin to a person who gets a PhD online and then wants everyone to call him/her “doctor”) did not know about William and his novacula Occami. This church person lived several centuries before the good Reverend Bayes. William’s big idea was KISS or keep it simple stupid. One of my now deceased instructors loved to call this lex parimoniae, but in this blog KISS is close enough for horse shoes. (A variant of this game may have been played a century before Willie was born in the 1280s.)

So what?

I read “How Should We Model Complex Systems?” The write up in my opinion makes the case for the good Reverend’s method with a dash of Willie’s as guiding principles. No doubt the self-appointed expert will be apoplectic if he reads this blog post. But the highlight of the write up is a comment by Yandon Zhang. The failings of modeling reality can be addressed in part by adding more data.

That is a triple play: Bayes’, Willie, and more data.

The result? Models are good enough despite the fancy math that many firms layer on these three legs of the predicting footstool.

What’s that mean in reality? Something is better than nothing. What is often overlooked is that guessing or operating in the manner of Monte Carlo might generate results closer to reality. Want an example? Maybe Covid models?

Stephen E Arnold, April 13, 2020

Forget Weak Priors, Certain Predictive Methods Just Fail

April 2, 2020

Nope. No equations. No stats speak. Tested predictive models were incorrect.

Navigate to “Researchers Find AI Is Bad at Predicting GPA, Grit, Eviction, Job Training, Layoffs, and Material Hardship.” Here’s the finding, which is a delightfully clear:

A paper coauthored by over 112 researchers across 160 data and social science teams found that AI and statistical models, when used to predict six life outcomes for children, parents, and households, weren’t very accurate even when trained on 13,000 data points from over 4,000 families.

So what? The write up states in the form of a quote from the author of the paywalled paper:

“Here’s a setting where we have hundreds of participants and a rich data set, and even the best AI results are still not accurate,” said study co-lead author Matt Salganik, a professor of sociology at Princeton and interim director of the Center for Information Technology Policy at the Woodrow Wilson School of Public and International Affairs. “These results show us that machine learning isn’t magic; there are clearly other factors at play when it comes to predicting the life course.”

We noted this comment from a researcher at Princeton University:

In the end, even the best of the over 3,000 models submitted — which often used complex AI methods and had access to thousands of predictor variables — weren’t spot on. In fact, they were only marginally better than linear regression and logistic regression, which don’t rely on any form of machine learning.

Several observations:

  1. Nice work AAAS. Keep advancing science with a paywall germane to criminal justice and policeware.
  2. Over inflation of the “value” of outputs from models is common in marketing. DarkCyber thinks that the weaknesses of these methods needs more than a few interviews with people like the Cathy O’Neil, author of Weapons of Math Destruction.
  3. Are those afflicted with innumeracy willing to delegate certain important actions to procedures which are worse than relying on luck, flipping a coin, or Monte Carlo methods?

Net net: No one made accurate predictions. Yep, no one. Thought stimulating research with implication for predictive analytics adherents. This open source paper provides some of the information referenced in the AAAS paper: Measuring the Predictability of Life Outcomes with a scientific mass collaboration

Stephen E Arnold, April 2, 2020

Israel and Mobile Phone Data: Some Hypotheticals

March 19, 2020

DarkCyber spotted a story in the New York Times: “Israel Looks to Repurpose a Trove of Cell Phone Data.” The story appeared in the dead tree edition on March 17, 2020, and you can access the online version of the write up at this link.

The write up reports:

Prime Minister Benjamin Netanyahu of Israel authorized the country’s internal security agency to tap into a vast , previously undisclosed trove of cell phone data to retract the movements of people who have contracted the corona virus and identify others who should be quarantined because their paths crossed.

Okay, cell phone data. Track people. Paths crossed. So what?

Apparently not much.

The Gray Lady does the handwaving about privacy and the fragility of democracy in Israel. There’s a quote about the need for oversight when certain specialized data are retained and then made available for analysis. Standard journalism stuff.

DarkCyber’s team talked about the write up and what the real journalists left out of the story. Remember. DarkCyber operates from a hollow in rural Kentucky and knows zero about Israel’s data collection realities. Nevertheless, my team was able to identify some interesting use cases.

Let’s look at a couple and conclude with a handful of observations.

First, the idea of retaining cell phone data is not exactly a new one. What if these data can be extracted using an identifier for a person of interest? What if a time-series query could extract the geolocation data for each movement of the person of interest captured by a cell tower? What if this path could be displayed on a map? Here’s a dummy example of what the plot for a single person of interest might look like. Please, note these graphics are examples selected from open sources. Examples are not related to a single investigation or vendor. These are for illustrative purposes only.

image

Source: Standard mobile phone tracking within a geofence. Map with blue lines showing a person’s path. SPIE at https://bit.ly/2TXPBby

Useful indeed.

Second, what if the intersection of two or more individuals can be plotted. Here’s a simulation of such a path intersection:

image

Source: Map showing the location of a person’s mobile phone over a period of time. Tyler Bell at https://bit.ly/2IVqf7y

Would these data provide a way to identify an individual with a mobile phone who was in “contact” with a person of interest? Would the authorities be able to perform additional analyses to determine who is in either party’s social network?

Third, could these relationship data be minded so that connections can be further explored?

Image result for analyst notebook mapping route

Source:  Diagram of people who have crossed paths visualized via Analyst Notebook functions. Globalconservation.org

Can these data be arrayed on a timeline? Can the routes be converted into an animation that shows a particular person of interest’s movements at a specific window of time?

image

Source: Vertical dots diagram from Recorded Future showing events on a timeline. https://bit.ly/39Xhbex

These hypothetical displays of data derived from cross correlations, geotagging, and timeline generation based on date stamps seem feasible. If earnest individuals in rural Kentucky can see the value of these “secret” data disclosed in the New York Times’ article, why didn’t the journalist and the others who presumably read the story?

What’s interesting is that systems, methods, and tools clearly disclosed in open source information is overlooked, ignored, or just not understood.

Now the big question: Do other countries have these “secret” troves of data?

DarkCyber does not know; however, it seems possible. Log files are a useful function of data processes. Data exhaust may have value.

Stephen E Arnold, March 19, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta