Facial Recognition: More Than Faces

July 29, 2021

Facial recognition software is not just for law enforcement anymore. Israel-based firm AnyVision’s clients include retail stores, hospitals, casinos, sports stadiums, and banks. Even schools are using the software to track minors with, it appears, nary a concern for their privacy. We learn this and more from, “This Manual for a Popular Facial Recognition Tool Shows Just How Much the Software Tracks People” at The Markup. Writer Alfred Ng reports that AnyVision’s 2019 user guide reveals the software logs and analyzes all faces that appear on camera, not only those belonging to persons of interest. A representative boasted that, during a week-long pilot program at the Santa Fe Independent School District in Texas, the software logged over 164,000 detections and picked up one student 1100 times.

There are a couple privacy features built in, but they are not turned on by default. “Privacy Mode” only logs faces of those on a watch list and “GDPR Mode” blurs non-watch listed faces on playbacks and downloads. (Of course, what is blurred can be unblurred.) Whether a client uses those options depends on its use case and, importantly, local privacy regulations. Ng observes:

“The growth of facial recognition has raised privacy and civil liberties concerns over the technology’s ability to constantly monitor people and track their movements. In June, the European Data Protection Board and the European Data Protection Supervisor called for a facial recognition ban in public spaces, warning that ‘deploying remote biometric identification in publicly accessible spaces means the end of anonymity in those places.’ Lawmakers, privacy advocates, and civil rights organizations have also pushed against facial recognition because of error rates that disproportionately hurt people of color. A 2018 research paper from Joy Buolamwini and Timnit Gebru highlighted how facial recognition technology from companies like Microsoft and IBM is consistently less accurate in identifying people of color and women. In December 2019, the National Institute of Standards and Technology also found that the majority of facial recognition algorithms exhibit more false positives against people of color. There have been at least three cases of a wrongful arrest of a Black man based on facial recognition.”

Schools that have implemented facial recognition software say it is an effort to prevent school shootings, a laudable goal. However, once in place it is tempting to use it for less urgent matters. Ng reports the Texas City Independent School District has used it to identify one student who was licking a security camera and to have another removed from his sister’s graduation because he had been expelled. As Georgetown University’s Clare Garvie points out:

“The mission creep issue is a real concern when you initially build out a system to find that one person who’s been suspended and is incredibly dangerous, and all of a sudden you’ve enrolled all student photos and can track them wherever they go. You’ve built a system that’s essentially like putting an ankle monitor on all your kids.”

Is this what we really want as a society? Never mind, it is probably a bit late for that discussion.

Cynthia Murrell, July 29, 2021

Why Some Outputs from Smart Software Are Wonky

July 26, 2021

Some models work like a champ. Utility rate models are reasonably reliable. When it is hot, use of electricity goes up. Rates are then “adjusted.” Perfect. Other models are less solid; for example, Bayesian systems which are not checked every hour or large neural nets which are “assumed” to be honking along like a well-ordered flight of geese. Why do I offer such Negative Ned observations? Experience for one thing and the nifty little concepts tossed out by Ben Kuhn, a Twitter persona. You can locate this string of observations at this link. Well, you could as of July 26, 2021, at 630 am US Eastern time. Here’s a selection of what are apparently the highlights of Mr. Kuhn’s conversation with “a former roommate.” That’s provenance enough for me.

Item One:

Most big number theory results are apparently 50-100 page papers where deeply understanding them is ~as hard as a semester-long course. Because of this, ~nobody has time to understand all the results they use—instead they “black-box” many of them without deeply understanding.

Could this be true? How could newly minted, be an expert with our $40 online course, create professionals who use models packaged in downloadable and easy to plug in modules be unfamiliar with the inner workings of said bundles of brilliance? Impossible? Really?

Item Two:

A lot of number theory is figuring out how to stitch together many different such black boxes to get some new big result. Roommate described this as “flailing around” but also highly effective and endorsed my analogy to copy-pasting code from many different Stack Overflow answers.

Oh, come on. Flailing around. Do developers flail or do they “trust” the outfits who pretend to know how some multi-layered systems work. Fiddling with assumptions, thresholds, and (close your ears) the data themselves  are never, ever a way to work around a glitch.

Item Three

Roommate told a story of using a technique to calculate a number and having a high-powered prof go “wow, I didn’t know you could actually do that”

No kidding? That’s impossible in general, and that expression would never be uttered at Amazon-, Facebook-, and Google-type operations, would it?

Will Mr. Kuhn be banned for heresy. [Keep in mind how Wikipedia defines this term: “is any belief or theory that is strongly at variance with established beliefs or customs, in particular the accepted beliefs of a church or religious organization.”] Just repeating an idea once would warrant a close encounter with an Iron Maiden or a pile of firewood. Probably not today. Someone might emit a slightly critical tweet, however.

Stephen E Arnold, July 26, 2021

Quote to Note: Fire, but Not the Wheel, Is a Loser

July 16, 2021

If Google says something, I believe it. Don’t you? Google is the Oracle of Shoreline Drive. No, not the Oracle on Dolphin Way, which is just south on the brilliantly designed Highway 101.

I had my enthusiasm for Google’s brilliance confirmed after I read “Google CEO Still Insists AI Revolution Bigger Than Invention of Fire.” The write up states:

Pichai suggests the internet and electricity are also small potatoes compared to AI.

Absolutely. AI makes possible much more than mere frightening animals at night, cooking said animals if a humanoid was able to kill it, melt substances to fabricate computers, and enable some types of power generation used to produce Google tchotchkes. AI is more, much more.

The write up continues with original secondary research from the Beeb:

“The progress in artificial intelligence, we are still in very early stages, but I viewed it as the most profound technology that humanity will ever develop and work on, and we have to make sure we do it in a way that we can harness it to society’s benefit,” Pichai said. “But I expect it to play a foundational role pretty much across every aspect of our lives. You know, be it health care, be it education, be it how we manufacture things and how we consume information. And so I view it as a very profound enabling technology. You know, if you think about fire or electricity or the internet, it’s like that, but I think even more profound,” Pichai continued.

The article points out that the Google Oracle does not define artificial intelligence. Never mind. Google says it, I believe it. My hunch is that if you want to get hired or become a consultant to Google believing that smart software is more important than fire is a precondition for becoming Googley.

Don’t believe me? Don’t understand the “profoundness” of the Timnit Gebru – Google dust up about AI? Not my problem. I believe. After walking my French bulldog, I will set on fire (a secondary discovery as you know) an America Online CD ROM.

Stephen E Arnold, July 16, 2021

Predicting Behavior from Videos: A New Frontier for Touts

July 13, 2021

I spotted “AI Learns to Predict Human Behavior from Videos.” Sounds good, sounds promising, sounds like IBM. The idea is that Watson (open source software, home grown IBM code, and software from acquisitions) can foretell the future. Feed Watson videos, and Watson can figure out what happens next.

The write up states:

In a new study, Columbia Engineering researchers unveil a computer vision technique for giving machines a more intuitive sense for what will happen next by leveraging higher-level associations between people, animals, and objects.

What’s the time horizon? Answer: Several minutes in the future.

What’s the accuracy? Answer: Uh, well.

What’s actually predicted? Answer: A higher level that links concepts.

What’s this means? Answer: Uh, well.

IBM, which like Google declared quantum supremacy-ness, is working overtime to demonstrate that Watson can deliver high value payoffs to those who embrace the IBM approach to smart software.

One of the researchers/students allegedly said, “Prediction is the basis of human intelligence.”

Okay, I will make a prediction: This watching videos angle smacks of marketing hoo hah based on the efforts of students with access to Watsony stuff and an environment which is hungry for evidence of the quantum supremacy-ness.

Confidence level: 99.999999

Stephen E Arnold, July 13, 2021

Want to Cash In on the TikTok AI?

July 8, 2021

If you want to license the artificial intelligence which chainsaws away IQ points, you can. The vendor is a company called BytePlus, and, yes, it is an official source of the TikTok goodness. Just bring cash and leave your concerns about having data from your use of the system and method winging its way to the land that won over Marco Polo.

ByteDance Starts Selling TikTok’s AI to Other Companies” states (if you pay up to read the original write up in the weird orange newspaper):BytePlus offers businesses the chance to tap some of TikTok’s secret ingredient: the algorithm that keeps users scrolling by recommending them videos that it thinks they will like. They can use this technology to personalize their apps and services for their customers. Other software on offer includes automated translation of text and speech, real-time video effects and a suite of data analysis and management tools.

Just think you can hook your prospects on short videos about such compelling subjects as enterprise search, the MBA life, personnel management at Google, and cooking on a burning Tesla Plaid.

Stephen E Arnold, July 8, 2021

Smart Software, Humans, and Personnel: The Ingredients for Management Success

July 2, 2021

I thought this paragraph was thought provoking:

A Deloitte survey found that while 71 percent of companies see people analytics as a high priority in their organizations (31 percent rate it very important), progress has been slow. After years of discussing this issue, only 8 percent report they have usable data; only 9 percent believe they have a good understanding of which talent dimensions drive performance in their organizations; and only 15 percent have broadly deployed HR and talent scorecards for line managers. One of the reasons for the low adoption of people analytics is that companies have a closed approach to analytics in HR, and readiness remains a serious issue.

This passage appears in “How People Analytics  Can Create a Culture of Care and Success.” Let’s consider two organizations with smart software, oodles of data, and a pristine track record for making big bucks. The two outfits are Amazon and Google. Both of these exemplary institutions have done an outstanding job with their people management. As I recall, there were some distasteful blog posts about warehouse workers and plastic bottles, comments about smart cameras and GPS systems monitoring delivery truck drivers, and the phone booth in which an employee can lock away the world. Peace, calm, and care.

And the Google? Staff flips at DeepMind, the outfit which has some type of smart software supremacy. Then there is the trivial ethics thing and the textbook handling of the Timnit Gebru situation. And, lest I forget, the yacht death, the diploid cell assembly in the legal department, and the fumbled suicide attempt by a big wheel’s marketing associate. Protests? I could name a few. Petitions? Well, that Maven thing. Sigh.

It seems to me that the article, including the “hire us because we are HR experts” in the Deloitte PR set up are describing a future which sells consulting. It seems that employees are forcing change upon employers. Others are quitting. Another group is happy to collect government “pay for being alive” checks. Smart software to the rescue? Give me a break.

When two outfits equipped to create cultures of care and success cannot do basic personnel, reality is different from the silliness of surveys chock full of buzzwords. Don’t believe me? Ask an Amazon truck driver. Better yet, pose a question to Dr. Gebru.

Stephen E Arnold, July 2, 2021

A Theory: No Room for Shortcuts in Healthcare Datasets

July 1, 2021

The value of any machine learning algorithm depends on the data it was trained on, we are reminded in the article, “Machine Learning Deserves Better Than This” at AAASScience Mag. Writer Derek Lowe makes some good points that are, nevertheless, likely to make him unpopular among the rah-rah AI crowd. He is specifically concerned with the ways machine learning is currently being applied in healthcare. As an example, Lowe examines a paper on coronavirus pathology as revealed in lung X-ray data. He writes:

“Every single one of the studies falls into clear methodological errors that invalidate their conclusions. These range from failures to reveal key details about the training and experimental data sets, to not performing robustness or sensitivity analyses of their models, not performing any external validation work, not showing any confidence intervals around the final results (or not revealing the statistical methods used to compute any such), and many more. A very common problem was the (unacknowledged) risk of bias right up front. Many of these papers relied on public collections of radiological data, but these have not been checked to see if the scans marked as COVID-19 positive patients really were (or if the ones marked negative were as well). It also needs to be noted that many of these collections are very light on actual COVID scans compared to the whole database, which is not a good foundation to work from, either, even if everything actually is labeled correctly by some miracle. Some papers used the entire dataset in such cases, while others excluded images using criteria that were not revealed, which is naturally a further source of unexamined bias.”

As our regular readers are aware, any AI is only as good as the data it is trained upon. However, data scientists can be so eager to develop tools (or, to be less charitable, to get published) that they take shortcuts. Some, for example, accept all data from public databases without any verification. Others misapply data, like the collection of lung x-rays from patients under the age of five that was included in the all-ages pneumonia dataset. Then there are the datasets and algorithms that simply do not have enough documentation to be trusted. How was the imaging data pre-processed? How was the model trained? How was it selected and validated? Crickets.

We understand why people are excited about the potential of machine learning in healthcare, a high-stakes field where solutions can be frustratingly elusive. However, it benefits no one to rely on conclusions drawn from flawed data. In fact, doing so can be downright dangerous. Let us take the time to get machine learning right first.

Cynthia Murrell, July 1, 2021

Microsoft Code Recommendations: Objectivity and Relevance, Anyone?

June 30, 2021

The “real news” outfit CNBC published an interesting news item: “Microsoft and OpenAI Have a New A.I. Tool That Will Give Coding Suggestions to Software Developers.” The write up states:

Microsoft on Tuesday announced an artificial intelligence system that can recommend code for software developers to use as they write code…The system, called GitHub Copilot, draws on source code uploaded to code-sharing service GitHub, which Microsoft acquired in 2018, as well as other websites. Microsoft and GitHub developed it with help from OpenAI, an AI research start-up that Microsoft backed in 2019.

The push to make programming “easier” is moving into Recommendation Land. Recommendation technology from Bing is truly remarkable. Here’s a quick example. Navigate to Bing and enter the query “Louisville KY bookkeeper.” Here are the results:

image

The page is mostly ads and links to intermediaries who sell connections to bookkeepers accepting new clients, wonky “best” lists, and links to two bookkeeping companies. FYI: There are dozens of bookkeeping services in Louisville, and the optimal way to get recommendations is to pose a query to the Nextdoor.com Web site.

Now a question: How “objective” will these code suggestions be? Will there be links to open source supported by or contributed to by such exemplary organizations as Amazon, Google, and IBM, among others?

My hunch is that Bing points the way to the future. I will be interested to see what code is recommended to a developer working on a smart cyber security system, which may challenge the most excellentness of Microsoft’s own offerings.

Stephen E Arnold, June 30, 2021

Gender Biased AI in Job Searches Now a Thing

June 30, 2021

From initial search to applications to interviews, job hunters are now steered through the process by algorithms. Employers’ demand for AI solutions has surged with the pandemic, but there is a problem—the approach tends to disadvantage women applicants. An article at MIT Technology Review describes one website’s true bro response: “LinkedIn’s Job-Matching AI Was Biased. The Company’s Solution? More AI.” Reporters Sheridan Wall and Hilke Schellmann also cover the responses of competing job search sites Monster, CareerBuilder, and ZipRecruiter. Citing former LinkedIn VP John Jerson, they write:

“These systems base their recommendations on three categories of data: information the user provides directly to the platform; data assigned to the user based on others with similar skill sets, experiences, and interests; and behavioral data, like how often a user responds to messages or interacts with job postings. In LinkedIn’s case, these algorithms exclude a person’s name, age, gender, and race, because including these characteristics can contribute to bias in automated processes. But Jersin’s team found that even so, the service’s algorithms could still detect behavioral patterns exhibited by groups with particular gender identities. For example, while men are more likely to apply for jobs that require work experience beyond their qualifications, women tend to only go for jobs in which their qualifications match the position’s requirements. The algorithm interprets this variation in behavior and adjusts its recommendations in a way that inadvertently disadvantages women. … Men also include more skills on their résumés at a lower degree of proficiency than women, and they often engage more aggressively with recruiters on the platform.”

Rather than, say, inject human judgment into the process, LinkedIn added new AI in 2018 designed to correct for the first algorithm’s bias. Other companies side-step the AI issue. CareerBuilder addresses bias by teaching employers how to eliminate it from their job postings, while Monster relies on attracting users from diverse backgrounds. ZipRecruiter’s CEO says that site classifies job hunters using 64 types of information, including geographical data but not identifying pieces like names. He refused to share more details, but is confident his team’s method is as bias-free as can be. Perhaps—but the claims of any of these sites are difficult or impossible to verify.

Cynthia Murrell, June 30, 2021

SPACtacular Palantir Tech Gets More Attention: This Is Good?

June 30, 2021

Palantir is working to expand its public-private partnership operations beyond security into the healthcare field. Some say the company has fallen short in its efforts to peddle security software to officials in Europe, so the data-rich field of government-managed healthcare is the next logical step. Apparently the pandemic gave Palantir the opening it was looking for, paving the way for a deal it made with the UK’s National Health Service to develop the NHS COVID-19 Data Store. Now however, CNBC reports, “Campaign Launches to Try to Force Palantir Out of Britain’s NHS.” Reporter Sam L. Shead states that more than 50 organizations, led by tech-justice nonprofit Foxglove, are protesting Palantir’s involvement. We learn:

“The Covid-19 Data Store project, which involves Palantir’s Foundry data management platform, began in March 2020 alongside other tech giants as the government tried to slow the spread of the virus across the U.K. It was sold as a short-term effort to predict how best to deploy resources to deal with the pandemic. The contract was quietly extended in December, when the NHS and Palantir signed a £23 million ($34 million) two-year deal that allows the company to continue its work until December 2022. The NHS was sued by political website openDemocracy in February over the contract extension. ‘December’s new, two-year contract reaches far beyond Covid: to Brexit, general business planning and much more,’ the group said. The NHS contract allows Palantir to help manage the data lake, which contains everybody’s health data for pandemic purposes. ‘The reality is, sad to say, all this whiz-bang data integration didn’t stop the United Kingdom having one of the worst death tolls in the Western world,’ said [Foxglove co-founder Cori] Crider. ‘This kind of techno solutionism is not necessarily the best way of making an NHS sustainable for the long haul.’”

Not surprisingly, privacy is the advocacy groups’ main concern. The very personal data used in the project is not being truly anonymized—instead it is being “pseudo-anonymized,” a reversible process where an alias is swapped out for identifying information. Both the NHS and Palantir assure citizens re-identification will only be performed if necessary, and that the company has no interest in the patient data itself. In fact, we are told, that data remains the property of NHS and Palantir can only do so much with it. Those protesting the project, though, understand that money can talk louder than corporate promises; many companies would offer much for the opportunity to monetize that data.

Cynthia Murrell, June 30, 2021

Next Page »

  • Archives

  • Recent Posts

  • Meta