Facial Recognition: A Partial List

June 3, 2020

DarkCyber noted “From RealPlayer to Toshiba, Tech Companies Cash in on the Facial Recognition Gold Rush.” The write up provides two interesting things and one idea which is like a truck tire retread.

First, the write up points out that facial recognition or FR is a “gold rush.” That’s a comparison which eluded the DarkCyber research team. There’s no land. No seller of heavy duty pants. No beautiful scenery. No wading in cold water. No hydro mining. Come to think of it, FR is not like a gold rush.

Second, the write up provides a partial list of outfits engaged in facial recognition. The word partial is important. There are some notable omissions, but 45 is an impressive number. That’s the point. Just 45?

The aspect of the write the DarkCyber team ignored is this “from the MBA classroom” observation:

Despite hundreds of vendors currently selling facial recognition technology across the United States, there is no single government body registering the technology’s rollout, nor is there a public-facing list of such companies working with law enforcement. To document which companies are selling such technology today, the best resource the public has is a governmental agency called the National Institute of Standards and Technology.

Governments are doing a wonderful job it seems. Perhaps the European Union should step forward? What about Brazil? China? Russia? The United Nations? With Covid threats apparently declining, maybe the World Health Organization? Yep, governments.

Then, after wanting a central listing of FR vendors, this passage snagged one of my researcher’s attention:

NIST is a government organization responsible for setting scientific measurement standards and testing novel technology. As a public service, NIST also provides a rolling analysis of facial recognition algorithms, which evaluates the accuracy and speed of a vendor’s algorithms. Recently, that analysis has also included aspects of facial recognition field like algorithmic bias based on race, age, and sex. NIST has previously found evidence of bias in a majority of algorithms studied.

Yep, NIST. The group has done an outstanding job for enterprise search. Plus the bias in algorithms has been documented and run through the math grinding wheel for many years. Put in snaps of bad actors and the FR system does indeed learn to match one digital watermark with a similar digital watermark. Run kindergarten snaps through the system and FR matches are essentially useless. Bias? Sure enough.

Consider these ideas:

  • An organization, maybe Medium, should build a database of FR companies
  • An organization, maybe Medium, should test each of the FR systems using available datasets or better yet building a training set
  • An organization, maybe Medium, should set up a separate public policy blog to track government organizations which are not doing the job to Medium’s standards.

There is an interest in facial recognition because there is a need to figure out who is who. There are some civil disturbances underway in a certain high profile country. FR systems may not be perfect, but they may offer a useful tool to some. On the other hand, why not abandon modern tools until they are perfect.

We live in an era of good enough, and that’s what is available.

Stephen E Arnold, June 3, 2020

Google to Australia: What! Us Pay You? Take a Walkabout, Mates

June 2, 2020

This will be interesting. Google has found the Australian request for money to save “real news” unacceptable. The information, if accurate, appears in “Google Rejects Call For Huge Australian Media Payout.” DarkCyber learned:

Google has rejected demands it pay hundreds of millions of dollars per year in compensation to Australian news media under a government-imposed revenue sharing deal.

What’s interesting is that Google, working overtime to control its costs of being the Google, said:

The company’s top executive in Australia said Google made barely Aus$10 million (US$6.7 million) per year from news-linked advertising, a fraction of a government watchdog’s estimates for the sector.

Will that explanation fly in Canberra (yep, DarkCyber know there was an aircraft with the moniker Canberra, but did you know that word may mean “meeting place?). Unfortunately the meeting place for Google and the government of Australia is likely to be in Oodnadatta in the summer.

The Ms. Silva, chief Googler in Australia:

also denied ACCC arguments that the tech firms gain significant “indirect benefits” from displaying news since the content draws users to their platforms. News “represents only a tiny number of queries” on Google, accounting last year for barely one percent of actions on Google Search in Australia, she said.

After 22 years of almost zero government initiative in regulating or using legislative mechanisms to deal with Google, Australia is moving forward to protect news. The effort will be interesting to watch. Unfortunately companies are likely to have more sticktoativity than some government professionals. What happens if Google hires some of the attorneys pushing the anti Google activity?

Stephen E Arnold, June 2, 2020

Google and Page Experience

June 2, 2020

Just a short item, a question in reality: What’s a “page experience”? I understand a Web page. This is the digital equivalent of a note card or a sheet of paper. I understand experience; for example, a tumble down a flight of steps is an experience.

But “page experience”? Fuzzy, weird word pairing like those old school America Online word pairs.

Google Will Factor Page Experience into Search Rankings” explains that the phrase means:

page experience will measure how users perceive the experience of interacting with a web page. To determine page experience, Google will consider Core Web Vitals, metrics that measure user experience, as well as existing signals, like mobile-friendliness, safe-browsing and HTTPS-security.

Interesting. The “secret” Google ranking algorithm gets another batch of signals to use to determine relevance. Does “relevance” mean which and how many ads can be matched to information retrieved from one of Google’s universal search indexes?

The article reports:

Google says it will still prioritize the best information…

For decades methods like precision and recall, Boolean logic, and controlled vocabularies provided mechanisms for matching indexed information to a user query.

Most people have zero idea how Google determines what information to display. Does it matter? To most people online information is accurate. Jibber jabber like “page experience” is a phrase with a hefty payload of content free suggestivity.

Advertising revenue is the point of the exercise, isn’t it? Perhaps there is a correlation between Amazon’s and Facebook’s growing online advertising businesses and Google verbiage?

Using Google in a quest to find relevant information is an experience in itself.

Stephen E Arnold, June 2, 2020

Facebook: A Too Clever Ninja Move?

June 2, 2020

Facebook has some ninja DNA. “Mark Zuckerberg’s Ridiculously Wrong, Misleading, And Self-Serving Statements Regarding Twitter Fact-Checking The President” explains that Facebook is ducking the censorship dust up. The write up states:

Sure, they [this plural means Facebook] have a different policy, because almost all sites have different policies, but if you compared Facebook’s policies on content moderation to Twitter’s you’d find that Facebook does vastly more moderation than Twitter has ever done and Facebook introduced similar “fact checking” efforts years ago. To pretend that Facebook doesn’t do the exact same thing that Twitter is accused of doing here is just ridiculous. And, we all agree that no platform should be “the arbiter of truth” but that’s not the same as saying “do no moderation” (and again, Facebook does a ton of moderation). As for the final claim that Facebook is “hands off” when it comes to political speech, that’s also false. Facebook is hands off on political ads, but not all political speech. And so is Twitter, in that it bars all political ads in the first place.

Pretty close to the pin. However, a question arises, “Why is Mr. Zuckerberg taking this position?” Possible reasons include:

  • Facebook has data which suggests that making friends with Mr. Trump is a good idea. Antagonizing the president is, therefore, not a good idea. Mr. Zuckerberg is acting in his own best interests.
  • Facebook’s leader believes that Facebook is indeed different, possibly superior to the companies which are trying to gain traction in the digital world he has crafted. Thus, the statements are a reflection of the “truth” as perceived by Mr. Zuckerberg.
  • Facebook is not really doing censorship, filtering, or any of the actions cooked up in response to what DarkCyber thinks of as “the Cambridge Analytica incident.” Talk, handwaving, hiring people, paying for psychological counseling are just handwaving.

Other reasons are like to exist. But DarkCyber is content with pointing out that with a couple of public statements, Mr. Zuckerberg has distanced himself and Facebook from the Twitter conflagration. Mr. Zuckerberg is likely to join Mr. Thiel as a go to resource for the White House. Plus, Mr. Zuckerberg is his warm, charming manner is saying, “Zuck you, Twitter.”

Stephen E Arnold, June 2, 2020

The Good Old Internet Archive Attracts Some Legal Eagle Action

June 2, 2020

Who really owns the Internet Archive? Does the Internet Archive still bundle up tweets and provide them to the august Library of Congress? Is that the caterpillar tracks of the Bezos bulldozer in the roadway near the Internet Archives headquarters? DarkCyber does not know the answers to these questions.

What is clear is that the Association of American Publishers (yes, there are still American publishers) is not happy with the Internet Archive. “Publishers File Suit Against Internet Archive for Systematic Mass Scanning and Distribution of Literary Works. Ask Court to Enjoin and Deter Willful Infringement” is reasonably well written, probably because there is a Vassar literature major in the PR chain. The write up states:

… member companies of the Association of American Publishers (AAP) filed a copyright infringement lawsuit against Internet Archive (“IA”) in the United States District Court for the Southern District of New York. The suit asks the Court to enjoin IA’s mass scanning, public display, and distribution of entire literary works, which it offers to the public at large through global-facing businesses coined “Open Library” and “National Emergency Library,” accessible at both openlibrary.org and archive.org. IA has brazenly reproduced some 1.3 million bootleg scans of print books, including recent works, commercial fiction and non-fiction, thrillers, and children’s books.

The AAP, like DarkCyber, finds the self aggrandizing, virtue signaling about pandemics, unemployment, and other contentious social issues tiresome. Amazon and Google are busy waving their hands after recent social turmoil. That helps if one is seeking clicks and some positive corporate CxO stroking.

The AAP’s statement continues:

Despite the self-serving library branding of its operations, IA’s conduct bears little resemblance to the trusted role that thousands of American libraries play within their communities and as participants in the lawful copyright marketplace. IA scans books from cover to cover, posts complete digital files to its website, and solicits users to access them for free by signing up for Internet Archive Accounts. The sheer scale of IA’s infringement described in the complaint—and its stated objective to enlarge its illegal trove with abandon—appear to make it one of the largest known book pirate sites in the world. IA publicly reports millions of dollars in revenue each year, including financial schemes that support its infringement design.

You can read the AAP statement via the link above.

At a time when civility is in short supply, the AAP approaches its legal foe this way:

The lawsuit reflects widespread anger among publishers, authors, and the entire creative community regarding IA’s actions and its response to objections. In an open letter to IA and its Board of Directors, the Authors Guild observed,  “You cloak your illegal scanning and distribution of books behind the pretense of magnanimously giving people access to them. But giving away what is not yours is simply stealing, and there is nothing magnanimous about that. Authors and publishers—the rights owners who legally can give their books away—are already working to provide electronic access to books to libraries and the people who need them. We do not need Internet Archive to give our works away for us.”

Yikes. The news release should have carried a trigger warning. Where is that Vassar-powered red pencil when one needs it? After the Google-centric headline, why not add “This content may be disturbing.”

Will publishers succeed in this effort? The flapping of the legal eagles over the Google Books’ project is less noisy than it was. (When will those angry with Google realize that projects die at Google because staff lose interest?)

Internet Archive may be different. One bright spot: The search and retrieval mechanism for Internet Archive content is darned interesting. Try to find a content object. Great stuff. When a content object cannot be found, does it exist?

The lawsuit is unlikely to consider this question.

Stephen E Arnold, June 2, 2020

List of Online Libraries

June 2, 2020

One of the DarkCyber researchers spotted a list of online libraries. The source is an unlikely one: Voat and a contributor named Auchtung. The links point to Web sites which provide access to collections. The “collections” are often duplicates; that is, there is redundancy in the list. DarkCyber believes that if one library is taken down, another one can be located. If you are curious about lists of books offered without charge, navigate to this link. Registration may be required. Also, it is possible that some of the information offered on these Web pages is protected by copyright. Just a heads up.

Stephen E Arnold, June 2, 2020

AI: Empty Calories Fuel Smart Software

June 1, 2020

Fast food is a wonder for some people. Zip into McDo, grab two Big Macs, fries, and a chocolate shake. (Is there milk in those?) Bang, zoom, hit the meeting room. After a few years of this accelerated approach to fine dining, the consequences are becoming evident.

The artificial intelligence sector is the digital equivalent of fast food. Limited menu of algorithms, a dash of big data, and massive advertising and marketing hype.

image

The hype calories would amaze Jenny Craig, a vendor of healthy and delicious meals. Smart software that can recognize faces — maybe half the faces on a good day. Smart software kicking a Go master in the knee was thrilling. Career ending, sure. But it’s an AI victory. Plus, there’s the virtue signaling about AI’s contributions to beating Covid 19. How is that working out? Oh, right. More time, more data needed, more money, and more marketing, Zoom presentations, and tweets. Yes, tweets.

Fast food marketing for smart software has been easy and fun to gobble down. VCs love this stuff. But heart burn? Yes, heart burn. Why?

The fact that artificial intelligence has not made that much progress if the information in “Eye-Catching Advances in Some AI fields Are Not Real.”

The write up states:

But some of the improvement comes from tweaks rather than the core innovations their inventors claim—and some of the gains may not exist at all, says Davis Blalock, a computer science graduate student at the Massachusetts Institute of Technology (MIT). Blalock and his colleagues compared dozens of approaches to improving neural networks—software architectures that loosely mimic the brain. “Fifty papers in,” he says, “it became clear that it wasn’t obvious what the state of the art even was.”

The report, if accurate from MIT, yep, the outfit that took money from everyone’s favorite influencer Jeffrey Epstein, says:

The researchers evaluated 81 pruning algorithms, programs that make neural networks more efficient by trimming unneeded connections. All claimed superiority in slightly different ways. But they were rarely compared properly—and when the researchers tried to evaluate them side by side, there was no clear evidence of performance improvements over a 10-year period.

And the extra side of fries? Check this statement:

Researchers are waking up to the signs of shaky progress across many subfields of AI. A 2019 meta-analysis of information retrieval algorithms used in search engines concluded the “high-water mark … was actually set in 2009.” Another study in 2019 reproduced seven neural network recommendation systems, of the kind used by media streaming services. It found that six failed to outperform much simpler, nonneural algorithms developed years before, when the earlier techniques were fine-tuned, revealing “phantom progress” in the field. In another paper posted on arXiv in March, Kevin Musgrave, a computer scientist at Cornell University, took a look at loss functions, the part of an algorithm that mathematically specifies its objective. Musgrave compared a dozen of them on equal footing, in a task involving image retrieval, and found that, contrary to their developers’ claims, accuracy had not improved since 2006. “There’s always been these waves of hype,” Musgrave says.

Oh, oh. Science Magazine’s write up reports:

Guttag says there’s also a disincentive for inventors of an algorithm to thoroughly compare its performance with others—only to find that their breakthrough is not what they thought it was. “There’s a risk to comparing too carefully.” It’s also hard work: AI researchers use different data sets, tuning methods, performance metrics, and baselines. “It’s just not really feasible to do all the apples-to-apples comparisons.”

The conclusion to the write up is a truism:

Researchers point out that even if new methods aren’t fundamentally better than old ones, the tweaks they implement can be applied to their forebears. And every once in a while, a new algorithm will be an actual breakthrough. “It’s almost like a venture capital portfolio,” Blalock says, “where some of the businesses are not really working, but some are working spectacularly well.”

When progress is slow but the market is hungry, what does an AI expert do? McDonald’s is investing about $200 million to boost marketing. Sounds like a plan. The Big Mac is more than a burger.

Stephen E Arnold, June 1, 2020

Is Cyber Crime Boring? Maybe The Characterization Masks a Painful Consequence?

June 1, 2020

DarkCyber read “Career Choice Tip: Cybercrime is Mostly Boring.” The article is clear. The experts cited are thorough and thoughtful. Practicing cyber crime is similar to what engineers, developers, and programmers do in the course of their work for firms worldwide. Much of that work is boring, filled with management friction, and repetitive.

The article states:

the academics stress that the romantic notions of those involved in cybercrime ignore the often mundane, rote aspects of the work that needs to be done to support online illicit economies. The researchers concluded that for many people involved, cybercrime amounts to little more than a boring office job sustaining the infrastructure on which these global markets rely, work that is little different in character from the activity of legitimate system administrators.

Exactly.

The paper is quoted in the article as explaining:

We find that as cybercrime has developed into industrialized illicit economies, so too have a range of tedious supportive forms of labor proliferated, much as in mainstream industrialized economies. We argue that cybercrime economies in advanced states of growth have begun to create their own tedious, low-fulfillment jobs, becoming less about charismatic transgression and deviant identity, and more about stability and the management and diffusion of risk. Those who take part in them, the research literature suggests, may well be initially attracted by exciting media portrayals of hackers and technological deviance.”

The DarkCyber study team discussed the Cambridge research summary and formulated some observations:

  1. Boring means that cyber crime will be automated. Automated processes will be tuned to be more efficient. Greater efficiency translates to the benefit the cyber criminals seek. Thus, the forward momentum of boring cyber crime is an increase in the volume and velocity of attacks.
  2. Certain criminal elements are hiring out of work or disgruntled technologist from mainstream companies, including high-profile Silicon Valley companies. Our research identified one criminal organization paying 90,000 euros per month and offering benefits to contract workers with specialized skills. The economic pressures translates to a talent pool available to certain criminal orchestrators. More talent feeds the engineering resources available to cyber crime constructs. DarkCyber believes a “Google effect” is beginning, just in the cyber crime market space.
  3. Law enforcement, government agencies, and some providers of specialized services to law enforcement and intelligence entities will be unable to hire at the rate criminal constructs hire. Asymmetry will increase with bad actors having an opportunity to outpace enforcement and detection activities.

Net net: The task facing law enforcement, security, and intelligence professionals is becoming more difficult. Cyber crime may be boring, but boring tasks fuel innovation. With access to talent and cash, there is a widening chasm. Talking about boring does not make clear the internal forces pushing cyber crime forward.

Stephen E Arnold, June 1, 2020

Criticism of Zuck: Empowering the Digital Ninja

June 1, 2020

Update: Mark Zuckerberg’s kimura arm lock is weakening. For details about a surprising pushback from Facebook professionals, navigate to the San Jose Mercury News’s story “Facebook employees plan virtual walkout over Trump treatment
Some workers disagree with Zuckerberg’s hands-off policy to president’s posts.” In a battle of power, which entity has more oomph: The Zuck or the proles? Worth watching this play off game. June 1, 2020, 3 pm US Eastern

DarkCyber found the article in Forbes Magazine (the capitalist’s tool, a phrase endorsed by the motorcycling publisher) adds juice to the Zuck. “Legal Organization Condemns Facebook, Zuckerberg For Not Condemning Trump” reports that Lawyers’ Committee for Civil Rights Under Law is not happy with the Facebook founder’s approach to tweets from Donald J. Trump. The Forbes’ article states:

While a war of words and threats continues between Twitter and the President, the Lawyers’ Committee bemoaned Trump’s tweets on the killing of George Floyd by Minnesota police officers. In those posts, the President referred to protesters as “thugs” and threatened to use military force to subdue the violent and destructive protests.

Factual? Yes. However, the article is likely to add juice to the digital ninja’s relationship with the current administration. The business story is that Facebook may be moving in a direction different from that followed by other high-technology companies. Which ones? Maybe the Google? Maybe proud creators of Catalina? Facebook has some useful data to inform certain tactical actions of the Zuck.

Stephen E Arnold, May 31, 2020

Now These Are Numbers You Can Bank On

June 1, 2020

In the midst of the pandemic, DarkCyber noted “How Semantic Search Helps Users Help Themselves.” The write up is from Lucidworks, a company reselling open source engineering support, proprietary software, and other jazzed up solutions. In the write up was a reference to an IBM document. The idea is that the IBM data make a case for buying IBM? Of course not. The data support the contention that semantic search is like training wheels on a toddler’s bicycle.

What are these magical data? First, the data come from an IBM blog post dated October 17, 2017. That’s a couple of years ago. Change does happen, doesn’t it?

Check out these numbers:

  • Businesses spend $1.3 trillion on 265 billion customer service calls each year
  • Phone interactions cost around $35-$50
  • Text chat costs about $8-$10 per session
  • It is realistic to aim to deflect between 40% – 80% of common customer service inquiries to automated frameworks.
  • A drop in per-query cost from $15-$200 (human agents) to $1 (virtual agents)

What’s the connection to the SOLR centric Lucidworks? The company wants to convince prospects that it has the solution known as chatbots. Clever phrase for what is a cost reduction play. Do chatbots work? That depends on whom one asks.

The good thing about chatbots is that they don’t create Rona hot spots. The bad thing is that most of the chatbots don’t work particularly well.

The IBM data, even though old and not in step with the Rona business climate, suggest that the on going cost of helping a “customer” deal with a product and service is brutal. Combine these here and now costs with the technical debt of informationized products and services and what do you get?

The short answer is that one has to have quite a bit of money to keep the good ship technology afloat.

Even Google-type companies, faced with sky rocketing costs and a dicey economic environment, are having to make money saving changes.

Net net: The happy talk about super duper technologies often creates cost black holes. What about IBM? Layoffs and ultra hedgey forecasts. What about Lucidworks type outfits? Wow. Much sales work ahead.

One suggestion? Watch those assertions and one’s cost accounting. Can one “help oneself”? Absolutely, maybe.

Stephen E Arnold, June 1, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta