Secrets Patterns Database

February 15, 2023

One of my researchers called my attention to “Secrets Patterns Database.” For those interested in finding “secrets”, you may want to take a look. The data and scripts are available on GitHub… for now. Among its features are:

  • “Over 1600 regular expressions for detecting secrets, passwords, API keys, tokens, and more.
  • Format agnostic. A Single format that supports secret detection tools, including Trufflehog and Gitleaks.
  • Tested and reviewed Regular expressions.
  • Categorized by confidence levels of each pattern.
  • All regular expressions are tested against ReDos attacks.”

Links to the author’s Web site and LinkedIn profile appear in the GitHub notes.

Stephen E Arnold, February 20, 2023

Datasette: Useful Tool for Crime Analysts

February 15, 2023

If you want to explore data sets, you may want to take a look at the “open source multi-tool for exploring and publishing data.” The Datasette Swiss Army knife “is a tool for exploring and publishing data.”

The company says,

It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API. Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 42 tools and 110 plugins dedicated to making working with structured data as productive as possible.

A handful of demos are available. Worth a look.

Stephen E Arnold, February 15, 2023

Going after the Original Entitled Wizards of Wonder: Blue Chip Consultants

February 14, 2023

I read “The McKinseys and the Deloittes Have No Expertise in the Areas That They’re Advising In.” I think the wildly “that they’re advising in” would make some old-school editors uncomfortable. But grammar and usage aside, the Financial Times, the odd orange newspaper, has identified what might be called “The Once Emperor-Like Are Naked So Let’s Put Them on TikTok.” Well, not TikTok, but the blue chip consultants are in the spot light for a short time.

I noted this passage in an “interview” with the author of the book The Big Con, by Mariana Mazzucato and Rosie Collington:

The Big Con of the book’s title is not a crime; it’s a confidence trick. Consultancies and outsourcers, Mazzucato argues, know less than they claim, cost more than they seem to, and — over the long term — prevent the public sector developing in-house capabilities.

The article presents as “real” financial news something that most former employees of blue chip firms know: Get smart about nanotechnology. We have a client meeting at 9 am.” The sentence is emitted from a sleek partner at 6:15 pm on a Wednesday evening. Yep, that’s why professionals at blue chip consulting firms get paid reasonable money: Get smart fast, spout sentences which seem to be spontaneously helpful, and nod at the right places. The goal: Close a job and start billing and scope change and bill some more.

I liked this statement in the article:

“These are private companies, the McKinseys and the Deloittes, that have no expertise in the areas that they’re advising in.”

Accurate? Yep. Will blue chip consulting firms change? Nope. Will those who hire blue chip consulting firms change their ways? No.

But why?

We can answer the question next week by getting our consulting firm to lead a discussion with the government staff involved in determining next steps. Those next steps require defining a project, a statement of work, a procurement or an add on to an existing contract, and billing.

In short, a validation of the superior intellect of the blue chip firms and their wizards of wonder.

Stephen E Arnold, February 14, 2023

Is Your Doctor Good at Statistical Analysis? Sure, Sure, Not to Worry

February 14, 2023

I spotted an interesting story titled “FDA Has Now Cleared More Than 500 Healthcare AI Algorithms.” The write up states:

There are now more than 520 marker-cleared artificial intelligence (AI) medical algorithms available in the United States, according to the U.S. Food and Drug Administration (FDA) as of January 2023. The vast majority of these are related to medical imaging.

Missing are Google’s method for solving death and the IBM Watson cancer solutions.

Another factoid in the article is that smart software in non-clinical areas are blooming. The niches with action are:

Population health
Health tracking apps
Identifying and addressing gaps in health equity
Revenue cycle management streamlining
Hospital-wide monitoring for length of stay, bed turn over rates, early sepsis detection and readmissions
Data analytics for key performance indicators
Enabling better patient wellness and preventative care

The item I found intriguing is:

When reviewing AI products, the FDA’s Center for Devices and Radiological Health (CDRH) is considering a total product lifecycle-based regulatory framework for these technologies that would allow for modifications to be made from real-world learning and adaptation, while ensuring that the safety and effectiveness of the software as a medical device are maintained. Such a regulatory framework could enable the FDA and manufacturers to evaluate and monitor a software product from its premarket development to post-market performance. This approach could allow for the FDA’s regulatory oversight to embrace the iterative improvement power of AI and ML-based software as a medical device, while assuring patient safety.

The FDA does a superb job. It makes perfect sense that the government agency can embrace smart software. No problemo.

Stephen E Arnold, February 14, 2023

The Chinese Balloon: A Legacy of Loon?

February 14, 2023

Several of my monographs about the Google relied on text and link analysis of Google patent applications and patents. One of the patents was “Balloon Altitude Control Using Density Adjustment and or Volume Adjustment.” You may recall the Loon Balloon, a project to provide Internet access in locations where landlines or other delivery mechanisms were either not affordable, safe, or accessible due to an issue. (An “issue” could be a war, disease, or a populace with a leader who was un-Googley.)

What is the likelihood that China’s use of a Loon-like invention described in US9033274B2 coupled with some smart software has enabled the Middle Kingdom to enable global activities. Balloons can carry explosive devices, surveillance equipment, or electronics designed to screw up a range of electrical (wave) centric systems.

Probably just a random connection my brain generated. Probably nothing significant.

Stephen E Arnold, February 13, 2023

Summarize for a Living: Should You Consider a New Career?

February 13, 2023

In the pre-historic age of commercial databases, many people earned money by reading and summarizing articles, documents, monographs, and consultant reports. In order to prepare and fact check a 130 word summary of an article in the Harvard Business Review in 1980, the cost to the database publisher worked out to something like $17 to $25 per summary for what I would call general business information. (If you want more information about this number, write benkent2020@yahoo.com, and maybe you will get more color on the number.) Flash forward to the present, the cost for a human to summarize an article in the Harvard Business Review has increased. That’s why it is necessary to pay to access and print an abstract from a commercial online service. Even with yesterday’s technology, the costs are a killer. Now you know why software that eliminates the human, the editorial review, the fact checking, and the editorial policies which define what is indexed, why, and what terms are assigned is a joke to many of those in the search and retrieval game.

I mention this because if you are in the A&I business (abstracting and indexing), you may want to take a look at HunKimForks ChatGPT Arxiv Extension. The idea is that ChatGPT can generate an abstract which is certainly less fraught with cost and management hassles than running one of the commercial database content generation systems dependent on humans, some with degrees in chemistry, law, or medicine.

Are the summaries any good? For the last 40 years abstracts and summaries have been, in my opinion, degrading. Fact checking is out the window along with editorial policies, style guidelines, and considered discussion of index terms, classification codes, time handling and signifying, among other, useful knowledge attributes.

Three observations:

  1. Commercial database publishers may want to check out this early-days open source contribution
  2. Those engaged in abstracting, writing summaries of books, and generating distillations of turgid government documents (yep, blue chip consulting firms I an thinking of you) may want to think about their future
  3. Say “hello” to increasingly inaccurate outputs from smart software. Recursion and liquid algorithms are not into factual stuff.

Stephen E Arnold, February 13, 2023

Modern Research Integrity: Stunning Indeed

February 13, 2023

I read “The Rise and Fall of Peer Review.” The essay addresses what happens when a researcher submits a research paper to a research journal. Many “research” journals are owned by big professional publishing companies. If you are not familiar with that sector, think about a publishing club which markets to libraries and “research” institutions. No articles in “research” publications, no promotion. The method for determining accuracy is to ask experts to read submitted papers, make comments, and send a signal about value of the “research.” I served on the peer review panel for a year and quit. I am no academic, but I know doo doo when it is on my shoe.

Now I want to focus on one passage. Consider this statement:

Why don’t reviewers catch basic errors and blatant fraud? One reason is that they almost never look at the data behind the papers they review, which is exactly where the errors and fraud are most likely to be. In fact, most journals don’t require you to make your data public at all. You’re supposed to provide them “on request,” but most people don’t. That’s how we’ve ended up in sitcom-esque situations like ~20% of genetics papers having totally useless data because Excel autocorrected the names of genes into months and years. (When one editor started asking authors to add their raw data after they submitted a paper to his journal, half of them declined and retracted their submissions. This suggests, in the editor’s words, “a possibility that the raw data did not exist from the beginning.”)

Observations:

  1. There is exactly one commercial database which added corrections to its entries. Why? Accuracy is expensive and most publishers are not into corrections. I think the feature of that database has been in the trash heap for many, many years. The outfit which bought the database is not into excellence in anything but revenue and profit.
  2. I found it impossible to get access to [a] the author to whom I wanted to address a question directly; that is, on the telephone, or [b] to get the data on which the crazy statistical hoops were displayed. Hey, math is not the key differentiator for many researchers, getting tenure and grants are the prime movers. A peer reviewer with pointed questions? Sorry, no way.
  3. The professional publishers want to follow a process which shifts responsibility for publishing error-filled articles to the “procedure”, the peer reviewers, the editors, and probably the stray dog outside their headquarters. Everyone is responsible for mistakes except them.

Net net: Perhaps the notion of open source accuracy needs to be expanded beyond tweets and Facebook posts?

Stephen  E Arnold, February 14, 2023

Prabhakar in Paris: An Expensive Google Trip

February 13, 2023

Paris has good restaurants, and it has quite a few alert, well-educated people. So why did Google take the Prabhakar Smart Search Show to the City of Light? “Google Employees Criticize CEO Sundar Pichai for Rushed, Botched Announcement of GPT Competitor Bard” does not have an answer for me or for others either.

The write up states:

Staffers took to the popular internal forum Memegen [an in house Google thing] to express their thoughts on the Bard announcement, referring to it as “rushed,” “botched” and “un-Googley,” according to messages and memes viewed by CNBC.

But here’s the killer comment:

During Google’s Wednesday event, search boss Prabhakar Raghavan briefly shared some slides with examples of Bard’s capabilities. People tuning in expected to hear more, and some employees weren’t even aware of the event. One presenter forgot to bring a phone that was required for the demo. Meanwhile, people on Twitter began pointing out that an ad for Bard offered an incorrect description of a telescope used to take the first pictures of a planet outside our solar system.

Is Prabhakar the Red Skelton of smart software infused search? By the way, the turning point for Googzilla was the interaction between the company and Dr. Timnit Gebru. If you have not read the stochastic parrot, you may find it interesting.

Polly want Google management to be organized? Squawk:

Dear Sundar, the Bard launch and the layoffs were rushed, botched, and myopic…. [now make parrot sounds]

The next high school reunion for Sundar and Prabhakar will be interesting indeed.

Stephen E Arnold, February 13, 2023

Google Shows Its Smarts by Trimming Its Market Value

February 10, 2023

The title of this blog is Beyond Search. More than a decade ago, I wanted to have a place to put my observations about search and retrieval. Retirement was coming, and I was unable to put criticism of search baloney in the write ups I was paid to do. (Nope, I won’t name the publication.)

image

Art generated and probably owned by Craiyon, Dreamtime, Getty, Alamy, Shutterstock and any other outfit looking to make a buck surfing on legal water droplets. I sure did not create this picture.

That’s why I have not been going head over heels with the smart software revolution. I now point to articles that offer something I find either interesting, amusing, or certifiably whacky. Today, I want to call your attention to a statement I quite like which appeared in “Google Bard or Google Storyteller”. Here’s the quote:

The problem here isn’t just the mistake. It’s the fact that this mistake was highlighted as an example of what Google Bard could accomplish. Before releasing this information, there were likely many people involved at Google. None were competent enough to fact-check what they wanted to show the world. This is not only embarrassing, but it also casts many doubts about Google’s internal checks on its products and shows an astounding level of amateurism for one of the biggest companies in the world.

Do you recall the antics of Abbott and Costello or the Three Stooges? I wonder if this slip betwixt cup and lip is the first program of the 2023 season for the Sundar and Prabhakar Show, sponsored by Microsoft  and OpenAI where you just Bing it!

I can hear the announcer saying,

“With Jeff Dean, Marcus White, and special guest stars Larry Page and Sergey Brin. Here are Sundar and Prabhakar, who have just returned from a meeting at what’s left of Charlie’s Café where the talented duo were discussing smart search. We join Sundar and Prabhakar in the once glorious dining facility…”

What would the comedy script generated by Bard say? I don’t want to know because that loss in market value was a hoot appropriate for a thunder lizard with a broken leg in the snow.

Stephen E Arnold, February 10, 2023

Google Is Busy: What about YouTube Filtering?

February 10, 2023

We noted a reference to a video produced by Wendover Productions. I know zero about the outfit’s videos, but one caught my eye. The video is “How to Illegally Cro0ss the Mexico-US Border.” The video runs 14 minutes and has been online for more than a year.  The company describes its “aboutness” this way:

Wendover Productions is all about explaining how our world works. From travel, to economics, to geography, to marketing and more, every video will leave you with a little better understanding of our world. New videos go out every other Tuesday.

The video does not strike me as particularly helpful. But there are some interesting factoids:

  1. Barriers protect about one-third of the border
  2. Walk across is possible in certain locations. Maybe Brownsville, Texas
  3. Cross in remote areas but a walk of 20 to 30 miles may be necessary
  4. Humanitarian groups have set up water stations in certain area

New methods of dealing with certain border infractions are in place and some like the Anduril tower and drone method seem to have some value. The situation, of course, determines what steps are taken and what methods are employed by authorized officials.

I wanted to highlight this video as I have those which provide information about to obtain and hack commercial software.

My question is, “Has Google been sufficiently distracted by bonuses, possible revenue shrinkage, and Code Red cartwheels to have ignored what appears to be information facilitating illegal activity.

Another possibility is that Google’s method of identifying certain types of content like stealing software and entering the US illegally is of little interest to Googlers or its smart software.

A related question is, “What will slip through the gaps in the Bard system?”

Stephen E Arnold, February 10, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta