Elasticsearch Guide: More of a Cheat Sheet

March 15, 2023

Elasticsearch has been a go-to solution for searching content either via the open source version or the Elastic technical support option. The system works, and it has many followers and enthusiasts. As a result, one can locate “help” easily online for many hitches in the git along.

I found the information in “Unlocking the Power of Elasticsearch: A Comprehensive Guide to Complex Search Use Cases.” I would suggest that the write up is more like a cheat sheet. Encounter a specific task, check the “Guide,” and sally forth.

I would suggest that many real-life enterprise search needs are often difficult to solve. Examples range from capturing data on a sales professional’s laptop before the colleague deletes the slide dek with the revised price quotation data. No search engine on the planet can get this important information to the legal department if the project goes off the rails. “I can’t find it” is not a helpful answer.

Similar challenges arise when the Elasticsearch system must interact with a line item for a product specified in a purchase order which has a corresponding engineering drawing. Line up the chemical, civil, mechanical, and nuclear engineers and tell them, “Well, that’s an object embedded in the what-do-you-call-it software I never heard of.” Yeah.

Nevertheless, for some helpful tips give the free guide a look.

The mantra is, “Search is easy. Search is a solved problem. Search is no big deal.” Convince yourself. Keep in mind that the mantra does not ring true to me nor does it make me calm.

Stephen E Arnold, March 15, 2023

Interesting Critique of the Google

March 14, 2023

I know there are other browsers available. For many people Google Chrome is THE browser. Microsoft figured out that Credge was cheaper and probably less likely to be zapped by the Google. Vivaldi is a browser working to attract users and provide a less money-centric software cocoon for online users. It too uses the Chromium engine.

I read “Vivaldi Co-Founder: Advertisers Stole the Internet from Us.” The article is mostly content marketing; nevertheless, I noted a handful of assertions and factoids I found thought provoking.

Here are a few. My observation about the comment appears in italics.

… part of the issue companies like Google may have is that Vivaldi blocks a lot of tracking and gets around advertisements in novel ways. No surprise I believe.

Android’s Privacy Sandbox can track users by creating an offline profile on them and show relevant advertisements based on that. No surprise I believe. Google dies without ad revenue.

… data can be used to influence how people vote, à la Cambridge Analytica. No surprise. Control the information, gain power.

the current state of advertising is less profitable for sites now than it was before widespread tracking was in place. No surprise but Google benefits because it “owns” the rights to charge people to enter and leave Club Ad via its swinging door.

The situation is clear: A small company faces a long slog up Mt. Everest without cold weather gear. Does the government of Nepal care? Nope.

Stephen E Arnold, March 14, 2023

If Google Is Online Advertising, Why Does Malvertising Thrive?

March 14, 2023

I think this question struck me after reading a few paragraphs of “Malvertising on Google Ads: It’s Hiding in Plain Site.” The essay is designed to cause a reader to embrace the commerce malware service provided by Kolide. How do I know? Here’s the statement that tipped me off:

Want to see how Kolide can get your entire fleet updated, patched and compliant? Watch Kolide’s on-demand demo today.

Despite the content marketing sway in the article, I noted an interesting comment about Google. After citing a Googley statement about the online ad giant’s good intentions and methods for dealing with malware, the write up says:

Unfortunately, the search engine does not provide a definition nor examples of what falls under “egregious violations.” And given how easy it is for bad actors to simply make a new account when a new one is shut down, this approach doesn’t meet the requirements for reliability or scalability. Still, when you look at things from Google’s perspective, these policies make sense.

In my opinion, Google happily delivers malvertising because Google sells advertising. The company does not want to harm its revenue. Just as the pop ads running on top of YouTube videos, Google is not losing revenue. The company says, “No more overlays in a few months.” Why? Is it because Google will introduce Amazon-Twitch style unskippable ads, insert more unskippable commercials in videos, and add more end-of-video ads? Absolutely. Google is not going to give up revenue in my opinion.

Shifting the responsibility for identifying and remediating issues with Google ad-delivered malware is good for cyber security companies and super good for Google. My view is that we have one more example of specious behavior from a company unable to get its ethical compass focused on any direction but its revenue.

Stephen  E Arnold, March 13, 2023

Synthetic Data: Yes, They Are a Thing

March 13, 2023

“Real” data — that is, data generated by humans — are expensive to capture, normalize, and manipulate. But, those “real” data are important. Unfortunately some companies have sucked up real data and integrated those items into products and services. Now regulators are awakening from a decades-long slumber and taking a look into the actions of certain data companies. More importantly, a few big data outfits are aware of the [a] the costs and [b] the risks of real data.

Enter synthetic data.

If you are unfamiliar with the idea, navigate to “What is Synthetic Data? The Good, the Bad, and the Ugly.” The article states:

The privacy engineering community can help practitioners and stakeholders identify the use cases where synthetic data can be used safely, perhaps even in a semi-automated way. At the very least, the research community can provide actionable guidelines to understand the distributions, types of data, tasks, etc. where we could achieve reasonable privacy-utility tradeoffs via synthetic data produced by generative models.

Helpful, correct?

The article does not point out two things which I find of interest.

First, the amount of money a company can earn by operating efficient synthetic data factories is likely to be substantial. Like other digital products, the upside can be profitable and give the “owner” of the synthetic data market and IBM-type of old-school lock in.

Second, synthetic data can be weaponized either intentionally via data poisoning or algorithm shaping.

I just wanted to point out that a useful essay does not explore what may be two important attributes of synthetic data. Will regulators rise to the occasion? Unlikely.

Stephen E Arnold, March 13, 2023

The Confluence: Big Tech, Lobbyists, and the US Government

March 13, 2023

I read “Biden Admin’s Cloud Security Problem: It Could Take Down the Internet Like a Stack of Dominos.” I was thinking that the take down might be more like the collapses of outfits like Silicon Valley Bank.

I noted this statement about the US government, which is

embarking on the nation’s first comprehensive plan to regulate the security practices of cloud providers like Amazon, Microsoft, Google and Oracle, whose servers provide data storage and computing power for customers ranging from mom-and-pop businesses to the Pentagon and CIA.

Several observations:

  1. Lobbyists have worked to make it easy for cloud providers and big technology companies to generate revenue is an unregulated environment.
  2. Government officials have responded with inaction and spins through the revolving door. A regulator or elected official today becomes tomorrow’s technology decision maker and then back again.
  3. The companies themselves have figured out how to use their money and armies of attorneys to do what is best for the companies paying them.

What’s the consequence? Wonderful wordsmithing is one consequence. The problem is that now there are Mauna Loas burbling in different places.

Three of them are evident: The fragility of Silicon Valley approach to innovation. That’s reactive and imitative at this time. The second issue is the complexity of the three body problem resulting from lobbyists, government methods, and monopolistic behaviors. Commercial enterprises have become familiar with the practice of putting their thumbs on the scale. Who will notice?

What will happen? The possible answers are not comforting. Waving a magic wand and changing what are now institutional behaviors established over decades of handcrafting will be difficult.

I touch on a few of the consequences in an upcoming lecture for the attendees at the 2023 National Cyber Crime Conference.

Stephen E Arnold, March 13, 2023

Is It Groundhog Day? Googzilla Chases Its Tail

March 10, 2023

In the buzz of Code Red, Google has a management fix for the damage caused by Microsoft’s ChatGPT marketing attack. “Google Dusts Off the Failed Google+ Playbook to Fight ChatGPT” states:

Google’s ChatGPT panic seemed a lot like its response to Google+, and several employees relayed that same sentiment to Bloomberg. Just like with G+, the report added that “current and former employees say at least some Googlers’ ratings and reviews will likely be influenced by their ability to integrate generative AI into their work.”

Google+ (try and search that, Google search fans). Does Google Plus work? How about a combo of “Google+ Plus Orkut” as a query?

The write up passes along a quote by an unnamed Google wizard:

“We’re throwing spaghetti at the wall, but it’s not even close to what’s needed to transform the company and be competitive.”

My take on this reference to Google+ or Google Plus is:

1. The sources for this story are not Googley and, therefore, cannot appreciate the management brilliance

2. The Google is out of ideas; that is, the Code Red thing and idea that it will be smart software everywhere is a knee jerk reaction

3. Googzilla is chasing its tail; that is, senior management has not idea what to do and hits upon this idea, “Google+ or Plus was a success. Let’s do that again.”

Net net: Is it groundhog day at the Googleplex? Next question: What confidence does one have in groundhogs?

Stephen E Arnold, March 10, 2023

DarkTrace: A Cyber Security Star Makes an Analyst Bayes at the Moon

March 10, 2023

DarkTrace is a cyber security firm which used Sir Thomas Bayes’s math to thwart bad actors. “Fresh Clouds for Darktrace as New York Hedge Fund Claims Concerns Borne Out” states:

Quintessential Capital Management, which previously expressed its “fear that sales, margins, and growth rates may be overstated” today said: “Darktrace’s recent financial results are consistent with our thesis: growth, new customers, cash generation and profits are all shrinking fast.

Bayes works for some types of predictive applications. I think the disconnect between the technical methods of DarkTrace and the skeptical venture firm may be related to the distance between what smart software can do and what marketers say the smart software does. In that space are perched investors, stakeholders, employees, and customers.

What has caused a market downturn? The article says that it may be a consequence of ChatGPT? Here’s a statement I noted:

The cybersecurity business said ChatGPT “ may have helped increase the sophistication of phishing emails, enabling adversaries to create more targeted, personalized, and ultimately, successful attacks.” “Darktrace has found that while the number of email attacks across its own customer base remained steady since ChatGPT’s release, those that rely on tricking victims into clicking malicious links have declined, while linguistic complexity, including text volume, punctuation, and sentence length among others, have increased, the firm said.

Is this a case of DarkTrace’s smart software being outfoxed by smarter software? I still believe the marketers bear the responsibility. Knowing exactly how DarkTrace works and the specific results the system can deliver is important. Marketers rarely share my bias. Now the claims of the collateral writers are insufficiently robust to support the skepticism of tweeting analysts at Quintessential Capital Management.

Stephen E Arnold, March 10. 2023

Is Intelware Square Dancing in Israel?

March 10, 2023

It is a hoe down. Allemande Left. Do Si Do. Circle Left.  Now Promenade. I can hear the tune in “NSO Group Co-Founder Emerges As New Majority Owner.” My toe was tapping when I read:

Omri Lavie – the “O” in NSO Group … appears to have emerged as the company’s new majority owner. Luxembourg filings show that Lavie’s investment firm, Dufresne Holding, is – for now – the sole owner of a Luxembourg-based holding company that ultimately owns NSO Group.

What’s the company’s technology enable? The Guardian says:

Pegasus can hack into any phone without leaving an obvious trace, enabling users to gain access to a person’s encrypted calls and chats, photographs, emails, and any other information held on a phone. It can also be used to turn a phone into a remote listening device by controlling its recorder.

Is the Guardian certain that this statement embraces the scope of the NSO Group’s capabilities? I don’t know. But the real newspaper sounds sure that it has its facts lined up.

Was the transition smooth? Well, there may have been some choppy water as the new owner boarded. The article reports:

[The] move follows in the wake of multiple legal fights between NSO and a US-based financial company that is now known as Treo, which controls the equity fund that owns a majority stake in NSO. A person familiar with the matter said Treo had been alerted to the change in ownership of the company’s shares in a recent letter by Lavie, which appears to have caught the financial group by surprise. The person said Treo was still trying to figure out the financial mechanism that Lavie had used to assume control of the shares, but that it believed the company’s financial lenders had, in effect, ceded control of the group to the Israeli founder.

I find it interesting when the milieu of intelligence professionals intersects with go-go money people. Is Treo surprised.

Allemande Right. Do Si Do. Promenade home.

Stephen E Arnold, March 10, 2023

Bing Begins, Dear Sundar and Prabhakar

March 9, 2023

Note: Note written by an artificial intelligence wonder system. The essay is the work of a certified dinobaby, a near80-year-old fossil. The Purple Prose parts are made up comments by me, the dinobaby, to help improve the meaning behind the words.

I think the World War 2 Dear John letter has been updated. Today’s version begins:

Dear Sundar and Prabhakar…

The New Bing and Edge – Progress from Our First Month” by Yusuf Mehdi explains that Bing has fallen in love with marketing. The old “we are so like one another, Sundar and Prabhakar” is now

“The magnetic Ms. OpenAI introduced me to her young son, ChatGPT. I am now going steady with that large language model. What a block of data! And I hope, Sundar and Prabhakar, we can still be friends. We can still chat, maybe at the high school reunion? Everyone will be there. Everyone. Timnit Gebru, Jerome Pesenti, Yan Lecun, Emily Bender, and you two, of course.”

The write up does not explicitly say these words. Here’s the actual verbiage from the marketing outfit also engaged in unpatchable security issues:

It’s hard to believe it’s been just over a month since we released the new AI-powered Bing and Edge to the world as your copilot for the web.  In that time, we have heard your feedback, learned a lot, and shipped a number of improvements.  We are delighted by the virtuous cycle of feedback and iteration that is driving strong Bing improvements and usage. 

A couple of questions? Is the word virtuous related to the word virgin? Pure, chaste, unsullied, and not corrupted by … advertising? Has it been a mere 30 days since Sundar and Prabhakar entered the world of Code Red? Were they surprised that their Paris comedy act drove attendees to Le Bar Bing? Is the copilot for the Web ready to strafe the digital world with Bing blasts?

Let’s look at what the love letter reports:

  • A million new users. What’s the Google pulled in with their change in the curse word policy for YouTube?
  • More searches on Le Bing than before the tryst with ChatGPT. Will Google address relevance ranking of bogus ads for a Thai restaurant favored by a certain humanoid influencer?
  • A mobile app. Sundar and Prabhakar, what’s happening with your mobile push? Hasn’t revenue from the Play store declined in the last year? Declined? Yep. As in down, down, down.

Is Bing a wonder working relevance engine? No way.

Is Bing going to dominate my world of search of retrieval? For the answer, just call 1 800 YOU WISH, please.

Is Bing winning the marketing battle for smarter search? Oh, yeah.

Well, Sundar and Prabhakar, don’t let that Code Red flashing light disturb your sleep. Love and kisses, Yusuf Mehdi. PS: The high school reunion is coming up. Maybe we can ChatGPT?

Stephen E Arnold, March 9, 2023

Hybrid Search: A Gentle Way of Saying “One Size Fits All” Search Like the Google Provides Is Not Going to Work for Some

March 9, 2023

On Hybrid Search” is a content marketing-type report. That’s okay. I found the information useful. What causes me to highlight this post by Qdrant is that one implicit message is: Google’s approach to search is lousy because it is aiming at the lowest common denominator of retrieval while preserving its relevance eroding online ad matching business.

The guts of the write up walks through old school and sort of new school approaches to matching processed content with a query. Keep in mind that most of the technology mentioned in the write up is “old” in the sense that it’s been around for a half decade or more. The “new” technology is about ready to hop on a bike with training wheels and head to the swimming pool. (Yes, there is some risk there I suggest.)

But here’s the key statement in the report for me:

Each search scenario requires a specialized tool to achieve the best results possible. Still, combining multiple tools with minimal overhead is possible to improve the search precision even further. Introducing vector search into an existing search stack doesn’t need to be a revolution but just one small step at a time. You’ll never cover all the possible queries with a list of synonyms, so a full-text search may not find all the relevant documents. There are also some cases in which your users use different terminology than the one you have in your database.

Here’s the statement I am not feeling warm fuzzies:

Those problems are easily solvable with neural vector embeddings, and combining both approaches with an additional reranking step is possible. So you don’t need to resign from your well-known full-text search mechanism but extend it with vector search to support the queries you haven’t foreseen.

Observations:

  • No problems in search when humans are seeking information are “easily solvable with shot gun marriages”.
  • Finding information is no longer enough: The information or data displayed have to be [a] correct, accurate, or at least reproducible; [b] free of injected poisoned information (yep, the burden falls on the indexing engine or engines, not the user who, by definition, does not know an answer or what is needed to answer a query; and [c] the need for having access to “real time” data creates additional computational cost, which is often difficult to justify
  • Basic finding and retrieval is morphing into projected outcomes or implications from the indexed data. Available technology for search and retrieval is not tuned for this requirement.

Stephen E Arnold, March 9, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta