Hybrid Search: A Gentle Way of Saying “One Size Fits All” Search Like the Google Provides Is Not Going to Work for Some
March 9, 2023
“On Hybrid Search” is a content marketing-type report. That’s okay. I found the information useful. What causes me to highlight this post by Qdrant is that one implicit message is: Google’s approach to search is lousy because it is aiming at the lowest common denominator of retrieval while preserving its relevance eroding online ad matching business.
The guts of the write up walks through old school and sort of new school approaches to matching processed content with a query. Keep in mind that most of the technology mentioned in the write up is “old” in the sense that it’s been around for a half decade or more. The “new” technology is about ready to hop on a bike with training wheels and head to the swimming pool. (Yes, there is some risk there I suggest.)
But here’s the key statement in the report for me:
Each search scenario requires a specialized tool to achieve the best results possible. Still, combining multiple tools with minimal overhead is possible to improve the search precision even further. Introducing vector search into an existing search stack doesn’t need to be a revolution but just one small step at a time. You’ll never cover all the possible queries with a list of synonyms, so a full-text search may not find all the relevant documents. There are also some cases in which your users use different terminology than the one you have in your database.
Here’s the statement I am not feeling warm fuzzies:
Those problems are easily solvable with neural vector embeddings, and combining both approaches with an additional reranking step is possible. So you don’t need to resign from your well-known full-text search mechanism but extend it with vector search to support the queries you haven’t foreseen.
Observations:
- No problems in search when humans are seeking information are “easily solvable with shot gun marriages”.
- Finding information is no longer enough: The information or data displayed have to be [a] correct, accurate, or at least reproducible; [b] free of injected poisoned information (yep, the burden falls on the indexing engine or engines, not the user who, by definition, does not know an answer or what is needed to answer a query; and [c] the need for having access to “real time” data creates additional computational cost, which is often difficult to justify
- Basic finding and retrieval is morphing into projected outcomes or implications from the indexed data. Available technology for search and retrieval is not tuned for this requirement.
Stephen E Arnold, March 9, 2023
Social Unhappiness, Disruption, and the Crime Explosions
March 9, 2023
Note: No smart software on earth writes like a dinobaby channeling his inner Jonathan Swift.
The mobile phones are responsible for: [a] fights on Carnival Cruise ships, [b] teens killing themselves, [c] stupid committee decisions that make the camel analogy comparatively harmless, and [d] an efflorescence of cyber crime.
How do I know this?
I read an essay called “Honestly, It’s Probably the Phones.” I admit I took the main argument of the essay and extended it. That argument proved stretchy, and I think the write up is on to something.
I noted this passage:
The first reason smartphones should be our prior is that the timing just lines up really well. The smartphone was invented in 2007, but it didn’t really become commonplace until the 2010s, exactly when teen happiness fell off a cliff…. First, they’re a distraction — the rise of smartphones was also the rise of “phubbing”, i.e. when people go on their phones instead of paying attention to the people around them. Second, phones provide a behavioral “nudge”, like a pantry stocked with junk food — when your phone is right there in your pocket, it’s easier to just text a friend instead of going and hanging out, even if the latter would be less fulfilling. And third, in-person interaction is a network effect. If 20% of people would rather be on their phones, that reduces everyone else’s options for in-person hangouts by 20%.
Okay, I am sold.
I want to shift gears and switch to a write up which purports to present facts. For the purposes of this blog post, I want to assume that the information in The US Sun (an estimable news source) article “Google Issues Six Major Alerts to Billions – You Face Bank Wipeout If You Ignore Them” is correct.
The article identifies a lottery scam, a tech support scam, fake jobs and invoices scams, Google account recovery scams, gift card scams, and blackmail and extortion scams. The idea seems to be that Google has created a massive ecosystem of crime. With most Google interactions taking place on mobile phones, it seems as if Google and its fellow traveler Apple are making clear that more than teen self-harm is a consequence of these gizmos.
Now what’s the fix? Perhaps a variation of “first, let’s kill all the lawyers” is a step too far. What about a driver’s license approach? No mobile and no phone until one reaches a certain age? What about a variation of the ever popular Chinese social credit system? Trouble in high school? No mobile for you.
I prefer that parents and guardians play a major role. I think smart software might be worth considering as a method for filtering to certain demographics some content. Why not ask the Dilbert cartoonist for some ideas.
I would suggest that the confluence of mobile phones and outfits like Google may have been like a lab experiment gone wrong. A clueless high school student (not in the science club, of course) mixes two apparently harmless household substances and makes the entire class sick. How does that get fixed? The answer, “Not easily.”
Stephen E Arnold, March 9, 2023
Take That Googzilla Because You Have One Claw in Your Digital Grave. Honest
March 8, 2023
My, my. How the “we are search experts” set have changed their tune. I am not talking about those who were terminated by the Google. I am not talking about the fawning advertising intermediaries. I am not talking about old school librarians who know how to extract information from commercial databases.
I am talking about the super clever Silicon Valley infused pundits.
Here’s an example: “Google Search Is Dying” from 2022. The write up contains one of the all-time statements from a Google wizard I have encountered. Believe me. I have noted a few over the years.
The speaker is the former champion of search engine optimization and denier of Google’s destruction of precision, recall, and relevance in search results. Here’s the statement:
You said in the post that quotes don’t give exact matches. They really do. Honest.— Google’s public search liaison (that’s a title of which to be proud)
I love it when a Googler uses the word “honest.”
Net net: The Gen X, Y’s, and Z’s perceive themselves as search experts. Okay, living in a cloud of unknowing is ubiquitous today. But “honest”?
Stephen E Arnold, March 8, 2023
Wanna Be an Old Fashioned B&E Person?
March 8, 2023
I spotted another of the info dumps which make me nervous. “Red Team, Physical Security, Covert Entry, and EDC” is another list of helpful products and tools. (EDC means every day carry.) My personal preference is that this type of information not zip around so that curious high school science club members can get some helpful ideas. What makes this list interesting is the disclaimer. Legal eagles will definitely be reluctant to take flight after reading:
Disclaimer: I am not responsible for anyone using any information in this post for any illegal activities. Getting caught with possession of burglary tools will likely land you behind bars and possibly end with a multiple felony conviction. The information in this post is for legal and authorized engagements, and to use for educational purposes only.
These types of messages are appearing with greater frequency. A good example is the message from Vaga Bond about train hopping in some interesting countries like Russia and Morocco.
If you want to see these tools, navigate to one of CosmodiumCS’s helpful YouTube videos; for example, https://www.youtube.com/watch?v=ETMHHvRrH5A.
Stephen E Arnold, March 9, 2023
Making Data Incomprehensible
March 8, 2023
I spotted a short write up with 100 pictures called “1 Dataset 100 Visualizations.” The write up presents a simple table:
And converts or presents the data 100 different ways.
Here’s an example:
My reactions to the examples are:
- Why are the colors presented with low contrast. Many of the charts’ graphics are incomprehensible. The texts’ illegibility underscores the disconnect between being understood and being in a weird world of incomprehensibility.
- What’s wrong with the basic table? It works. Why create a graph? Oh, I know. To be clever. Nope, not clever. Performative demonstration of numerical expertise perhaps?
- The Wall Street Journal and other “real news” organizations love these types of obscurification. I can visualize the goose bumps which form on the arms of these individuals. The anticipation of making something fuzzy is a thrilling moment.
Yikes. Marketing methods to be unclear.
Stephen E Arnold, March 8, 2023
Unpatchable Windows Flaw? Will Surprises Reside in Smart Software from Microsoft?
March 7, 2023
No big deal? A flaw described as “Unpatchable”? Not to worry. Okay, I will pretend not to worry, but I am worrying. Many commercial and government systems may be at risk. “Stealthy UEFI Malware Bypassing Secure Boot Enabled by Unpatchable Windows Flaw” reports:
Researchers on Wednesday [presumably March 1, 2023] announced a major cybersecurity find—the world’s first-known instance of real-world malware that can hijack a computer’s boot process even when Secure Boot and other advanced protections are enabled and running on fully updated versions of Windows.
Microsoft’s good enough engineering has produced technology which in “unpatchable.” Shouldn’t that effort be directed toward creating software which is patchable? I know. I know. People are in a hurry. There are those TikToks to watch. Plus, who wants to fool around with secure boot issues when the future is smart software.
As the Microsofties chase after the elusive “it understands human utterance” bunny rabbit, what gotchas will be tucked inside ChatGPT-inspired applications? I am not very good at predicting the future. I am not dumb enough to say, “Hey, that Microsoft smart software will be okay.” Microsoft is good at marketing. May I suggest that Microsoft is not so good at producing software that meets users’ expectations for security.
Stephen E Arnold, March 7, 2023
Publishers Face Another Existential Threat Beyond Their Own Management Decisions
March 7, 2023
Existential threat, existential threat. I hear that from many executives. The principal existential threat is a company’s own management decisions. Short-term, context-free, and uninformed deciders miss the boat, the train, and the bus to organic revenue growth. If I read a news story, I learn about another senior executive playing fast and loose with rules, regulations, and ethical guidelines.
Today I read the clickbait infused headline: “Big Media Is Gearing Up for Battle with Google and Microsoft over AI Chatbots Using Their Articles for Training: We Are Actively Considering Our Options.” (The headline seems to be pandering to the Google, does it not?)
What is an existential threat? Here a whack at a definition by Dictionary.com, a super duper source:
An existential threat is a threat to something’s very existence—when the continued being of something is at stake or in danger. It is used to describe threats to actual living things as well to nonliving thing things, such as a country or an ideology.
I think the phrase has been extended to cover an action or process which could erode the revenues of a publisher.
The write up cited is, of course, behind a paywall. No existential threat for Business Insider … yet. I learned:
It’s a moment some publishers consider the most disruptive change they’ve seen to their industry since the dawn of the internet — and the threat is no less than existential. The worry is that if people can get thorough answers to their questions through these bots, they won’t need to visit content sites anymore, undermining media’s entire revenue model, which has already been battered by digital upheaval.
But here’s the paragraph that caught my attention. Remember, that Rupert Murdoch and Fox News are in the midst of a conversation about dissemination of knowingly incorrect information. Remember the New York Times is discussing in a positive manner its coverage of some individuals’ efforts to shift from male to female and other possible combinations. Yep, Rupert and the Gray Lady.
“AI is a new frontier with great opportunity, but it can’t replace the trust, independence, and integrity of quality journalism,” said Danielle Coffey, EVP and general counsel of the News/Media Alliance, a publisher trade organization whose members include The New York Times and Wall Street Journal publisher News Corp. “Without compensation, we lose the humanity that journalists bring to telling a story.”
The issue was the loss of advertising revenue. Nope, that money is not coming back. Now the issue is loss of a reason to buy a subscription to “real news” publications. Nope, those readers are unlikely to come back.
Why? How about convenience?
I subscribe to dead tree newspapers. If the paper edition arrives, it could be torn, wet, or folded incorrectly because maintenance of the paper feed rollers is just an annoyance when someone wants to get a coffee.
What’s the fix? The desired fix is the termination with extreme prejudice of the evil Googzilla and its fellow travellers: Amazon, Apple, Facebook, Microsoft, and probably a few others on publishers’ dart boards.
A few observations:
- AI is not something new. Publishers have, as far as I know, been mostly on the sidelines in the AI refinement efforts over the last 50 years. Yes, that’s a half a century.
- The publishers want money. The “content” produced is simply a worm on a fish hook. Existential threat to revenue, yes. Death of publishers? Meh.
- The costs of litigation with an outfit like Google are likely to make the CFOs of the publishing companies going after Googzilla and its fellow travellers unhappy. Why? The EU and the US government have not had a stellar track record of getting these digital outfits to return phone calls, let alone play by the rules.
- Which outfits can pay the legal fees longer: Google and Microsoft or a group of publishers who seem to want Google traffic and whatever ad revenue can be had.
Net net: How about less existential threat talk and more use of plain English like “We want cash for content use”? I would ask why the publishers and their trade associations have not been in the vanguard of AI development. The focus seems to be on replacing humanoids with software to reduce costs. Søren Kierkegaard would be amused in my opinion.
Stephen E Arnold, March 7, 2023
Gen Z and Retro Tech
March 7, 2023
I read an interesting write up about people who are younger than I. Keep in mind, please, that I am a dinobaby. “Gen Z Apparently Baffled by Basic Technology.” The write up says:
But when it comes to using a scanner or printer — or even a file system on a computer — things become a lot more challenging to a generation that has spent much of their lives online
Does this mean that a younger employee will not be able to make a photocopy of a receipt for an alleged business expense?
I learned that a 25-year-old wizard was unable to get the photocopy to produce something other than a blank page.
Okay, the idea of turning over the page eluded the budding captain of social media.
Will these future leaders ask for assistance? Nah, there’s something called tech shame. Who wants to look stupid and not get promoted.
Need another example? No, well, too bad. The write up points out that these world beaters cannot schedule meetings? Like time is hard. Follow ups are almost like work.
I am glad I am old.
Stephen E Arnold, March 7, 2023
SEO Fuels Smart Software
March 6, 2023
I read “Must Read: The 100 Most Cited Papers in 2022.” The principal finding is that Google-linked entities wrote most of the “important” papers. If one thinks back to Gene Garfield’s citation analysis work, a frequently cited paper is either really good, or it is an intellectual punching bag. Getting published is often not enough to prove that an academic is smart. Getting cited is the path to glory, tenure, and possibly an advantage when chasing grants.,
Here’s a passage which explains the fact that Google is “important”:
Google is consistently the strongest player followed by Meta, Microsoft, UC Berkeley, DeepMind and Stanford.
Keep in mind that Google and DeepMind are components of Alphabet.
Why’s this important?
- There is big, big money in selling/licensing models and data sets down the road
- Integrating technology into other people’s applications is a step toward vendor lock in and surveillance of one sort or another
- Analyzing information about the users of a technology provides a useful source of signals about [a] what to buy or invest in, [b] copy, or [c] acquire
If the data in this “100 Most Cited” article are accurate or at least close enough for horseshoes Google and OpenAI may be playing a clever game not unlike what the Cambridge Analytica crowd did.
Implications? Absolutely. I will talk about a few in my National Cyber Crime Conference lecture about OSINT Blindspots. (Yep, my old term has new life in the smart software memesphere.
Stephen E Aronld, March 6, 2023
Google: Code Redder Because … Microsoft Markets AI Gooder
March 6, 2023
Don’t misunderstand. I think the Chat GPT search wars are more marketing than useful functionality for my work. You may have a different viewpoint. That’s great. Just keep in mind that Google’s marvelous Code Red alarm was a response to Microsoft marketing. Yep, if you want to see the Sundar and Prabhakar Duo do some fancy dancing, just get your Microsoft rep to mash the Goose Google button.
Someone took this advice and added “AI” to the truly wonderful Windows 11 software. I read “Microsoft Adds “AI” to Taskbar Search Field” and learned that either ChatGPT or a human said:
In the last three weeks, we also launched the new AI-powered Bing into preview for more than 1 million people in 169 countries, and expanded the new Bing to the Bing and Edge mobile apps as well as introduced it into Skype. It is a new era in Search, Chat and Creation and with the new Bing and Edge you now have your own copilot for the web. Today, we take the next major step forward adding to the incredible breadth and ease of use of the Windows PC by implementing a typable Windows search box and the amazing capability of the new AI-powered Bing directly into the taskbar. Putting all your search needs for Windows in one easy to find location.
Exciting because lousy search will become milk, honey, sunshine, roses, and French bulldog puppies. Nope. Search is still the Bing with a smaller index than the Google sports. But that “AI” in the search box evokes good thoughts for some users.
For Google, the AI in the search box mashes the Code Red button. I think that if that button gets pressed five times in quick succession, the Google goes from Code Red to Code Super Red with LED sparkles.
Remember this AI search is marketing at this time in my frame of reference.
Microsoft is showing that Google is not too good at marketing. I am now mashing the Code Red button five times. Mash. Mash. Mash. Mash. Mash. Now I can watch Googzilla twitch and hop. Perhaps the creature will be the opening act in the Sundar and Prabhakar Emergency Output Emission Explanation Tour. Did you hear the joke about Microsoft walks into a vegan restaurant and says, “Did you hear the joke about Google marketing?” The server says, “No.” The Softie replies, “Google searched for marketing in its search engine and couldn’t get a relevant answer.”
Ho, ho
Stephen E Arnold, March 6, 2023