In the Midst of Info Chaos, a Path Identified and Explained

July 10, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

The Thread – Twitter spat in the midst of BlueSky and Mastodon mark a modest change in having one place to go for current information. How does one maintain awareness with high school taunts awing, Mastodon explaining how easy it is to use, and BlueSky doing its deep gaze thing?

One answer and a quite good one at that appears in “RSS for Post-Twitter News and Web Monitoring.” The author knows quite a bit about finding information, and she also has the wisdom to address me as “dinobaby.” I know a GenZ when I get an email that begins, “Hey, there.” Trust me. That salutation does not work as the author expects.

In the cited article, you will get useful information about newsfeeds, screenshots, and practical advice. Here’s an example of what’s in the excellent how to:

If you want to check a site for RSS feeds and you think it might be a WordPress site, just add /feed/ to the end of the domain name. You might get a 404 error, but you also might get a page full of information!

There are more tips. Just navigate to Research Buzz, and learn.

This dinobaby awards one swish of its tail to Tara Calishain. Swish.

Stephen E Arnold, July 10, 2023

Neeva: Is This Google Killer on the Run?

May 18, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Sometimes I think it is 2007 doing the déjà vu dance. I read “Report: Snowflake Is in Advanced Talks to Acquire Search Startup Neeva.” Founded by Xooglers, Neeva was positioned to revolutionize search and generate subscription revenue. Along the highway to the pot of gold, Neeva would deliver on point results. How did that pay for search model work out?

According to the article:

Snowflake Inc., the cloud-based data warehouse provider, is reportedly in advanced talks to acquire a search startup called Neeva Inc. that was founded by former Google LLC advertising executive Sridhar Ramaswamy.

Like every other content processing company I bump into, Neeva was doing smart software. Combine the relevance angle with generative AI and what do you get? A start up that is going to be acquired by a firm with some interesting ideas about how to use search and retrieval to make life better.

Are there other search outfits with a similar business model? Sure, Kagi comes to mind. I used to keep track of start ups which had technology that would provide relevant results to users and a big payday to the investors. Do these names ring a bell?

Cluuz
Deepset
Glean
Kyndi
Siderian
Umiboza

If the Snowflake Neeva deal comes to fruition, will it follow the trajectory of IBM Vivisimo. Vivisimo disappeared as an entity and morphed into a big data component. No problem. But Vivisimo was a metasearch and on-the-fly tagging system. Will the tie up be similar to the Microsoft acquisition of Fast Search & Transfer. Fast still lives but I don’t know too many Softies who know about the backstory. Then there is the HP Autonomy deal. The acquisition is still playing out in the legal eagle sauna.

Few care about the nuances of search and retrieval. Those seemingly irrelevant details can have interesting consequences. Some are okay like the Dassault Exalead deal. Others? Less okay.

Stephen E Arnold, May 18, 2023

Am I a Moron Because I Use You.com?

May 10, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

“Only Morons Use ChatGPT As a Substitute for Google” is a declarative statement. Three words strike me as important in the title of the Lifehacker (an online publication).

First, “morons.” A moron according to TheFreedictionary.com citation is: A city in Eastern Argentina although it has the accented ó. On to the next definition which is “A person who is considered foolish or stupid.” I think this is closer to the mark. I am not comfortable invoking the third definition because it aims denotative punch a a person with a person having a mental age of from seven to 12. I am 78, so let’s go with “foolish or stupid.” I am in that set.

Second, “ChatGPT.” I think the moniker can apply specifically to the for-fee service of OpenAI. It is possible that “ChatGPT” stands for an entire class of generative software. I tried to make a list of a who’s who in generative software and abandoned the task. Quite a few companies are in the game either directly like the aforementioned OpenAI or a bandwagon of companies joyfully tallied by ProductWatch.com and a few LinkedIn contributors. I think the idea is that ChatGPT outputs content which is either derivative (a characteristic of a machine eating other people’s words and images) or hallucinatory (a feature of software which can go off the rails and output like a digital Lewis Carroll galumphing around a park in which young females frolic).

Third, “Google.” My hunch is that the author is an expert online searcher who like many open source intelligence professionals rely on the advertising-supported Google search for objective, on-point answers. Oh, my, that’s quite a reliable source of information. I want to point out that Google focuses on revenue-generation from advertising. Accuracy of results often has little connection to the user’s query. My interpretation of the word “Google” is that Google is good, probably better than “ChatGPT” in providing answers designed to meet the needs of users who may not read above the 9th grade level, struggle with derivatives, and cannot name the capital of Tasmania. (It is Hobart, by the way.)

I am on the fence with the word “only.” I am not comfortable with categorical affirmatives. Given the context of the article and the fact that Google is the Web search engine of choice (conscious or manipulated) for 90 percent of people in North America and Western Europe, I can understand why the field of view is narrow. An expert with regard to Google knows more and more about less and less.

Why is ChatGPT presented as the yan to Google’s yang? The write up says:

If you can’t fact-check the bot, the answers it gives are useless.

Okay, so people fact-check Google outputs. What does Google present with regard to Larry Page and his alleged interactions with Mr. Epstein, the late Mr. Epstein I should say. Run this query Google Page Epstein and this is what the system presented to me on May 6, 2023:

image

Are these stories accurate? Is CNBC the most relevant story about a lawsuit involving JPMorgan Chase and Jeffrey Epstein? What about the New York Post article “Virgin Islands Struggles to Subpoena Google’s Larry Page in Jeffrey Epstein Lawsuit against JPMorgan.” Or the tech insider outfit Gizmodo which offers “Google’s Larry Page Could Be Served in Jeffrey Epstein Case.” There are citations to the British newspaper Telegraph and one from Law & Crime. There is a Yahoo story and a Daily Beast article.

Now what does You.com offer me when I query Google Page Epstein?

There have been recent news articles reporting that the US Virgin Islands has issued subpoenas to Google co-founder Larry Page in connection with a lawsuit against JPMorgan Chase over the Jeffrey Epstein sex trafficking case 1. The lawsuit alleges that JPMorgan Chase facilitated Epstein’s trafficking scheme by continuously offering financial support to Epstein despite knowing about the illegal activities conducted at Little St. James, a private island owned by Epstein. The subpoenas seek documents and communications related to JPMorgan’s banking relationship with Epstein. Sergey Brin, another Google co-founder, has also reportedly been subpoenaed. Despite the subpoenas, it appears that the US Virgin Islands is struggling to serve Page with the legal documents. A recent ruling allows Page to be served with the subpoenas via his attorney in California. cnbc1

I would suggest that the Google citations provide a list, no indication of which source is more or less highly regarded for accuracy. Google wants me to click on one or more of the links, ingest the content of each article, and then synthesize the items of information which strike me as on the money. You.com on the other hand provides me with the bare bones of the alleged involvement with a person who like Lewis Carroll may have had an interest in hanging out around a park on a sunny Saturday afternoon. Catching some rays and perhaps coming up with new ideas are interpretations of such as action by a lawyer hired to explain the late and much lamented Mr. Epstein.

So which is it? The harvesting of buckwheat the old-fashioned way or the pellet of information spat out in a second or two?

I think the idea is that morons are going to go the ChatGPT-like route. Wizards and authors of online “real” news articles want to swing that sickle and relive the thrill of the workers in Vincent van Gogh’s “The Harvest.”

The article says:

you can’t tell whether an AI-generated fact is true or not by the way the text looks; it’s designed to look plausible and correct. You have to fact-check it.

Does one need to fact-check what Google spits out? What about the people who follow Google Maps’s instructions and drive off a cliff? What about the links in Google Scholar to papers with non-reproducible results?

Here’s the conclusion to the write up:

So if you want to use ChatGPT to get ideas or brainstorm places to look for more information, fine. But don’t expect it to base its answers on reality. Even for something as innocuous as recommending books based on your favorites, it’s likely to make up books that don’t even exist.

I like that “don’t even exist.” Google Bard would never do that. Google management would never fire a smart software executive who points out that Google’s smart software is biased. Google would never provide search results that explain how to steal copyright protected software. Well, maybe just one time like this:

image

Oh, no. Wonky software would never ever do that but for Google’s results via YouTube for the query “Magix Vegas crack.” Now who is a moron? Perhaps an apologist for Google?

Stephen E Arnold, May 10, 2023

Divorcing the Google: Legal Eagles Experience a Frisson of Anticipation

April 24, 2023

No smart software has been used to create this dinobaby’s blog post.

I have poked around looking for a version or copy of the contract Samsung signed with Google for the firms’ mobile phone tie up. Based on what I have heard at conferences and read on the Internet (of course, I believe everything I read on the Internet, don’t you?), it appears that there are several major deals.

The first is the use of and access to the mindlessly fragmented Android mobile phone software. Samsung can do some innovating, but the Google is into providing “great experiences.” Why would a mobile phone maker like Samsung allow a user to manage contacts and block mobile calls without implementing a modern day hunt for gold near Placer.

The second is the “suggestion” — mind you, the suggestion is nothing more than a gentle nudge — to keep that largely-malware-free Google Play Store front and center.

The third is the default search engine. Buy a Samsung get Google Search.

Now you know why the legal eagles a shivering when they think of litigation to redo the Google – Samsun deal. For those who think the misinformation zipping around about Microsoft Bing displacing Google Search, my thought would be to ask yourself, “Who gains by pumping out this type of disinformation?” One answer is big Chinese mobile phone manufacturers. This is Art of War stuff, and I won’t dwell on this. What about Microsoft? Maybe but I like to think happy thoughts about Microsoft. I say, “No one at Microsoft would engage in disinformation intended to make life difficult for the online advertising king. Another possibility is Silicon Valley type journalists who pick up rumors, amplify them, and then comment that Samsung is kicking the tires of Bing with ChatGPT. Suddenly a “real” news outfit emits the Samsung rumor. Exciting for the legal eagles.

The write up “Samsung Can’t Dump Google for Bing As the Default Search Engine on Its Phones” does a good job of explaining the contours of a Google – Samsung tie up.

Several observations:

First, the alleged Samsung search replacement provides a glimpse of how certain information can move from whispers at conferences to headlines.

Second, I would not bet against lawyers. With enough money, contracts can be nullified, transformed, or left alone. The only option which disappoints attorneys is the one that lets sleeping dogs lie.

Third, the growing upswell of anti-Google sentiment is noticeable. That may be a far larger problem for Googzilla than rumors about Samsung. Perceptions can be quite real, and they translate into impacts. I am tempted to quote William James, but I won’t.

Net net: If Samsung wants to swizzle a deal with an entity other than the Google, the lawyers may vibrate with such frequency that a feather or two may fall off.

Stephen E Arnold, April 24, 2023

Useful Scholarly / Semi-Scholarly Research System with Deduplicated Results

March 24, 2023

I was delighted to receive a link to OpenAIRE Explore. The service is sponsored by a non-profit partnership established in 2018 as a legal outfit. The objective is to “ensure a permanent open scholarly communication infrastructure to support European research.” (I am not sure whoever wrote the description has read “Book Publishers Won’t Stop Until Libraries Are Dead.)

The specific service I found interesting is Explore located at https://explore.openaire.eu. The service is described by OpenAIRE this way:

A comprehensive and open dataset of research information covering 161m publications, 58m research data, 317k research software items, from 124k data sources, linked to 3m grants and 196k organizations.

Maybe looking at that TechDirt article will be useful.

I ran a number of queries. The probably unreadable screenshot below illustrates the nice interface and the results to my query for Hopf fibrations (if this query doesn’t make sense to you, there’s not much I can do. Perhaps OpenAIRE Explore is ill-suited to queries about Taylor Swift and Ticketmaster?):

image

The query returned 127 “hits” and identified four organizations as having people interested in the subject. (Hopf fibrations are quite important, in my opinion.) No ads, no crazy SEO baloney, but probably some non-error checked equations. Plus, the result set was deduplicated. Imagine that. A use Vivisimo-type function available again.

Observation: Some professional publishers are likely to find the service objectionable. Four of the giants are watching their legal eagles circle the hapless Internet Archive. But soon… maybe OpenAIRE will attract some scrutiny.

For now, OpenAIRE Explore is indeed useful.

Stephen E Arnold, March 24, 2023

20 Years Ago: Primus Knowledge Solutions

March 20, 2023

Note: Written by a real-live dinobaby. No smart software involved.

I am not criticizing Primus Knowledge Solutions (acquired by ATG in 2004 and then Oracle purchased ATG in 2011). I would ask that you read this text and consider what was marketed in 2003. The source is a description of Primus’ Answer Engine which was once located at dub dub dub primus.com/products/answerEngine:

Primus Answer Engine helps companies take full advantage of the valuable content that already exists in corporate documents and databases. Using proprietary natural language processing, Answer Engine delivers quick, relevant answers to plain English questions by bringing widespread corporate knowledge to support, agents, as well as to customers, partners, and employees via the web.

What “features” did the system provide two decades ago? The fact sheet I picked up at a search conference in 2003 told me:

  • Natural language processing
  • Scalability
  • Database integration
  • All major document types
  • Insightful reporting
  • Customizable interface
  • Centralized administration.

The system can suggest questions and interprets these or other questions and returns a list of answers found in a company’s online documents. This allows users to view the answer in context if desired.

I mention Primus because it is one example from dozens in my files about NLP technology.

Several observations/questions:

  • Where is Oracle in the ChatGPT derby? May I suggest this link for starters.
  • Isn’t the principal difference between Primus and other NLP “smart software” users are chasing ChatGPT type systems, not innovators outputting marketing words?
  • Are issues like updating training models and their content, biases in the models themselves, and the challenge of accurate, current data enjoying the 2003 naïveté?

Net net: ChatGPT is just one manifestation of innovators’ attempts to deal with the challenge of finding accurate, on-point, and timely information in the digital world. (This is a world I call the datasphere.)

Stephen E Arnold, March 20, 2023

Elasticsearch Guide: More of a Cheat Sheet

March 15, 2023

Elasticsearch has been a go-to solution for searching content either via the open source version or the Elastic technical support option. The system works, and it has many followers and enthusiasts. As a result, one can locate “help” easily online for many hitches in the git along.

I found the information in “Unlocking the Power of Elasticsearch: A Comprehensive Guide to Complex Search Use Cases.” I would suggest that the write up is more like a cheat sheet. Encounter a specific task, check the “Guide,” and sally forth.

I would suggest that many real-life enterprise search needs are often difficult to solve. Examples range from capturing data on a sales professional’s laptop before the colleague deletes the slide dek with the revised price quotation data. No search engine on the planet can get this important information to the legal department if the project goes off the rails. “I can’t find it” is not a helpful answer.

Similar challenges arise when the Elasticsearch system must interact with a line item for a product specified in a purchase order which has a corresponding engineering drawing. Line up the chemical, civil, mechanical, and nuclear engineers and tell them, “Well, that’s an object embedded in the what-do-you-call-it software I never heard of.” Yeah.

Nevertheless, for some helpful tips give the free guide a look.

The mantra is, “Search is easy. Search is a solved problem. Search is no big deal.” Convince yourself. Keep in mind that the mantra does not ring true to me nor does it make me calm.

Stephen E Arnold, March 15, 2023

Hybrid Search: A Gentle Way of Saying “One Size Fits All” Search Like the Google Provides Is Not Going to Work for Some

March 9, 2023

On Hybrid Search” is a content marketing-type report. That’s okay. I found the information useful. What causes me to highlight this post by Qdrant is that one implicit message is: Google’s approach to search is lousy because it is aiming at the lowest common denominator of retrieval while preserving its relevance eroding online ad matching business.

The guts of the write up walks through old school and sort of new school approaches to matching processed content with a query. Keep in mind that most of the technology mentioned in the write up is “old” in the sense that it’s been around for a half decade or more. The “new” technology is about ready to hop on a bike with training wheels and head to the swimming pool. (Yes, there is some risk there I suggest.)

But here’s the key statement in the report for me:

Each search scenario requires a specialized tool to achieve the best results possible. Still, combining multiple tools with minimal overhead is possible to improve the search precision even further. Introducing vector search into an existing search stack doesn’t need to be a revolution but just one small step at a time. You’ll never cover all the possible queries with a list of synonyms, so a full-text search may not find all the relevant documents. There are also some cases in which your users use different terminology than the one you have in your database.

Here’s the statement I am not feeling warm fuzzies:

Those problems are easily solvable with neural vector embeddings, and combining both approaches with an additional reranking step is possible. So you don’t need to resign from your well-known full-text search mechanism but extend it with vector search to support the queries you haven’t foreseen.

Observations:

  • No problems in search when humans are seeking information are “easily solvable with shot gun marriages”.
  • Finding information is no longer enough: The information or data displayed have to be [a] correct, accurate, or at least reproducible; [b] free of injected poisoned information (yep, the burden falls on the indexing engine or engines, not the user who, by definition, does not know an answer or what is needed to answer a query; and [c] the need for having access to “real time” data creates additional computational cost, which is often difficult to justify
  • Basic finding and retrieval is morphing into projected outcomes or implications from the indexed data. Available technology for search and retrieval is not tuned for this requirement.

Stephen E Arnold, March 9, 2023

Take That Googzilla Because You Have One Claw in Your Digital Grave. Honest

March 8, 2023

My, my. How the “we are search experts” set have changed their tune. I am not talking about those who were terminated by the Google. I am not talking about the fawning advertising intermediaries. I am not talking about old school librarians who know how to extract information from commercial databases.

I am talking about the super clever Silicon Valley infused pundits.

Here’s an example: “Google Search Is Dying” from 2022. The write up contains one of the all-time statements from a Google wizard I have encountered. Believe me. I have noted a few over the years.

The speaker is the former champion of search engine optimization and denier of Google’s destruction of precision, recall, and relevance in search results. Here’s the statement:

You said in the post that quotes don’t give exact matches. They really do. Honest.— Google’s public search liaison (that’s a title of which to be proud)

I love it when a Googler uses the word “honest.”

Net net: The Gen X, Y’s, and Z’s perceive themselves as search experts. Okay, living in a cloud of unknowing is ubiquitous today. But “honest”?

Stephen E Arnold, March 8, 2023

Goggle Points Out the ChatGPT Has a Core Neural Disorder: LSD or Spoiled Baloney?

February 16, 2023

I am an old-fashioned dinobaby. I have a reasonably good memory for great moments in search and retrieval. I recall when Danny Sullivan told me that search engine optimization improves relevance. In 2006, Prabhakar Raghavan on a conference call with a Managing Director of a so-so financial outfit explained that Yahoo had semantic technology that made Google’s pathetic effort look like outdated technology.

psy pizza 1 copy

Hallucinating pizza courtesy of the super smart AI app Craiyon.com. The art, not the write up it accompanies, was created by smart software. The article is the work of the dinobaby, Stephen E Arnold. Looks like pizza to me. Close enough for horseshoes like so many zippy technologies.

Now that SEO and its spawn are scrambling to find a way to fiddle with increasingly weird methods for making software return results the search engine optimization crowd’s customers demand, Google’s head of search Prabhakar Raghavan is opining about the oh, so miserable work of Open AI and its now TikTok trend ChatGPT. May I remind you, gentle reader, that OpenAI availed itself of some Googley open source smart software and consulted with some Googlers as it ramped up to the tsunami of PR ripples? May I remind you that Microsoft said, “Yo, we’re putting some OpenAI goodies in PowerPoint.” The world rejoiced and Reddit plus Twitter kicked into rave mode.

Google responded with a nifty roll out in Paris. February is not April, but maybe it should have been in April 2023, not in les temp d’hiver?

I read with considerable amusement “Google Vice President Warns That AI Chatbots Are Hallucinating.” The write up states as rock solid George Washington I cannot tell a lie truth the following:

Speaking to German newspaper Welt am Sonntag, Raghavan warned that users may be delivered complete nonsense by chatbots, despite answers seeming coherent. “This type of artificial intelligence we’re talking about can sometimes lead to something we call hallucination,” Raghavan told Welt Am Sonntag. “This is then expressed in such a way that a machine delivers a convincing but completely fictitious answer.”

LSD or just the Google code relied upon? Was it the Googlers of whom OpenAI asked questions? Was it reading the gems of wisdom in Google patent documents? Was it coincidence?

I recall that Dr. Timnit Gebru and her co-authors of the Stochastic Parrot paper suggest that life on the Google island was not palm trees and friendly natives. Nope. Disagree with the Google and your future elsewhere awaits.

Now we have the hallucination issue. The implication is that smart software like Google-infused OpenAI is addled. It imagines things. It hallucinates. It is living in a fantasy land with bean bag chairs, Foosball tables, and memories of Odwalla juice.

I wrote about the after-the-fact yip yap from Google’s Chair Person of the Board. I mentioned the Father of the Darned Internet’s post ChatGPT PR blasts. Now we have the head of search’s observation about screwed up neural networks.

Yep, someone from Verity should know about flawed software. Yep, someone from Yahoo should be familiar with using PR to mask spectacular failure in search. Yep, someone from Google is definitely in a position to suggest that smart software may be somewhat unreliable because of fundamental flaws in the systems and methods implemented at Google and probably other outfits loving the Tensor T shirts.

Stephen E Arnold, February 16, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta