Am I a Moron Because I Use You.com?
May 10, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“Only Morons Use ChatGPT As a Substitute for Google” is a declarative statement. Three words strike me as important in the title of the Lifehacker (an online publication).
First, “morons.” A moron according to TheFreedictionary.com citation is: A city in Eastern Argentina although it has the accented ó. On to the next definition which is “A person who is considered foolish or stupid.” I think this is closer to the mark. I am not comfortable invoking the third definition because it aims denotative punch a a person with a person having a mental age of from seven to 12. I am 78, so let’s go with “foolish or stupid.” I am in that set.
Second, “ChatGPT.” I think the moniker can apply specifically to the for-fee service of OpenAI. It is possible that “ChatGPT” stands for an entire class of generative software. I tried to make a list of a who’s who in generative software and abandoned the task. Quite a few companies are in the game either directly like the aforementioned OpenAI or a bandwagon of companies joyfully tallied by ProductWatch.com and a few LinkedIn contributors. I think the idea is that ChatGPT outputs content which is either derivative (a characteristic of a machine eating other people’s words and images) or hallucinatory (a feature of software which can go off the rails and output like a digital Lewis Carroll galumphing around a park in which young females frolic).
Third, “Google.” My hunch is that the author is an expert online searcher who like many open source intelligence professionals rely on the advertising-supported Google search for objective, on-point answers. Oh, my, that’s quite a reliable source of information. I want to point out that Google focuses on revenue-generation from advertising. Accuracy of results often has little connection to the user’s query. My interpretation of the word “Google” is that Google is good, probably better than “ChatGPT” in providing answers designed to meet the needs of users who may not read above the 9th grade level, struggle with derivatives, and cannot name the capital of Tasmania. (It is Hobart, by the way.)
I am on the fence with the word “only.” I am not comfortable with categorical affirmatives. Given the context of the article and the fact that Google is the Web search engine of choice (conscious or manipulated) for 90 percent of people in North America and Western Europe, I can understand why the field of view is narrow. An expert with regard to Google knows more and more about less and less.
Why is ChatGPT presented as the yan to Google’s yang? The write up says:
If you can’t fact-check the bot, the answers it gives are useless.
Okay, so people fact-check Google outputs. What does Google present with regard to Larry Page and his alleged interactions with Mr. Epstein, the late Mr. Epstein I should say. Run this query Google Page Epstein and this is what the system presented to me on May 6, 2023:
Are these stories accurate? Is CNBC the most relevant story about a lawsuit involving JPMorgan Chase and Jeffrey Epstein? What about the New York Post article “Virgin Islands Struggles to Subpoena Google’s Larry Page in Jeffrey Epstein Lawsuit against JPMorgan.” Or the tech insider outfit Gizmodo which offers “Google’s Larry Page Could Be Served in Jeffrey Epstein Case.” There are citations to the British newspaper Telegraph and one from Law & Crime. There is a Yahoo story and a Daily Beast article.
Now what does You.com offer me when I query Google Page Epstein?
There have been recent news articles reporting that the US Virgin Islands has issued subpoenas to Google co-founder Larry Page in connection with a lawsuit against JPMorgan Chase over the Jeffrey Epstein sex trafficking case 1. The lawsuit alleges that JPMorgan Chase facilitated Epstein’s trafficking scheme by continuously offering financial support to Epstein despite knowing about the illegal activities conducted at Little St. James, a private island owned by Epstein. The subpoenas seek documents and communications related to JPMorgan’s banking relationship with Epstein. Sergey Brin, another Google co-founder, has also reportedly been subpoenaed. Despite the subpoenas, it appears that the US Virgin Islands is struggling to serve Page with the legal documents. A recent ruling allows Page to be served with the subpoenas via his attorney in California. cnbc1
I would suggest that the Google citations provide a list, no indication of which source is more or less highly regarded for accuracy. Google wants me to click on one or more of the links, ingest the content of each article, and then synthesize the items of information which strike me as on the money. You.com on the other hand provides me with the bare bones of the alleged involvement with a person who like Lewis Carroll may have had an interest in hanging out around a park on a sunny Saturday afternoon. Catching some rays and perhaps coming up with new ideas are interpretations of such as action by a lawyer hired to explain the late and much lamented Mr. Epstein.
So which is it? The harvesting of buckwheat the old-fashioned way or the pellet of information spat out in a second or two?
I think the idea is that morons are going to go the ChatGPT-like route. Wizards and authors of online “real” news articles want to swing that sickle and relive the thrill of the workers in Vincent van Gogh’s “The Harvest.”
The article says:
you can’t tell whether an AI-generated fact is true or not by the way the text looks; it’s designed to look plausible and correct. You have to fact-check it.
Does one need to fact-check what Google spits out? What about the people who follow Google Maps’s instructions and drive off a cliff? What about the links in Google Scholar to papers with non-reproducible results?
Here’s the conclusion to the write up:
So if you want to use ChatGPT to get ideas or brainstorm places to look for more information, fine. But don’t expect it to base its answers on reality. Even for something as innocuous as recommending books based on your favorites, it’s likely to make up books that don’t even exist.
I like that “don’t even exist.” Google Bard would never do that. Google management would never fire a smart software executive who points out that Google’s smart software is biased. Google would never provide search results that explain how to steal copyright protected software. Well, maybe just one time like this:
Oh, no. Wonky software would never ever do that but for Google’s results via YouTube for the query “Magix Vegas crack.” Now who is a moron? Perhaps an apologist for Google?
Stephen E Arnold, May 10, 2023
Divorcing the Google: Legal Eagles Experience a Frisson of Anticipation
April 24, 2023
No smart software has been used to create this dinobaby’s blog post.
I have poked around looking for a version or copy of the contract Samsung signed with Google for the firms’ mobile phone tie up. Based on what I have heard at conferences and read on the Internet (of course, I believe everything I read on the Internet, don’t you?), it appears that there are several major deals.
The first is the use of and access to the mindlessly fragmented Android mobile phone software. Samsung can do some innovating, but the Google is into providing “great experiences.” Why would a mobile phone maker like Samsung allow a user to manage contacts and block mobile calls without implementing a modern day hunt for gold near Placer.
The second is the “suggestion” — mind you, the suggestion is nothing more than a gentle nudge — to keep that largely-malware-free Google Play Store front and center.
The third is the default search engine. Buy a Samsung get Google Search.
Now you know why the legal eagles a shivering when they think of litigation to redo the Google – Samsun deal. For those who think the misinformation zipping around about Microsoft Bing displacing Google Search, my thought would be to ask yourself, “Who gains by pumping out this type of disinformation?” One answer is big Chinese mobile phone manufacturers. This is Art of War stuff, and I won’t dwell on this. What about Microsoft? Maybe but I like to think happy thoughts about Microsoft. I say, “No one at Microsoft would engage in disinformation intended to make life difficult for the online advertising king. Another possibility is Silicon Valley type journalists who pick up rumors, amplify them, and then comment that Samsung is kicking the tires of Bing with ChatGPT. Suddenly a “real” news outfit emits the Samsung rumor. Exciting for the legal eagles.
The write up “Samsung Can’t Dump Google for Bing As the Default Search Engine on Its Phones” does a good job of explaining the contours of a Google – Samsung tie up.
Several observations:
First, the alleged Samsung search replacement provides a glimpse of how certain information can move from whispers at conferences to headlines.
Second, I would not bet against lawyers. With enough money, contracts can be nullified, transformed, or left alone. The only option which disappoints attorneys is the one that lets sleeping dogs lie.
Third, the growing upswell of anti-Google sentiment is noticeable. That may be a far larger problem for Googzilla than rumors about Samsung. Perceptions can be quite real, and they translate into impacts. I am tempted to quote William James, but I won’t.
Net net: If Samsung wants to swizzle a deal with an entity other than the Google, the lawyers may vibrate with such frequency that a feather or two may fall off.
Stephen E Arnold, April 24, 2023
Useful Scholarly / Semi-Scholarly Research System with Deduplicated Results
March 24, 2023
I was delighted to receive a link to OpenAIRE Explore. The service is sponsored by a non-profit partnership established in 2018 as a legal outfit. The objective is to “ensure a permanent open scholarly communication infrastructure to support European research.” (I am not sure whoever wrote the description has read “Book Publishers Won’t Stop Until Libraries Are Dead.)
The specific service I found interesting is Explore located at https://explore.openaire.eu. The service is described by OpenAIRE this way:
A comprehensive and open dataset of research information covering 161m publications, 58m research data, 317k research software items, from 124k data sources, linked to 3m grants and 196k organizations.
Maybe looking at that TechDirt article will be useful.
I ran a number of queries. The probably unreadable screenshot below illustrates the nice interface and the results to my query for Hopf fibrations (if this query doesn’t make sense to you, there’s not much I can do. Perhaps OpenAIRE Explore is ill-suited to queries about Taylor Swift and Ticketmaster?):
The query returned 127 “hits” and identified four organizations as having people interested in the subject. (Hopf fibrations are quite important, in my opinion.) No ads, no crazy SEO baloney, but probably some non-error checked equations. Plus, the result set was deduplicated. Imagine that. A use Vivisimo-type function available again.
Observation: Some professional publishers are likely to find the service objectionable. Four of the giants are watching their legal eagles circle the hapless Internet Archive. But soon… maybe OpenAIRE will attract some scrutiny.
For now, OpenAIRE Explore is indeed useful.
Stephen E Arnold, March 24, 2023
20 Years Ago: Primus Knowledge Solutions
March 20, 2023
Note: Written by a real-live dinobaby. No smart software involved.
I am not criticizing Primus Knowledge Solutions (acquired by ATG in 2004 and then Oracle purchased ATG in 2011). I would ask that you read this text and consider what was marketed in 2003. The source is a description of Primus’ Answer Engine which was once located at dub dub dub primus.com/products/answerEngine:
Primus Answer Engine helps companies take full advantage of the valuable content that already exists in corporate documents and databases. Using proprietary natural language processing, Answer Engine delivers quick, relevant answers to plain English questions by bringing widespread corporate knowledge to support, agents, as well as to customers, partners, and employees via the web.
What “features” did the system provide two decades ago? The fact sheet I picked up at a search conference in 2003 told me:
- Natural language processing
- Scalability
- Database integration
- All major document types
- Insightful reporting
- Customizable interface
- Centralized administration.
The system can suggest questions and interprets these or other questions and returns a list of answers found in a company’s online documents. This allows users to view the answer in context if desired.
I mention Primus because it is one example from dozens in my files about NLP technology.
Several observations/questions:
- Where is Oracle in the ChatGPT derby? May I suggest this link for starters.
- Isn’t the principal difference between Primus and other NLP “smart software” users are chasing ChatGPT type systems, not innovators outputting marketing words?
- Are issues like updating training models and their content, biases in the models themselves, and the challenge of accurate, current data enjoying the 2003 naïveté?
Net net: ChatGPT is just one manifestation of innovators’ attempts to deal with the challenge of finding accurate, on-point, and timely information in the digital world. (This is a world I call the datasphere.)
Stephen E Arnold, March 20, 2023
Elasticsearch Guide: More of a Cheat Sheet
March 15, 2023
Elasticsearch has been a go-to solution for searching content either via the open source version or the Elastic technical support option. The system works, and it has many followers and enthusiasts. As a result, one can locate “help” easily online for many hitches in the git along.
I found the information in “Unlocking the Power of Elasticsearch: A Comprehensive Guide to Complex Search Use Cases.” I would suggest that the write up is more like a cheat sheet. Encounter a specific task, check the “Guide,” and sally forth.
I would suggest that many real-life enterprise search needs are often difficult to solve. Examples range from capturing data on a sales professional’s laptop before the colleague deletes the slide dek with the revised price quotation data. No search engine on the planet can get this important information to the legal department if the project goes off the rails. “I can’t find it” is not a helpful answer.
Similar challenges arise when the Elasticsearch system must interact with a line item for a product specified in a purchase order which has a corresponding engineering drawing. Line up the chemical, civil, mechanical, and nuclear engineers and tell them, “Well, that’s an object embedded in the what-do-you-call-it software I never heard of.” Yeah.
Nevertheless, for some helpful tips give the free guide a look.
The mantra is, “Search is easy. Search is a solved problem. Search is no big deal.” Convince yourself. Keep in mind that the mantra does not ring true to me nor does it make me calm.
Stephen E Arnold, March 15, 2023
Hybrid Search: A Gentle Way of Saying “One Size Fits All” Search Like the Google Provides Is Not Going to Work for Some
March 9, 2023
“On Hybrid Search” is a content marketing-type report. That’s okay. I found the information useful. What causes me to highlight this post by Qdrant is that one implicit message is: Google’s approach to search is lousy because it is aiming at the lowest common denominator of retrieval while preserving its relevance eroding online ad matching business.
The guts of the write up walks through old school and sort of new school approaches to matching processed content with a query. Keep in mind that most of the technology mentioned in the write up is “old” in the sense that it’s been around for a half decade or more. The “new” technology is about ready to hop on a bike with training wheels and head to the swimming pool. (Yes, there is some risk there I suggest.)
But here’s the key statement in the report for me:
Each search scenario requires a specialized tool to achieve the best results possible. Still, combining multiple tools with minimal overhead is possible to improve the search precision even further. Introducing vector search into an existing search stack doesn’t need to be a revolution but just one small step at a time. You’ll never cover all the possible queries with a list of synonyms, so a full-text search may not find all the relevant documents. There are also some cases in which your users use different terminology than the one you have in your database.
Here’s the statement I am not feeling warm fuzzies:
Those problems are easily solvable with neural vector embeddings, and combining both approaches with an additional reranking step is possible. So you don’t need to resign from your well-known full-text search mechanism but extend it with vector search to support the queries you haven’t foreseen.
Observations:
- No problems in search when humans are seeking information are “easily solvable with shot gun marriages”.
- Finding information is no longer enough: The information or data displayed have to be [a] correct, accurate, or at least reproducible; [b] free of injected poisoned information (yep, the burden falls on the indexing engine or engines, not the user who, by definition, does not know an answer or what is needed to answer a query; and [c] the need for having access to “real time” data creates additional computational cost, which is often difficult to justify
- Basic finding and retrieval is morphing into projected outcomes or implications from the indexed data. Available technology for search and retrieval is not tuned for this requirement.
Stephen E Arnold, March 9, 2023
Take That Googzilla Because You Have One Claw in Your Digital Grave. Honest
March 8, 2023
My, my. How the “we are search experts” set have changed their tune. I am not talking about those who were terminated by the Google. I am not talking about the fawning advertising intermediaries. I am not talking about old school librarians who know how to extract information from commercial databases.
I am talking about the super clever Silicon Valley infused pundits.
Here’s an example: “Google Search Is Dying” from 2022. The write up contains one of the all-time statements from a Google wizard I have encountered. Believe me. I have noted a few over the years.
The speaker is the former champion of search engine optimization and denier of Google’s destruction of precision, recall, and relevance in search results. Here’s the statement:
You said in the post that quotes don’t give exact matches. They really do. Honest.— Google’s public search liaison (that’s a title of which to be proud)
I love it when a Googler uses the word “honest.”
Net net: The Gen X, Y’s, and Z’s perceive themselves as search experts. Okay, living in a cloud of unknowing is ubiquitous today. But “honest”?
Stephen E Arnold, March 8, 2023
Goggle Points Out the ChatGPT Has a Core Neural Disorder: LSD or Spoiled Baloney?
February 16, 2023
I am an old-fashioned dinobaby. I have a reasonably good memory for great moments in search and retrieval. I recall when Danny Sullivan told me that search engine optimization improves relevance. In 2006, Prabhakar Raghavan on a conference call with a Managing Director of a so-so financial outfit explained that Yahoo had semantic technology that made Google’s pathetic effort look like outdated technology.
Hallucinating pizza courtesy of the super smart AI app Craiyon.com. The art, not the write up it accompanies, was created by smart software. The article is the work of the dinobaby, Stephen E Arnold. Looks like pizza to me. Close enough for horseshoes like so many zippy technologies.
Now that SEO and its spawn are scrambling to find a way to fiddle with increasingly weird methods for making software return results the search engine optimization crowd’s customers demand, Google’s head of search Prabhakar Raghavan is opining about the oh, so miserable work of Open AI and its now TikTok trend ChatGPT. May I remind you, gentle reader, that OpenAI availed itself of some Googley open source smart software and consulted with some Googlers as it ramped up to the tsunami of PR ripples? May I remind you that Microsoft said, “Yo, we’re putting some OpenAI goodies in PowerPoint.” The world rejoiced and Reddit plus Twitter kicked into rave mode.
Google responded with a nifty roll out in Paris. February is not April, but maybe it should have been in April 2023, not in les temp d’hiver?
I read with considerable amusement “Google Vice President Warns That AI Chatbots Are Hallucinating.” The write up states as rock solid George Washington I cannot tell a lie truth the following:
Speaking to German newspaper Welt am Sonntag, Raghavan warned that users may be delivered complete nonsense by chatbots, despite answers seeming coherent. “This type of artificial intelligence we’re talking about can sometimes lead to something we call hallucination,” Raghavan told Welt Am Sonntag. “This is then expressed in such a way that a machine delivers a convincing but completely fictitious answer.”
LSD or just the Google code relied upon? Was it the Googlers of whom OpenAI asked questions? Was it reading the gems of wisdom in Google patent documents? Was it coincidence?
I recall that Dr. Timnit Gebru and her co-authors of the Stochastic Parrot paper suggest that life on the Google island was not palm trees and friendly natives. Nope. Disagree with the Google and your future elsewhere awaits.
Now we have the hallucination issue. The implication is that smart software like Google-infused OpenAI is addled. It imagines things. It hallucinates. It is living in a fantasy land with bean bag chairs, Foosball tables, and memories of Odwalla juice.
I wrote about the after-the-fact yip yap from Google’s Chair Person of the Board. I mentioned the Father of the Darned Internet’s post ChatGPT PR blasts. Now we have the head of search’s observation about screwed up neural networks.
Yep, someone from Verity should know about flawed software. Yep, someone from Yahoo should be familiar with using PR to mask spectacular failure in search. Yep, someone from Google is definitely in a position to suggest that smart software may be somewhat unreliable because of fundamental flaws in the systems and methods implemented at Google and probably other outfits loving the Tensor T shirts.
Stephen E Arnold, February 16, 2023
Amazing Statement about Google
January 17, 2023
I am not into Twitter. I think that intelware and policeware vendors find the Twitter content interesting. A few of them may be annoyed that the Twitter application programming interface seems go have gone on a walkabout. One of the analyses of Twitter I noted this morning (January 15, 2023, 1035 am) is “Twitter’s Latest ‘Feature’ Is How You Know Elon Musk Is in Over His Head. It’s the Cautionary Tale Every Business Needs to Hear.”
I want to skip over the Twitter palpitations and focus on one sentence:
At least, with Google, the company is good enough at what it does that you can at least squint and sort of see that when it changes its algorithm, it does it to deliver a better experience to its users–people who search for answers on Google.
What about that “at least”? Also, what do you make of the “you can at least squint and sort of see that when it [Google] changes its algorithm”? Squint to see clearly. Into Google? Hmmm. I can squint all day at a result like this and not see anything except advertising and a plug for the Google Cloud for the query online hosting:
Helpful? Sure to Google, not to this user.
Now consider the favorite Google marketing chestnut, “a better experience.” Ads and a plug for Google does not deliver to me a better experience. Compare the results for the “online hosting” query to those from www.you.com:
Google is the first result, which suggests some voodoo in the search engine optimization area. The other results point to a free hosting service, a PC Magazine review article (which is often an interesting editorial method to talk about) and an outfit called Online Hosting Solution.
Which is better? Google’s ads and self promotion or the new You.com pointer to Google and some sort of relevant links?
Now let’s run the query “online hosting” on Yandex.com (not the Russian language version). Here’s what I get:
Note that the first link is to a particular vendor with no ad label slapped on the link. The other links are to listicle articles which present a group of hosting companies for the person running the query to consider.
Of the three services, which requires the “squint” test. I suppose one can squint at the Google result and conclude that it is just wonderful, just not for objective results. The You.com results are a random list of mostly relevant links. But that top hit pointing at Google Cloud makes me suspicious. Why Google? Why not Amazon AWS, Microsoft Azure, the fascinating Epik.com, or another vendor?
In this set of three, Yandex.com strikes me as delivering cleaner, more on point results. Your mileage may vary.
In my experience, systems which deliver answers are a quest. Most of the systems to which I have been exposed seem the digital equivalent of a ride with Don Quixote. The windmills of relevance remain at risk.
Stephen E Arnold, January 17, 2023
Semantic Search for arXiv Papers
January 12, 2023
An artificial intelligence research engineer named Tom Tumiel (InstaDeep) created a Web site called arXivxplorer.com.
According to his Twitter message (posted on January 7, 2023), the system is a “semantic search engine.” The service implements OpenAI’s embedding model. The idea is that this search method allows a user to “find the most relevant papers.” There is a stream of tweets at this link about the service. Mr. Tumiel states:
I’ve even discovered a few interesting papers I hadn’t seen before using traditional search tools like Google or arXiv’s own search function or even from the ML twitter hive mind… One can search for similar or “more like this” papers by “pasting the arXiv url directly” in the search box or “click the More Like This” button.
I ran several test queries, including this one: “Google Eigenvector.” The system surfaced generally useful papers, including one from January 2022. However, when I included the date 2023 in the search string, arXiv Xplorer did not return a null set. The system displayed hits which did not include the date.
Several quick observations:
- The system seems to be “time blind,” which is a common feature of modern search systems
- The system provides the abstract when one clicks on a link. The “view” button in the pop up displays the PDF
- Comparing result sets from the query with additional search terms surfaces papers reduces the result set size, a refreshing change from queries which display “infinite scrolling” of irrelevant documents.
For those interested in academic or research papers, will OpenAI become aware of the value of dates, limiting queries to endnotes, and displaying a relationship map among topics or authors in a manner similar to Maltego? By combining more search controls with the OpenAI content and query processing, the service might leapfrog the Lucene/Solr type methods. I think that would be a good thing.
Will the implementation of this system add to Google’s search anxiety? My hunch is that Google is not sure what causes the Google system to perturb ate. It may well be that the twitching, the sudden changes in direction, and the coverage of OpenAI itself in blogs may be the equivalent of tremors, soft speaking, and managerial dizziness. Oh, my, that sounds serious.
Stephen E Arnold, January 12, 2022