Amazon Is Winning the Product Search Derby… for Now
July 12, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Google cannot be happy about these numbers. We learn from a piece at Search Engine Land that now “50% of Product Searches Start on Amazon.” That is even worse for the competition than previously predicted. In fact, Google’s share of this market has slipped to less than a third at 31.5%. What’s Google’s solution to this click loss? Higher ad pricing? Or maybe an even higher ad-to-real content ratio?
The search racers are struggling to win traffic related to products. What has Amazon accomplished? Has Google’s vehicle lost power? What about Microsoft, a company whose engine is Bing-ing?
We also learn just 14% of respondents start their searches at retail or brand websites, while social media and review sites each capture a measly 2%. But that could change as Generation Z continues to age into independent shoppers. That group is the most likely to launch searches from social media. They are also most inclined to check online reviews. Reviews with photos are especially influential. Writer Danny Goodwin cites a recent Pew survey as he writes:
“Reviews and ratings can make or break a sale more than any other factor, including product price, free shipping, free returns and exchanges, and more. Overall, 77% of respondents said they specifically seek out websites with reviews – and this number was even higher for Gen Z (87%) and millennials (81%). Ratings without accompanying reviews are considered untrustworthy by 56% of survey respondents. Where people read reviews and ratings:
- Amazon: 94%
- Retail websites (e.g., Target, Wal-Mart): 91%
- Search engines: 70%
- Brand websites (the brand that manufactures the product: 68%
- Independent review sites: 40%
User-generated photos and videos gain value. Sixty percent of consumers looked at user-generated images or videos when learning about new products.
- 77% of respondents said they trust customer photos and videos.
- 53% said user-generated photos and videos from previous customers impacted their decision whether to purchase a product.”
So there you have it—if you have a product to market online, best encourage reviews. With pics, or it didn’t happen. Videos are a significant marketing factor. What happens if Zuck’s Threads pushes into product search, effectively linking text promotions with Instagram? And the Google? Let’s ask Bard?
Cynthia Murrell, July 12, 2023
Scinapse Is A Free Academic-Centric Database
July 11, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Quality academic worthy databases are difficult to locate outside of libraries and schools. Google Scholar attempted to qualify as an alternative to paywalled databases, but it returns repetitive and inaccurate results. Thanks to AI algorithms, free databases improved, such as Scinapse.
Scinapse is designed by Pluto and it is advertised as the “researcher’s favorite search engine. Scinapse delivers accurate and updated research materials in each search. Many free databases pull their results from old citations and fail to include recent publications. Pluto promises Scinapse delivers high-performing results due to its original algorithm optimized for research.
The algorithm returns research materials based on when it was published, how many times it was citied, and how impactful a paper was in notable journals. Scinapse consistently delivers results that are better than Google Scholar. Each search item includes a complete citation for quick reference. The customized filters offer the typical ways to narrow or broaden results, including journal, field of study, conference, author, publication year, and more.
People can also create an account to organize their research in reading lists, share with other scholars, or export as a citation list. Perhaps the most innovative feature is the paper recommendations where Scinapse sends paper citations that align with research. Scinapse aggregates over 48,000 journals. There are users in 196 countries and 1,130 reputable affiliations. Scinapse’s data sources include Microsoft Research, PubMed, Semantic Scholar, and Springer Nature.
Whitney Grace, July 11, 2023
In the Midst of Info Chaos, a Path Identified and Explained
July 10, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
The Thread – Twitter spat in the midst of BlueSky and Mastodon mark a modest change in having one place to go for current information. How does one maintain awareness with high school taunts awing, Mastodon explaining how easy it is to use, and BlueSky doing its deep gaze thing?
One answer and a quite good one at that appears in “RSS for Post-Twitter News and Web Monitoring.” The author knows quite a bit about finding information, and she also has the wisdom to address me as “dinobaby.” I know a GenZ when I get an email that begins, “Hey, there.” Trust me. That salutation does not work as the author expects.
In the cited article, you will get useful information about newsfeeds, screenshots, and practical advice. Here’s an example of what’s in the excellent how to:
If you want to check a site for RSS feeds and you think it might be a WordPress site, just add /feed/ to the end of the domain name. You might get a 404 error, but you also might get a page full of information!
There are more tips. Just navigate to Research Buzz, and learn.
This dinobaby awards one swish of its tail to Tara Calishain. Swish.
Stephen E Arnold, July 10, 2023
Neeva: Is This Google Killer on the Run?
May 18, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Sometimes I think it is 2007 doing the déjà vu dance. I read “Report: Snowflake Is in Advanced Talks to Acquire Search Startup Neeva.” Founded by Xooglers, Neeva was positioned to revolutionize search and generate subscription revenue. Along the highway to the pot of gold, Neeva would deliver on point results. How did that pay for search model work out?
According to the article:
Snowflake Inc., the cloud-based data warehouse provider, is reportedly in advanced talks to acquire a search startup called Neeva Inc. that was founded by former Google LLC advertising executive Sridhar Ramaswamy.
Like every other content processing company I bump into, Neeva was doing smart software. Combine the relevance angle with generative AI and what do you get? A start up that is going to be acquired by a firm with some interesting ideas about how to use search and retrieval to make life better.
Are there other search outfits with a similar business model? Sure, Kagi comes to mind. I used to keep track of start ups which had technology that would provide relevant results to users and a big payday to the investors. Do these names ring a bell?
Cluuz
Deepset
Glean
Kyndi
Siderian
Umiboza
If the Snowflake Neeva deal comes to fruition, will it follow the trajectory of IBM Vivisimo. Vivisimo disappeared as an entity and morphed into a big data component. No problem. But Vivisimo was a metasearch and on-the-fly tagging system. Will the tie up be similar to the Microsoft acquisition of Fast Search & Transfer. Fast still lives but I don’t know too many Softies who know about the backstory. Then there is the HP Autonomy deal. The acquisition is still playing out in the legal eagle sauna.
Few care about the nuances of search and retrieval. Those seemingly irrelevant details can have interesting consequences. Some are okay like the Dassault Exalead deal. Others? Less okay.
Stephen E Arnold, May 18, 2023
Am I a Moron Because I Use You.com?
May 10, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“Only Morons Use ChatGPT As a Substitute for Google” is a declarative statement. Three words strike me as important in the title of the Lifehacker (an online publication).
First, “morons.” A moron according to TheFreedictionary.com citation is: A city in Eastern Argentina although it has the accented ó. On to the next definition which is “A person who is considered foolish or stupid.” I think this is closer to the mark. I am not comfortable invoking the third definition because it aims denotative punch a a person with a person having a mental age of from seven to 12. I am 78, so let’s go with “foolish or stupid.” I am in that set.
Second, “ChatGPT.” I think the moniker can apply specifically to the for-fee service of OpenAI. It is possible that “ChatGPT” stands for an entire class of generative software. I tried to make a list of a who’s who in generative software and abandoned the task. Quite a few companies are in the game either directly like the aforementioned OpenAI or a bandwagon of companies joyfully tallied by ProductWatch.com and a few LinkedIn contributors. I think the idea is that ChatGPT outputs content which is either derivative (a characteristic of a machine eating other people’s words and images) or hallucinatory (a feature of software which can go off the rails and output like a digital Lewis Carroll galumphing around a park in which young females frolic).
Third, “Google.” My hunch is that the author is an expert online searcher who like many open source intelligence professionals rely on the advertising-supported Google search for objective, on-point answers. Oh, my, that’s quite a reliable source of information. I want to point out that Google focuses on revenue-generation from advertising. Accuracy of results often has little connection to the user’s query. My interpretation of the word “Google” is that Google is good, probably better than “ChatGPT” in providing answers designed to meet the needs of users who may not read above the 9th grade level, struggle with derivatives, and cannot name the capital of Tasmania. (It is Hobart, by the way.)
I am on the fence with the word “only.” I am not comfortable with categorical affirmatives. Given the context of the article and the fact that Google is the Web search engine of choice (conscious or manipulated) for 90 percent of people in North America and Western Europe, I can understand why the field of view is narrow. An expert with regard to Google knows more and more about less and less.
Why is ChatGPT presented as the yan to Google’s yang? The write up says:
If you can’t fact-check the bot, the answers it gives are useless.
Okay, so people fact-check Google outputs. What does Google present with regard to Larry Page and his alleged interactions with Mr. Epstein, the late Mr. Epstein I should say. Run this query Google Page Epstein and this is what the system presented to me on May 6, 2023:
Are these stories accurate? Is CNBC the most relevant story about a lawsuit involving JPMorgan Chase and Jeffrey Epstein? What about the New York Post article “Virgin Islands Struggles to Subpoena Google’s Larry Page in Jeffrey Epstein Lawsuit against JPMorgan.” Or the tech insider outfit Gizmodo which offers “Google’s Larry Page Could Be Served in Jeffrey Epstein Case.” There are citations to the British newspaper Telegraph and one from Law & Crime. There is a Yahoo story and a Daily Beast article.
Now what does You.com offer me when I query Google Page Epstein?
There have been recent news articles reporting that the US Virgin Islands has issued subpoenas to Google co-founder Larry Page in connection with a lawsuit against JPMorgan Chase over the Jeffrey Epstein sex trafficking case 1. The lawsuit alleges that JPMorgan Chase facilitated Epstein’s trafficking scheme by continuously offering financial support to Epstein despite knowing about the illegal activities conducted at Little St. James, a private island owned by Epstein. The subpoenas seek documents and communications related to JPMorgan’s banking relationship with Epstein. Sergey Brin, another Google co-founder, has also reportedly been subpoenaed. Despite the subpoenas, it appears that the US Virgin Islands is struggling to serve Page with the legal documents. A recent ruling allows Page to be served with the subpoenas via his attorney in California. cnbc1
I would suggest that the Google citations provide a list, no indication of which source is more or less highly regarded for accuracy. Google wants me to click on one or more of the links, ingest the content of each article, and then synthesize the items of information which strike me as on the money. You.com on the other hand provides me with the bare bones of the alleged involvement with a person who like Lewis Carroll may have had an interest in hanging out around a park on a sunny Saturday afternoon. Catching some rays and perhaps coming up with new ideas are interpretations of such as action by a lawyer hired to explain the late and much lamented Mr. Epstein.
So which is it? The harvesting of buckwheat the old-fashioned way or the pellet of information spat out in a second or two?
I think the idea is that morons are going to go the ChatGPT-like route. Wizards and authors of online “real” news articles want to swing that sickle and relive the thrill of the workers in Vincent van Gogh’s “The Harvest.”
The article says:
you can’t tell whether an AI-generated fact is true or not by the way the text looks; it’s designed to look plausible and correct. You have to fact-check it.
Does one need to fact-check what Google spits out? What about the people who follow Google Maps’s instructions and drive off a cliff? What about the links in Google Scholar to papers with non-reproducible results?
Here’s the conclusion to the write up:
So if you want to use ChatGPT to get ideas or brainstorm places to look for more information, fine. But don’t expect it to base its answers on reality. Even for something as innocuous as recommending books based on your favorites, it’s likely to make up books that don’t even exist.
I like that “don’t even exist.” Google Bard would never do that. Google management would never fire a smart software executive who points out that Google’s smart software is biased. Google would never provide search results that explain how to steal copyright protected software. Well, maybe just one time like this:
Oh, no. Wonky software would never ever do that but for Google’s results via YouTube for the query “Magix Vegas crack.” Now who is a moron? Perhaps an apologist for Google?
Stephen E Arnold, May 10, 2023
Divorcing the Google: Legal Eagles Experience a Frisson of Anticipation
April 24, 2023
No smart software has been used to create this dinobaby’s blog post.
I have poked around looking for a version or copy of the contract Samsung signed with Google for the firms’ mobile phone tie up. Based on what I have heard at conferences and read on the Internet (of course, I believe everything I read on the Internet, don’t you?), it appears that there are several major deals.
The first is the use of and access to the mindlessly fragmented Android mobile phone software. Samsung can do some innovating, but the Google is into providing “great experiences.” Why would a mobile phone maker like Samsung allow a user to manage contacts and block mobile calls without implementing a modern day hunt for gold near Placer.
The second is the “suggestion” — mind you, the suggestion is nothing more than a gentle nudge — to keep that largely-malware-free Google Play Store front and center.
The third is the default search engine. Buy a Samsung get Google Search.
Now you know why the legal eagles a shivering when they think of litigation to redo the Google – Samsun deal. For those who think the misinformation zipping around about Microsoft Bing displacing Google Search, my thought would be to ask yourself, “Who gains by pumping out this type of disinformation?” One answer is big Chinese mobile phone manufacturers. This is Art of War stuff, and I won’t dwell on this. What about Microsoft? Maybe but I like to think happy thoughts about Microsoft. I say, “No one at Microsoft would engage in disinformation intended to make life difficult for the online advertising king. Another possibility is Silicon Valley type journalists who pick up rumors, amplify them, and then comment that Samsung is kicking the tires of Bing with ChatGPT. Suddenly a “real” news outfit emits the Samsung rumor. Exciting for the legal eagles.
The write up “Samsung Can’t Dump Google for Bing As the Default Search Engine on Its Phones” does a good job of explaining the contours of a Google – Samsung tie up.
Several observations:
First, the alleged Samsung search replacement provides a glimpse of how certain information can move from whispers at conferences to headlines.
Second, I would not bet against lawyers. With enough money, contracts can be nullified, transformed, or left alone. The only option which disappoints attorneys is the one that lets sleeping dogs lie.
Third, the growing upswell of anti-Google sentiment is noticeable. That may be a far larger problem for Googzilla than rumors about Samsung. Perceptions can be quite real, and they translate into impacts. I am tempted to quote William James, but I won’t.
Net net: If Samsung wants to swizzle a deal with an entity other than the Google, the lawyers may vibrate with such frequency that a feather or two may fall off.
Stephen E Arnold, April 24, 2023
Useful Scholarly / Semi-Scholarly Research System with Deduplicated Results
March 24, 2023
I was delighted to receive a link to OpenAIRE Explore. The service is sponsored by a non-profit partnership established in 2018 as a legal outfit. The objective is to “ensure a permanent open scholarly communication infrastructure to support European research.” (I am not sure whoever wrote the description has read “Book Publishers Won’t Stop Until Libraries Are Dead.)
The specific service I found interesting is Explore located at https://explore.openaire.eu. The service is described by OpenAIRE this way:
A comprehensive and open dataset of research information covering 161m publications, 58m research data, 317k research software items, from 124k data sources, linked to 3m grants and 196k organizations.
Maybe looking at that TechDirt article will be useful.
I ran a number of queries. The probably unreadable screenshot below illustrates the nice interface and the results to my query for Hopf fibrations (if this query doesn’t make sense to you, there’s not much I can do. Perhaps OpenAIRE Explore is ill-suited to queries about Taylor Swift and Ticketmaster?):
The query returned 127 “hits” and identified four organizations as having people interested in the subject. (Hopf fibrations are quite important, in my opinion.) No ads, no crazy SEO baloney, but probably some non-error checked equations. Plus, the result set was deduplicated. Imagine that. A use Vivisimo-type function available again.
Observation: Some professional publishers are likely to find the service objectionable. Four of the giants are watching their legal eagles circle the hapless Internet Archive. But soon… maybe OpenAIRE will attract some scrutiny.
For now, OpenAIRE Explore is indeed useful.
Stephen E Arnold, March 24, 2023
20 Years Ago: Primus Knowledge Solutions
March 20, 2023
Note: Written by a real-live dinobaby. No smart software involved.
I am not criticizing Primus Knowledge Solutions (acquired by ATG in 2004 and then Oracle purchased ATG in 2011). I would ask that you read this text and consider what was marketed in 2003. The source is a description of Primus’ Answer Engine which was once located at dub dub dub primus.com/products/answerEngine:
Primus Answer Engine helps companies take full advantage of the valuable content that already exists in corporate documents and databases. Using proprietary natural language processing, Answer Engine delivers quick, relevant answers to plain English questions by bringing widespread corporate knowledge to support, agents, as well as to customers, partners, and employees via the web.
What “features” did the system provide two decades ago? The fact sheet I picked up at a search conference in 2003 told me:
- Natural language processing
- Scalability
- Database integration
- All major document types
- Insightful reporting
- Customizable interface
- Centralized administration.
The system can suggest questions and interprets these or other questions and returns a list of answers found in a company’s online documents. This allows users to view the answer in context if desired.
I mention Primus because it is one example from dozens in my files about NLP technology.
Several observations/questions:
- Where is Oracle in the ChatGPT derby? May I suggest this link for starters.
- Isn’t the principal difference between Primus and other NLP “smart software” users are chasing ChatGPT type systems, not innovators outputting marketing words?
- Are issues like updating training models and their content, biases in the models themselves, and the challenge of accurate, current data enjoying the 2003 naïveté?
Net net: ChatGPT is just one manifestation of innovators’ attempts to deal with the challenge of finding accurate, on-point, and timely information in the digital world. (This is a world I call the datasphere.)
Stephen E Arnold, March 20, 2023
Elasticsearch Guide: More of a Cheat Sheet
March 15, 2023
Elasticsearch has been a go-to solution for searching content either via the open source version or the Elastic technical support option. The system works, and it has many followers and enthusiasts. As a result, one can locate “help” easily online for many hitches in the git along.
I found the information in “Unlocking the Power of Elasticsearch: A Comprehensive Guide to Complex Search Use Cases.” I would suggest that the write up is more like a cheat sheet. Encounter a specific task, check the “Guide,” and sally forth.
I would suggest that many real-life enterprise search needs are often difficult to solve. Examples range from capturing data on a sales professional’s laptop before the colleague deletes the slide dek with the revised price quotation data. No search engine on the planet can get this important information to the legal department if the project goes off the rails. “I can’t find it” is not a helpful answer.
Similar challenges arise when the Elasticsearch system must interact with a line item for a product specified in a purchase order which has a corresponding engineering drawing. Line up the chemical, civil, mechanical, and nuclear engineers and tell them, “Well, that’s an object embedded in the what-do-you-call-it software I never heard of.” Yeah.
Nevertheless, for some helpful tips give the free guide a look.
The mantra is, “Search is easy. Search is a solved problem. Search is no big deal.” Convince yourself. Keep in mind that the mantra does not ring true to me nor does it make me calm.
Stephen E Arnold, March 15, 2023
Hybrid Search: A Gentle Way of Saying “One Size Fits All” Search Like the Google Provides Is Not Going to Work for Some
March 9, 2023
“On Hybrid Search” is a content marketing-type report. That’s okay. I found the information useful. What causes me to highlight this post by Qdrant is that one implicit message is: Google’s approach to search is lousy because it is aiming at the lowest common denominator of retrieval while preserving its relevance eroding online ad matching business.
The guts of the write up walks through old school and sort of new school approaches to matching processed content with a query. Keep in mind that most of the technology mentioned in the write up is “old” in the sense that it’s been around for a half decade or more. The “new” technology is about ready to hop on a bike with training wheels and head to the swimming pool. (Yes, there is some risk there I suggest.)
But here’s the key statement in the report for me:
Each search scenario requires a specialized tool to achieve the best results possible. Still, combining multiple tools with minimal overhead is possible to improve the search precision even further. Introducing vector search into an existing search stack doesn’t need to be a revolution but just one small step at a time. You’ll never cover all the possible queries with a list of synonyms, so a full-text search may not find all the relevant documents. There are also some cases in which your users use different terminology than the one you have in your database.
Here’s the statement I am not feeling warm fuzzies:
Those problems are easily solvable with neural vector embeddings, and combining both approaches with an additional reranking step is possible. So you don’t need to resign from your well-known full-text search mechanism but extend it with vector search to support the queries you haven’t foreseen.
Observations:
- No problems in search when humans are seeking information are “easily solvable with shot gun marriages”.
- Finding information is no longer enough: The information or data displayed have to be [a] correct, accurate, or at least reproducible; [b] free of injected poisoned information (yep, the burden falls on the indexing engine or engines, not the user who, by definition, does not know an answer or what is needed to answer a query; and [c] the need for having access to “real time” data creates additional computational cost, which is often difficult to justify
- Basic finding and retrieval is morphing into projected outcomes or implications from the indexed data. Available technology for search and retrieval is not tuned for this requirement.
Stephen E Arnold, March 9, 2023