How Do You Foster Echo Bias, Not Fake, But Real?
January 24, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Research is supposed to lead to the truth. It used to when research was limited and controlled by publishers, news bureaus, and other venues. The Internet liberated information access but it also unleashed a torrid of lies. When these lies are stacked and manipulated by algorithms, they become powerful and near factual. Nieman Labs relates how a new study shows the power of confirmation in, “Asking People ‘To Do The Research’ On Fake News Stories Makes Them Seem More Believable, Not Less.”
Nature reported on a paper by Kevin Aslett, Zeve Sanderson, William Godel, Nathaniel Persily, Jonathan Nagler, and Joshua A. Tucker. The paper abstract includes the following:
Here, across five experiments, we present consistent evidence that online search to evaluate the truthfulness of false news articles actually increases the probability of believing them. To shed light on this relationship, we combine survey data with digital trace data collected using a custom browser extension. We find that the search effect is concentrated among individuals for whom search engines return lower-quality information. Our results indicate that those who search online to evaluate misinformation risk falling into data voids, or informational spaces in which there is corroborating evidence from low-quality sources. We also find consistent evidence that searching online to evaluate news increases belief in true news from low-quality sources, but inconsistent evidence that it increases belief in true news from mainstream sources. Our findings highlight the need for media literacy programs to ground their recommendations in empirically tested strategies and for search engines to invest in solutions to the challenges identified here.”
All of the tests were similar in that they asked participants to evaluate news articles that had been rated “false or misleading” by professional fact checkers. They were asked to read the articles, research and evaluate the stories online, and decide if the fact checkers were correct. Controls were participants who were asked not to research stories.
The tests revealed that searching online increase misinformation belief. The fifth test in the study explained that exposure to lower-quality information in search results increased the probability of believing in false news.
The culprit for bad search engine results is data voids akin to rabbit holes of misinformation paired with SEO techniques to manipulate people. People with higher media literacy skills know how to better use search engines like Google to evaluate news. Poor media literacy people don’t know how to alter their search queries. Usually they type in a headline and their results are filled with junk.
What do we do? We need to revamp media literacy, force search engines to limit number of paid links at the top of results, and stop chasing sensationalism.
Whitney Grace, January 24, 2024
Kagi For-Fee Search: Comments from a Thread about Search
January 2, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Comparisons of search engine performance are quite difficult to design, run, and analyze. In the good old days when commercial databases reigned supreme, special librarians could select queries, refine them, and then run those queries via Dialog, LexisNexis, DataStar, or another commercial search engine. Examination of the results were tabulated and hard copy print outs on thermal paper were examined. The process required knowledge of the search syntax, expertise in query shaping, and the knowledge in the minds of the special librarians performing the analysis. Others were involved, but the work focused on determining overlap among databases, analysis of relevance (human and mathematical), and expertise gained from work in the commercial database sector, academic training, and work in a special library.
Who does that now? Answer: No one. With this prefatory statement, let’s turn our attention to “How Bad Are Search Results? Let’s Compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT.” Please, read the write up. The guts of the analysis, in my opinion, appear in this table:
The point is that search sucks. Let’s move on. The most interesting outcome from this write up from my vantage point is the comments in the Hacker News post. What I want to do is highlight observations about Kagi.com, a pay-to-use Web search service. The items I have selected make clear why starting and building sustainable revenue from Web search is difficult. Furthermore, the comments I have selected make clear that without an editorial policy, specific information about the corpus, its updating, and content acquisition method — evaluation is difficult.
Long gone are the days of precision and recall, and I am not sure most of today’s users know or care. I still do, but I am a dinobaby and one of those people who created an early search-related service called The Point (Top 5% of the Internet), the Auto Channel, and a number of other long-forgotten Web sites that delivered some type of findability. Why am I roadkill on the information highway? No one knows or cares about the complexity of finding information in print or online. Sure, some graduate students do, but are you aware that the modern academic just makes up or steals other information; for instance, the former president of Stanford University.l
Okay, here are some comments about Kagi.com from Hacker News. (Please, don’t write me and complain that I am unfair. I am just doing what dinobabies with graduate degrees do — Provide footnotes)
hannasanario: I’m not able to reproduce the author’s bad results in Kagi, at all. What I’m seeing when searching the same terms is fantastic in comparison. I don’t know what went wrong there. Dinobaby comment: Search results, in the absence of editorial policies and other basic information about valid syntax means subjectivity is the guiding light. Remember that soap operas were once sponsored influencer content.
Semaphor: This whole thread made me finally create a file for documenting bad searches on Kagi. The issue for me is usually that they drop very important search terms from the query and give me unrelated results. Dinobaby comment: Yep, editorial functions in results, and these are often surprises. But when people know zero about a topic, who cares? Not most users.
Szundi: Kagi is awesome for me too. I just realize using Google somewhere else because of the sh&t results. Dinobaby comment: Ah, the Google good enough approach is evident in this comment. But it is subjective, merely an opinion. Just ask a stockholder. Google delivers, kiddo.
Mrweasel: Currently Kagi is just as dependent on Google as DuckDuckGo is on Bing. Dinobaby comment: Perhaps Kagi is not communicating where content originates, how results are generated, and why information strikes Mrweasel as “dependent on Google. Neeva was an outfit that wanted to go beyond Google and ended up, after marketing hoo hah selling itself to some entity.
Fenaro: Kagi should hire the Marginalia author. Dinobaby comment: Staffing suggestions are interesting but disconnected from reality in my opinion.
ed109685: Kagi works because there is no incentive for SEO manipulators to target it since their market share is so small. Dinobaby comment: Ouch, small.
shado: I became a huge fan of Kagi after seeing it on hacker news too. It’s amazing how good a search engine can be when it’s not full of ads. Dinobaby comment: A happy customer but no hard data or examples. Subjectivity in full blossom.
yashasolutions: Kagi is great… So I switch recently to Kagi, and so far it’s been smooth sailing and a real time saver. Dinobaby comment: Score another happy, paying customer for Kagi.
innocentoldguy: I like Kagi and rarely use anything else. Kagi’s results are decent and I can blacklist sites like Amazon.com so they never show up in my search results. Dionobaby comment: Another dinobaby who is an expert about search.
What does this selection of Kagi-related comments reveal about Web search? Here’s snapshot of my notes:
- Kagi is not marketing its features and benefits particularly well, but what search engine is? With Google sucking in more than 90 percent of the query action, big bucks are required to get the message out. This means that subscriptions may be tough to sell because marketing is expensive and people sign up, then cancel.
- There is quite a bit of misunderstanding among “expert” searchers like the denizens of Hacker News. The nuances of a Web search, money supported content, metasearch, result matching, etc. make search a great big cloud of unknowing for most users.
- The absence of reproducible results illustrates what happens when consumerization of search and retrieval becomes the benchmark. The pursuit of good enough results in loss of finding functionality and expertise.
Net net: Search sucks. Oh, wait, I used that phrase in an article for Barbara Quint 35 years ago.
PS. Mwmbl is at https://mwmbl.or in case you are not familiar with the open source, non profit search engine. You have to register, well, because…
Stephen E Arnold, January 2, 2024
A Dinobaby Misses Out on the Hot Searches of 2023
December 28, 2023
This essay is the work of a dumb dinobaby. No smart software required.
I looked at “Year in Search 2023.” I was surprised at how out of the flow of consumer information I was. “Out of the flow” does not not capture my reaction to the lists of the news topics, dead people, and songs I was. Do you know much about Bizarrap? I don’t. More to the point, I have never heard of the obviously world-class musician.
Several observations:
First, when people tell me that Google search is great, I have to recalibrate my internal yardsticks to embrace queries for entities unrelated to my microcosm of information. When I assert that Google search sucks, I am looking for information absolutely positively irrelevant to those seeking insight into most of the Google top of the search charts. No wonder Google sucks for me. Google is keeping pace with maps of sports stadia.
Second, as I reviewed these top searches, I asked myself, “What’s the correlation between advertisers’ spend and the results on these lists? My idea is that a weird quantum linkage exists in a world inhabited by incentivized programmers, advertisers, and the individuals who want information about shirts. Its the game rigged? My hunch is, “Yep.” Spooky action at a distance I suppose.
Third, from the lists substantive topics are rare birds. Who is looking for information about artificial intelligence, precision and recall in search, or new approaches to solving matrix math problems? The answer, if the Google data are accurate and not a come on to advertisers, is almost no one.
As a dinobaby, I am going to feel more comfortable in my isolated chamber in a cave of what I find interesting. For 2024, I have steeled myself to exist without any interest in Ginny & Georgia, FIFTY FIFTY, or papeda.
I like being a dinobaby. I really do.
Stephen E Arnold, December 28, 2023
Why Google Dorks Exist and Why Most Users Do Not Know Why They Are Needed
December 4, 2023
This essay is the work of a dumb dinobaby. No smart software required.
Many people in my lectures are not familiar with the concept of “dorks”. No, not the human variety. I am referencing the concept of a “Google dork.” If you do a quick search using Yandex.com, you will get pointers to different “Google dorks.” Click on one of the links and you will find information you can use to retrieve more precise and relevant information from the Google ad-supported Web search system.
Here’s what QDORKS.com looks like:
The idea is that one plugs in search terms and uses the pull down boxes to enter specific commands to point the ad-centric system at something more closely resembling a relevant result. Other interfaces are available; for example, the “1000 Best Google Dorks List." You get a laundry list of tips,commands, and ideas for wrestling Googzilla to the ground, twisting its tail, and (hopefully) yield relevant information. Hopefully. Good work.
Most people are lousy at pinning the tail on the relevance donkey. Therefore, let someone who knows define relevance for the happy people. Thanks, MSFT Copilot. Nice animal with map pins.
Why are Google Dorks or similar guides to Google search necessary? Here are three reasons:
- Precision reduces the opportunities for displaying allegedly relevant advertising. Semantic relaxation allows the Google to suggest that it is using Oingo type methods to find mathematically determined relationships. The idea is that razzle dazzle makes ad blasting something like an ugly baby wrapped in translucent fabric on a foggy day look really great.
- When Larry Page argued with me at a search engine meeting about truncation, he displayed a preconceived notion about how search should work for those not at Google or attending a specialist conference about search. Rational? To him, yep. Logical? To his framing of the search problem, the stance makes perfect sense if one discards the notion of tense, plurals, inflections, and stupid markers like “im” as in “impractical” and “non” as in “nonsense.” Hey, Larry had the answer. Live with it.
- The goal at the Google is to make search as intellectually easy for the “user” as possible. The idea was to suggest what the user intended. Also, Google had the old idea that a person’s past behavior can predict that person’s behavior now. Well, predict in the sense that “good enough” will do the job for vast majority of search-blind users who look for the short cut or the most convenient way to get information.
Why? Control, being clever, and then selling the dream of clicks for advertisers. Over the years, Google leveraged its information framing power to a position of control. I want to point out that most people, including many Googlers, cannot perceive. When pointed out, those individuals refuse to believe that Google does [a] NOT index the full universe of digital data, [b] NOT want to fool around with users who prefer Boolean algebra, content curation to identify the best or most useful content, and [c] fiddle around with training people to become effective searchers of online information. Obfuscation, verbal legerdemain, and the “do no evil” craziness make the railroad run the way Cornelius Vanderbilt-types implemented.
I read this morning (December 4, 2023) the Google blog post called “New Ways to Find Just What You Need on Search.” The main point of the write up in my opinion is:
Search will never be a solved problem; it continues to evolve and improve alongside our world and the web.
I agree, but it would be great if the known search and retrieval functions were available to users. Instead, we have a weird Google Mom approach. From the write up:
To help you more easily keep up with searches or topics you come back to a lot, or want to learn more about, we’re introducing the ability to follow exactly what you’re interested in.
Okay, user tracking, stored queries, and alerts. How does the Google know what you want? The answer is that users log in, use Google services, and enter queries which are automatically converted to search. You will have answers to questions you really care about.
There are other search functions available in the most recent version of Google’s attempts to deal with an unsolved problem:
As with all information on Search, our systems will look to show the most helpful, relevant and reliable information possible when you follow a topic.
Yep, Google is a helicopter parent. Mom will know what’s best, select it, and present it. Don’t like it? Mom will be recalcitrant, like shaping search results to meet what the probabilistic system says, “Take your medicine, you brat.” Who said, “Mother Google is a nice mom”? Definitely not me.
And Google will make search more social. Shades of Dr. Alon Halevy and the heirs of Orkut. The Google wants to bring people together. Social signals make sense to Google. Yep, content without Google ads must be conquered. Let’s hope the Google incentive plans encourage the behavior, or those valiant programmers will be bystanders to other Googlers’ promotions and accompanying money deliveries.
Net net: Finding relevant, on point, accurate information is more difficult today than at any other point in the 50+ year work career. How does the cloud of unknowing dissipate? I have no idea. I think it has moved in on tiny Googzilla feet and sits looking over the harbor, ready to pounce on any creature that challenges the status quo.
PS. Corny Vanderbilt was an amateur compared to the Google. He did trains; Google does information.
Stephen E Arnold, December 4, 2023
Using Smart Software to Make Google Search Less Awful
November 16, 2023
This essay is the work of a dumb humanoid. No smart software required.
Here’s a quick tip: to get useful results from Google Search, use a competitor’s software. Digital Digging blogger Henk van Ess describes “How to Teach ChatGPT to Come Up with Google Formulas.” Specifically, Ess needed to include foreign-language results in his queries while narrowing results to certain time frames. These are not parameters Google handles well on its own. It was Chat GPT to the rescue—after some tinkering, anyway. He describes an example search goal:
“Find any official document about carbon dioxide reduction from Greek companies, anything from March 24, 2020 to December 21, 2020 will do. Hey, can you search that in Greek, please? Tough question right? Time to fire up Bing or ChatGPT. Round 1 in #chatgpt has a terrible outcome.”
But of course, Hess did not stop there. For the technical details on the resulting “ball of yarn,” how Hess resolved it, and how it can be extrapolated to other use cases, navigate to the write-up. One must bother to learn how to write effective prompts to get these results, but Hess insists it is worth the effort. The post observes:
“The good news is: you only have to do it once for each of your favorite queries. Set and forget, as you just saw I used the same formulae for Greek CO2 and Japanese EV’s. The advantage of natural language processing tools like ChatGPT is that they can help you generate more accurate and relevant search queries in a faster and more efficient way than manually typing in long and complex queries into search engines like Google. By using natural language processing tools to refine and optimize your search queries, you can avoid falling into ‘rabbit holes’ of irrelevant or inaccurate results and get the information you need more quickly and easily.”
Google is currently rolling out its own AI search “experience” in phases around the world. Will it improve results, or will one still be better off employing third-party hacks?
Cynthia Murrell, November 16, 2023
Google: Slip Slidin Away? Not Yet. Defaults Work
November 14, 2023
This essay is the work of a dumb humanoid. No smart software required.
I spotted a short item in the online information service called Quartz. The story had a click magnet title, and it worked for me. “Is This the Beginning of the End of Google’s Dominance in Search?” asks a rhetorical question without providing much of an answer. The write up states:
The tech giant’s market share is being challenged by an increasingly crowded field
I am not sure what this statement means. I noticed during the week of November 6, 2023, that the search system 50kft.com stopped working. Is the service dead? Is it experiencing technical problems? No one knows. I also checked Newslookup.com. That service remains stuck in the past. And Blogsurf.io seems to be a goner. I am not sure where the renaissance in Web search is. Is there a digital Florence, Italy, I have overlooked?
A search expert lounging in the hammock of habit. Thanks, Microsoft Bing. You do understand some concepts like laziness when it comes to changing search defaults, don’t you?
The write up continues:
Google has been the world’s most popular search engine since its launch in 1997. In October, it was holding a market share of 91.6%, according to web analytics tracker StatCounter. That’s down nearly 80 basis points from a year before, though a relatively small dent considering OpenAI’s ChatGPT was introduced late last year.
And what’s number two? How about Bing with a market share of 3.1 percent according to the numbers in the article.
Some people know that Google has spent big bucks to become the default search engine in places that matter. What few appreciate is that being a default is the equivalent of finding oneself in a comfy habit hammock. Changing the default setting for search is just not worth the effort.
What I think is happening is the conflation of search and retrieval with another trend. The new thing is letting software generate what looks like an answer. Forget that the outputs of a system based on smart software may be wonky or just incorrect. Thinking up a query is difficult.
But Web search sucks. Google is in a race to create bigger, more inviting hammocks.
Google is not sliding into a loss of market share. The company is coming in for the kill as it demonstrates its financial resolve with regard to the investment in Character.ai.
Let me be clear: Finding actionable information today is more difficult than at any previous time in my 50 year career in online information. Why? Software struggles to match content to what a human needs to solve certain problems. Finding a pizza joint or getting a list of results for further reading just looks like an answer. To move beyond good enough so the pizza joint does not gag a maggot or the list of citations is beyond the user’s reading level is not what’s required.
We are stuck in the Land of Good Enough, lounging in habit hammocks, and living the good life. Some people wear a T shirt with the statement, “Ignorance is bliss. Hello, Happy.”
Net net: I think the write up projects a future in which search becomes really easy and does the thinking for the humanoids. But for now, it’s the Google.
Stephen E Arnold, November 14, 2023
Autonomy: More Legal Activity
October 25, 2023
This essay is the work of a dumb humanoid. No smart software required.
Though the UK legal system seems to have lost interest, the US is still determined to throw the book at Autonomy’s founder for his alleged deceit of HP. Now, The Telegraph reports, “Mike Lynch Files Legal Challenge to Have Fraud Case Thrown Out by US Courts.” While their client languishes in San Francisco under self-funded house arrest, Lynch’s lawyers insist the US has no jurisdiction to prosecute. Reporter James Titcomb writes:
“The filing states: ‘At all times between 2009 and 2011, Autonomy was fundamentally a UK-centric business. Autonomy listed its shares on the London Stock Exchange. All major decisions about the strategic direction of the company, its revenue-generating operations, and its compliance with financial reporting obligations were made in England. ‘The “means and methods” identified in the [indictment] – revenue recognition issues, allegedly fraudulent entries in Autonomy’s books, allegedly false and misleading quarterly and annual reports – all comprise conduct that occurred in another country.’ Mr Lynch has long maintained that any case against him should be heard in Britain, but the Serious Fraud Office dropped its investigation into the matter in 2015.”
Will this tactic work? The US DOJ filed charges in 2018 and 2019. Despite all efforts to block extradition, Lynch was moved to San Francisco in May 2023. The article states a judge will hear the request to throw out the case in November. Meanwhile, the trial remains scheduled for 2024.
The saga of Autonomy and HP continues. Who knew enterprise search could become a legal thriller? Netflix, perhaps a documentary?
Cynthia Murrell, October 25, 2023
Kagi Rolls Out a Small Web Initiative
October 5, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Recall the early expectations for the Web: It would be a powerful conduit for instant connection and knowledge-sharing around the world. Despite promises to the contrary, that rosy vision has long since given way to commercial interests’ paid content, targeted ads, bots, and data harvesting. Launched in 2018, Kagi offers a way to circumvent those factors with its ad-free, data protecting search engine—for a small fee, naturally. Now the company is promoting what it calls the Kagi Small Web initiative. We learn from the blog post:
“Since inception, we’ve been featuring content from the small web through our proprietary Teclis and TinyGem search indexes. This inclusion of high-quality, lesser-known parts of the web is part of what sets Kagi’s search results apart and gives them a unique flavor. Today we’re taking this a step further by integrating Kagi Small Web results into the index.”
See the write-up for examples. Besides these insertions into search results, one can also access these harder-to-find sources at the new Kagi Small Web website. This project displays a different random, recent Web page with each click of the “Next Post” button. Readers are also encouraged to check out their experimental Small YouTube, which we are told features content by YouTube creators with fewer than 4,000 subscribers. (Although as of this writing, the Small YouTube link supplied redirects right back to the source blog post. Hmm.)
The write-up concludes with these thoughts on Kagi’s philosophy:
“The driving question behind this initiative was simple yet profound: the web is made of millions of humans, so where are they? Why do they get overshadowed in traditional search engines, and how can we remedy this? This project required a certain leap of faith as the content we crawl may contain anything, and we are putting our reputation on the line vouching for it. But we also recognize that the ‘small web’ is the lifeblood of the internet, and the web we are fighting for. Those who contribute to it have already taken their own leaps of faith, often taking time and effort to create, without the assurance of an audience. Our goal is to change that narrative. Together with the global community of people who envision a different web, we’re committed to revitalizing a digital space abundant in creativity, self-expression, and meaningful content – a more humane web for all.”
Does this suggest that Google Programmable Search Engine is a weak sister?
Cynthia Murrell, October 5, 2023
This Dinobaby Likes Advanced Search, Boolean Operators, and Precision. Most Do Not
August 28, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I am not sure of the chronological age of the author of “7 Reasons to Replace Advanced Search with Filters So Users Can Easily Find What They Need.” From my point of view, the author has a mental age of someone much younger than I. The article identifies a number of reasons why “advanced search” functions are lousy. As a dinobaby, I want to be crystal clear: A user should have an interface which allows that user to locate the information required to respond in a useful way to a query.
The expert online searcher says with glee, “I love it when free online search services make finding information easy. Best of all is Amazon. It suggests so many things I absolutely need.” Hey, MidJourney, thanks for the image without suggesting Mother MJ okay my word choice. “Whoever said, ‘Nothing worthwhile comes easy’ is pretty stupid,” shouts or sliding board slider.
Advanced search in my dinobaby mental space means Boolean operators like AND, OR, and NOT, among others. Advanced search requires other meaningful “tags” specifically designed to minimize the ambiguity of words; for example, terminal can mean transportation or terminal can mean computing device. English is notable because it has numerous words which make sense only when a context is provided. Thus, a Field Code can instruct the retrieval system to discard the computing device context and retrieve the transportation context.
The write up makes clear that for today’s users training wheels are important. Are these “aids” like icons, images, bundles of results under a category dark patterns or assistance for a user. I can only imagine the push back I would receive if I were in a meeting with today’s “user experience” designers. Sorry, kids. I am a dinobaby.
I really want to work through seven reasons advanced search sucks. But I won’t. The number of people who know how to use key word search is tiny. One number I heard when I was a consultant to a certain big search engine is less than three percent of the Web search users. The good news for those who buy into the arguments in the cited article is that dinobabies will die.
Is it a lack of education? Is it laziness? Is it what most of today’s users understand?
I don’t know. I don’t care. A failure to understand how to obtain the specific information one requires is part of the long slow slide down a descent gradient. Enjoy the non-advanced search.
Stephen E Arnold, August 28, 2023
Academic Research Resources: Smart Software Edition
August 8, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
One of my research team called “The Best AI Tools to Power Your Academic Research.” The article identifies five AI infused tools; specifically:
- ChatPDF
- Consensus
- Elicit.org
- Research Rabbit
- Scite.ai
Each of the tools is described briefly. The “academic research” phrase is misleading. These tools can provide useful information related to inventors and experts (real or alleged), specific technical methods, and helpful background or contest for certain social, political, and intellectual issues.
If you have access to a LLM question-and-answer system, experimenting with article summaries, lists of information, and names of people associated with a particular activity — give a ChatGPT system a whirl too.
Stephen E Arnold, August 8, 2023