Kagi For-Fee Search: Comments from a Thread about Search

January 2, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Comparisons of search engine performance are quite difficult to design, run, and analyze. In the good old days when commercial databases reigned supreme, special librarians could select queries, refine them, and then run those queries via Dialog, LexisNexis, DataStar, or another commercial search engine. Examination of the results were tabulated and hard copy print outs on thermal paper were examined. The process required knowledge of the search syntax, expertise in query shaping, and the knowledge in the minds of the special librarians performing the analysis. Others were involved, but the work focused on determining overlap among databases, analysis of relevance (human and mathematical), and expertise gained from work in the commercial database sector, academic training, and work in a special library.

Who does that now? Answer: No one. With this prefatory statement, let’s turn our attention to “How Bad Are Search Results? Let’s Compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT.” Please, read the write up. The guts of the analysis, in my opinion, appear in this table:

image

The point is that search sucks. Let’s move on. The most interesting outcome from this write up from my vantage point is the comments in the Hacker News post. What I want to do is highlight observations about Kagi.com, a pay-to-use Web search service. The items I have selected make clear why starting and building sustainable revenue from Web search is difficult. Furthermore, the comments I have selected make clear that without an editorial policy, specific information about the corpus, its updating, and content acquisition method — evaluation is difficult.

Long gone are the days of precision and recall, and I am not sure most of today’s users know or care. I still do, but I am a dinobaby and one of those people who created an early search-related service called The Point (Top 5% of the Internet), the Auto Channel, and a number of other long-forgotten Web sites that delivered some type of findability. Why am I roadkill on the information highway? No one knows or cares about the complexity of finding information in print or online. Sure, some graduate students do, but are you aware that the modern academic just makes up or steals other information; for instance, the former president of Stanford University.l

Okay, here are some comments about Kagi.com from Hacker News. (Please, don’t write me and complain that I am unfair. I am just doing what dinobabies with graduate degrees do — Provide footnotes)

hannasanario: I’m not able to reproduce the author’s bad results in Kagi, at all. What I’m seeing when searching the same terms is fantastic in comparison. I don’t know what went wrong there. Dinobaby comment: Search results, in the absence of editorial policies and other basic information about valid syntax means subjectivity is the guiding light. Remember that soap operas were once sponsored influencer content.

Semaphor: This whole thread made me finally create a file for documenting bad searches on Kagi. The issue for me is usually that they drop very important search terms from the query and give me unrelated results. Dinobaby comment: Yep, editorial functions in results, and these are often surprises. But when people know zero about a topic, who cares? Not most users.

Szundi: Kagi is awesome for me too. I just realize using Google somewhere else because of the sh&t results. Dinobaby comment: Ah, the Google good enough approach is evident in this comment. But it is subjective, merely an opinion. Just ask a stockholder. Google delivers, kiddo.

Mrweasel: Currently Kagi is just as dependent on Google as DuckDuckGo is on Bing. Dinobaby comment: Perhaps Kagi is not communicating where content originates, how results are generated, and why information strikes Mrweasel as “dependent on Google. Neeva was an outfit that wanted to go beyond Google and ended up, after marketing hoo hah selling itself to some entity.

Fenaro: Kagi should hire the Marginalia author. Dinobaby comment: Staffing suggestions are interesting but disconnected from reality in my opinion.

ed109685: Kagi works because there is no incentive for SEO manipulators to target it since their market share is so small. Dinobaby comment: Ouch, small.

shado: I became a huge fan of Kagi after seeing it on hacker news too. It’s amazing how good a search engine can be when it’s not full of ads. Dinobaby comment: A happy customer but no hard data or examples. Subjectivity in full blossom.

yashasolutions: Kagi is great… So I switch recently to Kagi, and so far it’s been smooth sailing and a real time saver. Dinobaby comment: Score another happy, paying customer for Kagi.

innocentoldguy: I like Kagi and rarely use anything else. Kagi’s results are decent and I can blacklist sites like Amazon.com so they never show up in my search results. Dionobaby comment: Another dinobaby who is an expert about search.

What does this selection of Kagi-related comments reveal about Web search? Here’s snapshot of my notes:

  1. Kagi is not marketing its features and benefits particularly well, but what search engine is? With Google sucking in more than 90 percent of the query action, big bucks are required to get the message out. This means that subscriptions may be tough to sell because marketing is expensive and people sign up, then cancel.
  2. There is quite a bit of misunderstanding among “expert” searchers like the denizens of Hacker News. The nuances of a Web search, money supported content, metasearch, result matching, etc. make search a great big cloud of unknowing for most users.
  3. The absence of reproducible results illustrates what happens when consumerization of search and retrieval becomes the benchmark. The pursuit of good enough results in loss of finding functionality and expertise.

Net net: Search sucks. Oh, wait, I used that phrase in an article for Barbara Quint 35 years ago.

PS. Mwmbl is at https://mwmbl.or in case you are not familiar with the open source, non profit search engine. You have to register, well, because…

Stephen E Arnold, January 2, 2024

Kiddie Control: Money and Power. What Is Not to Like?

January 2, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I want to believe outputs from Harvard University. But the ethics professor who made up data about ethics and the more recent the recent publicity magnet from the possibly former university president nag at me. Nevertheless, let’s assume that some of the data in “Social Media Companies Made $11 Billion in US Ad Revenue from Minors, Harvard Study Finds” are semi-correct or at least close enough for horseshoes. (You may encounter a paywall or a 404 error. Well, just trust a free Web search system to point you to a live version of the story. I admit that I was lucky. The link from my colleague worked.)

image

The senior executive sets the agenda for the “exploit the kiddies” meeting. Control is important. Ignorant children learn whom to trust, believe, and follow. Does this objective require an esteemed outfit like the Harvard U. to state the obvious? Seems like it. Thanks, MSFT Copilot, you output child art without complaint. Consistency is not your core competency, is it?

From the write up whose authors I hope are not crossing their fingers like some young people do to neutralize a lie.

Check this statement:

The researchers say the findings show a need for government regulation of social media since the companies that stand to make money from children who use their platforms have failed to meaningfully self-regulate. They note such regulations, as well as greater transparency from tech companies, could help alleviate harms to youth mental health and curtail potentially harmful advertising practices that target children and adolescents.

The sentences contain what I think are silly observations. “Self regulation” is a bit of a sci-fi notion in today’s get-rich-quick high-technology business environment. The idea of getting possible oligopolists together to set some rules that might hurt revenue generation is something from an alternative world. Plus, the concept of “government regulation” strikes me as a punch line for a stand up comedy act. How are regulatory agencies and elected officials addressing the world of digital influencing? Answer: Sorry, no regulation. The big outfits are in many situations are the government. What elected official or Washington senior executive service professional wants to do something that cuts off the flow of nifty swag from the technology giants? Answer: No one. Love those mouse pads, right?

Now consider these numbers which are going to be tough to validate. Have you tried to ask TikTok about its revenue? What about that much-loved Google? Nevertheless, these are interesting if squishy:

According to the Harvard study, YouTube derived the greatest ad revenue from users 12 and under ($959.1 million), followed by Instagram ($801.1 million) and Facebook ($137.2 million). Instagram, meanwhile, derived the greatest ad revenue from users aged 13-17 ($4 billion), followed by TikTok ($2 billion) and YouTube ($1.2 billion). The researchers also estimate that Snapchat derived the greatest share of its overall 2022 ad revenue from users under 18 (41%), followed by TikTok (35%), YouTube (27%), and Instagram (16%).

The money is good. But let’s think about the context for the revenue. Is there another payoff from hooking minors on a particular firm’s digital content?

Control. Great idea. Self regulation will definitely address that issue.

Stephen E Arnold, January 2, 2023

Lawyer, Former Government Official, and Podcaster to Head NSO Group

January 2, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The high-profile intelware and policeware vendor NSO Group has made clear that specialized software is a potent policing tool. NSO Group continues to market its products and services at low-profile trade shows like those sponsored by an obscure outfit in northern Virginia. Now the firm has found a new friend in a former US official. TechDirt reports, “Former DHS/NSA Official Stewart Baker Decides He Can Help NSO Group Turn a Profit.” Writer Tim Cushing tells us:

“This recent filing with the House of Representatives makes it official: Baker, along with his employer Steptoe and Johnson, will now be seeking to advance the interests of an Israeli company linked to abusive surveillance all over the world. In it, Stewart Baker is listed as the primary lobbyist. This is the same Stewart Baker who responded to the Commerce Department blacklist of NSO by saying it wouldn’t matter because authoritarians could always buy spyware from… say…. China.”

So, the reasoning goes, why not allow a Western company to fill that niche? This perspective apparently makes Baker just the fellow to help NSO buff up NSO Group’s reputation. Cushing predicts:

“The better Baker does clearing NSO’s tarnished name, the sooner it and its competitors can return to doing the things that got them in trouble in the first place. Once NSO is considered somewhat acceptable, it can go back to doing the things that made it the most money: i.e., hawking powerful phone exploits to human rights abusers. But this time, NSO has a former US government official in its back pocket. And not just any former government official but one who spent months telling US citizens who were horrified by the implications of the Snowden leaks that they were wrong for being alarmed about bulk surveillance.”

Perhaps the winning combination for the NSO Group is a lawyer, former US government official, and a podcaster in one sleek package will do the job? But there are now alternatives to the Pegasus solution. Some of these do not have the baggage carted around by the stealthy flying horse.

Perhaps there will be a podcast about NSO Group in the near future.

Cynthia Murrell, January 2, 2024

Google, There Goes Two Percent of 2022 Revenues. How Will the Company Survive?

January 1, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

True or false: Google owes $5 billion US. I am not sure, but the headline in Metro makes the number a semi-factoid. So let’s see what could force Googzilla to transfer the equivalent of less than two percent of Google’s alleged 2022 revenues. Wow. That will be painful for the online advertising giant. Well, fire some staff; raise ad rates; and boost the cost of YouTube subscriptions. Will the GOOG survive? I think so.

image

An executive ponders a court order to pay the equivalent of two percent of 2022 revenues for unproven alleged improper behavior. But the court order says, “Have a nice day.” I assume the court is sincere. Thanks, MSFT Copilot Bing thing. Good enough.

Google Settles $5,000,000,000 Claim over Searches for Intimate and Embarrassing Things” reports:

Google has agreed to settle a US lawsuit claiming it secretly tracked millions of people who thought they were browsing privately through its Incognito Mode between 2016 and 2020. The claim was seeking at least $5 billion in damages, including at least $5,000 for each user affected. Ironically, the terms of the settlement have not been disclosed, but a formal agreement will be submitted to the court by February 24.

My thought is that Google’s legal eagles will not be partying on New Year’s Eve. These fine professionals will be huddling over their laptops, scrolling for fee legal databases, and using Zoom (the Google video service is a bit of a hassle) to discuss ways to [a] delay, [b] deflect, [c] deny, and [d] dodge the obviously [a] fallacious, [b] foul, [c] false, [d] flimsy, and [e] flawed claims that Google did anything improper.

Hey, incognito means what Google says it means, just like the “unlimited” data claims from wireless providers. Let’s not get hung up on details. Just ask the US regulatory authorities.

For you and me, we need to read Google’s terms of service, check our computing device’s security settings, and continue to live in a Cloud of Unknowing. The allegations that Google mapping vehicles did Wi-Fi sniffing? Hey, these assertions are [a] fallacious, [b] foul, [c] false, [d] flimsy, and [e] flawed . Tracking users. Op cit, gentle reader.

Never has a commercial enterprise been subjected to so many [a] unwarranted, [b] unprovable, [c] unacceptable, and [d] unnecessary assertions. Here’s my take: [a] The Google is innocent; [b] the GOOG is misunderstood, [c] Googzilla is a victim. I ticked a, b, and c.

Stephen E Arnold, January 1, 2024

The Cost of Clever

January 1, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

A New Year and I want to highlight an interesting story which I spotted in SFGate: “Consulting Firm McKinsey Agrees to $78 Million Settlement with Insurers over Opioids.” The focus on efficiency and logic created an interesting consulting opportunity for a blue-chip firm. That organization responded. The SFGate story reports:

Consulting firm McKinsey and Co. has agreed to pay $78 million to settle claims from insurers and health care funds that its work with drug companies helped fuel an opioid addiction crisis.

image

A blue-consultant has been sent to the tool shed by Ms. Justice. The sleek wizard is not happy because the tool shed is the location for severe punishment by Ms. Justice. Thanks, MSFT Copilot Bing thing.

What did the prestigious firm’s advisors assist Purdue Pharma to achieve? The story says:

The insurers argued that McKinsey worked with Purdue Pharma – the maker of OxyContin – to create and employ aggressive marketing and sales tactics to overcome doctors’ reservations about the highly addictive drugs. Insurers said that forced them to pay for prescription opioids rather than safer, non-addictive and lower-cost drugs, including over-the-counter pain medication. They also had to pay for the opioid addiction treatment that followed.

The write up presents McKinsey’s view of its service this way:

“As we have stated previously, we continue to believe that our past work was lawful and deny allegations to the contrary,” the company said, adding that it reached a settlement to avoid protracted litigation. McKinsey said it stopped advising clients on any opioid-related business in 2019.

What’s interesting is that the so-called opioid crisis reveals the consequences of a certain mental orientation. The goal of generating a desired outcome for a commercial enterprise can have interesting and, in this case, expensive consequences. Have some of these methods influenced other organizations? Will blue-chip consulting firms and efficiency-oriented engineers learn from wood shed visits?

Happy New Year everyone.

Stephen E Arnold, January 1, 2024

Another AI Output Detector

January 1, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

It looks like AI detection may have a way to catch up with AI text capabilities. But for how long? Nature reports, “’ChatGPT Detector’ Catches AI Generated Papers with Unprecedented Accuracy.” The key to this particular tool’s success is its specificity—it was developed by chemist Heather Desaire and her team at the University of Kansas specifically to catch AI-written chemistry papers. Reporter McKenzie Prillaman tells us:

“Using machine learning, the detector examines 20 features of writing style, including variation in sentence lengths, and the frequency of certain words and punctuation marks, to determine whether an academic scientist or ChatGPT wrote a piece of text. The findings show that ‘you could use a small set of features to get a high level of accuracy’, Desaire says.”

The model was trained on human-written papers from 10 chemistry journals then tested on 200 samples written by ChatGPT-3.5 and ChatGPT-4. Half the samples were based on the papers’ titles, half on the abstracts. Their tool identified the AI text 100% and 98% of the time, respectively. That clobbers the competition: ZeroGPT only caught about 35–65% and OpenAI’s own text-classifier snagged 10–55%. The write-up continues:

“The new ChatGPT catcher even performed well with introductions from journals it wasn’t trained on, and it caught AI text that was created from a variety of prompts, including one aimed to confuse AI detectors. However, the system is highly specialized for scientific journal articles. When presented with real articles from university newspapers, it failed to recognize them as being written by humans.”

The lesson here may be that AI detectors should be tailor made for each discipline. That could work—at least until the algorithms catch on. On the other hand, developers are working to make their systems more and more like humans.

Cynthia Murrell, January 1, 2024

« Previous Page

  • Archives

  • Recent Posts

  • Meta