Microsoft Grouses and Barks, Then Regrouses and Rebarks about the Google
December 23, 2024
This blog post is the work of an authentic dinobaby. No smart software was used.
I spotted a reference to Windows Central, a very supportive yet “independent” explainer of Microsoft. That write up bounced around and a version ended up in Analytics India, an online publication from a country familiar to the Big Dogs at Microsoft and Google.
A stern mother tells her child to knock off the constant replays of a single dorky tune like “If I Knew You Were Comin’ I’d’ve Baked a Cake.” Thanks, Grok. Good enough.
The Analytics India story is titled “Google Makes More Money on Windows Than Microsoft, says Satya Nadella.” Let’s look at a couple of passages from the write up and then reflect on the “grousing” both giants in the money making department are sharing with anyone, maybe everyone.
Here’s the first snippet:
“Google makes more money on Windows than all of Microsoft,” Nadella said, discussing the company’s strategy to reclaim lost market share in the browser space.
I love that “lost market share”. Did Microsoft have market share in the browser space. Like Windows Phone, the Microsoft search engine in its many incarnations was not a click magnet. I heard when I did a teeny tiny thing for a Microsoft “specialist” outfit that Softies were running queries on Google and then reverse engineering what to index and what knobs to turn in order to replicate what Google’s massively wonderful method produced. True or false? Hey, I only know what I was told. Whatever Microsoft did in search failed. (How about that Fast Search & Transfer technology which powered alltheweb.com when it existed?)
I circled this statement as well:
Looking ahead, Nadella expressed confidence in Microsoft’s efforts to regain browser market share and promote its AI tools. “We get to relitigate,” he said, pointing to the opportunity to win back market share. “This is the best news for Microsoft shareholders—that we lost so badly that we can now go contest it and win back some share,” he said.
Ah, ha. “Lost so badly.” What an interesting word “relitigate.” Huh? And the grouse replay “win back market share.” What market share? Despite the ubiquity of the outstandingly wonderful Windows operating system and its baked in browser and button fest for Bing, exactly what is the market share.
Google is chugging along with about 90 percent Web search market share. Microsoft is nipping at Google’s heels with a robust four percent. Yandex is about two percent. The more likely scenario is that Yandex could under its new ownership knock Microsoft out of second place. Google isn’t going anywhere fast because the company is wrapped around finding information like Christiano Ronaldo holding one of his trophies.
What’s interesting about the Analytics India write up is what is not included in the article. For example:
- The cultural similarities of the two Big Dogs. The competition has the impact of a couple of high schoolers arguing in the cafeteria
- The lack of critical commentary about the glittering generalities the Microsoft Big Dog barks and rebarks like an annoyed French bulldog
- A total lack of interest in the fact that both companies are monopolies and that neither exists to benefit anyone other than those who hold shares in the respective companies. As long as there is money, search market share is nothing more than a money stream.
Will smart software improve the situation?
No. But the grouse and re-grouse approach to business tactics will be a very versatile rhetorical argument.
Stephen E Arnold, December 23, 2024
The Hay Day of Search Has a Ground Hog Moment
December 19, 2024
This blog post is the work of an authentic dinobaby. No smart software was used.
I think it was 2002 or 2003 that I started writing the first of three editions of Enterprise Search Report. I am not sure what happened to the publisher who liked big, fat thick printed books. He has probably retired to an island paradise to ponder the crashing blue surf.
But it seems that the salad days of enterprise search are back. Elastic is touting semantics, smart software, and cyber goodness. IBM is making noises about “Watson” in numerous forms just gift wrapped with sparkly AI ice cream jimmies. There is a start up called Swirl. The HuggingFace site includes numerous references to finding and retrieving. And there is Glean.
I keep seeing references to Glean. When I saw a link to the content marketing piece “Glean’s Approach to Smarter Systems: AI, Inferencing and Enterprise Data,” I read it. I learned that the company did not want to be an AI outfit, a statement I am not sure how to interpret; nevertheless, the founder of Glean is quoted as saying:
“We didn’t actually set out to build an AI application. We were first solving the problem of people can’t find anything in their work lives. We built a search product and we were able to use inferencing as a core part of our overall product technology,” he said. “That has allowed us to build a much better search and question-and-answering product … we’re [now] able to answer their questions using all of their enterprise knowledge.”
And what happened to finding information? The company has moved into:
- Workflows
- Intelligent data discovery
- Problem solving
And the result is not finding information:
Glean enables enterprises to improve efficiency while maintaining control over their knowledge ecosystem.
Translation: Enterprise search.
The old language of search is gone, but it seems to me that “search” is now explained with loftier verbiage than that used by Fast Search & Transfer in a lecture delivered in Switzerland before the company imploded.
Is it now time for write the “Enterprise Knowledge Ecosystem Report”? Possibly for someone, but it’s Ground Hog time. I have been there and done that. Everyone wants search to work. New words and the same challenges. The hay is growing thick and fast.
Stephen E Arnold, December 19, 2024
The Fatal Flaw in Rules-Based Smart Software
December 17, 2024
This blog post is the work of an authentic dinobaby. No smart software was used.
As a dinobaby, I have to remember the past. Does anyone know how the “smart” software in AskJeeves worked? At one time before the cute logo and the company followed the path of many, many other breakthrough search firms, AskJeeves used hand-crafted rules. (Oh, the reference to breakthrough is a bit of an insider joke with which I won’t trouble you.) A user would search for “weather 94401” and the system would “look up” in the weather rule the zip code for Foster City, California, and deliver the answer. Alternatively, I could have when I ran the query looked out my window. AskJeeves went on a path painfully familiar to other smart software companies today: Customer service. AskJeeves was acquired by IAC Corp. which moved away from the rules-based system which was “revolutionizing” search in the late 1990s.
Rules-based wranglers keep busy a-fussin’ and a-changin’ all the dang time. The patient mule Jeeves just wants lunch. Thanks, MidJourney, good enough.
I read “Certain Names Make ChatGPT Grind to a Halt, and We Know Why.” The essay presents information about how the wizards at OpenAI solve problems its smart software creates. The fix is to channel the “rules-based approach” which was pretty darned exciting decades ago. Like the AskJeeves’ approach, the use of hand-crafted rules creates several problems. The cited essay focuses on the use of “rules” to avoid legal hassles created when smart software just makes stuff up.
I want to highlight several other problems with rules-based decision systems which are far older in computer years than the AskJeeves marketing success in 1996. Let me highlight a few which may lurk within the OpenAI and ChatGPT smart software:
- Rules have to be something created by a human in response to something another (often unpredictable) human did. Smart software gets something wrong like saying a person is in jail or dead when he is free and undead.
- Rules have to be maintained. Like legacy code, setting and forgetting can have darned exciting consequences after the original rules creator changed jobs or fell into the category “in jail” or “dead.”
- Rules work with a limited set of bounded questions and answers. Rules fail when applied to the fast-changing and weird linguistic behavior of humans. If a “rule” does know a word like “debanking”, the system will struggle, crash, or return zero results. Bummer.
- Rules seem like a great idea until someone calculates how many rules are needed, how much it costs to create a rule, and how much maintenance rules require (typically based on the cost of creating a rule in the first place). To keep the math simple, rules are expensive.
I liked the cited essay about OpenAI. It reminds me how darned smart today’s developers of smart software are. This dinobaby loved the article. What a great anecdote! I want to say, “OpenAI should have “asked Jeeves.” I won’t. I will point out that IBM Watson, the Jeopardy winner version, was rules based. In fact, rules are still around, and they still carry like a patient donkey the cost burden.
Stephen E Arnold, December 17, 2024
Europe Wants Its Own Search System: Filtering, Trees, and More
November 20, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I am not going to recount the history of search companies and government entities building an alternative to Google. One can toss in Bing, but Google is the Big Dog. Yandex is useful for Russian content. But there is a void even though Swisscows.com is providing anonymity (allegedly) and no tracking (allegedly).
Now a new European solution may become available. If you remember Pertimm, you probably know that Qwant absorbed some of that earlier search system’s goodness. And there is Ecosia, a search system which plants trees. The union of these two systems will be an alternative to Google. I think Exalead.com tried this before, but who remembers European search history in rural Kentucky?
“Two Upstart Search Engines Are Teaming Up to Take on Google” report:
The for-profit joint venture, dubbed European Search Perspective and located in Paris, could allow the small companies and any others that decide to join up to reduce their reliance on Google and Bing and serve results that are better tailored to their companies’ missions and Europeans’ tastes.
A possible name or temporary handle for the new search system is EUSP or European Search Perspective. What’s interesting is that the plumbing will be provided by a service provider named OVH. Four years ago, OVHcloud became a strategic partner of … wait for it … Google. Apparently that deal does not prohibit OVH from providing services to a European alternative to Google.
Also, you may recall that Eric Schmidt, former adult in the room at Google, suggested that Qwant kept him awake at night. Yes, Qwant has been a threat to Google for 13 years. How has that worked out? The original Qwant was interesting with a novel way of showing results from different types of sources. Now Qwant is actually okay. The problem with any search system, including Bing, is that the cost of maintaining an index containing new content and refreshing or updating previously indexed content is a big job. Toss in some AI goodness and cash burning furiously.
“Google” is now the word for search whether it works or does not. Perhaps regulatory actions will alter the fact that in Denmark, 99 percent of user queries flow to Google. Yep, Denmark. But one can’t go wrong with a ballpark figure like 95 percent of search queries outside of China and a handful of other countries are part of the Google market share.
How will the new team tackle the Google? I hope in a way that delivers more progress than Cogito. Remember that? Okay, no problem.
PS. Is a 13-year-old company an upstart? Sigh.
Stephen E Arnold, November 20, 2024
Content Conversion: Search and AI Vendors Downplay the Task
November 19, 2024
No smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.
Marketers and PR people often have degrees in political science, communications, or art history. This academic foundation means that some of these professionals can listen to a presentation and struggle to figure out what’s a horse, what’s horse feathers, and what’s horse output.
Consequently, many organizations engaged in “selling” enterprise search, smart software, and fusion-capable intelligence systems downplay or just fib about how darned easy it is to take “content” and shove it into the Fancy Dan smart software. The pitch goes something like this: “We have filters that can handle 90 percent of the organization’s content. Word, PowerPoint, Excel, Portable Document Format (PDF), HTML, XML, and data from any system that can export tab delimited content. Just import and let our system increase your ability to analyze vast amounts of content. Yada yada yada.”
Thanks, Midjourney. Good enough.
The problem is that real life content is often a problem. I am not going to trot out my list of content problem children. Instead I want to ask a question: If dealing with content is a slam dunk, why do companies like IBM and Oracle sustain specialized tools to convert Content Type A into Content Type B?
The answer is that content processing is an essential step because [a] structured and unstructured content can exist in different versions. Figuring out the one that is least wrong and most timely is tricky. [b] Humans love mobile devices, laptops, home computers, photos, videos, and audio. Furthermore, how does a content processing get those types of content from a source not located in an organization’s office (assuming it has one) and into the content processing system? The answer is, “Money, time, persuasion, and knowledge of what employee has what.” Finding a unicorn at the Kentucky Derby is more likely. [c] Specialized systems employ lingo like “Export as” and provide some file types. Yeah. The problem is that the output may not contain everything that is in the specialized software program. Examples range from computational chemistry systems to those nifty AutoCAD type drawing system to slick electronic trace routing solutions to DaVinci Resolve video systems which can happily pull “content” from numerous places on a proprietary network set up. Yeah, no problem.
Evidence of how big this content conversion issue is appears in the IBM write up “A New Tool to Unlock Data from Enterprise Documents for Generative AI.” If the content conversion work is trivial, why is IBM wasting time and brainpower figuring out something like making a PowerPoint file smart software friendly?
The reason is that as big outfits get “into” smart software, the people working on the project find that the exception folder gets filled up. Some documents and content types don’t convert. If a boss asks, “How do we know the data in the AI system are accurate?”, the hapless IT person looking at the exception folder either lies or says in a professional voice, “We don’t have a clue?”
IBM’s write up says:
IBM’s new open-source toolkit, Docling, allows developers to more easily convert PDFs, manuals, and slide decks into specialized data for customizing enterprise AI models and grounding them on trusted information.
But one piece of software cannot do the job. That’s why IBM reports:
The second model, TableFormer, is designed to transform image-based tables into machine-readable formats with rows and columns of cells. Tables are a rich source of information, but because many of them lie buried in paper reports, they’ve historically been difficult for machines to parse. TableFormer was developed for IBM’s earlier DeepSearch project to excavate this data. In internal tests, TableFormer outperformed leading table-recognition tools.
Why are these tools needed? Here’s IBM’s rationale:
Researchers plan to build out Docling’s capabilities so that it can handle more complex data types, including math equations, charts, and business forms. Their overall aim is to unlock the full potential of enterprise data for AI applications, from analyzing legal documents to grounding LLM responses on corporate policy documents to extracting insights from technical manuals.
Based on my experience, the paragraph translates as, “This document conversion stuff is a killer problem.”
When you hear a trendy enterprise search or enterprise AI vendor talk about the wonders of its system, be sure to ask about document conversion. Here are a few questions to put the spotlight on what often becomes a black hole of costs:
- If I process 1,000 pages of PDFs, mostly text but with some charts and graphs, what’s the error rate?
- If I process 1,000 engineering drawings with embedded product and vendor data, what percentage of the content is parsed for the search or AI system?
- If I process 1,000 non text objects like videos and iPhone images, what is the time required and the metadata accuracy for the converted objects?
- Where do unprocessable source objects go? An exception folder, the trash bin, or to my in box for me to fix up?
Have fun asking questions.
Stephen E Arnold, November 19, 2024
Dreaming about Enterprise Search: Hope Springs Eternal…
November 6, 2024
The post is the work of a humanoid who happens to be a dinobaby. GenX, Y, and Z, read at your own risk. If art is included, smart software produces these banal images.
Enterprise search is back, baby. The marketing lingo is very year 2003, however. The jargon has been updated, but the story is the same: We can make an organization’s information accessible. Instead of Autonomy’s Neurolinguistic Programming, we have AI. Instead of “just text,” we have video content processed. Instead of filters, we have access to cloud-stored data.
An executive knows he can crack the problem of finding information instantly. The problem is doing it so that the time and cost of data clean up does not cost more than buying the Empire State Building. Thanks, Stable Diffusion. Good enough.
A good example of the current approach to selling the utility of an enterprise search and retrieval system is the article / interview in Betanews called “How AI Is Set to Democratize Information.” I want to be upfront. I am a mostly aligned with the analysis of information and knowledge presented by Taichi Sakaiya. His The Knowledge Value Revolution or a History of the Future has been a useful work for me since the early 1990s. I was in Osaka, Japan, lecturing at the Kansai Institute of Technology when I learned of this work book from my gracious hosts and the Managing Director of Kinokuniya (my sponsor). Devaluing knowledge by regressing to the fat part of a Gaussian distribution is not something about which I am excited.
However, the senior manager of Pyron (Raleigh, North Carolina), an AI-powered information retrieval company, finds the concept in line with what his firm’s technology provides to its customers. The article includes this statement:
The concept of AI as a ‘knowledge cloud’ is directly tied to information access and organizational intelligence. It’s essentially an interconnected network of systems of records forming a centralized repository of insights and lessons learned, accessible to individuals and organizations.
The benefit is, according to the Pyron executive:
By breaking down barriers to knowledge, the AI knowledge cloud could eliminate the need for specialized expertise to interpret complex information, providing instant access to a wide range of topics and fields.
The article introduces a fresh spin on the problems of information in organizations:
Knowledge friction is a pervasive issue in modern enterprises, stemming from the lack of an accessible and unified source of information. Historically, organizations have never had a singular repository for all their knowledge and data, akin to libraries in academic or civic communities. Instead, enterprise knowledge is scattered across numerous platforms and systems — each managed by different vendors, operating in silos.
Pyron opened its doors in 2017. After seven years, the company is presenting a vision of what access to enterprise information could, would, and probably should do.
The reality, based on my experience, is different. I am not talking about Pyron now. I am discussing the re-emergence of enterprise search as the killer application for bolting artificial intelligence to information retrieval. If you are in love with AI systems from oligopolists, you may want to stop scanning this blog post. I do not want to be responsible for a stroke or an esophageal spasm. Here we go:
- Silos of information are an emergent phenomenon. Knowledge has value. Few want to make their information available without some value returning to them. Therefore, one can talk about breaking silos and democratization, but those silos will be erected and protected. Secret skunk works, mislabeled projects, and squirreling away knowledge nuggets for a winter’s day. In the case of Senator Everett Dirksen, the information was used to get certain items prioritized. That’s why there is a building named after him.
- The “value” of information or knowledge depends on another person’s need. A database which contains the antidote to save a child from a household poisoning costs money to access. Why? Desperate people will pay. The “information wants to free” idea is not one that makes sense to those with information and the knowledge to derive value from what another finds inscrutable. I am not sure that “democratizing information” meshes smoothly with my view.
- Enterprise search, with or without, hits some cost and time problems with a small number of what have been problems for more than 50 years. SMART failed, STAIRS III failed, and the hundreds of followers have failed. Content is messy. The idea that one can process text, spreadsheets, Word files, and email is one thing. Doing it without skipping wonky files or the time and cost of repurposing data remains difficult. Chemical companies deal with formulae; nuclear engineering firms deal with records management and mathematics; and consulting companies deal with highly paid people who lock up their information on a personal laptop. Without these little puddles of information, the “answer” or the “search output” will not be just a hallucination. The answer may be dead wrong.
I understand the need to whip up jargon like “democratize information”, “knowledge friction”, and “RAG frameworks”. The problem is that despite the words, delivering accurate, verifiable, timely on-point search results in response to a query is a difficult problem.
Maybe one of the monopolies will crack the problem. But most of output is a glimpse of what may be coming in the future. When will the future arrive? Probably when the next PR or marketing write up about search appears. As I have said numerous times, I find it more difficult to locate the information I need than at any time in my more than half a century in online information retrieval.
What’s easy is recycling marketing literature from companies who were far better at describing a “to be” system, not a “here and now” system.
Stephen E Arnold, November 4, 2024
Can Prabhakar Do the Black Widow Thing to Technology at Google?
October 21, 2024
No smart software but we may use image generators to add some modern spice to the dinobaby’s output.
The reliable (mostly?) Wall Street Journal ran a story titled“Google Executive Overseeing Search and Advertising Leaves Role.” The executive in question is Prabhakar Raghavan, the other half of the Sundar and Prabhakar Comedy Team. The wizardly Prabhakar is the person Edward Zitron described as “The Man Who Killed Google Search.” I recommend reading that essay because it has more zip than the Murdoch approach to poohbah analysis.
I want to raise a question because I assume that Mr. Zitron is largely correct about the demise of Google Search. The sleek Prabhakar accelerated the decline. He was the agent of the McKinsey think infused in his comedy partner Sundar. The two still get laughs at their high school reunions amidst chums and more when classmates gather to explain their success to one another.
The Google approach: Who needs relevance? Thanks, MSFT Copilot. Not quite excellent.
What is the question? Here it is:
Will Prabhakar do to Google’s technology what he did to search?
My view is that Google’s technology has demonstrated corporate ossification. The company “invented”, according to Google lore, the transformer. Then Google — because it was concerned about its invention — released some of it as open source and then watched as Microsoft marketed AI as the next big thing for the Softies. And what was the outfit making Microsoft’s marketing coup possible? It was Sam AI-Man.
Microsoft, however, has not been a technology leader for how many years?
Suddenly the Google announced a crisis and put everyone on making Google the leader in AI. I assume the McKinsey think did not give much thought to the idea that MSFT’s transformer would be used to make Google look darned silly. In fact, it was Prabhakar who stole the attention of the pundits with a laughable AI demonstration in Paris.
Flash forward from early 2023 to late 2024 what’s Google doing with technology? My perception is that Google is trying to create AI winners, capture the corporate market from Microsoft, and convince as many people as possible that if Google is broken apart, AI in America will flop.
Yes, the fate of the nation hangs on Google’s remaining a monopoly. That sounds like a punch line to a skit in the Sundar and Prabhakar Comedy Show.
Here’s my hypothesis: The death of search (the Edward Zitron view) is a job well done. The curtains fall on Act I of the Google drama. Act II is about the Google technology. The idea is that the technology of the online advertising monopoly defines the future of America.
Stay tuned because the story will be streamed on YouTube with advertising, lots of advertising, of course.
Stephen E Arnold, October 21, 2024
Online Search: The Old Function Is in Play
October 18, 2024
Just a humanoid processing information related to online services and information access.
We spotted an interesting marketing pitch from Kagi.com, the pay-to-play Web search service. The information is located on the Kagi.com Help page at this link. The approach is what I call “fact-centric marketing.” In the article, you will find facts like these:
In 2022 alone, search advertising spending reached a staggering 185.35 billion U.S. dollars worldwide, and this is forecast to grow by six percent annually until 2028, hitting nearly 261 billion U.S. dollars.
There is a bit of consultant-type analysis which explains the difference between Google’s approach labeled “ad-based search” and the Kagi.com approach called “user-centric search.” I don’t want to get into an argument about these somewhat stark bifurcations in the murky world of information access, search, and retrieval. Let’s just accept the assertion.
I noted more numbers. Here’s a sampling (not statistically valid, of course):
Google generated $76 billion in US ad revenue in 2023. Google had 274 million unique visitors in the US as of February 2023. To estimate the revenue per user, we can divide the 2023 US ad revenue by the 2023 number of users: $76 billion / 274 million = $277 revenue per user in the US or $23 USD per month, on average! That means there is someone, somewhere, a third party and a complete stranger, an advertiser, paying $23 per month for your searches.
The Kagi.com point is:
Choosing to subscribe to Kagi means that while you are now paying for your search you are getting a fair value for your money, you are getting more relevant results, are able to personalize your experience and take advantage of all the tools and features we built, all while protecting your and your family’s privacy and data.
Why am I highlighting this Kagi.com Help information? Leo Laporte on the October 13, 2024, This Week in Tech program talked about Kagi. He asserted that Kagi uses Bing, Google, and its own search index. I found this interesting. If true, Mr. Laporte is disseminating the idea that Kagi.com is a metasearch engine like Ixquick.com (now StartPage.com). The murkiness about what a Web search engine presents to a user is interesting.
A smart person is explaining why paying for search and retrieval is a great idea. It may be, but Google has other ideas. Thanks, You.com. Good enough
In the last couple of days I received an invitation to join a webinar about a search system called Swirl, which connotes mixing content perhaps? I also received a spam message from a fund called TheStreet explaining that the firm has purchased a block of Elastic B.V. shares. A company called provided an interesting explanation of what struck me as a useful way to present search results.
Everywhere companies are circling back to the idea that one cannot “find” needed information.
With Google facing actual consequences for its business practices, that company is now suggesting this angle: “Hey, you can’t break us up. Innovation in AI will suffer.”
So what is the future? Will vendors get a chance to use the Google search index for free? Will alternative Web search solutions become financial wins? Will metasearch triumph, using multiple indexes and compiling a single list of results? Will new-fangled solutions like Glean dominate enterprise information access and then move into the mainstream? Will visual approaches to information access kick “words” to the curb?
Here are some questions I like to ask those who assert that they are online experts, and I include those in the OSINT specialist clan as well:
- Finding information is an unsolved problem. Can you, for example, easily locate a specific frame from a video your mobile device captured a year ago?
- Can you locate the specific expression in a book about linear algebra germane to the question you have about its application to an AI procedure?
- Are you able to find quickly the telephone number (valid at the time of the query) for a colleague you met three years ago at an international conference?
As 2024 rushes to what is likely to be a tumultuous conclusion, I want to point out that finding information is a very difficult job. Most people tell themselves they can find the information needed to address a specific question or task. In reality, these folks are living in a cloud of unknowing. Smart software has not made keyword search obsolete. For many users, ChatGPT or other smart software is a variant of search. If it is easy to use and looks okay, the output is outstanding.
So what? I am not sure the problem of finding the right information at the right time has been solved. Free or for fee, ad supported or open sourced, dumb string matching or Fancy Dan probabilistic pattern identification — none is delivering what so many people believe are on point, relevant, timely information. Don’t even get me started on the issue of “correct” or “accurate.”
Marketers, stand down. Your assertions, webinars, advertisements, special promotions, jargon, and buzzwords do not deliver findability to users who don’t want to expend effort to move beyond good enough. I know one thing for certain, however: Finding relevant information is now more difficult than it was a year ago. I have a hunch the task is only become harder.
Stephen E Arnold, October 18, 2024
Google and Search: A Fix or a Pipe Dream?
September 6, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I read “Dawn of a New Era in Search: Balancing Innovation, Competition, and Public Good.”
Don’t get me wrong. I think multiple search systems are a good thing. The problem is that search (both enterprise and Web) are difficult problems, and these problems are expensive to solve. After working more than 50 years in electronic information, I have seen search systems come and go. I have watched systems morph from search into weird products that hide the search plumbing beneath fancy words like business intelligence and OSINT tools, among others. In 2006 or 2007, one of my financial clients published some of our research. The bank received an email from an “expert” (formerly and Verity) that his firm had better technology than Google. In that conversation, that “expert” said, “I can duplicate Google search for $300 million.” The person who said these incredibly uninformed words is now head of search at Google. Ed Zitron has characterized the individual as the person who killed Google search. Well, that fellow and Google search are still around. This suggests that baloney and high school reunions provide a career path for some people. But search is not understood particularly well at Google at this time. It is, therefore, that awareness of the problems of search is still unknown to judges, search engine marketing experts, developers of metasearch systems which recycle Bing results, and most of the poohbahs writing about search in blogs like Beyond Search.
The poor search kids see the rich guy with lots of money. The kids want it. The situation is not fair to those with little or nothing. Will the rich guy share the money? Thanks, Microsoft Copilot. Good enough. Aren’t you one of the poor Web search vendors?
After five decades of arm wrestling with finding on point information for myself, my clients, and for the search-related start ups with whom I have worked, I have an awareness of how much complexity the word “search” obfuscates. There is a general perception that Google indexes the Web. It doesn’t. No one indexes the Web. What’s indexed are publicly exposed Web pages which a crawler can access. If the response is slow (like many government and underfunded personal / commercial sites), spiders time out. The pages are not indexed. The crawlers have to deal in a successful way with the changes on how Web pages are presented. Upon encountering something for which the crawler is not configured, the Web page is skipped. Certain Web sites are dynamic. The crawler has to cope with these. Then there are Web pages which are not composed of text. The problems are compounded by the vagaries of intermediaries’ actions; for example, what’s being blocked or filtered today? The answer is the crawler skips them.
Without revealing information I am not permitted to share, I want to point out that crawlers have a list which contains bluebirds, canaries, and dead ducks. The bluebirds are indexed by crawlers on an aggressive schedule, maybe multiple times every hour. The canaries are the index-on-a-normal-cycle, maybe once every day or two. The dead ducks are crawled when time permits. Some US government Web sites may not be updated in six or nine months. The crawler visits the site once every six months or even less frequently. Then there are forbidden sites which the crawler won’t touch. These are on the open Web but urls are passed around via private messages. In terms of a Web search, these sites don’t exist.
How much does this cost? The answer is, “At scale, a lot. Indexing a small number of sites is really cheap.” The problem is that in order to pull lots of clicks, one has to have the money to scale or a niche no one else is occupying. Those are hard to find, and when one does, it makes sense to slap a subscription fee on them; for example, POISINDEX.
Why am I running though what strikes me as basic information about searching the Web? “Dawn of a New Era in Search: Balancing Innovation, Competition, and Public Good” is interesting and does a good job of expressing a specific view of Web search and Google’s content and information assets. I want to highlight the section of the write up titled “The Essential Facilities Doctrine.” The idea is that Google’s search index should be made available to everyone. The idea is interesting, and it might work after legal processes in the US were exhausted. The gating factor will be money and the political climate.
From a competitor’s point of view, the index blended with new ideas about how to answer a user’s query would level the playing field. From Google’s point of view it would loss of intellectual property.
Several observations:
- The hunger to punish Big Tech seems to demand being satisfied. Something will come from the judicial decision that Google is a monopoly. It took a couple of decades to arrive at what was obvious to some after the Yahoo ad technology settlement prior to the IPO, but most people didn’t and still don’t get “it.” So something will happen. What is not yet known.
- Wide access to the complete Google index could threaten the national security of the US. Please, think about this statement. I can’t provide any color, but it is a consideration among some professionals.
- An appeal could neutralize some of the “harms,” yet allow the indexing business to continue. Specific provisions might be applied to the decision of Judge Mehta. A modified landscape for search could be created, but online services tend to coalesce into efficient structures. Like the break up of AT&T, the seven Baby Bells and Bell Labs have become AT&T and Verizon. This could happen if “ads” were severed from Web search. But after a period of time, the break up is fighting one of the Arnold Laws of Online: A single monopoly is more efficient and emergent.
To sum up, the time for action came and like a train in Switzerland, left on time. Undoing Google is going to be more difficult than fiddling with Standard Oil or the railroad magnates.
Stephen E Arnold, September 6, 2024
Consensus: A Gen AI Search Fed on Research, not the Wild Wild Web
September 3, 2024
How does one make an AI search tool that is actually reliable? Maybe start by supplying it with only peer-reviewed papers instead of the whole Internet. Fast Company sings the praises of Consensus in, “Google Who? This New Service Actually Gets AI Search Right.” Writer JR Raphael begins by describing why most AI-powered search engines, including Google, are terrible:
“The problem with most generative AI search services, at the simplest possible level, is that they have no idea what they’re even telling you. By their very nature, the systems that power services like ChatGPT and Gemini simply look at patterns in language without understanding the actual context. And since they include all sorts of random internet rubbish within their source materials, you never know if or how much you can actually trust the info they give you.”
Yep, that pretty much sums it up. So, like us, Raphael was skeptical when he learned of yet another attempt to bring generative AI to search. Once he tried the easy-to-use Consensus, however, he was convinced. He writes:
“In the blink of an eye, Consensus will consult over 200 million scientific research papers and then serve up an ocean of answers for you—with clear context, citations, and even a simple ‘consensus meter’ to show you how much the results vary (because here in the real world, not everything has a simple black-and-white answer!). You can dig deeper into any individual result, too, with helpful features like summarized overviews as well as on-the-fly analyses of each cited study’s quality. Some questions will inevitably result in answers that are more complex than others, but the service does a decent job of trying to simplify as much as possible and put its info into plain English. Consensus provides helpful context on the reliability of every report it mentions.”
See the post for more on using the web-based app, including a few screenshots. Raphael notes that, if one does not have a specific question in mind, the site has long lists of its top answers for curious users to explore. The basic service is free to search with no query cap, but creators hope to entice us with an $8.99/ month premium plan. Of course, this service is not going to help with every type of search. But if the subject is worthy of academic research, Consensus should have the (correct) answers.
Cynthia Murrell, September 3, 2024