Europe Wants Its Own Search System: Filtering, Trees, and More
November 20, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I am not going to recount the history of search companies and government entities building an alternative to Google. One can toss in Bing, but Google is the Big Dog. Yandex is useful for Russian content. But there is a void even though Swisscows.com is providing anonymity (allegedly) and no tracking (allegedly).
Now a new European solution may become available. If you remember Pertimm, you probably know that Qwant absorbed some of that earlier search system’s goodness. And there is Ecosia, a search system which plants trees. The union of these two systems will be an alternative to Google. I think Exalead.com tried this before, but who remembers European search history in rural Kentucky?
“Two Upstart Search Engines Are Teaming Up to Take on Google” report:
The for-profit joint venture, dubbed European Search Perspective and located in Paris, could allow the small companies and any others that decide to join up to reduce their reliance on Google and Bing and serve results that are better tailored to their companies’ missions and Europeans’ tastes.
A possible name or temporary handle for the new search system is EUSP or European Search Perspective. What’s interesting is that the plumbing will be provided by a service provider named OVH. Four years ago, OVHcloud became a strategic partner of … wait for it … Google. Apparently that deal does not prohibit OVH from providing services to a European alternative to Google.
Also, you may recall that Eric Schmidt, former adult in the room at Google, suggested that Qwant kept him awake at night. Yes, Qwant has been a threat to Google for 13 years. How has that worked out? The original Qwant was interesting with a novel way of showing results from different types of sources. Now Qwant is actually okay. The problem with any search system, including Bing, is that the cost of maintaining an index containing new content and refreshing or updating previously indexed content is a big job. Toss in some AI goodness and cash burning furiously.
“Google” is now the word for search whether it works or does not. Perhaps regulatory actions will alter the fact that in Denmark, 99 percent of user queries flow to Google. Yep, Denmark. But one can’t go wrong with a ballpark figure like 95 percent of search queries outside of China and a handful of other countries are part of the Google market share.
How will the new team tackle the Google? I hope in a way that delivers more progress than Cogito. Remember that? Okay, no problem.
PS. Is a 13-year-old company an upstart? Sigh.
Stephen E Arnold, November 20, 2024
Content Conversion: Search and AI Vendors Downplay the Task
November 19, 2024
No smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.
Marketers and PR people often have degrees in political science, communications, or art history. This academic foundation means that some of these professionals can listen to a presentation and struggle to figure out what’s a horse, what’s horse feathers, and what’s horse output.
Consequently, many organizations engaged in “selling” enterprise search, smart software, and fusion-capable intelligence systems downplay or just fib about how darned easy it is to take “content” and shove it into the Fancy Dan smart software. The pitch goes something like this: “We have filters that can handle 90 percent of the organization’s content. Word, PowerPoint, Excel, Portable Document Format (PDF), HTML, XML, and data from any system that can export tab delimited content. Just import and let our system increase your ability to analyze vast amounts of content. Yada yada yada.”
Thanks, Midjourney. Good enough.
The problem is that real life content is often a problem. I am not going to trot out my list of content problem children. Instead I want to ask a question: If dealing with content is a slam dunk, why do companies like IBM and Oracle sustain specialized tools to convert Content Type A into Content Type B?
The answer is that content processing is an essential step because [a] structured and unstructured content can exist in different versions. Figuring out the one that is least wrong and most timely is tricky. [b] Humans love mobile devices, laptops, home computers, photos, videos, and audio. Furthermore, how does a content processing get those types of content from a source not located in an organization’s office (assuming it has one) and into the content processing system? The answer is, “Money, time, persuasion, and knowledge of what employee has what.” Finding a unicorn at the Kentucky Derby is more likely. [c] Specialized systems employ lingo like “Export as” and provide some file types. Yeah. The problem is that the output may not contain everything that is in the specialized software program. Examples range from computational chemistry systems to those nifty AutoCAD type drawing system to slick electronic trace routing solutions to DaVinci Resolve video systems which can happily pull “content” from numerous places on a proprietary network set up. Yeah, no problem.
Evidence of how big this content conversion issue is appears in the IBM write up “A New Tool to Unlock Data from Enterprise Documents for Generative AI.” If the content conversion work is trivial, why is IBM wasting time and brainpower figuring out something like making a PowerPoint file smart software friendly?
The reason is that as big outfits get “into” smart software, the people working on the project find that the exception folder gets filled up. Some documents and content types don’t convert. If a boss asks, “How do we know the data in the AI system are accurate?”, the hapless IT person looking at the exception folder either lies or says in a professional voice, “We don’t have a clue?”
IBM’s write up says:
IBM’s new open-source toolkit, Docling, allows developers to more easily convert PDFs, manuals, and slide decks into specialized data for customizing enterprise AI models and grounding them on trusted information.
But one piece of software cannot do the job. That’s why IBM reports:
The second model, TableFormer, is designed to transform image-based tables into machine-readable formats with rows and columns of cells. Tables are a rich source of information, but because many of them lie buried in paper reports, they’ve historically been difficult for machines to parse. TableFormer was developed for IBM’s earlier DeepSearch project to excavate this data. In internal tests, TableFormer outperformed leading table-recognition tools.
Why are these tools needed? Here’s IBM’s rationale:
Researchers plan to build out Docling’s capabilities so that it can handle more complex data types, including math equations, charts, and business forms. Their overall aim is to unlock the full potential of enterprise data for AI applications, from analyzing legal documents to grounding LLM responses on corporate policy documents to extracting insights from technical manuals.
Based on my experience, the paragraph translates as, “This document conversion stuff is a killer problem.”
When you hear a trendy enterprise search or enterprise AI vendor talk about the wonders of its system, be sure to ask about document conversion. Here are a few questions to put the spotlight on what often becomes a black hole of costs:
- If I process 1,000 pages of PDFs, mostly text but with some charts and graphs, what’s the error rate?
- If I process 1,000 engineering drawings with embedded product and vendor data, what percentage of the content is parsed for the search or AI system?
- If I process 1,000 non text objects like videos and iPhone images, what is the time required and the metadata accuracy for the converted objects?
- Where do unprocessable source objects go? An exception folder, the trash bin, or to my in box for me to fix up?
Have fun asking questions.
Stephen E Arnold, November 19, 2024
Dreaming about Enterprise Search: Hope Springs Eternal…
November 6, 2024
The post is the work of a humanoid who happens to be a dinobaby. GenX, Y, and Z, read at your own risk. If art is included, smart software produces these banal images.
Enterprise search is back, baby. The marketing lingo is very year 2003, however. The jargon has been updated, but the story is the same: We can make an organization’s information accessible. Instead of Autonomy’s Neurolinguistic Programming, we have AI. Instead of “just text,” we have video content processed. Instead of filters, we have access to cloud-stored data.
An executive knows he can crack the problem of finding information instantly. The problem is doing it so that the time and cost of data clean up does not cost more than buying the Empire State Building. Thanks, Stable Diffusion. Good enough.
A good example of the current approach to selling the utility of an enterprise search and retrieval system is the article / interview in Betanews called “How AI Is Set to Democratize Information.” I want to be upfront. I am a mostly aligned with the analysis of information and knowledge presented by Taichi Sakaiya. His The Knowledge Value Revolution or a History of the Future has been a useful work for me since the early 1990s. I was in Osaka, Japan, lecturing at the Kansai Institute of Technology when I learned of this work book from my gracious hosts and the Managing Director of Kinokuniya (my sponsor). Devaluing knowledge by regressing to the fat part of a Gaussian distribution is not something about which I am excited.
However, the senior manager of Pyron (Raleigh, North Carolina), an AI-powered information retrieval company, finds the concept in line with what his firm’s technology provides to its customers. The article includes this statement:
The concept of AI as a ‘knowledge cloud’ is directly tied to information access and organizational intelligence. It’s essentially an interconnected network of systems of records forming a centralized repository of insights and lessons learned, accessible to individuals and organizations.
The benefit is, according to the Pyron executive:
By breaking down barriers to knowledge, the AI knowledge cloud could eliminate the need for specialized expertise to interpret complex information, providing instant access to a wide range of topics and fields.
The article introduces a fresh spin on the problems of information in organizations:
Knowledge friction is a pervasive issue in modern enterprises, stemming from the lack of an accessible and unified source of information. Historically, organizations have never had a singular repository for all their knowledge and data, akin to libraries in academic or civic communities. Instead, enterprise knowledge is scattered across numerous platforms and systems — each managed by different vendors, operating in silos.
Pyron opened its doors in 2017. After seven years, the company is presenting a vision of what access to enterprise information could, would, and probably should do.
The reality, based on my experience, is different. I am not talking about Pyron now. I am discussing the re-emergence of enterprise search as the killer application for bolting artificial intelligence to information retrieval. If you are in love with AI systems from oligopolists, you may want to stop scanning this blog post. I do not want to be responsible for a stroke or an esophageal spasm. Here we go:
- Silos of information are an emergent phenomenon. Knowledge has value. Few want to make their information available without some value returning to them. Therefore, one can talk about breaking silos and democratization, but those silos will be erected and protected. Secret skunk works, mislabeled projects, and squirreling away knowledge nuggets for a winter’s day. In the case of Senator Everett Dirksen, the information was used to get certain items prioritized. That’s why there is a building named after him.
- The “value” of information or knowledge depends on another person’s need. A database which contains the antidote to save a child from a household poisoning costs money to access. Why? Desperate people will pay. The “information wants to free” idea is not one that makes sense to those with information and the knowledge to derive value from what another finds inscrutable. I am not sure that “democratizing information” meshes smoothly with my view.
- Enterprise search, with or without, hits some cost and time problems with a small number of what have been problems for more than 50 years. SMART failed, STAIRS III failed, and the hundreds of followers have failed. Content is messy. The idea that one can process text, spreadsheets, Word files, and email is one thing. Doing it without skipping wonky files or the time and cost of repurposing data remains difficult. Chemical companies deal with formulae; nuclear engineering firms deal with records management and mathematics; and consulting companies deal with highly paid people who lock up their information on a personal laptop. Without these little puddles of information, the “answer” or the “search output” will not be just a hallucination. The answer may be dead wrong.
I understand the need to whip up jargon like “democratize information”, “knowledge friction”, and “RAG frameworks”. The problem is that despite the words, delivering accurate, verifiable, timely on-point search results in response to a query is a difficult problem.
Maybe one of the monopolies will crack the problem. But most of output is a glimpse of what may be coming in the future. When will the future arrive? Probably when the next PR or marketing write up about search appears. As I have said numerous times, I find it more difficult to locate the information I need than at any time in my more than half a century in online information retrieval.
What’s easy is recycling marketing literature from companies who were far better at describing a “to be” system, not a “here and now” system.
Stephen E Arnold, November 4, 2024
Can Prabhakar Do the Black Widow Thing to Technology at Google?
October 21, 2024
No smart software but we may use image generators to add some modern spice to the dinobaby’s output.
The reliable (mostly?) Wall Street Journal ran a story titled“Google Executive Overseeing Search and Advertising Leaves Role.” The executive in question is Prabhakar Raghavan, the other half of the Sundar and Prabhakar Comedy Team. The wizardly Prabhakar is the person Edward Zitron described as “The Man Who Killed Google Search.” I recommend reading that essay because it has more zip than the Murdoch approach to poohbah analysis.
I want to raise a question because I assume that Mr. Zitron is largely correct about the demise of Google Search. The sleek Prabhakar accelerated the decline. He was the agent of the McKinsey think infused in his comedy partner Sundar. The two still get laughs at their high school reunions amidst chums and more when classmates gather to explain their success to one another.
The Google approach: Who needs relevance? Thanks, MSFT Copilot. Not quite excellent.
What is the question? Here it is:
Will Prabhakar do to Google’s technology what he did to search?
My view is that Google’s technology has demonstrated corporate ossification. The company “invented”, according to Google lore, the transformer. Then Google — because it was concerned about its invention — released some of it as open source and then watched as Microsoft marketed AI as the next big thing for the Softies. And what was the outfit making Microsoft’s marketing coup possible? It was Sam AI-Man.
Microsoft, however, has not been a technology leader for how many years?
Suddenly the Google announced a crisis and put everyone on making Google the leader in AI. I assume the McKinsey think did not give much thought to the idea that MSFT’s transformer would be used to make Google look darned silly. In fact, it was Prabhakar who stole the attention of the pundits with a laughable AI demonstration in Paris.
Flash forward from early 2023 to late 2024 what’s Google doing with technology? My perception is that Google is trying to create AI winners, capture the corporate market from Microsoft, and convince as many people as possible that if Google is broken apart, AI in America will flop.
Yes, the fate of the nation hangs on Google’s remaining a monopoly. That sounds like a punch line to a skit in the Sundar and Prabhakar Comedy Show.
Here’s my hypothesis: The death of search (the Edward Zitron view) is a job well done. The curtains fall on Act I of the Google drama. Act II is about the Google technology. The idea is that the technology of the online advertising monopoly defines the future of America.
Stay tuned because the story will be streamed on YouTube with advertising, lots of advertising, of course.
Stephen E Arnold, October 21, 2024
Online Search: The Old Function Is in Play
October 18, 2024
Just a humanoid processing information related to online services and information access.
We spotted an interesting marketing pitch from Kagi.com, the pay-to-play Web search service. The information is located on the Kagi.com Help page at this link. The approach is what I call “fact-centric marketing.” In the article, you will find facts like these:
In 2022 alone, search advertising spending reached a staggering 185.35 billion U.S. dollars worldwide, and this is forecast to grow by six percent annually until 2028, hitting nearly 261 billion U.S. dollars.
There is a bit of consultant-type analysis which explains the difference between Google’s approach labeled “ad-based search” and the Kagi.com approach called “user-centric search.” I don’t want to get into an argument about these somewhat stark bifurcations in the murky world of information access, search, and retrieval. Let’s just accept the assertion.
I noted more numbers. Here’s a sampling (not statistically valid, of course):
Google generated $76 billion in US ad revenue in 2023. Google had 274 million unique visitors in the US as of February 2023. To estimate the revenue per user, we can divide the 2023 US ad revenue by the 2023 number of users: $76 billion / 274 million = $277 revenue per user in the US or $23 USD per month, on average! That means there is someone, somewhere, a third party and a complete stranger, an advertiser, paying $23 per month for your searches.
The Kagi.com point is:
Choosing to subscribe to Kagi means that while you are now paying for your search you are getting a fair value for your money, you are getting more relevant results, are able to personalize your experience and take advantage of all the tools and features we built, all while protecting your and your family’s privacy and data.
Why am I highlighting this Kagi.com Help information? Leo Laporte on the October 13, 2024, This Week in Tech program talked about Kagi. He asserted that Kagi uses Bing, Google, and its own search index. I found this interesting. If true, Mr. Laporte is disseminating the idea that Kagi.com is a metasearch engine like Ixquick.com (now StartPage.com). The murkiness about what a Web search engine presents to a user is interesting.
A smart person is explaining why paying for search and retrieval is a great idea. It may be, but Google has other ideas. Thanks, You.com. Good enough
In the last couple of days I received an invitation to join a webinar about a search system called Swirl, which connotes mixing content perhaps? I also received a spam message from a fund called TheStreet explaining that the firm has purchased a block of Elastic B.V. shares. A company called provided an interesting explanation of what struck me as a useful way to present search results.
Everywhere companies are circling back to the idea that one cannot “find” needed information.
With Google facing actual consequences for its business practices, that company is now suggesting this angle: “Hey, you can’t break us up. Innovation in AI will suffer.”
So what is the future? Will vendors get a chance to use the Google search index for free? Will alternative Web search solutions become financial wins? Will metasearch triumph, using multiple indexes and compiling a single list of results? Will new-fangled solutions like Glean dominate enterprise information access and then move into the mainstream? Will visual approaches to information access kick “words” to the curb?
Here are some questions I like to ask those who assert that they are online experts, and I include those in the OSINT specialist clan as well:
- Finding information is an unsolved problem. Can you, for example, easily locate a specific frame from a video your mobile device captured a year ago?
- Can you locate the specific expression in a book about linear algebra germane to the question you have about its application to an AI procedure?
- Are you able to find quickly the telephone number (valid at the time of the query) for a colleague you met three years ago at an international conference?
As 2024 rushes to what is likely to be a tumultuous conclusion, I want to point out that finding information is a very difficult job. Most people tell themselves they can find the information needed to address a specific question or task. In reality, these folks are living in a cloud of unknowing. Smart software has not made keyword search obsolete. For many users, ChatGPT or other smart software is a variant of search. If it is easy to use and looks okay, the output is outstanding.
So what? I am not sure the problem of finding the right information at the right time has been solved. Free or for fee, ad supported or open sourced, dumb string matching or Fancy Dan probabilistic pattern identification — none is delivering what so many people believe are on point, relevant, timely information. Don’t even get me started on the issue of “correct” or “accurate.”
Marketers, stand down. Your assertions, webinars, advertisements, special promotions, jargon, and buzzwords do not deliver findability to users who don’t want to expend effort to move beyond good enough. I know one thing for certain, however: Finding relevant information is now more difficult than it was a year ago. I have a hunch the task is only become harder.
Stephen E Arnold, October 18, 2024
Google and Search: A Fix or a Pipe Dream?
September 6, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I read “Dawn of a New Era in Search: Balancing Innovation, Competition, and Public Good.”
Don’t get me wrong. I think multiple search systems are a good thing. The problem is that search (both enterprise and Web) are difficult problems, and these problems are expensive to solve. After working more than 50 years in electronic information, I have seen search systems come and go. I have watched systems morph from search into weird products that hide the search plumbing beneath fancy words like business intelligence and OSINT tools, among others. In 2006 or 2007, one of my financial clients published some of our research. The bank received an email from an “expert” (formerly and Verity) that his firm had better technology than Google. In that conversation, that “expert” said, “I can duplicate Google search for $300 million.” The person who said these incredibly uninformed words is now head of search at Google. Ed Zitron has characterized the individual as the person who killed Google search. Well, that fellow and Google search are still around. This suggests that baloney and high school reunions provide a career path for some people. But search is not understood particularly well at Google at this time. It is, therefore, that awareness of the problems of search is still unknown to judges, search engine marketing experts, developers of metasearch systems which recycle Bing results, and most of the poohbahs writing about search in blogs like Beyond Search.
The poor search kids see the rich guy with lots of money. The kids want it. The situation is not fair to those with little or nothing. Will the rich guy share the money? Thanks, Microsoft Copilot. Good enough. Aren’t you one of the poor Web search vendors?
After five decades of arm wrestling with finding on point information for myself, my clients, and for the search-related start ups with whom I have worked, I have an awareness of how much complexity the word “search” obfuscates. There is a general perception that Google indexes the Web. It doesn’t. No one indexes the Web. What’s indexed are publicly exposed Web pages which a crawler can access. If the response is slow (like many government and underfunded personal / commercial sites), spiders time out. The pages are not indexed. The crawlers have to deal in a successful way with the changes on how Web pages are presented. Upon encountering something for which the crawler is not configured, the Web page is skipped. Certain Web sites are dynamic. The crawler has to cope with these. Then there are Web pages which are not composed of text. The problems are compounded by the vagaries of intermediaries’ actions; for example, what’s being blocked or filtered today? The answer is the crawler skips them.
Without revealing information I am not permitted to share, I want to point out that crawlers have a list which contains bluebirds, canaries, and dead ducks. The bluebirds are indexed by crawlers on an aggressive schedule, maybe multiple times every hour. The canaries are the index-on-a-normal-cycle, maybe once every day or two. The dead ducks are crawled when time permits. Some US government Web sites may not be updated in six or nine months. The crawler visits the site once every six months or even less frequently. Then there are forbidden sites which the crawler won’t touch. These are on the open Web but urls are passed around via private messages. In terms of a Web search, these sites don’t exist.
How much does this cost? The answer is, “At scale, a lot. Indexing a small number of sites is really cheap.” The problem is that in order to pull lots of clicks, one has to have the money to scale or a niche no one else is occupying. Those are hard to find, and when one does, it makes sense to slap a subscription fee on them; for example, POISINDEX.
Why am I running though what strikes me as basic information about searching the Web? “Dawn of a New Era in Search: Balancing Innovation, Competition, and Public Good” is interesting and does a good job of expressing a specific view of Web search and Google’s content and information assets. I want to highlight the section of the write up titled “The Essential Facilities Doctrine.” The idea is that Google’s search index should be made available to everyone. The idea is interesting, and it might work after legal processes in the US were exhausted. The gating factor will be money and the political climate.
From a competitor’s point of view, the index blended with new ideas about how to answer a user’s query would level the playing field. From Google’s point of view it would loss of intellectual property.
Several observations:
- The hunger to punish Big Tech seems to demand being satisfied. Something will come from the judicial decision that Google is a monopoly. It took a couple of decades to arrive at what was obvious to some after the Yahoo ad technology settlement prior to the IPO, but most people didn’t and still don’t get “it.” So something will happen. What is not yet known.
- Wide access to the complete Google index could threaten the national security of the US. Please, think about this statement. I can’t provide any color, but it is a consideration among some professionals.
- An appeal could neutralize some of the “harms,” yet allow the indexing business to continue. Specific provisions might be applied to the decision of Judge Mehta. A modified landscape for search could be created, but online services tend to coalesce into efficient structures. Like the break up of AT&T, the seven Baby Bells and Bell Labs have become AT&T and Verizon. This could happen if “ads” were severed from Web search. But after a period of time, the break up is fighting one of the Arnold Laws of Online: A single monopoly is more efficient and emergent.
To sum up, the time for action came and like a train in Switzerland, left on time. Undoing Google is going to be more difficult than fiddling with Standard Oil or the railroad magnates.
Stephen E Arnold, September 6, 2024
Consensus: A Gen AI Search Fed on Research, not the Wild Wild Web
September 3, 2024
How does one make an AI search tool that is actually reliable? Maybe start by supplying it with only peer-reviewed papers instead of the whole Internet. Fast Company sings the praises of Consensus in, “Google Who? This New Service Actually Gets AI Search Right.” Writer JR Raphael begins by describing why most AI-powered search engines, including Google, are terrible:
“The problem with most generative AI search services, at the simplest possible level, is that they have no idea what they’re even telling you. By their very nature, the systems that power services like ChatGPT and Gemini simply look at patterns in language without understanding the actual context. And since they include all sorts of random internet rubbish within their source materials, you never know if or how much you can actually trust the info they give you.”
Yep, that pretty much sums it up. So, like us, Raphael was skeptical when he learned of yet another attempt to bring generative AI to search. Once he tried the easy-to-use Consensus, however, he was convinced. He writes:
“In the blink of an eye, Consensus will consult over 200 million scientific research papers and then serve up an ocean of answers for you—with clear context, citations, and even a simple ‘consensus meter’ to show you how much the results vary (because here in the real world, not everything has a simple black-and-white answer!). You can dig deeper into any individual result, too, with helpful features like summarized overviews as well as on-the-fly analyses of each cited study’s quality. Some questions will inevitably result in answers that are more complex than others, but the service does a decent job of trying to simplify as much as possible and put its info into plain English. Consensus provides helpful context on the reliability of every report it mentions.”
See the post for more on using the web-based app, including a few screenshots. Raphael notes that, if one does not have a specific question in mind, the site has long lists of its top answers for curious users to explore. The basic service is free to search with no query cap, but creators hope to entice us with an $8.99/ month premium plan. Of course, this service is not going to help with every type of search. But if the subject is worthy of academic research, Consensus should have the (correct) answers.
Cynthia Murrell, September 3, 2024
Elastic N.V. Faces a New Search Challenge
September 2, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Elastic N.V. and Shay Banon are what I call search survivors. Gone are Autonomy (mostly), Delphis, Exalead, Fast Search & Transfer (mostly), Vivisimo, and dozens upon dozens of companies who sought to put an organization’s information at an employee’s fingertips. The marketing lingo of these and other now-defunct enterprise search vendors is surprisingly timely. Once can copy and paste chunks of Autonomy’s white papers into the OpenAI ChatGPT search is coming articles and few would notice that the assertions and even the word choice was more than 40 years old.
Elastic N.V. survived. It rose from a failed search system called Compass. Elastic N.V. recycled the Lucene libraries, released the open source Elasticsearch, and did an IPO. Some people made a lot of money. The question is, “Will that continue?”
I noted the Silicon Angle article “Elastic Shares Plunge 25% on Lower Revenue Projections Amid Slower Customer Commitments.” That write up says:
In its earnings release, Chief Executive Officer Ash Kulkarni started positively, noting that the results in the quarter we solid and outperformed previous guidance, but then comes the catch and the reason why Elastic stock is down so heavily after hours. “We had a slower start to the year with the volume of customer commitments impacted by segmentation changes that we made at the beginning of the year, which are taking longer than expected to settle,” Kulkarni wrote. “We have been taking steps to address this, but it will impact our revenue this year.” With that warning, Elastic said that it expects fiscal second-quarter adjusted earnings per share of 37 to 39 cents on revenue of $353 million to $355 million. The earnings per share forecast was ahead of the 34 cents expected by analysts, but revenue fell short of an expected $360.8 million. It was a similar story for Elastic’s full-year outlook, with the company forecasting earnings per share of $1.52 to $1.56 on revenue of $1.436 billion to $1.444 billion. The earnings per share outlook was ahead of an expected $1.42, but like the second quarter outlook, revenue fell short, as analysts had expected $1.478 billion.
Elastic N.V. makes money via service and for-fee extras. I want to point out that the $300 million or so revenue numbers are good. Elastic B.V. has figured out a business model that has not required [a] fiddling the books, [b] finding a buyer as customers complain about problems with the search software, [c] the sources of financing rage about cash burn and lousy revenue, [d] government investigators are poking around for tax and other financial irregularities, [e] the cost of running the software is beyond the reach of the licensee, or [f] the system simply does not search or retrieve what the user wanted or expected.
Elastic B.V. and its management team may have a challenge to overcome. Thanks, OpenAI, the MSFT Copilot thing crashed today.
So what’s the fix?
A partial answer appears in the Elastic B.V. blog post titled “Elasticsearch Is Open Source, Again.” The company states:
The tl;dr is that we will be adding AGPL as another license option next to ELv2 and SSPL in the coming weeks. We never stopped believing and behaving like an open source community after we changed the license. But being able to use the term Open Source, by using AGPL, an OSI approved license, removes any questions, or fud, people might have.
Without slogging through the confusion between what Elastic B.V. sells, the open source version of Elasticsearch, the dust up with Amazon over its really original approach to search inspired by Elasticsearch, Lucid Imagination’s innovation, and the creaking edifice of A9, Elastic B.V. has released Elasticsearch under an additional open source license. I think that means one can use the software and not pay Elastic B.V. until additional services are needed. In my experience, most enterprise search systems regardless of how they are explained need the “owner” of the system to lend a hand. Contrary to the belief that smart software can do enterprise search right now, there are some hurdles to get over.
Will “going open source again” work?
Let me offer several observations based on my experience with enterprise search and retrieval which reaches back to the days of punch cards and systems which used wooden rods to “pull” cards with a wanted tag (index term):
- When an enterprise search system loses revenue momentum, the fix is to acquire companies in an adjacent search space and use that revenue to bolster the sales prospects for upsells.
- The company with the downturn gilds the lily and seeks a buyer. One example was the sale of Exalead to Dassault Systèmes which calculated it was more economical to buy a vendor than to keep paying its then current supplier which I think was Autonomy, but I am not sure. Fast Search & Transfer pulled of this type of “exit” as some of the company’s activities were under scrutiny.
- The search vendor can pivot from doing “search” and morph into a business intelligence system. (By the way, that did not work for Grok.)
- The company disappears. One example is Entopia. Poof. Gone.
I hope Elastic B.V. thrives. I hope the “new” open source play works. Search — whether enterprise or Web variety — is far from a solved problem. People believe they have the answer. Others believe them and license the “new” solution. The reality is that finding information is a difficult challenge. Let’s hope the “downturn” and “negativism” goes away.
Stephen E Arnold, September 2, 2024
A Familiar Cycle: The Frustration of Almost Solving the Search Problem
August 16, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Search and retrieval is a difficult problem. The solutions have ranged from scrolls with labels to punched cards and rods to bags of words. Each innovation or advance sparked new ideas. Boolean gave way to natural language. Natural language evolved into semi-smart systems. Now we are in the era of what seems to be smart software. Like the punch card systems, users became aware of the value of consistent, accurate indexing. Today one expects a system to “know” what the user wants. Instead of knowing index terms, one learns to be a prompt engineer.
Search and retrieval is not “solved” using large language models. LLMs are a step forward on a long and difficult path. The potential financial cost of thinking that the methods are a sure-fire money machine is high. Thanks, MSFT Copilot. How was DEFCON?
I read “LLM Progress Is Slowing — What Will It Mean for AI?.” The write up makes clear that some of the excitement of smart software which can makes sense of natural language queries (prompts) has lost some of its shine. This type of insight is one that probably existed when a Babylonian tablet maker groused about not having an easy way to stack up clay tablets for the money guy. Search and retrieval is essential for productive work. A system which makes that process less of a hassle is welcomed. After a period of time one learns that the approach is not quite where the user wants it to be. Researchers and innovators hear the complaint and turn their attention to improving search and retrieval … again.
The write up states:
The leap from GPT-3 to GPT-3.5 was huge, propelling OpenAI into the public consciousness. The jump up to GPT-4 was also impressive, a giant step forward in power and capacity. Then came GPT-4 Turbo, which added some speed, then GPT-4 Vision, which really just unlocked GPT-4’s existing image recognition capabilities. And just a few weeks back, we saw the release of GPT-4o, which offered enhanced multi-modality but relatively little in terms of additional power. Other LLMs, like Claude 3 from Anthropic and Gemini Ultra from Google, have followed a similar trend and now seem to be converging around similar speed and power benchmarks to GPT-4. We aren’t yet in plateau territory — but do seem to be entering into a slowdown. The pattern that is emerging: Less progress in power and range with each generation.
This is an echo of the complaints I heard about Dr. Salton’s SMART search system.
The “fix” according to the write up may be to follow one of these remediation paths:
- More specialization
- New user interfaces
- Open source large language models
- More and better data
- New large language model architectures.
These are ideas bolted to the large language model approach to search and retrieval. I think each has upsides and downsides. These deserve thoughtful discussion. However, the evolution of search-and-retrieval has been an evolutionary process. Those chaos and order thinkers at the Santa Fe Institute suggest that certain “things” self organize and emerge. The idea has relevance to what happens with each “new” approach to search and retrieval.
The cited write up concludes with this statement:
One possible pattern that could emerge for LLMs: That they increasingly compete at the feature and ease-of-use levels. Over time, we could see some level of commoditization set in, similar to what we’ve seen elsewhere in the technology world. Think of, say, databases and cloud service providers. While there are substantial differences between the various options in the market, and some developers will have clear preferences, most would consider them broadly interchangeable. There is no clear and absolute “winner” in terms of which is the most powerful and capable.
I think the idea about competition is mostly correct. However, what my impression of search and retrieval as a technology thread is that progress is being made. I find it encouraging that more users are interacting with systems. Unfortunately search and retrieval is not solved by generating a paragraph a high school student can turn into a history teacher as an original report.
Effective search and retrieval is not just a prompt box. Effective information access remains a blend of extraordinarily trivial activities. For instance, a conversation may suggest a new way to locate relevant information. Reading an article or a longer document may trigger an unanticipated connection between ant colonies and another task-related process. The act of looking at different sources may lead to a fact previously unknown which leads in turn to another knowledge insight. Software alone cannot replicate these mental triggers.
LLMs like stacked clay tablets provide challenges and utility. However, search and retrieval remains a work in progress. LLMs, like semantic ad matching, or using one’s search history as a context clue, are helpful. But opportunities for innovation exist. My view is that the grousing about LLM limitations is little more than a recognition that converting a human concept or information need to an “answer” is a work in progress. The difference is that today billions of dollars have been pumped into smart software in the hope that information retrieval is solved.
Sorry, it is not. Therefore, the stakes of realizing that the golden goose may not lay enough eggs to pay off the cost of the goose itself. Twenty years ago search and retrieval was not a sector consuming billions of dollars in the span of a couple of years. That’s what is making people nervous about LLMs. Watching Delphi or Entopia fail was expensive, but the scale of the financial loss and the emotional cost of LLM failure is a different kettle of fish.
Oh, and those five “fixes” in the bullet points from the write up. None will solve the problem of search and retrieval.
Stephen E Arnold, August 16, 2024
Publication Founded by a Googler Cheers for Google AI Search
June 5, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
To understand the “rah rah” portion of this article, you need to know the backstory behind Search Engine Land, a news site about search and other technology. It was founded by Danny Sullivan, who pushed the SEO bandwagon. He did this because he was angling for a job at Google, he succeeded, and now he’s the point person for SEO.
Another press release touting the popularity of Google search dropped: “Google SEO Says AI Overviews Are Increasing Search Usage.” The author Danny Goodwin remains skeptical about Google’s spiked popularity due to AI and despite the bias of Search Engine Land’s founder.
During the QI 2024 Alphabet earnings call, Google/Alphabet CEO Sundar Pichai said that the search engine’s generative AI has been used for billions of queries and there are plans to develop the feature further. Pichai said positive things about AI, including that it increased user engagement, could answer more complex questions, and how there will be opportunities for monetization.
Goodwin wrote:
“All signs continue to indicate that Google is continuing its slow evolution toward a Search Generative Experience. I’m skeptical about user satisfaction increasing, considering what an unimpressive product AI overviews and SGE continues to be. But I’m not the average Google user – and this was an earnings call, where Pichai has mastered the art of using a lot of words to say a whole lot of nothing.”
AI is the next evolution of search and Google is heading the parade, but the technology still has tons of bugs. Who founded the publication? A Googler. Of course there is no interaction between the online ad outfit and an SEO mouthpiece. Un-uh. No way.
Whitney Grace, June 5, 2024