Does Google Have Search Fear?

May 16, 2022

I can hear the Googlers at an search engine optimization conference saying this:

Our recent investments in search are designed to provide a better experience for our users. Our engineers are always seeking interesting, new, and useful ways to make the world’s information more accessible.

What these code words mean to me is:

Yep, the ancient Larry and Sergey thing. Not working. Oh, my goodness. What are we going to do? Buy Neeva, Kagi, Seekr, and Wecript? Let’s let Alphabet invest and we can learn and maybe earn before more people figure out our results are not as good as Bing and DuckDuckGo’s.

Even Slashdot is running items which make clear that Google and search do not warrant the title of “search giant.”

image

Source: Slashdot at https://bit.ly/3PkBOGt

I crafted this imaginary dialog when I read “This Germany-based AI Startup is Developing the Next Enterprise Search Engine Fueled by NLP and Open-Source.” That write up said:

Deepset, a German startup, is working to add to Natural Language Processing by integrating a language awareness layer into the business tech stack, allowing users to access and interact with data using language. Its flagship product, Haystack, is an open-source NLP framework that enables developers to create pipelines for a variety of search use-cases.

But here’s the snappy part of the article:

The Haystack-based NLP is typically implemented over a text database like Elasticsearch or Amazon’s OpenSearch branch and then connects directly with the end-user application through a REST API. It already has thousands of users and over 100 contributors. It uses transformer models to let developers create a variety of applications, such as production-ready question answering (QA), semantic document search, and summarization. The company has also introduced Deepset Cloud, an end-to-end platform for integrating customized and high-performing NLP-powered search systems into your application.

In theory, this is an open source, cloud centric super app, a meta play, a roll up of what’s needed to make finding information sort of work.

The kicker in the story is this statement:

The Berlin-based company has raised $14M in Series A funding led by GV, Alphabet’s venture capital arm.

Yep, the Google is investing. Why? Check that which applies:

(  ) Its own innovation engines are the equivalent of a Ford Pinto racing a Tesla Model S Plaid? Google search is no longer the world’s largest Web site?

(  ) Amazon gets more product searches than Google does?

( ) Users are starting to complain about how Google ignores what users key in the search box?

( ) Large sites are not being spidered in a comprehensive or timely manner?

( ) All of the above.

Stephen E Arnold, May 16, 2022

Kyndi: Advanced Search Technology with Quanton Methods. Yes, Quonton

April 29, 2022

One of my newsfeeds spit out this story: “Kyndi Unveils the Kyndi Natural Language Search Solution – Enables Enterprises to Discover and Deliver the Most Relevant and Precise Contextual Business Information at Unprecedented Speed.” The Kyndi founders appear to be business oriented, not engineering focused. The use of jargon like natural language understanding, contextual information, artificial intelligence, software robots, explainable artificial intelligence, and others is now almost automatic as if generated by smart software, not people who have struggled to make content processing and information retrieval work for users.

The firm’s Web site does not provide much detail about the technical pl8umbing for the company’s search and retrieval system. I took a quick look at the firm’s patents and noted these. I have added bold face to highlight some of  the interesting words in these documents.

  • A method using Birkhoff polytopes and Landau numbers. See US11205135 “Quanton [sic] Representation for Emulating Quantum-lie Computation on Classical Processors,”  granted December 21, 2021. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.
  • A method employing combinatorial hyper maps. See US10985775 “System and Method of Combinatorial Hypermap Based Data Representations and Operations,” Granted April 20, 2021. Inventor: Arun Majumdar, possibly in Alexandria, Virginia. (As a point of interest the document Includes the word bijectively.)
  • A method making use of Q-Medoids and Q-Hashing. See US10747740 “Cognitive Memory Graph Indexing, Storage and Retrieval,” granted August 18, 2020. Inventor: Arun Majumdar, possibly in San Mateo, California.
  • A method using Semantic Boundary Indices and a variant of the VivoMind* Analogy Engine. See US10387784 “Technical and Semantic Signal Processing in Large, Unstructured Data Fields,” granted August 20, 2019. Inventor: Arun Majumdar, possibly in Alexandria, Virginia. *VivoMind was a company started my Arun Majumdar prior to his relationship with Kyndi.
  • A method using rvachev functions and  transfinite interpolations. See US10372724 “Relativistic Concept Measuring System for Data Clustering,” granted August 6, 2019. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.
  • A method using Clifford algebra. See US10120933 “Weighted Subsymbolic Data Encoding,” granted November 6, 2018. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.

The inventor is not listed on the firm’s Web site. Mr. Majumdar’s contributions are significant. The chief technology officer is Dan Gartung, who is a programmer and entrepreneur. However, there does not seem to be an observable link among the founders, the current CTO, and Mr. Majumdar.

The company will have to work hard to capture mindshare from companies like Algolia (now working to reinvent enterprise search), Mindbreeze, Yext, and X1 (morphing into an eDiscovery system it seems), among others. Kyndi has absorbed more than  $20 million plus in venture funding, but a competitor like Lucidworks has captured in the neighborhood of $200 million.

It is worth noting that one facet of the firm’s marketing is to hire the whiz kids from a couple of mid tier consulting firms to explain the firm’s approach to search. It might be a good idea for the analysts from these firms to read the Kyndi patents and determine how the Vivomind methods have been updated and applied to the Kyndi product. A bit of benchmarking might be helpful. For example, my team uses a collection of Google patents and indexes them, runs tests queries, and analyzes the result sets. Almost incomprehensible specialist terminology is one thing, but solid, methodical analysis of a system’s real life performance is another. Precision and recall scores remain helpful, particularly for certain content; for example, pharma research, engineered materials, and nuclear physics.

Stephen E Arnold, April 29, 2022

Web Search Alternatives Compete with Gusto

April 22, 2022

Search and information blog DKB shares a roundup of interesting search systems in, “The Next Google.” Are we confident any of these will be the next Google? Nope. But there are several our readers might find useful. While relatively popular Google alternatives like DuckDuckGo and Bing are based on the Google model, the apps on this list take their own paths. The write-up tells us:

“The next Google can’t just be an input box that spits out links. We need new thinking to create something much better than what came before. In the last few years, different groups of people came to the same conclusion, and started working on the next generation of search engines. For this new generation, privacy is necessary, and invasive ads are not an option. But that’s where the commonalities end. Beyond that, they’ve all taken the idea of a search engine in very different directions. … This new wave of search engines is only just getting started. Many of them have only recently launched. Even if they aren’t perfect yet, the paths they’re exploring can lead to promising new innovation in the stagnant search space.”

First is Kagi, which emphasizes customization. Users decide how they want information presented and can refine the sources the search taps into. Then there is Neeva, which takes searches beyond the web and into one’s personal resources, like email and a wide array of online file storage systems. You.com tries to match each query with the source most relevant to the type of question, while Andi takes a little time to pinpoint the best answer and deliver it with the feel of a real conversation. Finally, Brave Search boasts its own independent index that does not rely on Google or Bing for results, an unusual achievement indeed. See the write-up for more information on each of these systems. No, Google is not going to be replaced across the Web any time soon. But some readers may find an option here that could replace it in their own browsers, at least some of the time.

Cynthia Murrell, April 22, 2022

Nuclia: The Solution to the Enterprise Search Problem?

April 21, 2022

I read an interesting article called “Spanish Startup Nuclia Gets $5.4M to Advance Unstructured Data Search.” The article includes an illustration, presumably provided by Nuclia, which depicts search as a super app accessed via APIs.

image

Source: Silicon Angle and possibly Nuclia.com. Consult the linked story to see the red lines zip around without bottlenecks. (What? Bottlenecks in content processing, index updating, and query processing. Who ever heard of such a thing?)

Here are some of the highlights — assertions is probably a better word — about the Nuclia technology:

  • The system is “AI powered.”
  • Nuclia can “connect to any data source and automatically index its content regardless of what format or even language it is in.”
  • The system can “discover semantic results, specific paragraphs in text and relationships between data. These capabilities can be integrated in any application with ease.”
  • Nuclia can “detect images within unstructured datasets.”
  • The cloud-based service can “say one video is X% similar to another one, and so on.”

What makes the Nuclia approach tick? There are two main components:

  • The Nuclia vector database which is available via GitHub
  • The application programming interface.

The news hook for the search story is that investors have input $5.4 million in seed funding to the company.

Algolia wants to reinvent search. Maybe Nuclia has? Google is search, but it may be intrigued with the assertions about vector embeddings and finding similarities which may be otherwise overlooked. The idea is that the ad for Liberty Mutual might be displayed in YouTube videos about seized yachts by business wizards on one or more lists of interesting individuals. Elastics may want to poke around Nuclia in a quest for adding some new functionality to its search system.

Enterprise search seems to be slightly less dormant than it has been.

Stephen E Arnold, April 21, 2022

Google Web Search Quality

April 20, 2022

The cat is out of the bag. The Reddit threat “Does Anyone Else Think Google Search Quality Has Gone Downhill Fast?” provides an interesting series of comments about “quality.”

The notion of “search quality” in the good old days involved gathering a corpus of text. The text was indexed using a system; for example, Smart or maybe Personal Bibliographic software. Test queries would be created in order to determine how the system displayed search results. The research minded person would then examine the corpus and determine if the result set returned the best matches. There are tricks those skilled in the art could use to make the test queries perform. One would calculate precision and recall. Bingo metrics. Now here’s the good part. Another search system would be used to index the content; for example, something interesting like the “old” Sagemaker, the mainframe fave IBM STAIRS III, or Excalibur. The performance of the second system would be compared to the first system. One would do this over time and generate precision and recall scores which could be compared. We used to use a corpus of Google patents, and I remember that Perfect Search (remember that one, gentle reader) outperformed a number of higher profile and allegedly more advanced systems.

I am not sure Reddit posts are into precision and recall, but the responses to the question about degradation of Google search quality is fascinating. Those posting are not too happy with what Google delivers and how the present day Googley search and retrieval system works. Thank you, Prabhakar Raghavan, former search wizard executive at Verity (wow, that was outstanding) and the individual who argued with a Bear Stearns’ managing director and me about how much better Yahoo’s semantic technology was that Google’s. Raghavan was at Yahooooo then and we know how wonderful Yahoo search was!)

Hewer’s a rundown of some of the issues identified in the Reddit thread:

  • From PizzaInteraction: “always laugh when I enter like 4 search terms and all the results focus on just one of the terms.”
  • Healthy-Contest-1605: “Every algorithm is being gamed to have their trash come out in top.”
  • Cl0udSurfer: “the usual tricks like adding quotes around required words, or putting a dash in front of words that should be excluded don’t work anymore.”

Net net: This is the Verity-Yahoo trajectory. Precision and recall? Ho ho ho. What about disclosing when a source was indexed and updated? What about Boolean operators? What about making as much money as possible so one can go to a high school reunion and explain the wonderfulness one’s cleverness? What happened to Louis Monier, Sanjay Ghemawat, and the Backrub crowd?

Stephen E Arnold, April 20, 2022

Google Responds to Amazon Product Search Growth

April 20, 2022

Here is a new feature from Google, dubbed Lens, we suspect was designed to win back product-search share from Amazon. TechCrunch reveals, “Google’s New ‘Multisearch’ Feature Lets You Search Using Text and Images at the Same Time.” The mobile-app feature, now running as a beta in the US, is available on Android and iOS. As one would expect, it allows one to ask questions or refine search results for a photo or other image. Writer Aisha Malik reports:

“Google told TechCrunch that the new feature currently has the best results for shopping searches, with more use cases to come in the future. With this initial beta launch, you can also do things beyond shopping, but it won’t be perfect for every search. In practice, this is how the new feature could work. Say you found a dress that you like but aren’t a fan of the color it’s available in. You could pull up a photo of the dress and then add the text ‘green’ in your search query to find it in your desired color. In another example, you’re looking for new furniture, but want to make sure it complements your current furniture. You can take a photo of your dining set and add the text ‘coffee table’ in your search query to find a matching table. Or, say you got a new plant and aren’t sure how to properly take care of it. You could take a picture of the plant and add the text ‘care instructions’ in your search to learn more about it.”

Malik notes this feature is great for times when neither an image nor words by themselves produce great Google results—a problem the platform has wrestled with. Lens employs the company’s latest ready-for-prime-time AI tech, but the developers hope to go further and incorporate their budding Multitask Unified Model (MUM). See the write up for more information, including a few screenshots of Lens at work.

Cynthia Murrell, April 20, 2022

DuckDuckGo and Filtering

April 18, 2022

I read “DuckDuckGo Removes Pirate Websites from Search Results: No More YouTube-dl?” The main thrust of the story is:

The private search engine, DuckDuckGo, has decided to remove pirate websites from its official search results.

DuckDuckGo is a metasearch engine. These are systems which may do some focused original spidering, but may send a user’s query to partner indexes. Then the results are presented to the user (which may be a human or a software robot). Some metasearch systems like Vivisimo invested some intellectual cycles in de-duplicating the results. (A helpful rule of thumb is to assume a 50 to 70 percent overlap in results from one Web search system to another.) IBM bought Vivisimo, and I have to admit that I have no idea what happened to the de-duplicating technology because … IBM.

There are more advanced metasearch systems. One example is Silobreaker, a system influenced by some Swedish wizards. The difference between a DuckDuckGo and an industrial strength system, in my opinion, is significant. Web search is an opaque service. Many behind-the-scenes actions take place, and some of the most important are not public disclosed in a way that makes sense to a person looking for pizza.

My question, Is DuckDuckGo actively filtering?” And “Why did this take so long?” And, “Is DuckDuckGo virtue signaling after its privacy misstep, or is the company snagged in a content marketing bramble?

I don’t know. My thoughts are:

  1. The editorial policies of metasearch systems should be disclosed; that is, we do this and we do that.
  2. Metasearch systems should disclose that many results are recycled and the provenance, age, and accuracy of the results are unknown to the metasearch provider?
  3. Metasearch systems should make clear exactly what the benefits of using the metasearch system are and why the provider of some search results are not as beneficial to the user; for example, which result is an ad (explicit or implicit), sponsored, etc.

Will metasearch systems embrace some of these thoughts? Nah. Those who use “free” Web search systems are in a cloud of unknowing.

Stephen E Arnold, April 18, 2022

DuckDuckGo Metasearch Service Causes Quacks

April 12, 2022

DuckDuckGo followed its technology brethren by rescinding most of its services in Russia due to the unfortunate invasion of Ukraine. The unbiased search engine CEO Gabriel Weinberg stated on March 9 that it would down rank Russian Web sites that spread disinformation. Much to DuckDuckGo’s surprise (as well as many others), the search engine was attacked by right-leaning, pro-free-speech supporters. The privacy search engine unintentionally attracted these supporters but did not discourage them.

Recode via Vox has the entire story: “The Free Speech Search Engine That Never Was.”

Weinberg tweeted his support for user privacy, but conservative supporters who used DuckDuckGo to search for content without Big Tech censorship were angry. They did not like that DuckDuckGo was demoting Russian propaganda Web sites. Oddly enough, these people also were pro-Putin’s invasion on Ukraine.

Right-wing supporters flocked to DuckDuckGo, because it was supposedly free of censorship that plagues other search engines like Google. These conservatives believe that information relating to their political and social beliefs was censored in all search engines except DuckDuckGo. These conservative supporters are more of the alt-right, conspiracy theorist type, i.e., anti-vaccination, DC capital insurrection. DuckDuckGo was okay with this:

“So DuckDuckGo surely knew what many of its new fans were coming to it for. They leaned into it a bit, too. Weinberg told Fox News and Quartz that Google’s search results were biased because Google collects data on users, which it then uses to target results to them. That, he said, created filter bubbles that further polarized society. Because DuckDuckGo didn’t collect data, its results were unbiased and searchers were free from Google’s echo chamber. This was a bit of a dodge; conservatives accused Google of intentionally keeping conservative sites and content off of its results, not just returning results influenced by a searchers’ interests. But it was an answer that seemed to satisfy users of all political persuasions. The alt-right wingers do not like that, but DuckDuckGo explained they are doing what search engines should be doing: “ensure that users were getting the best results for their searches.”

DuckDuckGo is one of many platforms that the alt-right adopted: Rumble, MeWe, Telegram, Substack. These platforms did not sky away from the users, because it meant more investments. We like the idea of a metasearch service protecting user privacy and, as a byproduct, false propaganda. Now how about better results?

Whitney Grace, April 12, 2022

Agolo: Making Government Sales and Landing VC Money

April 4, 2022

Apple, Google, and Microsoft might be search experts, but they continue to get things wrong. These are big companies, so they cannot solely concentrate on the search function like a dedicated company. Tech Crunch explains how Agolo specializes in search and improves upon what the tech giants do: “Agolo Summary-Powered Search Brings In Government Work And Fresh Funding.”

Agolo developed a powerful summary search engine that dissects articles and presents users with shorter versions that preserve key points. Agolo identified two types of search tools: dumb and smart. Dumb engines are not good at locating context or extraction data, while the smart functions are decent at fining relevance and order items. Both are limited in their capabilities. Agolo designed a smarter search function and described it as this:

“Agolo co-founder and CEO Sage Wohns gave the example of searching for ibuprofen. Any ordinary search engine only understands ibuprofen as a term people generally search for in order to learn more about the medicine, and that’s the way it’s reflected in the index. Even if you deploy that search tech on a domain-specific corpus, like research papers, it doesn’t magically gain better understanding. But a medical researcher searching through pandemic-related papers for ibuprofen already knows what it is — what they need is an ordered presentation of how ibuprofen appears in the literature, what other drugs and effects it is most tightly correlated with, what institutions and authors are associated with studying it.”

Agolo digests terabytes of data and summaries them in usable knowledge graphs. The summary search tool is capable of handling long documents and the US federal government uses it. Agolo does not sell an out of the box search solution, instead it partners with enterprise system designers like Google and Microsoft. This is interesting because in a recent round of funding, Google and Microsoft invested in Agolo:

“The company’s A round was just closed, led by Lytical Ventures, plus returning investors Microsoft M12, Google’s Assistant Investment Group, Tensility Venture Partners, Ridgeline Partners and Thomson Reuters. The company has raised over $18 million in total to date.”

Agolo, like Kyndi, are examples of a mini-enterprise search renascence? The memories of Autonomy, Delphi, Entopia, and Fast Search & Transfer have faded from customers’, investors’, and innovators’ memories.

Whitney Grace, April 4, 2022

Google: Grade A Search Baloney

March 31, 2022

I have been involved in online information for more than 50 years. Yep, folks, That’s more than half a century. Those early days involved using big clunky computers to locate a word in a Latin corpus. Then there were the glory days of commercial online products like Business Dateline, the Health Reference Center, and others. The Internet was a source of online craziness that trumped the wackiness of Ev Brenner and his vision for petrochemical data. Against this richly colored tapestry of marketing fabrications, overpromising and under delivering, and the bizarre fantasies of the “old” Information Industry Association I read “Google Search Is Actually Getting Better at Giving You What You Need.”

The write up channels a marketing person at the Google and mixes the search wizard’s recycling of Google truisms with some pretty crazy assertions about finding information in 2022.

Let’s take a look at three points and then step back and put these online advertising charged assertions in a broader context; namely, of the outcomes of a a system which is a de facto information monopoly.

Here are the points I noted in the write up:

Big, baby, big.

The first idea is that Google processes a great deal of information. Plus, Google tests to tackle the challenge of “search quality.” By the way, what does “quality” mean? What happens when you combine big with quality, you get really good outputs from the Google system. Just try it. Do a search for pizza via Google on a mobile device. See what you get? Pizza information. Perfect. So big and quality means good. Do you buy that?

The second idea is that Google like little beavers or little Googzillas works to improve quality. The idea is that yesterday’s Google was not bad; it needs improvement. Many improvements mean that quality goes up. Okay, let’s try it. Say you want information about a loss of coolant accident. You know. Chernobyl, Fukashima, et al. Type in loca and you get Shakira’s video. Type in “nuclear loca” and you get links to a loss of coolant accident. Type in site:nrc.gov loca and you get results specific to a loss of coolant incident. Note what’s needed to get Google to produce something about loss of coolant accident. The user must specify a context; otherwise, Google delivers lowest common denominator results. One can use Google Dorks to work about the Shakira problem, but let’s face it, very few people are into Google Dorks. (I include them in my OSINT lecture at the National Cyber Crime Conference in April 2022, but I know from experience that not even trained investigators are into Google Dorks.)

The third idea is that Google is embracing artificial intelligence. That makes sense because there are not enough people to process today’s flows of information in the old fashioned subject matter expert way. One must reduce costs in order to deliver “quality.” Does that seem an unusual pairing of improvements and search results? Think about it, please.

Now let’s step back. Here are some observations I jotted on a 4×6 notecard:

  1. Google uses people looking for online information to generate revenue from ads. That which produces more ad revenue is valued. The “quality” is a repurposing of a useful concept to the need to generate revenue. Shakira is the correct result for the “loca” query. That’s quality.
  2. The notion of testing is interesting. What’s the objective? The answer is generating revenue. Thus, the notion of testing is little more than steering or tuning search results to generate more revenue. The adjustments operate on several levels: Shaping understanding via filtering and producing revenue from search results. Simple, just not exactly what a user of an ad supported system thinks about when running a query for pizza.
  3. Smart software is the number one way for Google to [a] reduce costs, [b] deflect legal challenges to its search result shaping with the statement “The algorithm does, not a human”; and [c] create the illusion that Google search results are really smart. Use Google and you will be smarter too.

Believe these assertions? You’re the ideal Google user. Have doubts? You are not Googley. Don’t apply for a job at the Google and for heaven’s sake, don’t expect the Google outputs to be objective, just accept that some information is unfindable by design.

Google Dorks exist for a reason? Google has made finding relevant information more difficult than at any time in my professional career. And every year, the Google system becomes more detached from what most people believe fuels Google’s responses to what Google users need.

Yep, need. Sell ads. Reduce costs. Generate feedback into the system from user’s who have biases. Why are government agencies pushing back on outfits like Google? The quest for qualilty? Nope. The pushback reflects a growing awareness of disinformation, manipulation, and behavior that stifles options in my opinion.

Stephen E Arnold, March 31, 2022

Next Page »

  • Archives

  • Recent Posts

  • Meta