Another Plea for Web Search That Sort of Works: Andrew Carnegie, Where Are You?

July 11, 2022

I am not going to do any history. Oh, well. Not really. Does anyone on TikTok know about Andrew Carnegie? Okay, let’s try another angle. How about a semi-rapacious dude with roots in Scotland who wanted to do good. Please, ignore the Carnegie era Monongahela River. The cheerful Mr. Carnegie came up with the idea of a free public library. Looking up information was a useful thing for poor folks and monopolistic steel barons alike. One person sort of fixed the “problem” of information access.

Flash forward to Backrub. Two bright young sprouts realized that a person had a tough time finding relevant information on Lycos and the other search engines available at “dawn” or the Internet. The fix? Take a little bit of Kleinberg, add a pinch of technology, use available computing resources whether others at Stanford University knew or cared, and mix in continuous feedback to a bundle of mostly automatic rules. More links in, good. Not many links in, meh. Then advertising. Yeah, that worked great for some. For others, ho ho ho.

The result is the weaponized findability environment of good old 2022.

What’s the fix? “Why the World Needs a Non-Profit Search Engine” explains that donors contribute money, and an objective Web search system will return relevant results. The write up states:

Sometimes I forget why I’ve taken on this crazy, huge task. Why am I building a search engine? Will it really be better than Google one day? Will people support it? Will people even use it? And then I read something like The Bullshit Web and I remember, that, yes, there is a point. Even if I make the web better for one person, it’s worth it. Because the way things are is just wrong. Search engines are in a unique position to fix the situation. Not only do we create a view on the world’s knowledge, we influence it too. If we promote bullshit-free sites, then people will create more bullshit-free sites. More importantly, search engines are a filter on the world’s knowledge. Do you really want your filter to be “whatever makes $SEARCH_ENGINE more money”, particularly when that means, “show ads instead of search results, and prioritize search results that also make us more money”? We can and should do better.

I want to point out that what may be required is an Andrew Carnegie type who already has money and a guilty conscience. It is a modern perception that if one can get lots and lots of people to contribute money, one can fund anything.

Nice idea. My response? “Where’s the Andrew Carnegie?”

Why?

Traffic means monetization. Do-gooding is walking on the information highway. One has to speed, and speed is infinitely expensive. Ergo: Monetization lies over the horizon.

Stephen E Arnold, July 11, 2022

Akn Unfindable Search Utillity: Wild Spelling and Naming Idea

July 7, 2022

I like to check out new Web search systems. Most are little more than recycled versions of Dogpile.com, one of the most Abe Lincoln metasearch systems. A metasearch system uses hits from other search systems, possibly adds a bit of Vivisimo-type special sauce, and outputs results and rather crazy marketing materials.

The write up “This Badass Tool Makes Advanced YouTube Searches a Breeze” states:

This tool also allows you to perform advanced search on Google, DuckDuckGo, Twitter, and Reddit.

But the article is over the moon about the utility of the system when searching for content in Newton Minnow’s nightmare, YouTube. I learned:

I [the author of the article] think this cool tool is better suited to YouTube.

Let’s try to find the system using its name, ä1. Try plugging the ä1 into Google, and what do you get? I received hits for services wildly unrelated to search and retrieval:

image

What about Bing, the Microsofties’ wonderful, but small, search system:

image

Yep, childhood disease.

What about Yandex? No joy.

image

Let’s search for the ä1 site on the ä1 site. What do we get? Google results and no ä1 search overlay or service.

Net net: Innovators, use names which can be searched. (Not every one knows how to put the a with acne into a search box. Besides, most search systems discard such silliness as dots, checks, and circumflexes. Intellectual niceties are not part of the plan.) Pain in the a$$, not bad a$$ in my opinion.

If you want to try out the all-in-one “system” yourself, here’s the url: https://ä1.com.

Tip: How about a findable name?

Stephen E Arnold, July 7, 2022

Search the Web: Maybe Find a Nugget or Two for Intrepid Researchers?

June 21, 2022

A Look at Search Engines with Their Own Indexes” has been updated. The article provides a run down of systems and services which offer Web search services.

Some of the factoids in the article are ones often overlooked by many of the “search experts” generating information about how to find information via open sources. Here are a few which deserve more attention from students of search:

  1. Bing is the most promiscuous supporter of metasearch
  2. YaCy is included in the “unusable” category; however, it is not. YaCy has some interesting properties of interest to cyber sleuths
  3. Neeva’s index is exposed as a mix of some original crawl content with Bing results. (Where’s the Google love for a former Googler’s search system.)
  4. Qwant is exposed for using Bing data
  5. Exalead, arguably better than Pertimm which influenced Qwant, takes some bullets. But Dassault is into other, more lucrative businesses than “search”
  6. Kagi is a for fee service which uses its own index and, like other metasearch systems, taps results from Bing and Google. (Is Google excited yet?)
  7. The Thunderstone service is noted. (How long has Thunderstone been around? Answer: A long time.)

Worth noting the links. Perhaps someone will create a list of the services indexing content for specialized software applications and government agencies. There are hundreds of “data aggregators” but how does one search them for useful results?

I addressed findability issue in my recent OSINT lecture for the National Cyber Crime Conference attendees and in a follow up session for the Mass. Asso. of Crime Analysts.

Stephen E Arnold, June 21, 2022

Decentralized Presearch Moves from Testnet to Mainnet

June 15, 2022

Yet another new platform hopes to rival the king of the search-engine hill. We think this is one to watch, though, for its approach to privacy, performance, and scope of indexing. PCMagazine asks, “The Next Google? Decentralized Search Engine ‘Presearch’ Exits Testing Phase.” The switch from its Testnet at Presearch.org to the Mainnet at Presearch.com means the platform’s network of some 64,000 volunteer nodes will be handling many more queries. They expect to process more than five million searches a day at first but are prepared to scale to hundreds of millions. Writer Michael Kan tells us:

“Presearch is trying to rival Google by creating a search engine free of user data collection. To pull this off, the search engine is using volunteer-run computers, known as ‘nodes,’ to aggregate the search results for each query. The nodes then get rewarded with a blockchain-based token for processing the search results. The result is a decentralized, community-run search engine, which is also designed to strip out the user’s private information with each search request. Anyone can also volunteer to turn their home computer or virtual server into a node. In a blog post, Presearch said the transition to the Mainnet promises to make the search engine run more smoothly by tapping more computing power from its volunteer nodes. ‘We now have the ability for node operators to contribute computing resources, be rewarded for their contributions, and have the network automatically distribute those resources to the locations and tasks that require processing,’ the company said.”

The blog post referenced above compares this decentralized approach to traditional search-engine infrastructure. An interesting Presearch feature is the row of alternative search options. One can perform a straightforward search in the familiar query box or click a button to directly search sources like DuckDuckGo, YouTube, Twitter, and, yes, Google. Reflecting its blockchain connection, the page also supplies buttons to search Etherscan, CoinGecko, and CoinMarketCap for related topics. Presearch gained 3.8 million registered users between its Testnet launch in October 2020 and the shift to its Mainnet. We are curious to see how fast it will grow from here.

Cynthia Murrell, June 15, 2022

Are There Google Wolves in Stealth Privacy Clothing?

June 8, 2022

A growing number of search engines are cropping up that purport to protect one’s privacy. Lukol is one of these. A brief entry at The New Leaf Journal questions that site’s privacy promises in, “Lukol Search Engine Shows Up in Logs.” New Leaf editor Nicholas A Ferrell noticed a paradox: though Lukol bills itself as an “anonymous search engine,” it is also “powered by Google Search.” Further investigation revealed this paragraph in the site’s privacy policy:

“We use cookies to personalise content and ads, and to analyse our traffic. We also share information about your use of our site with our advertising and analytics partners who may combine it with other information you’ve provided to them or they’ve collected from your use of their services. If you wish to opt out of Google cookies you may do so by visiting the Google privacy policy page.”

It seems the word “privacy” does not mean what Lukol thinks it means. Farrell comments:

“So this anonymous search engine stores cookies on your computer to serve you with personalized ‘content and ads’ and it shares information about your use of the site with ‘advertising and analytics partners.’ It then directs you to Google’s privacy policy page for information about how to opt out of Google cookies. While I struggle to see how Lukol is privacy-friendly (much less anonymous), it is a great example for why it is important to look behind catchy promises about privacy and anonymity.”

Agreed. Lukol is basically Google Search with some added manipulations. None of which appear to protect user privacy. Let the searcher beware.

Cynthia Murrell, June 8, 2022

DuckDuckGo: A Duck May Be Plucked

May 25, 2022

Metasearch engines are not understood by most Internet users. Here’s my simplified take: A company thinks it can add value to the results output from an ad-supported search engine. Maybe the search engine is a for-fee outfit? Either way, the metasearch systems gets the okay to send queries and get results. The results stream back to the metasearch outfit and the value-adding takes place.

One of the better metasearch systems was the pre-IBM Vivisimo. This outfit sent out queries to an ad-supported search engine, accepted the results, and then clustered them. The results appeared to the Vivisimo user as a results list with some folders in a panel. The idea was that the user could scan the folders and the results list. The user could decide to click on a folder and see what results it contained or just click on a link. The magic, as I understood it, was that the clustering took place in near real time. Plus, the query on the original Vivisimo pre-IBM system could send the user’s query to multiple Web search engines. The results from each search system would be de-duplicated. An interesting factoid from the 2000s is that search systems returned overlapping results 70 percent of more of the time. Dumping the duplicates was helpful. There were other interesting metasearch systems as well, but I am just using Vivisimo as an example of a pretty good one.

Privacy, like security, is a tricky concept to explain.

Using privacy to sell a free Web search system raises a number of questions; for example:

  1. What’s privacy in the specific context of the metasearch engine mean?
  2. Where is the money coming from to keep the lights on at the metasearch outfit?
  3. What about log files?
  4. What about legal orders to reveal data about users?
  5. What’s the quid pro quo with the search engine or engines whose results the metasearch system uses?
  6. What part of the search chain captures data, inserts trackers, bugs, cookies, etc. into the user’s query?

None of these questions catch the attention of the real news folks nor do most users know what the questions require to answer. The metasearch engines typically do not become chatty Cathies when someone like me shows up to gather information about metasearch systems. I recall the nervousness of the New York City wizard who cooked up Ixquick and the evasiveness of the owner of the Millionshort services.

Now we come to the the notion that a duck can be plucked. My hunch is that plucking a duck is a messy affair both duck and duck plucker.

DuckDuckGo Browser Allows Microsoft Trackers Due to Search Agreement” presents information which appears to suggest that the “privacy” oriented DuckDuckGo metasearch system is not so private as some believed. The cited article states:

The privacy-focused DuckDuckGo browser purposely allows Microsoft trackers on third-party sites due to an agreement in their syndicated search content contract between the two companies.

You can read the cited article to get more insight into the assertion that DuckDuck has been pluck plucked in the feathered hole of privacy.

Am I surprised? No. Search is without a doubt one of the most remarkable business segments for soft fraud. How do I know? My partners and I created The Point in 1994, and even though you don’t remember it, I sure remember what I learned about finding information online. Lycos (CMGI) bought our curated search business, and I wrote several books about search. You know what? No one wants to think about search and soft fraud. Maybe more people should?

Net net: Free comes at a cost. One does not know what one does not know.

Stephen E Arnold, May 25, 2022

Does Google Have Search Fear?

May 16, 2022

I can hear the Googlers at an search engine optimization conference saying this:

Our recent investments in search are designed to provide a better experience for our users. Our engineers are always seeking interesting, new, and useful ways to make the world’s information more accessible.

What these code words mean to me is:

Yep, the ancient Larry and Sergey thing. Not working. Oh, my goodness. What are we going to do? Buy Neeva, Kagi, Seekr, and Wecript? Let’s let Alphabet invest and we can learn and maybe earn before more people figure out our results are not as good as Bing and DuckDuckGo’s.

Even Slashdot is running items which make clear that Google and search do not warrant the title of “search giant.”

image

Source: Slashdot at https://bit.ly/3PkBOGt

I crafted this imaginary dialog when I read “This Germany-based AI Startup is Developing the Next Enterprise Search Engine Fueled by NLP and Open-Source.” That write up said:

Deepset, a German startup, is working to add to Natural Language Processing by integrating a language awareness layer into the business tech stack, allowing users to access and interact with data using language. Its flagship product, Haystack, is an open-source NLP framework that enables developers to create pipelines for a variety of search use-cases.

But here’s the snappy part of the article:

The Haystack-based NLP is typically implemented over a text database like Elasticsearch or Amazon’s OpenSearch branch and then connects directly with the end-user application through a REST API. It already has thousands of users and over 100 contributors. It uses transformer models to let developers create a variety of applications, such as production-ready question answering (QA), semantic document search, and summarization. The company has also introduced Deepset Cloud, an end-to-end platform for integrating customized and high-performing NLP-powered search systems into your application.

In theory, this is an open source, cloud centric super app, a meta play, a roll up of what’s needed to make finding information sort of work.

The kicker in the story is this statement:

The Berlin-based company has raised $14M in Series A funding led by GV, Alphabet’s venture capital arm.

Yep, the Google is investing. Why? Check that which applies:

(  ) Its own innovation engines are the equivalent of a Ford Pinto racing a Tesla Model S Plaid? Google search is no longer the world’s largest Web site?

(  ) Amazon gets more product searches than Google does?

( ) Users are starting to complain about how Google ignores what users key in the search box?

( ) Large sites are not being spidered in a comprehensive or timely manner?

( ) All of the above.

Stephen E Arnold, May 16, 2022

Kyndi: Advanced Search Technology with Quanton Methods. Yes, Quonton

April 29, 2022

One of my newsfeeds spit out this story: “Kyndi Unveils the Kyndi Natural Language Search Solution – Enables Enterprises to Discover and Deliver the Most Relevant and Precise Contextual Business Information at Unprecedented Speed.” The Kyndi founders appear to be business oriented, not engineering focused. The use of jargon like natural language understanding, contextual information, artificial intelligence, software robots, explainable artificial intelligence, and others is now almost automatic as if generated by smart software, not people who have struggled to make content processing and information retrieval work for users.

The firm’s Web site does not provide much detail about the technical pl8umbing for the company’s search and retrieval system. I took a quick look at the firm’s patents and noted these. I have added bold face to highlight some of  the interesting words in these documents.

  • A method using Birkhoff polytopes and Landau numbers. See US11205135 “Quanton [sic] Representation for Emulating Quantum-lie Computation on Classical Processors,”  granted December 21, 2021. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.
  • A method employing combinatorial hyper maps. See US10985775 “System and Method of Combinatorial Hypermap Based Data Representations and Operations,” Granted April 20, 2021. Inventor: Arun Majumdar, possibly in Alexandria, Virginia. (As a point of interest the document Includes the word bijectively.)
  • A method making use of Q-Medoids and Q-Hashing. See US10747740 “Cognitive Memory Graph Indexing, Storage and Retrieval,” granted August 18, 2020. Inventor: Arun Majumdar, possibly in San Mateo, California.
  • A method using Semantic Boundary Indices and a variant of the VivoMind* Analogy Engine. See US10387784 “Technical and Semantic Signal Processing in Large, Unstructured Data Fields,” granted August 20, 2019. Inventor: Arun Majumdar, possibly in Alexandria, Virginia. *VivoMind was a company started my Arun Majumdar prior to his relationship with Kyndi.
  • A method using rvachev functions and  transfinite interpolations. See US10372724 “Relativistic Concept Measuring System for Data Clustering,” granted August 6, 2019. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.
  • A method using Clifford algebra. See US10120933 “Weighted Subsymbolic Data Encoding,” granted November 6, 2018. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.

The inventor is not listed on the firm’s Web site. Mr. Majumdar’s contributions are significant. The chief technology officer is Dan Gartung, who is a programmer and entrepreneur. However, there does not seem to be an observable link among the founders, the current CTO, and Mr. Majumdar.

The company will have to work hard to capture mindshare from companies like Algolia (now working to reinvent enterprise search), Mindbreeze, Yext, and X1 (morphing into an eDiscovery system it seems), among others. Kyndi has absorbed more than  $20 million plus in venture funding, but a competitor like Lucidworks has captured in the neighborhood of $200 million.

It is worth noting that one facet of the firm’s marketing is to hire the whiz kids from a couple of mid tier consulting firms to explain the firm’s approach to search. It might be a good idea for the analysts from these firms to read the Kyndi patents and determine how the Vivomind methods have been updated and applied to the Kyndi product. A bit of benchmarking might be helpful. For example, my team uses a collection of Google patents and indexes them, runs tests queries, and analyzes the result sets. Almost incomprehensible specialist terminology is one thing, but solid, methodical analysis of a system’s real life performance is another. Precision and recall scores remain helpful, particularly for certain content; for example, pharma research, engineered materials, and nuclear physics.

Stephen E Arnold, April 29, 2022

Web Search Alternatives Compete with Gusto

April 22, 2022

Search and information blog DKB shares a roundup of interesting search systems in, “The Next Google.” Are we confident any of these will be the next Google? Nope. But there are several our readers might find useful. While relatively popular Google alternatives like DuckDuckGo and Bing are based on the Google model, the apps on this list take their own paths. The write-up tells us:

“The next Google can’t just be an input box that spits out links. We need new thinking to create something much better than what came before. In the last few years, different groups of people came to the same conclusion, and started working on the next generation of search engines. For this new generation, privacy is necessary, and invasive ads are not an option. But that’s where the commonalities end. Beyond that, they’ve all taken the idea of a search engine in very different directions. … This new wave of search engines is only just getting started. Many of them have only recently launched. Even if they aren’t perfect yet, the paths they’re exploring can lead to promising new innovation in the stagnant search space.”

First is Kagi, which emphasizes customization. Users decide how they want information presented and can refine the sources the search taps into. Then there is Neeva, which takes searches beyond the web and into one’s personal resources, like email and a wide array of online file storage systems. You.com tries to match each query with the source most relevant to the type of question, while Andi takes a little time to pinpoint the best answer and deliver it with the feel of a real conversation. Finally, Brave Search boasts its own independent index that does not rely on Google or Bing for results, an unusual achievement indeed. See the write-up for more information on each of these systems. No, Google is not going to be replaced across the Web any time soon. But some readers may find an option here that could replace it in their own browsers, at least some of the time.

Cynthia Murrell, April 22, 2022

Nuclia: The Solution to the Enterprise Search Problem?

April 21, 2022

I read an interesting article called “Spanish Startup Nuclia Gets $5.4M to Advance Unstructured Data Search.” The article includes an illustration, presumably provided by Nuclia, which depicts search as a super app accessed via APIs.

image

Source: Silicon Angle and possibly Nuclia.com. Consult the linked story to see the red lines zip around without bottlenecks. (What? Bottlenecks in content processing, index updating, and query processing. Who ever heard of such a thing?)

Here are some of the highlights — assertions is probably a better word — about the Nuclia technology:

  • The system is “AI powered.”
  • Nuclia can “connect to any data source and automatically index its content regardless of what format or even language it is in.”
  • The system can “discover semantic results, specific paragraphs in text and relationships between data. These capabilities can be integrated in any application with ease.”
  • Nuclia can “detect images within unstructured datasets.”
  • The cloud-based service can “say one video is X% similar to another one, and so on.”

What makes the Nuclia approach tick? There are two main components:

  • The Nuclia vector database which is available via GitHub
  • The application programming interface.

The news hook for the search story is that investors have input $5.4 million in seed funding to the company.

Algolia wants to reinvent search. Maybe Nuclia has? Google is search, but it may be intrigued with the assertions about vector embeddings and finding similarities which may be otherwise overlooked. The idea is that the ad for Liberty Mutual might be displayed in YouTube videos about seized yachts by business wizards on one or more lists of interesting individuals. Elastics may want to poke around Nuclia in a quest for adding some new functionality to its search system.

Enterprise search seems to be slightly less dormant than it has been.

Stephen E Arnold, April 21, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta