DuckDuckWent: Can a Search System Float in the Same Content Stream Again?

March 11, 2022

I read “DuckDuckGo Ends Neutrality, Will Down-Rank Sites Associated with Russian Disinformation.” Recognizing disinformation can be tricky. Using the word Russian may make the job easier.

I am not going to get into a philosophical discussion.

For me the important point of DuckDuckGo’s decision to have an editorial policy (often called censorship) is captured in this passage from the source document:

A change in direction.

I would like to see DuckDuckGo be upfront about:

  1. The source of its search index
  2. The number of content objects compared to the indexes of Swisscows, Google, and Brave Search
  3. How deduplication works

Responding to Russia is a waddle but more steps are needed. Waddle along, DuckDuck, please.

Stephen E Arnold, March 11, 2022

Yandex: Is It Time to Say Hello, Goodbye?

March 9, 2022

For about 80 to 90 percent of the people in North America and Western Europe, “search” means Googzilla’s service. Is it useful? Legions will say, “Google’s search service is the bestest ever.” Others are more comfortable running queries on Exalead Search, Swisscows, and one of the new kids on the block like Kagi or Wecript, among others.

My personal plan of attack, as I shared with the founder of Kagi, is to run specific queries across a group of selected search engines. (Sorry, I don’t provide those in this unloved, and mostly ignored free blog. However, if you attend my 2022 National Cyber Crime Conference lecture on finding information, you will get a list of about 500 useful search/content services.)

Why am I talking about “free” or ad-supported Web search. Three reasons:

  1. Today’s search “experts” don’t pay much attention to the lack of overlap in results. Hey, reading pages of results and cross checking them is too annoying. Let’s do the TikTok thing is the way to go.
  2. Web search engines do not disclose what I call the “editorial policy.” How often does Googzilla update results eight links deep on the Department of Energy’s public facing Web site? Or, where does DuckDuckGo get its results? Or, why doesn’t IxQuick/StartPage disclose which search systems generate its results? Or why are Gigablast results for images not really images? If one discloses an editorial policy, then the shallowness, freshness, and bias of the spidering mechanisms is disclosed. Who wants that? Certainly not the Web search outfits.
  3. Serious or professional Web search systems charge money and deliver high value results simply not obtainable via free Web search systems. Why don’t these outfits market to the users of free Web search systems? These outfits don’t want to end up in an RV at the Israel River Campground in the White Mountains. A low profile is a prudent profile.

I noted this article “Russian Tech Giant Yandex Says Might Default” on Friday, March 3, 2022. I have no idea if the information in the write up is accurate, but it is suggestive. I learned that the Russian Web search engine, which is “free”, may be a goner. I noted this passage:

…the company, often called the “Russian Google” for its size and breadth of services, said that if it is suspended for more than five trading days, owners of certain bonds will legally be able to redeem their debt with interest. “The Yandex group as a whole does not currently have sufficient resources to redeem the notes in full,” the company said.

The language “suspended” and “sufficient resources” are to my way of thinking a flashing yellow light. Could that light go red?

Yandex might be hauled off to the Web search system grave yard. How will this affect Googzilla? Not at all. However, start up Web search outfits may be in a position to hit up funding sources for more cash in order to provide Yandex users with a viable option.

That sounds like a slide deck phrase, doesn’t it.

Stephen E Arnold, March 9, 2022

As Privacy Concerns Grow, So Do Search Alternatives

February 23, 2022

Google is sure to remain king of the online search hill for the foreseeable future, but Spark takes a look at a couple burgeoning alternatives in the post, “Search Engines Try to Rival Google by Offering Fewer Ads, More Privacy.” Writer Jonathan Ore begins with Neeva, founded by ex-Googler Sridhar Ramaswamy.

“[Ramaswamy] bills Neeva as an ad-free, private search engine. Results won’t include advertisements, and the company says any information it does collect from users isn’t shared with third parties. That ad-free experience does come with a cost, however: a subscription fee of $5 US per month, after a three-month trial period. Ramaswamy argues that no search engine is truly free, as users end up paying with all the advertisements and affiliate links clogging up search results, making it harder to find the things they actually want.”

That is one way to look at it. We would add that Neeva does have a free version, but naturally hopes users will be enticed to upgrade. Ore notes that, though Neeva emphasizes privacy, it does collect certain user data—like one’s email address, IP address, location data, browser, and OS. The platform uses this information to improve function and performance, states its privacy policy, but promises not to share any of it with third parties.

Next, the write-up takes a look at You.com, a platform that seems tailored to younger audiences. We learn:

“Rather than a mostly-linear list of results sorted in order of relevance or accuracy, You.com displays search results in a grid-like format. It also lets users ‘upvote’ and ‘downvote’ individual results, directly affecting their rankings in future searches. That added flexibility comes at the cost of simplicity, though; The Verge’s Adi Robertson said its layout can appear ‘overwhelming and sort of cluttered’ to anyone used to Google’s linear approach. Co-founder Richard Socher said, however, that he found younger users used to other social media platforms like Instagram or TikTok, which display content in tiles both vertically and horizontally, were able to quickly acclimate themselves to You.com’s unique layout.”

Like Neeva, You.com also emphasizes privacy and refuses to sell user data to advertisers. Can such search platforms really take out Google? Don’t be silly—of course not. But the write-up cites DuckDuckGo as an example of success. That privacy-centric service, launched in 2008, now processes tens of millions of searches daily and employs over 140 workers. Is ad-addicted Google bothered? Probably not. It can well afford to lose such small slices of the search pie and remain decisively in the lead.

Cynthia Murrell, February 23, 2022

Algolia: A New Approach to Relevance?

February 21, 2022

Algolia is a company providing search and retrieval services to a number of companies. A call for résumés provides some interesting assertions about the company, its philosophy, and its goal.

The goal interests me. The posting on the Algolia Web site says:

Our mission is to return relevant search results at all times and give companies the ability to tailor those results to their own specific needs. We are tasked with reimagining a core piece of Algolia’s technology: how search results are ranked. To that end, we are still early in our building, and we are looking for someone who can help us perform experiments and manage the technical aspects of our pilot program, including building clients for users to test our work and tools to evaluate the impact of our changes.

I like the idea of tailoring “search” which is certainly okay if someone knows for that which the individual is looking. I like the idea of ranking because relevance is — to some people — helpful.  I like the idea that the company is “early in building.” The right person with the right stuff will make an impact. I like the idea of measuring results, which works reasonably well when the people in the same know that which they need to find.

There are several challenges in delivering or finding better ways to rank search results.

First, today the idea of knowing the corpus and using old-fashioned techniques like precision and recall are not as sexy as capsule network or caps net methods.

Second, users who want to formulate complex search queries like those required to extract semi useful information from Google or a Dataminr feed of social media are rare birds. I heard at one big search outfit that fewer than three percent of queries are a result of complex search statements; for example, site: or filetype:. Serving experts, analysts, and intelligence professionals is different from serving the ingredients for a Sicilian pizza.

Third, the now threadbare truism of lots of data, changing rapidly, and incorporating different content types and a veritable fun house of metadata requires some innovation. So far the best efforts of some bright folks have led to outright failure (Autonomy, Fast Search & Transfer, et al) or recycling endlessly with minor variations the functionality of everyone’s favorite fighter of Amazon, Elastic.

I noted some interesting supportive information in the write up; for instance, the candidate with right stuff must have grit (the sort of effort required to get an advanced degree from MINES ParisTech or Université Paris Saclay or the toughness required to deal with a wealthy family or a generational link to the Capetians. Other ingredients in the “right stuff” trois étoiles cannelés of Bordeaux:

  • Trust
  • Care
  • Candor
  • Humility

I am eager to explore the new approach to relevance. But I harbor an abiding affection for a clear explanation of the content indexed and good old Boolean logic. Snorkels, caps nets, and a 21sst century approach to relevance? Meh.

Stephen E Arnold, February 21, 2022

A Google Dork for Everyone

February 21, 2022

In my lectures about open source intelligence for law enforcement and other government professionals, I mention Google Dorks. I won’t go into detail, but the “dork” is a fancy way of saying a person who is an information professional with a knowledge of specialized commands can get semi-on point results from the online ad outfit. See for example this link. Do Googlers wear T shirts emblazoned with the phrase “Don’t be evil.” I saw such a shirt with the message “Don’t be Google,” but I may have misread.

What’s interesting is that Google Dorking is finding its way into the mainstream of the people who perceive themselves as “experts in online research.” Yep, the expertise is often similar to mastering an automatic teller machine, but that’s possibly a characteristic of our Covid era.

Google Search Is Dying” has undergone a number of updates. The write up states:

Google still gives decent results for many other categories, especially when it comes to factual information. You might think that Google results are pretty good for you, and you have no idea what I’m talking about. What you don’t realize is that you’ve been self-censoring yourself from searching most of the things you would have wanted to search. You already know subconsciously that Google isn’t going to return a good result.

The punch line is “Google is dying.” Yeah, no kidding. When the wizard from Verity and Yahoo got involved, it was not dying. It was gifted a MOAB (that is the mother of all bombs or a disconnect from a query and stuff like precision and recall).

So what’s the fix?

A Google Dork.

Enter a query and stick “reddit” in the query. The idea is that some entity (bot or humanoid) will have posted more useful, authentic, relevant information on that service. One can be sporty and try wiki at the end of a query as well.

Google Dorking for everyone even the self proclaimed experts in online information search and retrieval! The challenge is that Google advertising is pumping cash, and that plus the bonuses for senior management is what makes Google search the outstanding service it is.

Stephen E Arnold, February 21, 2022

Google Observations: A Hoot and a Maybe Bit Frightening If Statements Are Accurate

February 4, 2022

I read an item on Hacker News which “tells” about an issue/observation. The comment points out that certain queries generate links on a search result page which point to questionable content. Interesting, but news? Not in Harrods Creek, the technology centroid of the world.

What is quite fascinating in the short article? The comments. Yep, the comments. There are quite a few gems scattered in the trollite outcrops.

Here are a few examples with the “names” of the entity generating the output. Remember. I am just sharing. These are not my observations, comments, or ideas. In fact, we think the current version of the Google is a heck of a lot better than Version 2.0 which I wrote a monograph about many years ago.

  1. “nobody gets promoted in Google for doing their job well. Only for inventing a new job to do.” – reaperducer
  2. “It was my mistake. I trusted Google.” — Silisili
  3. “I work for Google Search. We are looking into this.” – SullilvanDanny
  4. “My wife recently received, in her inbox, a spoofed email from her own email address on Gmail.” – andrewmcwatters
  5. “There is no end to The Greed.” – JayTaylor

Stephen E Arnold, February 4, 2022

Mike Lynch: Going to America?

January 29, 2022

I noted the Beeb’s article “Mike Lynch: Priti Patel Approves Extradition of Autonomy Founder.” The write up states:

Home Secretary Priti Patel has approved the extradition of a British tech tycoon to the US to face criminal fraud charges. The decision comes after Mike Lynch, the founder of Autonomy, lost a multibillion-dollar fraud action in London on Friday.

Welp.

A Home Office spokesperson said: “Under the Extradition Act 2003, the secretary of state must sign an extradition order if there are no grounds to prohibit the order being made. Extradition requests are only sent to the home secretary once a judge decides it can proceed after considering various aspects of the case. On 28 January, following consideration by the courts, the extradition of Dr Michael Lynch to the US was ordered.”

The Beeb’s write up includes some biographical information:

Cambridge graduate Mr Lynch, 56, built Autonomy up to be one of the top 100 UK public companies. In 2006, he was awarded an OBE for services to enterprise. A fellow of the Royal Society, Mr Lynch, who lives in Suffolk, previously advised the government and sat on the boards of the British Library and the BBC.

The brief summary omits some interesting information; for example, the Bayesian influence and the architecture of a system which would influence decades of content processing systems. More information is available on my Xenky.com site at this link: https://bit.ly/3IQTwgz

Stephen E Arnold, January 29, 2022

Hewlett Packard Autonomy: A Decision of Sorts

January 28, 2022

I read “HPE Has Substantially Succeeded in Its £3.3bn Fraud Trial against Autonomy’s Mike Lynch – Judge.” The write up reports that buyer beware is not a legal argument. It appears that more litigation awaits Mike Lynch in the US. I noted one interesting statement in the very good summary of the UK legal activities:

Autonomy, which told the market it was a “pure play” software company, accounted for its substantial hardware sales by burying them inside its sales and marketing revenue instead of breaking them out separately.

I am delighted I am not an attorney. I am a retired knowledge worker who has some familiarity with the general technology used by Autonomy and I did some work for the company years ago.

My uninformed view is that Hewlett Packard was looking for a home run when Léo Apotheker (formerly SAP and owner of the TREX search technology), ignored realities about the search and content processing revenue ceilings. Hewlett Packard, it seems to me, pushed forward, ignored inputs, and paid the what might be called the Ford Bronco surcharge.

What happens when a used vehicle sales professional explains the sidewalk guarantee to the buyer? Nothing. Buyers often do not do their homework, are too excited about the deal, or just don’t care about the future until it arrives. Oh, oh. Are there lemon laws for content processing platforms? I suppose the question will be answered US style in the coming months.

Stephen E Arnold, January 28, 2022

How about That Subscription Web Search Model?

January 24, 2022

Former Googlers Sridhar Ramaswamy and Vivek Raghunathan are refining their paid, privacy-centric search platform Neeva. We have followed this development from the 2020 beta through the 2021 official launch. Now we learn Neeva has added a free tier from The Next Web’s piece, “How a Couple of Ex-Googlers Are Trying to Fix What’s Wrong with Search Engines.” It appears not enough users are (yet) willing to pay the low, low price of $4.95 per month for search and the team is looking to upsell about 5% of those who sign on for free. It might be a good bet—Ramaswamy reports that a third of folks who sampled the free trial have subscribed. Even he was surprised users cited the peaceful, ad-free screen as their favorite feature. Reporter Ivan Mehta writes:

“[Neeva] will offer ad-free search with customizations, and integration to accounts such as Gmail, Microsoft Office, and Dropbox. People who’re paying for Neeva’s services will get all of this, a leading third-party VPN and a password manager service, and advanced features, like a monthly Q&A. As far as search engine features go, Neeva offers customizations, such as being able to see particular sites in results more or less. You can also ‘skip’ an ecommerce site in results, or get the whole recipe for a dish without having to visit a site. What’s more, the new search engine lets your look through your email right from the search bar. And if you install Neeva’s extension, it also blocks ad trackers that are collecting your browsing data. Last October, Neeva also launched a 1-click Fasttap search geared towards mobile where users just need to type a phrase to get accurate search results. It’s like Google auto-complete on steroids.”

The write-up includes a few screenshots of Neeva features in action. Regarding the how-to behind it all, Mehta tells us:

“On the technological side, while Neeva is aggregating some search results from Bing, the company is building its own crawler and looking at billions of pages every day. But as Raghunathan pointed out in his FastCompany interview earlier this month, crawling the web to create a new index while maintaining privacy standards is hard.”

Perhaps if anyone is up to the task, it is these two Xooglers. As of yet, Neeva is only available in the US, but the company hopes to become global. The plan is to expand into India and Western Europe “soon.” One tactic it is using to compete against the likes of privacy-focused DuckDuckGo and Brave is its partnership with news rating agency NewsGuard, which is helping it assess the accuracy of information. We wonder whether such features plus the free-tier offering will help Neeva reach its stated goal: to become the primary search engine for millions of privacy-centered users in the next two years.

Are there monetization options? The Point team is available to offer some ideas. Just write benkent2020 at yahoo dot com. We’ve been there and know the subscription method was a loser decades ago.

Cynthia Murrell, January 24, 2021

New Search Platform Focuses on Protecting Intellectual Property

January 21, 2022

Here is a startup offering a new search engine, now in beta. Huski uses AI to help companies big and small reveal anyone infringing on their intellectual property, be it text or images. It also promises solutions for title optimization and even legal counsel. The platform was developed by a team of startup engineers and intellectual property litigation pros who say they want to support innovative businesses from the planning stage through protection and monitoring. The Technology page describes how the platform works:

“* Image Recognition: Our deep learning-based image recognition algorithm scans millions of product listings online to quickly and accurately find potentially infringing listings with images containing the protected product.

* Natural Language Processing: Our machine learning algorithm detects infringements based on listing information such as price, product description, and customer reviews, while simultaneously improving its accuracy based on patterns it finds among confirmed infringements.

* Largest Knowledge Graph in the Field: Our knowledge graph connects entities such as products, trademarks, and lawsuits in an expansive network. Our AI systems gather data across the web 24/7 so that you can easily base decisions on the most up-to-date information.

* AI-Powered Smart Insights: What does it mean to your brands and listings when a new trademark pops out? How about when a new infringement case pops out? We’ll help you discover the related insights that you may never know otherwise.

* Big Data: All of the above intelligence is being derived from the data universe of the eCommerce, intellectual property, and trademark litigation. Our data engine is the biggest ‘black hole’ in that universe.”

Founder Guan Wang and his team promise a lot here, but only time will tell if they can back it up. Launched in the challenging year of 2020, Huski.ai is based in Silicon Valley but it looks like it does much of its work online. The niche is not without competition, however. Perhaps a Huski will cause the competition to run away?

Cynthia Murrell, January 21, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta