DuckDuckGo and Filtering

April 18, 2022

I read “DuckDuckGo Removes Pirate Websites from Search Results: No More YouTube-dl?” The main thrust of the story is:

The private search engine, DuckDuckGo, has decided to remove pirate websites from its official search results.

DuckDuckGo is a metasearch engine. These are systems which may do some focused original spidering, but may send a user’s query to partner indexes. Then the results are presented to the user (which may be a human or a software robot). Some metasearch systems like Vivisimo invested some intellectual cycles in de-duplicating the results. (A helpful rule of thumb is to assume a 50 to 70 percent overlap in results from one Web search system to another.) IBM bought Vivisimo, and I have to admit that I have no idea what happened to the de-duplicating technology because … IBM.

There are more advanced metasearch systems. One example is Silobreaker, a system influenced by some Swedish wizards. The difference between a DuckDuckGo and an industrial strength system, in my opinion, is significant. Web search is an opaque service. Many behind-the-scenes actions take place, and some of the most important are not public disclosed in a way that makes sense to a person looking for pizza.

My question, Is DuckDuckGo actively filtering?” And “Why did this take so long?” And, “Is DuckDuckGo virtue signaling after its privacy misstep, or is the company snagged in a content marketing bramble?

I don’t know. My thoughts are:

  1. The editorial policies of metasearch systems should be disclosed; that is, we do this and we do that.
  2. Metasearch systems should disclose that many results are recycled and the provenance, age, and accuracy of the results are unknown to the metasearch provider?
  3. Metasearch systems should make clear exactly what the benefits of using the metasearch system are and why the provider of some search results are not as beneficial to the user; for example, which result is an ad (explicit or implicit), sponsored, etc.

Will metasearch systems embrace some of these thoughts? Nah. Those who use “free” Web search systems are in a cloud of unknowing.

Stephen E Arnold, April 18, 2022

DuckDuckGo Metasearch Service Causes Quacks

April 12, 2022

DuckDuckGo followed its technology brethren by rescinding most of its services in Russia due to the unfortunate invasion of Ukraine. The unbiased search engine CEO Gabriel Weinberg stated on March 9 that it would down rank Russian Web sites that spread disinformation. Much to DuckDuckGo’s surprise (as well as many others), the search engine was attacked by right-leaning, pro-free-speech supporters. The privacy search engine unintentionally attracted these supporters but did not discourage them.

Recode via Vox has the entire story: “The Free Speech Search Engine That Never Was.”

Weinberg tweeted his support for user privacy, but conservative supporters who used DuckDuckGo to search for content without Big Tech censorship were angry. They did not like that DuckDuckGo was demoting Russian propaganda Web sites. Oddly enough, these people also were pro-Putin’s invasion on Ukraine.

Right-wing supporters flocked to DuckDuckGo, because it was supposedly free of censorship that plagues other search engines like Google. These conservatives believe that information relating to their political and social beliefs was censored in all search engines except DuckDuckGo. These conservative supporters are more of the alt-right, conspiracy theorist type, i.e., anti-vaccination, DC capital insurrection. DuckDuckGo was okay with this:

“So DuckDuckGo surely knew what many of its new fans were coming to it for. They leaned into it a bit, too. Weinberg told Fox News and Quartz that Google’s search results were biased because Google collects data on users, which it then uses to target results to them. That, he said, created filter bubbles that further polarized society. Because DuckDuckGo didn’t collect data, its results were unbiased and searchers were free from Google’s echo chamber. This was a bit of a dodge; conservatives accused Google of intentionally keeping conservative sites and content off of its results, not just returning results influenced by a searchers’ interests. But it was an answer that seemed to satisfy users of all political persuasions. The alt-right wingers do not like that, but DuckDuckGo explained they are doing what search engines should be doing: “ensure that users were getting the best results for their searches.”

DuckDuckGo is one of many platforms that the alt-right adopted: Rumble, MeWe, Telegram, Substack. These platforms did not sky away from the users, because it meant more investments. We like the idea of a metasearch service protecting user privacy and, as a byproduct, false propaganda. Now how about better results?

Whitney Grace, April 12, 2022

Agolo: Making Government Sales and Landing VC Money

April 4, 2022

Apple, Google, and Microsoft might be search experts, but they continue to get things wrong. These are big companies, so they cannot solely concentrate on the search function like a dedicated company. Tech Crunch explains how Agolo specializes in search and improves upon what the tech giants do: “Agolo Summary-Powered Search Brings In Government Work And Fresh Funding.”

Agolo developed a powerful summary search engine that dissects articles and presents users with shorter versions that preserve key points. Agolo identified two types of search tools: dumb and smart. Dumb engines are not good at locating context or extraction data, while the smart functions are decent at fining relevance and order items. Both are limited in their capabilities. Agolo designed a smarter search function and described it as this:

“Agolo co-founder and CEO Sage Wohns gave the example of searching for ibuprofen. Any ordinary search engine only understands ibuprofen as a term people generally search for in order to learn more about the medicine, and that’s the way it’s reflected in the index. Even if you deploy that search tech on a domain-specific corpus, like research papers, it doesn’t magically gain better understanding. But a medical researcher searching through pandemic-related papers for ibuprofen already knows what it is — what they need is an ordered presentation of how ibuprofen appears in the literature, what other drugs and effects it is most tightly correlated with, what institutions and authors are associated with studying it.”

Agolo digests terabytes of data and summaries them in usable knowledge graphs. The summary search tool is capable of handling long documents and the US federal government uses it. Agolo does not sell an out of the box search solution, instead it partners with enterprise system designers like Google and Microsoft. This is interesting because in a recent round of funding, Google and Microsoft invested in Agolo:

“The company’s A round was just closed, led by Lytical Ventures, plus returning investors Microsoft M12, Google’s Assistant Investment Group, Tensility Venture Partners, Ridgeline Partners and Thomson Reuters. The company has raised over $18 million in total to date.”

Agolo, like Kyndi, are examples of a mini-enterprise search renascence? The memories of Autonomy, Delphi, Entopia, and Fast Search & Transfer have faded from customers’, investors’, and innovators’ memories.

Whitney Grace, April 4, 2022

Google: Grade A Search Baloney

March 31, 2022

I have been involved in online information for more than 50 years. Yep, folks, That’s more than half a century. Those early days involved using big clunky computers to locate a word in a Latin corpus. Then there were the glory days of commercial online products like Business Dateline, the Health Reference Center, and others. The Internet was a source of online craziness that trumped the wackiness of Ev Brenner and his vision for petrochemical data. Against this richly colored tapestry of marketing fabrications, overpromising and under delivering, and the bizarre fantasies of the “old” Information Industry Association I read “Google Search Is Actually Getting Better at Giving You What You Need.”

The write up channels a marketing person at the Google and mixes the search wizard’s recycling of Google truisms with some pretty crazy assertions about finding information in 2022.

Let’s take a look at three points and then step back and put these online advertising charged assertions in a broader context; namely, of the outcomes of a a system which is a de facto information monopoly.

Here are the points I noted in the write up:

Big, baby, big.

The first idea is that Google processes a great deal of information. Plus, Google tests to tackle the challenge of “search quality.” By the way, what does “quality” mean? What happens when you combine big with quality, you get really good outputs from the Google system. Just try it. Do a search for pizza via Google on a mobile device. See what you get? Pizza information. Perfect. So big and quality means good. Do you buy that?

The second idea is that Google like little beavers or little Googzillas works to improve quality. The idea is that yesterday’s Google was not bad; it needs improvement. Many improvements mean that quality goes up. Okay, let’s try it. Say you want information about a loss of coolant accident. You know. Chernobyl, Fukashima, et al. Type in loca and you get Shakira’s video. Type in “nuclear loca” and you get links to a loss of coolant accident. Type in site:nrc.gov loca and you get results specific to a loss of coolant incident. Note what’s needed to get Google to produce something about loss of coolant accident. The user must specify a context; otherwise, Google delivers lowest common denominator results. One can use Google Dorks to work about the Shakira problem, but let’s face it, very few people are into Google Dorks. (I include them in my OSINT lecture at the National Cyber Crime Conference in April 2022, but I know from experience that not even trained investigators are into Google Dorks.)

The third idea is that Google is embracing artificial intelligence. That makes sense because there are not enough people to process today’s flows of information in the old fashioned subject matter expert way. One must reduce costs in order to deliver “quality.” Does that seem an unusual pairing of improvements and search results? Think about it, please.

Now let’s step back. Here are some observations I jotted on a 4×6 notecard:

  1. Google uses people looking for online information to generate revenue from ads. That which produces more ad revenue is valued. The “quality” is a repurposing of a useful concept to the need to generate revenue. Shakira is the correct result for the “loca” query. That’s quality.
  2. The notion of testing is interesting. What’s the objective? The answer is generating revenue. Thus, the notion of testing is little more than steering or tuning search results to generate more revenue. The adjustments operate on several levels: Shaping understanding via filtering and producing revenue from search results. Simple, just not exactly what a user of an ad supported system thinks about when running a query for pizza.
  3. Smart software is the number one way for Google to [a] reduce costs, [b] deflect legal challenges to its search result shaping with the statement “The algorithm does, not a human”; and [c] create the illusion that Google search results are really smart. Use Google and you will be smarter too.

Believe these assertions? You’re the ideal Google user. Have doubts? You are not Googley. Don’t apply for a job at the Google and for heaven’s sake, don’t expect the Google outputs to be objective, just accept that some information is unfindable by design.

Google Dorks exist for a reason? Google has made finding relevant information more difficult than at any time in my professional career. And every year, the Google system becomes more detached from what most people believe fuels Google’s responses to what Google users need.

Yep, need. Sell ads. Reduce costs. Generate feedback into the system from user’s who have biases. Why are government agencies pushing back on outfits like Google? The quest for qualilty? Nope. The pushback reflects a growing awareness of disinformation, manipulation, and behavior that stifles options in my opinion.

Stephen E Arnold, March 31, 2022

Microsoft Search: Getting Better and Better

March 30, 2022

In early versions of Windows operating systems, the search function stank worse than rotting garbage in summer. Since the initial Windows deploy, Microsoft has improved the search function and as technology advances there are still upgrades to be made says Make Use Of in: “Microsoft Is Making Windows 11’s Search Function Better Than Ever.” In a refreshing take on its past mistakes, Microsoft admits that its former search tools were not the best. When it comes to Windows 11, Microsoft revamped the search into a quality tool and does not plan to rest on its laurels.

One of the best upgrades with the newest Windows 11 patch is the that search will be streamlined between work/business accounts. The search function will locate items on all accounts. Microsoft is also adding lifestyle widgets to make the OS more entertaining, such as a “word of the day” and altering users to Microsoft Reward offerings. Search will also take the place of Facebook and inform users of important dates, such as birthdays, anniversaries, and holidays. Whenever Microsoft releases a new Windows version, they do their best to get users to adopt the new OS:

“When Microsoft releases a new operating system, it always faces the same challenge. Users and businesses are comfortable with their operating system of choice, and now the Redmond tech giant has to convince them to upgrade to the newer one. The best way to do that is to make an operating system that improves upon the old one’s formula. As such, Microsoft’s touch-ups to Windows 11’s Search tool may be an effort to convince people to leave Windows 10 behind and adopt the newer, shinier system.”

Microsoft has a poor track record when it comes to system upgrades. They have a pattern of every other OS being a bad. Windows users might want to stick with Windows 10 a little longer and wait until Windows 12. It would be nice if Microsoft also added database search options like specific date, file name, Boolean, etc.

Whitney Grace, March 30, 2022

Google: A Redefinition of Relevance

March 21, 2022

It begins with the author’s search for a new toaster. That is the example The New Yorker‘s Kyle Chayka cites as he discusses “What Google Search Isn’t Showing You.” Of course, we know that Google sells ads. It does not deliver objective search. If you want objective search, you have to do actual research, not query free services which have to make money selling user data, ads, and analyses. That is why Chayka’s initial toaster hunt produced a dissatisfying, “cluttered onslaught of homogenous e-commerce options,” as he put it.

When Google Search launched in 1998, it was free of advertising and dedicated to supplying users with the best results. At the time, co-founders Sergey Brin and Lawrence Page wrote that advertising would interfere with that goal. Even so, they introduced ads two years later; their original hypothesis was, as it turns out, correct. Then there is the entire SEO racket that has developed around gaming Google’s algorithm. And let us not forget Google’s growing willingness to push its own interests to the top of results. Chayka writes:

“Decades of search-engine optimization have resulted in content that is formulated not to inform readers but to rank prominently on Google pages. That might be one reason that my toaster results felt so redundant: each site is attempting to solve the same algorithmic equation. Gabriel Weinberg, the C.E.O. of the privacy-focused search-engine company DuckDuckGo, cited three other sources of dissatisfaction with Google Search. The first is the company’s practice of tracking user behavior, which drives the kind of creepy, chasing-you-around-the-Internet advertising that Google profits from. The second is Google prioritizing its own services in search results, by, for instance, answering a travel query with Quick Answers pulled from Google Places instead of from a richer, more social source such as Tripadvisor. Lastly, Weinberg argued, users are simply tired of Google’s dominance over their experience of the Internet. Google is reportedly paying Apple upward of fifteen billion dollars a year to remain the default search engine on iPhones. On Google’s own Android phone, changing one’s preferred search engine requires a cumbersome settings adjustment, and pop-up messages along the way urge the user to switch back to Google.”

To say the company is taking advantage of its near-monopoly is an understatement. (Google Search makes up about 85% of the search market.) Besides DuckDuckGo‘s Weinberg, the article shares comments from two other alternative-search champions, Marginalia founder Viktor Lofgren and Are.na co-founder Daniel Pianetti. See the write-up for those perspectives. When applied to Chayka’s toaster queries, both these niche platforms returned unexpected results. The author found them interesting if not particularly helpful for the online shopper. We do not know where Chayka finally decided to purchase a new toaster, but his tangent reminds us how far Google has veered from its original philosophy.

Cynthia Murrell, March 21, 2022

Microsoft: Fun Search

March 17, 2022

We have censorship. We have discriminatory spidering. We have sites which are no longer indexed. And now if ZDNet’s “real” news team is on the money, we have search fun or fun search. You pick.

Microsoft Is About to Add More Fun to Your Windows Search” reports:

… the Windows 10’s taskbar search box and search home pane will now feature content “including fun illustrations, which help you discover more, be connected, and stay productive. Search highlights will present notable and interesting moments of what’s special about each day – like holidays, anniversaries, and other educational moments in time both globally and in your region.

Great. How about that Windows search. Do you have a Drobo or similar storage device. I bet that Windows search will make that “fun.” What about a desire to locate an actual file on the C: or boot drive? I bet Microsoft will make that fun too. And I could go on? For example, don’t you love Microsoft search syntax? And let’s not forget “unfindable” files. Yeah, that’s a winner too!

How about search that just works, includes Boolean, and provides one click access to sample syntax? That would be fun too.

Stephen E Arnold, March 17, 2022

Dashworks Promises To Be The Best Enterprise Search System

March 16, 2022

Search not only remains a fundamental component of working environments, but also daily life. Quickly locating information is essential, but if a search engine low quality results it clogs up routines. TechCrunch dives into the background of a robust enterprise search system: “Dashworks Is A Search Engine For Your Company’s Sprawling Internal Knowledge.”

Dashworks promises to be a comprehensive search system that scours everything from Slack threads to Dropbox files. It wants to be an organization’s one stop search solution for internal knowledge through one centralized hub. While its homepage is helpful with FAQs and bookmarks, its cross-tool search is the real selling feature:

“More impressive, though, is its cross-tool search. With backgrounds in natural language processing at companies like Facebook and Cresta, co-founders Prasad Kawthekar and Praty Sharma are building a tool that allows you to ask Dashworks questions and have them answered from the knowledge it’s gathered across all of those aforementioned Slack threads, or Jira tickets, or Dropbox files. It’ll give you a search results page of relevant files across the services you’ve hooked in — but if it thinks it knows the answer to your question, it’ll just bubble that answer right to the top of the page, Google Snippets style.”

Dashworks is compatible with over thirty popular services and more are being added all the time. Dashworks does require access to all the services, devices, and applications within an organization, which might be alarming but necessary for cross-tool search.

Dashworks is an excellent idea, but if an employee uses their own device will it engage with platforms that should remain personal? But a promise? Hmmm.

Whitney Grace, March 16, 2022

DuckDuckWent: Can a Search System Float in the Same Content Stream Again?

March 11, 2022

I read “DuckDuckGo Ends Neutrality, Will Down-Rank Sites Associated with Russian Disinformation.” Recognizing disinformation can be tricky. Using the word Russian may make the job easier.

I am not going to get into a philosophical discussion.

For me the important point of DuckDuckGo’s decision to have an editorial policy (often called censorship) is captured in this passage from the source document:

A change in direction.

I would like to see DuckDuckGo be upfront about:

  1. The source of its search index
  2. The number of content objects compared to the indexes of Swisscows, Google, and Brave Search
  3. How deduplication works

Responding to Russia is a waddle but more steps are needed. Waddle along, DuckDuck, please.

Stephen E Arnold, March 11, 2022

Yandex: Is It Time to Say Hello, Goodbye?

March 9, 2022

For about 80 to 90 percent of the people in North America and Western Europe, “search” means Googzilla’s service. Is it useful? Legions will say, “Google’s search service is the bestest ever.” Others are more comfortable running queries on Exalead Search, Swisscows, and one of the new kids on the block like Kagi or Wecript, among others.

My personal plan of attack, as I shared with the founder of Kagi, is to run specific queries across a group of selected search engines. (Sorry, I don’t provide those in this unloved, and mostly ignored free blog. However, if you attend my 2022 National Cyber Crime Conference lecture on finding information, you will get a list of about 500 useful search/content services.)

Why am I talking about “free” or ad-supported Web search. Three reasons:

  1. Today’s search “experts” don’t pay much attention to the lack of overlap in results. Hey, reading pages of results and cross checking them is too annoying. Let’s do the TikTok thing is the way to go.
  2. Web search engines do not disclose what I call the “editorial policy.” How often does Googzilla update results eight links deep on the Department of Energy’s public facing Web site? Or, where does DuckDuckGo get its results? Or, why doesn’t IxQuick/StartPage disclose which search systems generate its results? Or why are Gigablast results for images not really images? If one discloses an editorial policy, then the shallowness, freshness, and bias of the spidering mechanisms is disclosed. Who wants that? Certainly not the Web search outfits.
  3. Serious or professional Web search systems charge money and deliver high value results simply not obtainable via free Web search systems. Why don’t these outfits market to the users of free Web search systems? These outfits don’t want to end up in an RV at the Israel River Campground in the White Mountains. A low profile is a prudent profile.

I noted this article “Russian Tech Giant Yandex Says Might Default” on Friday, March 3, 2022. I have no idea if the information in the write up is accurate, but it is suggestive. I learned that the Russian Web search engine, which is “free”, may be a goner. I noted this passage:

…the company, often called the “Russian Google” for its size and breadth of services, said that if it is suspended for more than five trading days, owners of certain bonds will legally be able to redeem their debt with interest. “The Yandex group as a whole does not currently have sufficient resources to redeem the notes in full,” the company said.

The language “suspended” and “sufficient resources” are to my way of thinking a flashing yellow light. Could that light go red?

Yandex might be hauled off to the Web search system grave yard. How will this affect Googzilla? Not at all. However, start up Web search outfits may be in a position to hit up funding sources for more cash in order to provide Yandex users with a viable option.

That sounds like a slide deck phrase, doesn’t it.

Stephen E Arnold, March 9, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta