Search Engines: Bias, Filters, and Selective Indexing

I read “It’s Not Just a Social Media Problem: How Search Engines Spread Misinformation.” The write up begins with a Venn diagram. My hunch is that quite a few people interested in search engines will struggle with the visual. Then there is the concept that typing in a search team returns results are like loaded dice in a Manhattan craps game in Union Square.

The reasons, according to the write up, that search engines fall off the rails are:

  • Relevance feedback or the Google-borrowed CLEVER method from IBM Almaden’s patent
  • Fake stories which are picked up, indexed, and displayed as value infused,

The write up points out that people cannot differentiate between accurate, useful, or “factual” results and crazy information.

Okay, here’s my partial list of why Web search engines return flawed results:

  1. Stop words. Control the stop words and you control the info people can find
  2. Stored queries. Type what you want but get the results already bundled and ready to display.
  3. Selective spidering. The idea is that any index is a partial representation of the possible content. Instruct spiders to skip Web sites with information about peanut butter, and, bingo, no peanut butter information
  4. Spidering depth. Is the bad stuff deep in a Web site? Just limit the crawl to fewer links?
  5. Spider within a span. Is a marginal Web site linking to sites with info you want killed? Don’t follow links off a domain.
  6. Delete the past. Who looks at historical info? A better question, “What advertiser will pay to appear on old content?” Kill the backfile. Web indexes are not archives no matter what thumbtypers believe.

There are other methods available as well; for example, objectionable info can be placed in near line storage so that results from questionable sources display with latency or slow enough to cause the curious user to click away.

To sum up, some discussions of Web search are not complete or accurate.

Stephen E Arnold, March 15, 2021


DarkCyber for June 9, 2020, Is Now Available: AI and Music Composition

The DarkCyber for June 9, 2020, presents a critical look at music generated by artificial intelligence. The focus is the award-winning song in the Eurovision AI 2020 competition. The interview discusses the characteristics of AI-generated music, its impact on music directors, how professional musicians deal with machine-created music, and the implications of non-numan music. The program is a criticism of the state-of-the-art for smart software. Instead of focusing on often over-hyped start ups and large companies making increasingly exaggerated claims, the Australian song and the two musicians make clear that AI is a work in progress. You can view the video at https://vimeo.com/427227666.

Kenny Toth, June 9, 2020

Latest News

ByteDance Versus Begging for Dollars

I have noticed more and more “real” news sites begging for dollars. The blandishments come in many forms. Be a patron, subscribe, buy a T shirt, spit out an... Read more »

June 18, 2021 | Comment

Google: What Is the Value of Fake News? What Did You Say?

I read a story which may be hogwash. (If you have ever cleaned a pig, you can recall the delights of that exercise on a 90 degree day in Poland China territory.... Read more »

June 18, 2021 | Comment

Restraining Strategic Tech Acquisitions in the EU

Anti-big-tech or anti-American? Is there a difference? The Macau News Agency reports, “Germany, France Want to Curb ‘Killer’ Big Tech Deals.” Left out of... Read more »

June 18, 2021 | Comment

Google Tracking: Not Too Obvious Angle, Right?

Apple is the privacy outfit. Remember? Google wants to do away with third party cookies, right? Apple was sufficiently unaware to know that the company was providing... Read more »

June 18, 2021 | Comment

TikTok: What Is the Problem? None to Sillycon Valley Pundits.

I remember making a comment in a DarkCyber video about the lack of risk TikTok posed to its users. I think I heard a couple of Sillycon Valley pundits suggest that... Read more »

June 18, 2021 | 1 Comment

The Myth, the Man: Sundar Sundararajan

Want to know about the young Sundararajan? Navigate to “5 Stories Shared By Sundar Pichai From His IIT Days That Will Make Engineers Miss Their College.” Here’s... Read more »

June 17, 2021 | Comment

Amazon Burgoo: A Recipe from the Baedeker of Zuckland

“Amazon Blames Social Media for Struggle with Fake Reviews” sparked a thought I had not entertained previously. Amazon is taking a page from the Zuck Baedeker... Read more »

June 17, 2021 | Comment

Google and Ethics: Shaken and Stirred Up

Despite recent controversies, Vox Recode reports, “Google Says it’s Committed to Ethical AI Research. Its Ethical AI Team Isn’t So Sure.” In fact, it sounds... Read more »

June 17, 2021 | Comment

A Test of Two Sentiment Analysis Libraries

A post by developer Alan Jones at Towards Data Science takes a close look at “Two Sentiment Analysis Libraries and How they Perform.” Complete with snippets... Read more »

June 17, 2021 | Comment

A Google Survey: The Cloud Has Headroom

Google sponsored a study. You can read it here. There’s a summary of the report in “Manufacturers Allocate One Third of Overall IT Spend to AI, Survey Shows.” First,... Read more »

June 17, 2021 | Comment

  • Archives

  • Recent Posts

  • Meta