Search Engines: Bias, Filters, and Selective Indexing

I read “It’s Not Just a Social Media Problem: How Search Engines Spread Misinformation.” The write up begins with a Venn diagram. My hunch is that quite a few people interested in search engines will struggle with the visual. Then there is the concept that typing in a search team returns results are like loaded dice in a Manhattan craps game in Union Square.

The reasons, according to the write up, that search engines fall off the rails are:

  • Relevance feedback or the Google-borrowed CLEVER method from IBM Almaden’s patent
  • Fake stories which are picked up, indexed, and displayed as value infused,

The write up points out that people cannot differentiate between accurate, useful, or “factual” results and crazy information.

Okay, here’s my partial list of why Web search engines return flawed results:

  1. Stop words. Control the stop words and you control the info people can find
  2. Stored queries. Type what you want but get the results already bundled and ready to display.
  3. Selective spidering. The idea is that any index is a partial representation of the possible content. Instruct spiders to skip Web sites with information about peanut butter, and, bingo, no peanut butter information
  4. Spidering depth. Is the bad stuff deep in a Web site? Just limit the crawl to fewer links?
  5. Spider within a span. Is a marginal Web site linking to sites with info you want killed? Don’t follow links off a domain.
  6. Delete the past. Who looks at historical info? A better question, “What advertiser will pay to appear on old content?” Kill the backfile. Web indexes are not archives no matter what thumbtypers believe.

There are other methods available as well; for example, objectionable info can be placed in near line storage so that results from questionable sources display with latency or slow enough to cause the curious user to click away.

To sum up, some discussions of Web search are not complete or accurate.

Stephen E Arnold, March 15, 2021


DarkCyber for June 9, 2020, Is Now Available: AI and Music Composition

The DarkCyber for June 9, 2020, presents a critical look at music generated by artificial intelligence. The focus is the award-winning song in the Eurovision AI 2020 competition. The interview discusses the characteristics of AI-generated music, its impact on music directors, how professional musicians deal with machine-created music, and the implications of non-numan music. The program is a criticism of the state-of-the-art for smart software. Instead of focusing on often over-hyped start ups and large companies making increasingly exaggerated claims, the Australian song and the two musicians make clear that AI is a work in progress. You can view the video at https://vimeo.com/427227666.

Kenny Toth, June 9, 2020

Latest News

Google Tesla: A New Play for the Final Frontier?

I read some real “news provided by Google Cloud”. The story was “Google Cloud and SpaceX’s Starlink to Deliver Secure, Global Connectivity.” The write... Read more »

May 14, 2021 | Comment

Who Watches? Mom or a 20-Something?

It is undeniable that COVID-19 has forever changed the work environment. In order to guarantee that telecommuting workers were being productive, organizations adopted... Read more »

May 14, 2021 | Comment

Realistic AI Clones are Here

Is this the future of our now ubiquitous Zoom meetings? PetaPixel tells us that “AI Can Now Turn You Into a Fully Digital, Realistic Talking Clone.” Startup... Read more »

May 14, 2021 | Comment

Microsoft Partners Up for Smarter Security

I noted “Microsoft Partners with Darktrace to Help Customers Combat Cyber Threats with AI.” You may know that Microsoft has been the subject of some attention.... Read more »

May 13, 2021 | Comment

The Amusing Antics of Big Tech Monopoly-Type Companies

If I use my imagination, I can hear the comments in the TV room of a fraternity house near the Chambana campus of the University of Illinois. “Dudes, we can make... Read more »

May 13, 2021 | Comment

Amazon: An AWS Network Which Seems Super Juicy

At the National Cyber Crime Conference, I ran through the basics of Amazon’s data acquisition method. Over the last four years, I have given six, maybe eight,... Read more »

May 13, 2021 | Comment

More Search Explaining: Will It Help an Employee Locate an Errant PowerPoint?

“Semantics, Ambiguity, and the role of Probability in NLU” is a search-and-retrieval explainer. After half a century of search explaining, one would think that... Read more »

May 13, 2021 | 1 Comment

Googley Logic: Can Money Buy Success, Innovation, and Insecurity?

Two items caught my attention this morning. No, I am not talking about warfighting, a horse trainer’s about face on the use of controlled substances in the... Read more »

May 12, 2021 | Comment

An App Twist: Online Interaction and Dark Patterns May Pose a Threat to Users

I don’t know if this write up is spot on, but it does raise some interesting questions. Navigate to “Snapchat Can Be Sued over Its Speed Filter, Which Is Blamed... Read more »

May 12, 2021 | Comment

Facebook Tracking: Why Secrets Are Important to Some Digital Players

I read a headline which I assume was crafted to shock; to wit: “Analytics Suggest 96% of of Users Leave App Tracking Disabled in iOS 14.5.” The headline did... Read more »

May 12, 2021 | Comment

  • Archives

  • Recent Posts

  • Meta