A Look at Web Search: Useful for Some OSINT Work

February 22, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I read “A Look at Search Engines with Their Own Indexes.” For me, the most useful part of the 6,000 word article is the identified search systems. The author, a person with the identity Seirdy, has gathered in one location a reasonably complete list of Web search systems. Pulling such a list together takes time and reflects well on Seirdy’s attention to a difficult task. There are some omissions; for example, the iSeek education search service (recently repositioned), and Biznar.com, developed by one of the founders of Verity. I am not identifying problems; I just want to underscore that tracking down, verifying, and describing Web search tools is a difficult task. For a person involved in OSINT, the list may surface a number of search services which could prove useful; for example, the Chinese and Vietnamese systems.

A generated image based on your input prompt

A new search vendor explains the advantages of a used convertible driven by an elderly person to take a French bulldog to the park once a day. The clueless fellow behind the wheel wants to buy a snazzy set of wheels. The son in the yellow shirt loves the vehicle. What does that car sales professional do? Some might suggest that certain marketers lie, sell useless add ons, patch up problems, and fiddle the interest rate financing. Could this be similar to search engine cheerleaders and the experts who explain them? Thanks ImageFX. A good enough illustration with just a touch of bias.

I do want to offer several observations:

  1. Google dominates Web search. There is an important distinction not usually discussed when some experts analyze Google; that is, Google delivers “search without search.” The idea is simple. A person uses a Google service of which there are many. Take for example Google Maps. The Google runs queries when users take non-search actions; for example, clicking on another part of a map. That’s a search for restaurants, fuel services, etc. Sure, much of the data are cached, but this is an invisible search. Competitors and would-be competitors often forget that Google search is not limited to the Google.com search box. That’s why Google’s reach is going to be difficult to erode quickly. Google has other search tricks up its very high-tech ski jacket’s sleeve. Think about search-enabled applications.
  2. There is an important difference between building one’s own index of Web content and sending queries to other services. The original Web indexers have become like rhinos and white tigers. It is faster, easier, and cheaper to create a search engine which just uses other people’s indexes. This is called metasearch. I have followed the confusion between search and metasearch for many years. Most people do not understand or care about the difference in approaches. This list illustrates how Web search is perceived by many people.
  3. Web search is expensive. Years ago when I was an advisor to BearStearns (an estimable outfit indeed), my client and I were on a conference call with Prabhakar Raghavan (then a Yahoo senior “search” wizard). He told me and my client, “Indexing the Web costs only $300,000 US.” Sorry Dr. Raghavan (now the Googler who made the absolutely stellar Google Bard presentation in France after MSFT and OpenAI caught Googzilla with its gym shorts around its ankles in early 2023) you were wrong. That’s why most “new” search systems look for short cuts. These range from recycling open source indexes to ignoring pesky robots.txt files to paying some money to use assorted also-ran indexes.

Net net: Web search is a complex, fast-moving, and little-understood business. People who know now do other things. The Google means overt search, embedded search, and AI-centric search. Why? That is a darned good question which I have tried to answer in my different writings. No one cares. Just Google it.

PS. Download the article. It is a useful reference point.

Stephen E Arnold, February 22, 2024

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta