Web Search: In Flux
May 17, 2021
I listened to an interview conducted by the host of the Big Technology podcast and Sridhar Ramaswamy, the former Xoogler who was in charge of Google Advertising for a number of years. Mr. Ramaswamy’s new venture is a subscription Web search engine. The interview was interesting, but I somehow missed the definition of what will be the “Web” content the system would index. I brought up this “missing question” at lunch today because the “Web” can mean different things to different searchers. Does the system search dynamic sites like those built on Shopify? Does it index forums and public discussion groups? Does it index password protected but no cost sites like Nextdoor.com? You get the idea without my tossing in videos, audio, and tabular data on government Web sites.
What the interview did not touch upon was the Infinity search system. You can get information about this $5.00 US per month service at this link. The system seems to be a combination of metasearch and proprietary indexing. Our tests, prior to its becoming a subscription service, were mixed. Overall, the results were not as useful as those retrieved from Swisscows.com, for example. The value proposition of the Xoogler’s subscription search service and Infinity seemed similar.
I want to mention that Yippy, the Web search component of Vivisimo seems to have gone offline. I thought the Vivisimo service was interesting even though the company focused on selling itself to IBM and becoming a cog in the IBM Big Data Watson world. The on-the-fly clustering was as good if not better than the original version of Northern Light clustering. As I listened to the explanation of why the time is right for subscription search of Web (whatever that means), I wondered why Yippy did not push aggressively for subscription revenues. Perhaps subscription services make sense when plugging assumptions into an Excel model? In real life, subscriptions are difficult.
The realities of Web (whatever that means) search is that costs go up. The brutal fact is that once content is indexed, that content must be revisited and changes discerned. Indexing changed content keeps the information in the index for those sites fresh. Also, the flows of new content mean that wonky new sites like those tallied by Product Hunt have to identified, indexed, and then passed to the update queue. The users are often indifferent to indexing update cycles. Web search engines have to allocate their resources among a number of different demands; for example, which sites get updated in near real time? What sites get indexed every six months like the US government Railway Retirement Board site? What sites get a look every couple of months?
And what about the rich media? The discussion groups? The Web sites which change their method of presenting content so that a crawler just skips the site? How deep does the crawler go? What happens to images? What about sites which require users to do something to get access; for example, a user name, a password, and then authentication on a smartphone?
Net net: The world of Web search is in flux. It is more difficult than at any time in my professional life to locate specific information. Maybe subscription services will do the trick? My hunch is that the lessons of the DataStars and Dialcoms and Lycoses will helpful to today’s innovators.
What you don’t remember DataStar? That’s one of the issues experts in search and retrieval face: Learning from yesterday’s innovators.
Stephen E Arnold, May 17, 2021