A Non Search Person Explains Why Search Is a Lost Cause

December 16, 2013

The author of “2013: the Year ‘the Stream’ Crested” is focused on tapping into flows of data. Twitter and real time “Big Data” streams are the subtext for the essay. I liked the analysis. In one 2,500 word write up, the severe weaknesses of enterprise and Web search systems are exposed.

The main point of the article is that “the stream”—that is, flows of information and data—is what people want. The flow is of sufficient volume that making sense of it is difficult. Therefore, an opportunity exists for outfits like The Atlantic to provide curation, perspective, and editorial filtering. The write up’s code for this higher-value type of content process is “the stock.”

The article asserts:

This is the strange circumstance that obtained in 2013, given the volume of the stream. Regular Internet users only had three options: 1) be overwhelmed 2) hire a computer to deploy its logic to help sort things 3) get out of the water.

The take away for me is that the article makes clear that search and retrieval just don’t work. Some “new” is needed. Perhaps this frustration with search is the trigger behind the interest in “artificial intelligence” and “machine learning”? Predictive analytics may have a shot at solving the problem of finding and identifying needed information, but from what I have seen, there is a lot of talk about fancy math and little evidence that it works at low cost in a manner that makes sense to the average person. Data scientists are not a dime a dozen. Average folks are.

Will the search and content processing vendors step forward and provide concrete facts that show a particular system can solve a Big Data problem for Everyman and Everywoman? We know Google is shifting to an approach to search that yields revenue. Money, not precision and recall, is increasingly important. The search and content  vendors who toss around the word “all” have not been able to deliver unless the content corpus is tightly defined and constrained.

Isn’t it obvious that processing infinite flows and changes to “old” content are likely to cost a lot of money. Google, Bing, and Yandex search are not particularly “good.” Each is becoming a system designed to support other functions. In fact, looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index. The vendors are not talking.

In the enterprise, the problem is almost as hopeless. Vendors invent new words to describe a function that seems to convey high value. Do you remember this catchphrase: “One step to ROI”? How do you think that company performed? The founders were able to sell the company and some of the technology lives on today, but the limitations of the system remain painfully evident.

Search and retrieval is complex, expensive to implement in an effective manner, and stuck in a rut. Giving away a search system seems to reduce costs? But are license fees the major expense? Embracing fancy math seems to deliver high value answers? But are the outputs accurate? Users just assume these systems work.

Kudos to Atlantic for helping to make clear that in today’s data world, something new is needed. Changing the words used to describe such out of favor functions as “editorial policy”, controlled terms, scheduled updates, and the like is more popular than innovation.

Stephen E Arnold, December 16, 2013


One Response to “A Non Search Person Explains Why Search Is a Lost Cause”

  1. Paul T. Jackson on December 17th, 2013 1:15 pm

    A couple of things. Google index, used by many search engines is actually doing updating on the fly, i.e., it never stops, so there are no “scheduled” updates to Google.

    It is correct that the search engines are doing more than searching for content. They are now erroneously searching for content that it thinks we want to find, using our past searches for biases in what we look at or what we might be buying…then passing that information on to us; a packaged original consumer specific result, but not necessarily what is needed or wanted. This is why free search engines are pretty much useless. Whether it’s product on Amazon or information on Google, you will never get the same results twice. You can’t go back with the same terms an be likely to find the same things.

    The above issues are why we hear this all the time, “looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index.” The information is still there, it’s just that the algorithms don’t think you want to see it.

    Yes, there are things which are taken down. I found a file of PDF and Word files was no longer around on the internet…although the link I had did indeed find the page where they had been. I was able to find out from the site’s owner that the information was now incorporated into a book of oral history.

    I’ve found that reference sites have been taken down by the owners, but some can be found still at the Wayback Machine (internet archive.org,) should you have enough information for the search.

    There are ways to write webbots or spiders which will go out and find specific targets, but only a few will know how to do this. And some sites won’t let those webbots in to bring the information back to you.

    To say search just doesn’t work, is a bit of overkill. It has never worked for some things. No search engine I know seems to know the difference between music and recordings of music. It can’t find the difference between “drum part” i.e., music, and “drum part” i.e., a tension rod or such things which put drums together.

    To some extent it’s true…search doesn’t work. Although it does work for many things, it may not be useful for all retrieval. I’ve been hearing about natural language searching for years; I guess they now talk about the same thing as Semantic search. As we get further along, it is getting more frustrating not to be able to go back and find what you might have seen a few minutes before. No one seems to want to fix that. As Eli Pariser has written in The Filter Bubble. Search engines are only interested in monetizing you, and giving you what it thinks you want…thus not allowing you to see what others see.

