The Perils of Searching in a Hurry

November 1, 2011

I read the Computerworld story “How Google Was Tripped Up by a Bad Search.” I assume that it is pretty close to events as the “real” reporter summarized them.

Let me say that I am not too concerned about the fact that Google was caught in a search trip wire. I am concerned with a larger issue, and one that is quite important as search becomes indexing, facets, knowledge, prediction, and apps. The case reported by Computerworld applies to much of “finding” information today.

Legal matters are rich with examples of big outfits fumbling a procedure or making an error under the pressure of litigation or even contemplating litigation. The Computerworld story describes an email which may be interpreted as having a bright LED to shine on the Java in Android matter. I found this sentence fascinating:

Lindholm’s computer saved nine drafts of the email while he was writing it, Google explained in court filings. Only to the last draft did he add the words “Attorney Work Product,” and only on the version that was sent did he fill out the “to” field, with the names of Rubin and Google in-house attorney Ben Lee.

Ah, the issue of versioning. How many content management experts have ignored this issue in the enterprise. When search systems index, does one want every version indexed or just the “real” version? Oh, what is the “real” version. A person has to investigate and then make a decision. Software and azure chip consultants, governance and content management experts, and busy MBAs and contractors are often too busy to perform this work. Grunt work, I believe, it may be described by some.

What I am considering is the confluence of people who assume “search” works, the lack of time Outlook and iCalandar “priority one” people face, and the reluctance to sit down and work through documents in a thorough manner. This is part of the “problem” with search and software is not going to resolve the problem quickly, if ever.

Source: http://www.clipartguide.com/_pages/0511-1010-0617-4419.html

What struck me is how people in a hurry, assumptions about search, and legal procedures underscore a number of problems in findability. But the key paragraph in the write up, in my opinion, was:

It’s unclear exactly how the email drafts slipped through the net, and Google and two of its law firms did not reply to requests for comment. In a court filing, Google’s lawyers said their “electronic scanning tools” — which basically perform a search function — failed to catch the documents before they were produced, because the “to” field was blank and Lindholm hadn’t yet added the words “attorney work product.” But documents produced for opposing counsel should normally be reviewed by a person before they go out the door, said Caitlin Murphy, a senior product manager at AccessData, which makes e-discovery tools, and a former attorney herself. It’s a time-consuming process, she said, but it was “a big mistake” for the email to have slipped through.

What did I think when I read this?

First, all the baloney—yep, the right word, folks–about search, facets, metadata, indexing, clustering, governance and analytics underscore something I have been saying for a long, long time. Search is not working as lots of people assume it does. You can substitute “eDiscovery,” “text mining,” or “metatagging” for search. The statement holds water for each.

The algorithms will work within limits but the problem with search has to do with language. Software, no matter how sophisticated, gets fooled with missing data elements, versions, and words themselves. It is high time that the people yapping about how wonderful automated systems are stop and ask themselves this question, “Do I want to go to jail because I assumed a search or content processing system was working?” I know my answer.

Second, in the Computerworld write up, the user’s system dutifully saved multiple versions of the document. Okay, SharePoint lovers, here’s a question for you? Does your search system make clear which antecedent version is which and which document is the best and final version? We know from the Computerworld write up that the Google system did not make this distinction. My point is that the nifty sounding yap about how “findable” a document is remains mostly baloney. Azure chip consultants and investment banks can convince themselves and the widows from whom money is derived that a new search system works wonderfully. I think the version issue makes clear that most search and content processing systems still have problems with multiple instances of documents. Don’t believe me. Go look for the drafts of your last PowerPoint. Now to whom did you email a copy? From whom did you get inputs? Which set of slides were the ones on the laptop you used for the briefing? What the “correct” version of the presentation? If you cannot answer the question, how will software?

Third, it is time to quit wallowing in marketing mud and time to focus on the realities of search, eDiscovery, business intelligence. Assuming these systems work as advertised is, if the Computerworld article is accurate, is probably not such a hot idea. Rushing, cost cutting, having to get a hair cut are wonderful excuses sent via a text message by a 30 something.

The problem is that rushing and a lack of diligence undermine search and content processing. I am not letting search vendors off the hook, but, gentle reader, the carelessness of the “in a hurry” youngsters is sort of a problem at least in this Computerworld example.

So what?

Four observations:

  1. Search is software and software are flawed, often deeply. Humans, also flawed, have to spend time preparing, indexing, reviewing, and searching for relevant content. There’s no app for that, at least not one that will keep a Gen Y out of an orange jump suit.
  2. Speed or the assumption that go fast is the ideal mode for work is pretty silly. Hurrying causes lots of problems. Then when the tuna goes into the grinder, the sight is not pretty and it can take lots of time to remediate. Speed in work may indeed cause an accident. An expensive accident.
  3. Ignoring the grunt work is not a good idea. Finding, analyzing, reviewing, and preparing data are a problem in many, many organizations.
  4. Search remains a difficult task. Those who ignore its complexity are likely to enjoy some exciting times in their work career. In short, a bright smile is no substitute for doing one’s information and analysis job.

I can hear the comments at lunch today. “You are too old.” “You don’t know the full story.” “You are ignoring the benefits of next generation search systems.”

I don’t care.

Computerworld has given us what may be a semi-accurate story. But it makes clear how rushing, assumptions, and carelessness with regard to search have very real and quite significant implications.

No seminar, lecture, or azure chip home economics major turned tech expert will change the problem lousy search, lousy work habits, and lousy time management create when looking for information.

Stephen E Arnold, November 1, 2011

Sponsored by Pandia.com

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta