Promise Best Practices: Encouraging Theoretical Innovation in Search

March 29, 2013

The photo below shows the goodies I got for giving my talk at Cebit in March 2013. I was hoping for a fat honorarium, expenses, and a dinner. I got a blue bag, a pen, a notepad, a 3.72 gigabyte thumb drive, and numerous long walks. The questionable hotel in which I stayed had no shuttle. Hitchhiking looked quite dangerous. Taxis were as rare as an educated person in Harrod’s Creek, and I was in the same city as Leibnitz Universität. Despite my precarious health, I hoofed it to the venue which was eerily deserted. I think only 40 percent of the available space was used by Cebit this year. The hall in which I found myself reminded me of an abandoned subway stop in Manhattan with fewer signs.

image

The PPromise goodies. Stuffed in my bag were hard copies of various PPromise documents. The most bulky of these in terms of paper were also on the 3.73 Gb thumb drive. Redundancy is a virtue I think.

Finally on March 23, 2013, I got around to snapping the photo of the freebies from the PPromise session and reading a monograph with this moniker:

Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation. FP7 ICT 20094.3, Intelligent Information Management. Deliverable 2.3 Best Practices Report.

The acronym should be “PPromise,” not “Promise.” The double “P” makes searching for the group’s information much easier in my opinion.

If one takes the first letter of “Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation” one gets PPromise. I suppose the single “P” was an editorial decision. I personally like “PP” but I live in a rural backwater where my neighbors shoot squirrels with automatic weapons and some folks manufacture and drink moonshine. Some people in other places shoot knowledge blanks and talk about moonshine. That’s what makes search experts and their analyses so darned interesting.

To point out the vagaries of information retrieval, my search to a publicly accessible version of the PPromise document returned a somewhat surprising result.

image

A couple more queries did the trick. You can get a copy of the document without the blue bag, the pen, the notepad, the 3.72 gigabyte thumb drive, and the long walk at http://www.promise-noe.eu/documents/10156/086010bb-0d3f-46ef-946f-f0bbeef305e8.

So what’s in the Best Practices Report? Straightaway you might not know that the focus of the whole PPromise project is search and retrieval. Indexing, anyone?

Let me explain what PPromise is or was, dive into the best practices report, and then wrap up with some observations about governments in general and enterprise search in particular.

What’s a PPromise?

There are four links on the PPromise Web site to answer this question. Here’s a snippet so you can prep yourself for a deeper dive. I am okay with the wading thing and dipping my toes in the PPromise pool:

Large-scale worldwide experimental evaluations provide fundamental contributions to the advancement of state-of-the-art techniques through common evaluation procedures, regular and systematic evaluation cycles, comparison and benchmarking of the adopted approaches, and spreading of knowledge. In the process, vast amounts of experimental data are generated that beg for analysis tools to enable interpretation and thereby facilitate scientific and technological progress.

PROMISE (sic) will provide a virtual laboratory for conducting participative research and experimentation to carry out, advance and bring automation into the evaluation and benchmarking of such complex information systems, by facilitating management and offering access, curation, preservation, re-use, analysis, visualization, and mining of the collected experimental data. PROMISE (sic) will:

  • foster the adoption of regular experimental evaluation activities;
  • bring automation into the experimental evaluation process;
  • promote collaboration and re-use over the acquired knowledge-base;
  • stimulate knowledge transfer and uptake.

Europe is unique: a powerful economic community that politically and culturally strives for equality in its languages and an appreciation of diversity in its citizens. New Internet paradigms are continually extending the media and the task where multiple language based interaction must be supported. PROMISE (sic) will direct a world-wide research community to track these changes and deliver solutions so that Europe can achieve one of its most cherished goals.

PPromise is into evaluation which suggests looking at what is available. Since this is a Euro-centric project I assume the touchstones will be Autonomy-type, Fast Search-type, and Sphinx Search-type systems. PPromise also hits the buzzwords which often have difficulty understanding when my own researchers bandy them about. The PPromise approach embraces access, visualization, etc. Finally, Promise wants to kick in the collaboration gene which suggests Facebook-type of services. I could not locate the PPromise Facebook page. My fault I assume.

image

A diagram of the information access cycle. Seems simple enough. Why do so many search installations generate 55 percent or higher user dissatisfaction? Searches via Google are even simpler. The user doesn’t have to do anything to get information. Google predicts what the user needs and queues it up. Simple. Maybe that’s why Google’s share of the search market in Europe is over 90 percent across the European Community countries. Only Russia with Yandex is going against the Google flow.

I noted the statement “Europe is unique.” I would agree. The financial issues associated with Portugal, Italy, Ireland, and Greece are distress beacons. Each week’s issue of the Economist (the newspaper which sure looks like a glossy magazine here in Kentucky) offers such supplementary crisis updates as “The Eurozone Crisis.” Kentucky is in equally dire straits. However, Kentucky is in the middle of nowhere and populated by folks with zero pretensions for financial acuity or much knowledge of best practices. Exceptions include horse racing, bourbon, gambling, and basketball.

What’s in the best practices report? The summary does a far better job than I can in crunching down the 45 pages of tables and bibliographies.

This  report  presents  best  practice  recommendations  for  information  retrieval  (IR)  system developers,  IR  application  implementers  and  IR  application  maintainers.  It  covers  the  main aspects  of IR  systems  and  applications,  as  well  as  recommendations for  the  user  interface and  evaluation. The  best  practices  presented  are  the  result  of  a  distillation  of  academic  IR output, taken mainly from experiments conducted within the confines of the CLEF evaluation campaigns, but also from additional sources. Elaboration was carried out both as a manual, intellectual effort, but also using semi-automatic, statistical methods that provided additional evidence  for  validation.  Information  retrieval  technology  is  today  used  for  very  diverse purposes,  supporting  a  range  from  “classical”  search  engines  to  applications  such  as  topic detection  or  recommender  systems.  It  is  thus  important  to  provide  context  to  the  individual recommendations.  The  report  proposes  a  structure  for  the  different  best  practice recommendations  that  states  limitations  and  qualifications  for  different  use  case  domains, and is prepared to include direct links to experiments and tested configurations in the future.

If you want a checklist of actions a search system should be able to perform, you are in business.

What’s the significance of the “best practices”? I am not sure how to answer the question. Let me come at it in this way.

  1. For government procurement team, the list of best practices is a way to formulate future grant proposals, enhance statements of work, and stimulate a mostly abstract discussion of what search should be.
  2. For an academic, the list of “best practices” will keep graduate students busy for decades. In the quest for “good enough” search, why not “good enough” research?
  3. For a vendor, the best practices report is a crib sheet for figuring out how to comment about such topics as “general retrieval” in a “retrieval paradigm” or explain a “recall oriented retrieval scenario.” MBA marketers will have a field day generating more sales gobbledygook.
  4. For an investment firm the best practices are a reminder of how difficult it is to make money with search, retrieval, content processing, and allied disciplines. The issues associated with Autonomy-type, Fast Search-type, and Sphinx Search-type systems are not from the dust bin of history. The problems are now issues. More troubling, the “now” issues are the same ones which have plagued information retrieval for decades.

I was going to write this commentary using my PPromise pen. But the ink had dried up. The USB thumb drive is okay, but too limited for the information I cart around with me on my jaunts around the world. The bag has the interesting characteristic of allowing the contents to spill out. The paper notepad is a keeper, but I have most of my ephemera in the cloud.

My hunch is that the promise of cracking the code for search and retrieval remains unmet in the US, Europe, and elsewhere. The fix? Fund another round of government studies. This approach to innovation works like gang busters here in the USA.

In the meantime, search is broken. Users just accept what the system outputs. Critical thinking does not play a major role in the majority of online searching based on the data I have reviewed.

Most of the US and European vendors are desperate for some way to make a buck. The notion of “search” has been buried under big data, predictive analytics, sentiment analysis, and other fancy sounding buzzwords.

Organizations are looking at bundles, discounted products, and freebies. Open source search is disruptive, and the dust up with a certain European search vendor and a US company does not bolster confidence in some circles.

In my opinion organizations and users are stuck with findability challenges. Progress, promises, and precision—elusive.

Stephen E Arnold, March 29, 2013

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta