IBM’s Vertical Search Engine for Research Papers: As Disappointing as IBM Planetwide Search
June 26, 2008
I want to pick up the thread of my discussion of IBM’s Planetwide search system. IBM offers a vertical search system for its research publications. If you are not familiar with this system, you can access it here http://domino.research.ibm.com/library/cyberdig.nsf/index.html.
The default search page features fields. I assume that IBM believes that anyone looking for IBM research information feels comfortable with specifying authors, reports by geographic region, and the notion of narrowing a query to a title or abstract.
The first query I ran was “dataspace”, an approach to data management that dates from the 1990s. The query returned a null set just like my query for a WebFountain document on IBM Planetwide. No suggestions. No “did you mean”. No training wheels in the form of “See Also” references.
The second query was one of my favorites, programmable search engine. IBM did quite a bit of research related to this technical notion in 2004 to 2005. Again, a null set.
My third query is for Ramanathan Guha, one of the wizards involved in defining bits and pieces of the semantic Web. Again a null set. Zero hits. I was surprised by Ramanathan Guha worked at IBM Almaden before he went to Google and promptly filed five patents on the same day in 2005.
My fourth query was for “Semantic Web.” I was not too hopeful. I was zero for three in the basic query department. The system generated a page of results.
When I scanned this list, I noticed three quirks:
- I could not figure out the relevance logic in this list. The first hit does not have “semantic web” in its title but the phrase appears in the abstract. The date is 2005. The paper references the Semantic Web, yet its focus is on two IBM-emmy notions, Model-Driven Architecture (MDA) and Ontology Definition Metamodel (ODM).
- Newer documents appeared deep in the result list; for example, Kamal Bhattacharya, Cagdas Gerede, Richard Hull, Rong Liu, Jianwen Su (2007). “Towards formal analysis of artifact-centric business process models” in RC24282. I could not find a way to sort by date.
- A document that I thought was relevant was even deeper in the result list. The title, the abstract, and the paper itself evidenced numerous references to semantics and concepts germane to the query. After examining the paper, I wondered if the IBM system was putting the most relevant documents at the foot of the results list not the top. Furthermore, there were no 2008 documents on this subject, and I could not figure out exactly what was in this collection.
I clicked on the hot link for recent news. The most recent news was dated 2007 but the system offered me a hot link to 2008 news. I was expecting the news to be displayed in reverse chronological order with the most recent news at the head of the page and the older news at the foot. Nope. I clicked on the hot link for 2008 news and the system displayed this page:
At this point, I lost enthusiasm for running queries for papers from IBM research using the search system that one search pundit described to me as “quite good”.
I navigated to Google and entered this query: IBM Almaden research +”Ramanathan Guha”. Google responded in 0.23 seconds with 78 hits. The first three were:
My searching skills are not too good. I am getting old. I eat squirrel stew. My logo is a silly goose. I wear bunny rabbit ears before erudite audiences in New York. Nevertheless, the IBM search system for its research papers is not too useful. I will stick with Googzilla. IBM may want to try Google’s free custom search engine and at least deliver pretty good results instead of the disappointments I experienced. IBM-ers, agree or disagree? Search pundits weigh in. Maybe I am missing something. Time to go shoot squirrels with my water pistol. More productive than trying to find information with the IBM research vertical search engine.
Stephen Arnold, June 26, 2008