IBM Search: A Trial of Patience for Customers

June 25, 2008

A quick question. What is the url for IBM’s public Web search? Ah, you did not know that IBM had a Web search system. I did. IBM’s crawler once paid a quick visit to my Web site years ago. You can use this service yourself. Navigate to http://www.ibm.com/search. The service is called the IBM Planetwide Web.

Let us run a test query. My favorite test query is for an IBM server called the PC704. I once owned two of these four processor Pentium Pro machines. For years I wanted to upgrade the memory to a full gigabyte, so I became a regular Sherlock Holmes as I tried to find memory I could afford.

Here are the results for this query PC704.

ibm results

The screen shot is difficult to read, but there is one result–a reference in an IBM technical manual. Let us click on the link. We get a link to a manual about storage sub systems. I know that IBM discontinued the PC704, but the fact that there is no archive of technical information about this system is only slightly less baffling than the link to the storage documentation.

Let’s try another query. Navigate to http://www.ibm.com. We are greeted with a different splash screen with an option to “sign in” and a search box. Let’s run a new query “text mining”. The system responds with a laundry list of results. The first five hits are primarily research documents. The second page of the results has links to two IBM text mining systems, IBM TAKMI and IBM Text Mining Server. TAKMI is another research link and the Text Mining Server is on the IBM developer Web site.

I don’t know about you, but I received one hit for PC704 and and quite a few research hits for text mining. Where is the product information?

Let us persist. I know that IBM had a product called WebFountain. I want information about that product. I enter the single word, WebFountain, and the IBM system responds with 152 results. The documentation links figure prominently as well as pointers to information about a WebFountain appliance and architecture for a large-scale text analytics system.

Result 13 seemed to be on target. Here is what the Planetwide system showed me:

IBM – WebFountain – United States
WebFountain is a new text analytics technology from IBM’s Research division that analyzes millions of pages of data weekly.
URL: http://www-304.ibm.com/jct03004c/businesscenter/vent…

And here is the Web page this link displays.

webfountain result

Stepping Back

What have these three queries revealed?

  1. Despite the cratering of prices for storage devices, IBM does not maintain an archive of information about its older systems. The single hit for the string PC704 was to a book about storage. The string PC704 probably appears in this technical manual, but the system’s precision and recall disappointed me.
  2. The second query for text mining generated more than 3,000 hits. My inspection of the results suggested to me that IBM was indexing technical information. Some of the documents appeared to be as old as the PC704 that was not available in the index. The results provided no context for the bound phrase, and the results were to me delivering unsatisfactory precision. Recall was better than the single hit for PC704 however. To me, irrelevant hits are not much better than one hit.
  3. The third query for an IBM product called WebFountain generated hits to research reports, documentation, and a Web site about WebFountain. Unfortunately, the link was active but there were no data displayed on the Web page.

All in all, IBM’s Planetwide search is pretty lousy for me. Your mileage may vary, of course.

Google

Now, let us run these three queries on Google’s Web search system.

The first query for PC704 returned 1,904 hits to the string PC704. A quick review of the results showed some false drops because other companies have products with those letters and numbers in their name. However, I was able to locate resellers with server parts available and links to documentation for the PC704. Definitely better than IBM’s Planetwide system.

The second query for text mining caused Google to identify more than two million results. I reran the query as “IBM text mining”, and the system returned in 0.21 links to IBM research papers specifically about text mining, the TEMIS system now spun out of IBM, and the link to UIMA and DB2 Intelligent Miner, a current product available one presumes from IBM. Google hit the ball out of the park. These were highly relevant and useful results.

The third query for WebFountain displayed in 0,20 seconds and returned more than 20,000 results. The Google results were directly on target. Here are the first three hits:

IBM WebFountain – Wikipedia, the free encyclopedia

WebFountain is an Internet analytics engine implemented by IBM for the study of unstructured data on the World Wide Web. IBM describes WebFountain as:
en.wikipedia.org/wiki/IBM_WebFountain – 19k – CachedSimilar pages

IBM’s WebFountain Launched–The Next Big Thing?

IBM has launched a service named WebFountain that applies an elaborate mesh of software called text mining or text analytics to spidered data from across
www.infotoday.com/newsbreaks/nb030922-1.shtml – 41k – CachedSimilar pages

How to build a WebFountain: An architecture for very large-scale

WebFountain is a platform for very large-scale text analytics applications. …. However, architecturally WebFountain is a distributed architecture based on
www.research.ibm.com/journal/sj/431/gruhl.html – 71k – CachedSimilar pages

Google’s search system delivers more relevant and useful results than IBM’s Planetwide system.

Why This Drill?

Not long ago, a search wizard took exception with my suggestion that IBM’s Web site search system was not very useful. In fact, it is pretty difficult to find information. The site has weird blank result pages. The system reminds me of the days when Excite or Lycos could not process content that was retrieved by the systems’ crawlers. Very 1998.

I find this remarkable. IBM acquired iPhrase, a fancy content processing system. IBM developed UIMA so integrating content processing systems is easier than ever. IBM has its own home grown technology from its various research labs, which seem to be in the publishing business, not the search business. IBM has Lucene, the open source search system in the free IBM Yahoo system. IBM somewhere has a variant of STAIRS III, now called Server Master or something along this line. (Go ahead and search for this product name. I do not want to run any more queries on the sluggish Planetwide system.) There is the search function that ships with DB2. Illustra has a search data blade. Goodness knows how many search partners like Autonomy, Endeca, and others IBM has. Why not install a system that sort of works?

Still the IBM Web site search is awful in my opinion. I am certain the wizard who wrote me can make Planetwide work like an expensive Swiss watch. I can’t, and I think anyone who tries to locate information on www.ibm.com will find the effort required above average. Compared to Google, IBM’s Web site search is slightly better than a manual literature search in one of IBM’s technical libraries which used to run on BRS Search, not IBM STAIRS III.

Observations

Let me offer some observations:

  1. IBM does not care about the efficacy of the search system; otherwise, IBM would make an effort to implement some of the whizzy IBM Almaden technologies to improve precision, recall, performance, and the user experience
  2. IBM makes it difficult to locate information about specific products. The company rolls out a new initiative and turns on its PR machine. When the product goes nowhere, the information is orphaned. The result are the weird blank pages that I encounter in many queries.
  3. IBM does not retain technical information about older products. Perhaps IBM thinks this policy will stimulate people like me to retire my PC704 or my two NetFinity 5500. What is does is motivate me to use white box servers and avoid the pain of trying to figure out how to find FRU (field replaceable unit) numbers, technical documentation for ServeRAID versions, and disassembly instructions when jumpers are below multi layer motherboards requiring the technician to tear down an entire machine to change the monitor resolution.

I may be off base here, but I am confident the pundit who told me that IBM search was wonderful will set me straight. IBM should license the Google Search Appliance and call it a day.

Stephen Arnold, June 25, 2008

Comments

2 Responses to “IBM Search: A Trial of Patience for Customers”

  1. Vincent McBurney on June 25th, 2008 8:52 am

    I blog about the IBM Information Server so I trawl through IBM.com a lot looking for new technical documents, press releases, software etc. I’ve found the IBM search has got better in recent times when you search for single word popular products but it struggles with multiple word searches or niche products. I don’t think IBM have enough search users to refine the search results for the long tail like Google or Yahoo can and they don’t have enough visibility over links from the outside world to identify popular pages.

    I’ve found much better results entering the IBM website via google. For example I can google ~Install IBM Information Server~ and find the right IBM web page right away but if I IBM search the same term the results are completely useless. The multiple word search is picky and semantic. “Install Information Server” wont work but install “information server” will get better results. In Google you don’t need to worry about inverted commas.

    The Google “site:” search can be a useful way to dig into ibm.com if you get to know the domains like IBM developerworks and IBM downloads.

  2. Stephen E. Arnold on June 25th, 2008 10:58 am

    Vincent, thanks for taking the time to add a comment. I agree that Google makes IBM’s Web site usable. The real issue that put a burr under my saddle was a comment that IBM’s search is quite good. That’s is not true for me. IBM, of course, is all knowing because it is a consulting firm selling almost anything that will fit in a peddler’s satchel. I think IBM might use some of its expertise to make its own content findable.

    Stephen Arnold, June 25, 2008

  • Archives

  • Recent Posts

  • Meta