X1 Extracts Search Patent

June 26, 2008

X1 Technologies in Pasadena, Calif., has been “innovating” (X1;’s word, not mine) search solutions since 2003. The company has patented an enterprise search solution touted to be quick and efficient that uses an advanced technique to find search results even as you’re typing your query. X1 calls the innovation “fast-as-you-type” search, and it narrows results as you keep typing. With this patent, X! wants to make their mark and assert themselves in the enterprise search market with their X1 Enterprise Search Suite. The suite searches over 400 applications file types across PCs and servers alike. The company has more patents pending, trying to build on this idea; the firm’s plan is to change how search is “done”, not just to make search faster. You can download 7,370,035, Methods and Systems for Search Indexing, here.

Jess Bratcher, June 26, 2008

Nstein: Finding Greener Pastures

June 26, 2008

Nstein Technologies scored huge in a signing a deal with Bonnier Magazine Group, one of the largest magazine publishing companies in the United States. You can read the Nstein news release here. Bonnier will use NStein’s content management products to digitize, sort, tag and spruce up all their online material and digital data assets, and NStein gets big business from the company that owns Time, Popular Science, and Parenting. Once a content processing company, Nstein is morphing into a solution for data problems. Will other search and content processing vendors follow. Beyond Search’s call: solutions will be easier to sell than search, a function that baffles some senior managers.

Jess Bratcher, June 26, 2008

IBM’s Vertical Search Engine for Research Papers: As Disappointing as IBM Planetwide Search

June 26, 2008

I want to pick up the thread of my discussion of IBM’s Planetwide search system. IBM offers a vertical search system for its research publications. If you are not familiar with this system, you can access it here http://domino.research.ibm.com/library/cyberdig.nsf/index.html.

The default search page features fields. I assume that IBM believes that anyone looking for IBM research information feels comfortable with specifying authors, reports by geographic region, and the notion of narrowing a query to a title or abstract.

research search form

The first query I ran was “dataspace”, an approach to data management that dates from the 1990s. The query returned a null set just like my query for a WebFountain document on IBM Planetwide. No suggestions. No “did you mean”. No training wheels in the form of “See Also” references.

The second query was one of my favorites, programmable search engine. IBM did quite a bit of research related to this technical notion in 2004 to 2005. Again, a null set.

My third query is for Ramanathan Guha, one of the wizards involved in defining bits and pieces of the semantic Web. Again a null set. Zero hits. I was surprised by Ramanathan Guha worked at IBM Almaden before he went to Google and promptly filed five patents on the same day in 2005.

My fourth query was for “Semantic Web.” I was not too hopeful. I was zero for three in the basic query department. The system generated a page of results.

ibm research semantic web results

When I scanned this list, I noticed three quirks:

  1. I could not figure out the relevance logic in this list. The first hit does not have “semantic web” in its title but the phrase appears in the abstract. The date is 2005. The paper references the Semantic Web, yet its focus is on two IBM-emmy notions, Model-Driven Architecture (MDA) and Ontology Definition Metamodel (ODM).
  2. Newer documents appeared deep in the result list; for example, Kamal Bhattacharya, Cagdas Gerede, Richard Hull, Rong Liu, Jianwen Su (2007). “Towards formal analysis of artifact-centric business process models” in RC24282. I could not find a way to sort by date.
  3. A document that I thought was relevant was even deeper in the result list. The title, the abstract, and the paper itself evidenced numerous references to semantics and concepts germane to the query. After examining the paper, I wondered if the IBM system was putting the most relevant documents at the foot of the results list not the top. Furthermore, there were no 2008 documents on this subject, and I could not figure out exactly what was in this collection.

I clicked on the hot link for recent news. The most recent news was dated 2007 but the system offered me a hot link to 2008 news. I was expecting the news to be displayed in reverse chronological order with the most recent news at the head of the page and the older news at the foot. Nope. I clicked on the hot link for 2008 news and the system displayed this page:

ibmm 2008 news

At this point, I lost enthusiasm for running queries for papers from IBM research using the search system that one search pundit described to me as “quite good”.

I navigated to Google and entered this query: IBM Almaden research +”Ramanathan Guha”. Google responded in 0.23 seconds with 78 hits. The first three were:

ibm research guha

My searching skills are not too good. I am getting old. I eat squirrel stew. My logo is a silly goose. I wear bunny rabbit ears before erudite audiences in New York. Nevertheless, the IBM search system for its research papers is not too useful. I will stick with Googzilla. IBM may want to try Google’s free custom search engine and at least deliver pretty good results instead of the disappointments I experienced. IBM-ers, agree or disagree? Search pundits weigh in. Maybe I am missing something. Time to go shoot squirrels with my water pistol. More productive than trying to find information with the IBM research vertical search engine.

Stephen Arnold, June 26, 2008

Business Objects: Number One in Business Intelligence… for Now

June 26, 2008

Business intelligence–along with content management and enterprise search–is a mid-sized blob of marketing mercury. The big names in the US are SPSS and SAS Institute. Both work hard to get colleges and universities to teach eager math students how to make these proprietary systems make data walk on their hind legs, roll over, and sit on command. Business Objects, a sales-oriented company, has made in roads into the SPSS and SAS client base and now the Gartner Group has named Business Objects as the number one business intelligence outfit.

You can read SearchDataManagement.com’s summary of the Gartner research here. You can read the Business Objects news release here. Let’s get to the meat of the Gartner study. For me this was the key point:

Combined, SAP and Business Objects controlled 26.3% of the global BI platform market in 2007, nearly double their nearest competitors. IBM and Cognos held 14.7% market share, followed by the SAS Institute at 14.5%.

So, “combined” makes Business Objects number one. Chop out the SAP part and Business Objects posts nearly $1.0 billion in revenues. Will Business Objects be able to maintain is revenues? Will the company be able to make Inxight Software into more than a content utility? Will superplatforms such as IBM. Microsoft, and Oracle bundle business intelligence with higher value systems sucking the air out of Business Objects’ growth?

For me, Business Objects means excellent sales management. Could its success come from the lack of marketing and sales management expertise, not its technology?

Stephen Arnold, June 26, 2008

One Reason Why Microsoft May Not Make Search a Success

June 26, 2008

The Bill Gates “noise” echoed in Kentucky. I read PCMag.com’s “Exclusive: The Bill Gates Exit Interview” here. The interview merits your attention. I zoned out with references to “the platform” and choked when I encountered this comment: “Everything in computer science is to just write less code.” And I was baffled with references to a “natural user interface”. But I am a Kentucky hill billy.

I tried to avoid reading about “Bill Gates’ Web Experience”. Michael Krigsman does work I enjoy, but I was hooked. Mr. Krigsman pulled the best bits from a PDF of an email exchange here. I discovered that this was a “flame” among Microsofties. You can read SeattlePI.com’s take on the exchange and learn why the PDF has confidential stamped on it.

I read the emails and ignored the complaints about Mr. Gates’s problems using Windows XP. What’s new?

The email put the PCMag.com interview into perspective for me. Here is the key line in the email thread. One Microsoftie writes, “I am owning the website issues.” [sic].

Now, for me the telling comment is in a response to this person’s attempt to provide leadership, accept responsibility for the mess, and fix the problem. Ready, here is what a Microsoft employee identified as Mike Beckerman wrote: “I don’t know what it means to ‘own website issues…‘”. I have added the emphasis.

Now my observations:

  1. I am no leader, but I recognize that the person stepping forward to assume responsibility is walking and talking like a leader. Leaders are good because good leaders make things happen. For a colleague not to know what it means to “own Web site issues” is snide. In some organizations, the comment would be close to insubordinate.
  2. When colleagues cannot cede control, preferring to keep the status quo, the management process is in danger of veering off track. The email exchange took place in 2003 and now it is 2008. The Yahoo deal flopped. Vista is an issue for some. The enterprise search and Web search initiatives are spinning their wheels. I would assert that these are examples of flawed management and a refusal for colleagues to sort out their differences and find a leader to guide them forward.
  3. Google may have some challenges ahead. But if this email exchange is accurate (it may be a hoax for all I know), Microsoft may have some trouble closing the gap with Google in advertising, search, and cloud-based services. Google is a great many things, but so far it has avoided the headwind caused by employees who disregard a plea for changes from the fellow who founded the company.

Hopefully, I won’t have to read any more about Mr. Gates’s retirement, which I believe, has him on the Microsoft campus two or three days a week. Oh, the problems identified in the 2003 “flame” emails are still around. No one was able to fix them. Well, there is always next year, which is what IBM said about OS/2.

Stephen Arnold, June 26, 2008

Google: Snuggling with OCLC

June 25, 2008

Digital Document Quarterly, Volume 7, Number 2, 2008 provided this item:

OCLC and Google have agreed to exchange book discovery data. Google will link from Google Book Search to WorldCat, which will drive traffic to online library services. Google will also share digitized book data. WorldCat will represent OCLC member library collections and link books scanned by Google. A user who finds a book in Google Book Search will be able to use WorldCat to find local library copies.

You can read the DDQ at http://home.pacbell.net/hgladney/ddq_7_2.htm. I recommend the publication if you have an interest in the library side of online information and digital documents.

My view of this is that slowly, ever so slowly, Google is encroaching on the traditional database world. I am confident the management gurus at ProQuest, Ebsco Electronic Publishing, Newsbank, and the other firms servicing this important but shrinking market has a GPS device on Googzilla.

A happy quack to H.M. Gladney from the Beyond Search goose.

Stephen Arnold, June 25, 2008

Hosted SharePoint

June 25, 2008

Tired of trying to figure out where SharePoint put a file? Relief is available from an outfit called SharePoint 360, company offering cloud-based SharePoint. You can read about the company here. SharePoint 360 is Microsoft Gold Certified partner. If this service takes off, Microsoft will move forward with more software as a service offerings. Details about the hosted SharePoint service are here.

The company says:

Our approach allows for even the most non-technical users to quickly get started and feel as comfortable working with Microsoft SharePoint as they do with Word or Excel.

The service warrants a test drive.

Stephen Arnold, June 25, 2008

IBM Search: A Trial of Patience for Customers

June 25, 2008

A quick question. What is the url for IBM’s public Web search? Ah, you did not know that IBM had a Web search system. I did. IBM’s crawler once paid a quick visit to my Web site years ago. You can use this service yourself. Navigate to http://www.ibm.com/search. The service is called the IBM Planetwide Web.

Let us run a test query. My favorite test query is for an IBM server called the PC704. I once owned two of these four processor Pentium Pro machines. For years I wanted to upgrade the memory to a full gigabyte, so I became a regular Sherlock Holmes as I tried to find memory I could afford.

Here are the results for this query PC704.

ibm results

The screen shot is difficult to read, but there is one result–a reference in an IBM technical manual. Let us click on the link. We get a link to a manual about storage sub systems. I know that IBM discontinued the PC704, but the fact that there is no archive of technical information about this system is only slightly less baffling than the link to the storage documentation.

Let’s try another query. Navigate to http://www.ibm.com. We are greeted with a different splash screen with an option to “sign in” and a search box. Let’s run a new query “text mining”. The system responds with a laundry list of results. The first five hits are primarily research documents. The second page of the results has links to two IBM text mining systems, IBM TAKMI and IBM Text Mining Server. TAKMI is another research link and the Text Mining Server is on the IBM developer Web site.

I don’t know about you, but I received one hit for PC704 and and quite a few research hits for text mining. Where is the product information?

Let us persist. I know that IBM had a product called WebFountain. I want information about that product. I enter the single word, WebFountain, and the IBM system responds with 152 results. The documentation links figure prominently as well as pointers to information about a WebFountain appliance and architecture for a large-scale text analytics system.

Result 13 seemed to be on target. Here is what the Planetwide system showed me:

IBM – WebFountain – United States
WebFountain is a new text analytics technology from IBM’s Research division that analyzes millions of pages of data weekly.
URL: http://www-304.ibm.com/jct03004c/businesscenter/vent…

And here is the Web page this link displays.

webfountain result

Stepping Back

What have these three queries revealed?

  1. Despite the cratering of prices for storage devices, IBM does not maintain an archive of information about its older systems. The single hit for the string PC704 was to a book about storage. The string PC704 probably appears in this technical manual, but the system’s precision and recall disappointed me.
  2. The second query for text mining generated more than 3,000 hits. My inspection of the results suggested to me that IBM was indexing technical information. Some of the documents appeared to be as old as the PC704 that was not available in the index. The results provided no context for the bound phrase, and the results were to me delivering unsatisfactory precision. Recall was better than the single hit for PC704 however. To me, irrelevant hits are not much better than one hit.
  3. The third query for an IBM product called WebFountain generated hits to research reports, documentation, and a Web site about WebFountain. Unfortunately, the link was active but there were no data displayed on the Web page.

All in all, IBM’s Planetwide search is pretty lousy for me. Your mileage may vary, of course.

Read more

Management Views Search as a Side Issue

June 25, 2008

Dave Valiante’s “Enterprise Search a High Priority for Most Users, But Not for Companies” is an important essay. You can read it on the Wall Street Technology Web site here. The url is a tricky one: www.wallstreetandtech.com.

He reports on a study that says “many businesses [are] unaware of the importance of findability. His write up contains a number of interesting statistics from the report based on a survey of 500 business users. The AIIM study triggered a flurry of news items about user dissatisfaction with search, but Mr. Valiante’s essay digs a bit deeper into the results.

The one finding that jumped out at me was:

The survey states that most organizations do not have a strategic approach for enterprise search and shows that 49 percent of respondents have “no formal goal” for enterprise findability within their own organizations.

What a remarkable finding. With search an essential first step in performing work today, the idea that organizations have “no formal goal” is intriguing. Let’s assume that the finding is spot on. Half of the organizations surveyed view search and retrieval as a non-issue. If true, this explains why point solutions for customer support, litigation support, and business intelligence sell throughout an organization. Licensees are neither interested in systems already installed or, even more likely, indifferent to getting a system that meets very specific needs. Silos are not aberrations. Isolated systems, often containing content already processed by another system in the organization, are standard operating procedure.

No wonder an organization’s information technology department often shows little enthusiasm for a search or content processing system. With systems flowering, existing technical resources may be stretched to the limit. Another related thought I had, again assuming the finding is accurate, is that vendors have little incentive to change their marketing and sales strategies. A vendor can jump from market sector to market sector looking for customers who have a specific problem.

My research reveals user dissatisfaction with search and retrieval. The information in Mr. Valiante’s write up tells me that dissatisfaction is likely to be the norm in many organizations until management understanding matures. Agree? Disagree? Use the comment sections to share your views.

Stephen Arnold, June 25, 3008

SharePoint and Lotus Notes: Deeper and Wider Challenges

June 24, 2008

Oliver Marks’s “Microsoft Office SharePoint Server: A Next Generation of Deeper, Wider Content Silos?” stopped me in my tracks. You may want to read the complete essay here. Mr. Marks has done a nice piece of work. The seed from which this analysis germinated was a discussion at the Enterprise 2.0 conference during which Microsoft and IBM each demonstrated their respective products, SharePoint and Lotus Notes.

I have some lame duck experience with both systems, and I have to admit, I am not exactly sure what product category is appropriate for either product. Mr. Marks’s nailed the issue squarely in two of his observations.

First, with regard to IBM and Lotus Notes, he writes:

…It’s not too hard to see where that supertanker is sailing: over time enterprises whose backbone is Lotus Notes will eventually upgrade to Lotus Connections to take advantage of adequate collaboration capabilities.

Second, he observes:

The road ahead for SharePoint users is less clear. The partners and front end providers for Microsoft Office SharePoint Server (MOSS), which is built on top of Windows SharePoint Services (WSS) continue to build, with some excellent contextual products signing on…Partners are seeing an opportunity to create a view into otherwise impenetrable SharePoint silos.

I agree with his assessment that Microsoft has a number of “disparate products” and uniting them will be interesting to watch.

As I thought about his metaphors “deeper, wider content silos”, several thoughts swirled through my mind.

First, SharePoint and Lotus Notes are what I think of as software that can be dressed in a costume to assume a large number of guises. In the US government, Lotus Notes means email. True,there are “spaces” for shared documents, calendars, and collaboration tools. But email is the fuel that powers many of the agencies with which I am familiar. Lotus Notes is defined by its users. SharePoint can be a document manager, or as one consultant told me “a next-generation operating system for the enterprise”. I think the fuzzy boundaries are a clear indication that both IBM and Microsoft want a class of software that can be sold anywhere, anytime, to do anything. Fuzzy makes it difficult for competitors to pin down exactly what feature set is appropriate for a particular organization. It is like playing cards against a person who can change cards at will.

Second, both SharePoint and Notes create repositories and data stores that can be difficult to normalize, index, deduplicate, and index so users can find a specific document. I recall a situation in one company where a needed attachment was shared in a workspace. In this particular organization, the originator of the document worked in the unit that was anchored in Notes. Several colleagues were from a group relying on Microsoft Exchange. In the span of seven days between virtual meetings in a shared space, the attachment was copied, modified, emailed, and transferred across and within each of the environments. A query for the document produced an unusable list of “hits”. The only way to find the particular version needed by the group was to inspect each instance manually. Mr. Marks’ “deeper and wider” allusion evoked in my mind a flood of murky brown water in Cedar Rapids. What a problem for residents and what a mess the flood creates.

Finally, the problem of managing information within and across boundaries of polymorphic software systems like SharePoint and Lotus Notes is growing. Most users of these systems do the best they can to create documents and share them. The flood of digital information combined with users’ willingness to distribute copies forwarded hither and yon, make changes to attachments, and create local stores with unique file names is the reality in most organizations. The management tools provided with SharePoint and Lotus Notes have not kept pace with the data management challenge.

What’s my take? I think that both of these systems create time bombs for system administrators; specifically:

  • The cost of figuring out what is in these systems and then deducing like Sherlock Holmes what is what is hidden but sucking scarce resources
  • The administrative tasks necessary to index and make findable information in these systems is getting larger and more time consuming by the day. One question that concerns me is, “How do I know I have located each relevant document for this particular matter?” I just do not know what I have missed.
  • The legal vulnerability of organizations with these systems is ratcheting upwards. Email is a challenge. Who has seen what? Where is a particular document? What is the lineage or family tree of a document with an important change?

Agree that SharePoint and Lotus Notes are great as they are? Let me know. Do you believe that these polymorphic systems have some rough edges? Share your viewpoints in the comments section to this Web log.

Stephen Arnold, June 24, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta