Zoomii: Interesting Interactive Interface

June 23, 2008

A person who remembers old-fashioned bookstores but prefers the Amazon-type experience may want to look at Zoomii Books here. The service allows you to enter a word or phrase or explore a visual representation of a bookstore’s shelves. The notion of browsing in a book store is useful. The service runs on Amazon’s EC2 and S3 services. If you want to learn more about the company, write talk at zoomii.com. This is worth a look. I can think of applications in enterprise search where this approach would be appropriate and helpful. Too 20-something or useful to a person in assisted living? What do you think?

Stephen Arnold, June 23, 2008

Microsoft: New Management Line Up

June 23, 2008

InfoWorld, a print publication that went all digital, contains a useful round up of Microsoft’s current senior management team. You can read the essay “Microsoft’s Post-Gates Management Team” here. The pop ups and quirky search engine make it difficult to locate some material on the InfoWorld Web site, so click this link while it still fresh (June 20, 2008).

The important point for me in Elizabeth Montalbano’s story is that no one is identified as having responsibility for search, text analysis, and content processing. I find this strange because search is the killer application for anyone working with information today.

My thoughts on this strange omission are:

  1. I am not smart enough to understand that search is the obvious responsibility of one of these senior managers, possibly Stephen Elop, president of the Microsoft Business Division.
  2. Microsoft believes that other areas are more important than search, assuming that those in lower management ranks will be able to deal with Google’s dominance of this application space
  3. Microsoft is confident that the new Live.com initiatives and the enterprise Web part for SharePoint are the key steps needed to catch up with Google and then leap frog over it.

If you have other thoughts on the “owner” of search at Microsoft, let me know via the comments section.

Stephen Arnold, June 23, 2008

Update 1, June 23, 2008 6 30 pm Eastern: A reader provided us with this link. John Lervik is a corporate vice president, Microsoft Enterprise Search Group, in the Microsoft Business Division. Microsoft information page is here. A happy quack to the reader who took the time to provide this item.

Google: Friction More Powerful than Google

June 23, 2008

Om Malik’s GigaOM flagged a vulnerability that I had overlooked–friction. His essay “Delayed: Android aka Google Phone” is a summary of a Wall Street Journal story, but he pushes beyond the WSJ with this statement:

…Whimsical wishes of carriers, endless customization, software delays and of course, executive reshuffling–these are facts of life for mobile start-ups. Welcome to the club, Google.

Please, click here and read his take on the Google Android delay.

What struck me as important was that in my listing of Google’s vulnerabilities in my The Google Legacy (2005) and Google Version 2.0 (2007), I overlooked friction. GigaOM makes it clear that environmental factors such as bureaucracy and work procedures can slow most companies, including Google.

After thinking about this point, I want to suggest that as Google grows larger, the friction the company faces will go up. For competitors, evidence of “friction” hampering Google is good news. Kudos for GigaOM for making this point clear to me.

Stephen Arnold, June 23, 2008

Update 1, June 23, 2008 6 45 pm Eastern: CNet has published “Google Andoid Success: I’ll Believe It When I See It” by Don Reisinger. Mr. Reisinger identifies four issues with the mobile initiative. Worth your reading time.Wall Street Journal

Update 2, June 24, 2008 Ars Technica reports that Google Android is on track. Jacqui Chang’s “Google Says Android Still on Schedule” is here. Google’s quick reaction to the seed  story has been fast and forceful. One wonders how Google could have missed the opportunity to provide a clearer signal to the Wall Street Journal before the negative Android story broke. Could Google’s PR mechanism be part of the problem?

Text Analytics ROI Case: Intuit

June 23, 2008

Case studies with teeth are useful. Jeff Kelly, news editor for SearchDataManagement.com provides information about Intuit’s use of text analytics. His essay “Calculating Text Analytics ROI: Start Small and Focus on Customer Data” is here. The most interesting point for me was this statement:

Companies seeking to establish the ROI of a text analytics project should start their deployment with a specific, targeted set of unstructured data to prove the business case.

Those involved in search and text processing often try to swing for the fences with a procurement. The requirements include every feature and function the team can identify. The result is a mess. The common sense approach is to define a problem, figure out the content needed to give users what’s needed to make a decision, and deploy a system that boils an egg, not boils the ocean.

Mr. Kelly’s essay makes this obvious point clearly. Return on investment is easier to figure out when the project is bounded and narrowed intentionally.

Stephen Arnold, June 23, 2008

Text Analytics; Search Fractures Identified

June 23, 2008

A quite interesting essay by Frank Diana popped into my news reader. Mr. Diana’s essay “The 4th Annual Text Analytics Summit” is here. The most interesting part of his summary of the conference is this list:

  • Fraud detection
  • Voice of the Employee, Customer, Community and Market
  • Patient Safety, Drug Discovery, Clinical Analysis
  • Law Enforcement, Intelligence Analysis
  • Litigation Support / eDiscovery
  • SOX Compliance, Corporate Governance
  • Investment Analysis
  • Marketing Campaign Analysis, Advertising Analysis
  • Claims Analysis, Warranty Analysis
  • Product Innovation
  • Reputation Management
  • Intelligent Messaging.

Mr. Diana suggests that these are areas in which text analytics will play a role. I agree, but I would like to offer several observations.

Each of these niches require tailored components to address the specific information requirements of each market. Some functions will be common such as entity extraction and email threading. Other requirements will be highly particularized such as those for law enforcement, financial applications, and health and medical applications.

The list underscores why “one size fits all search and content processing systems” continue to disappoint many users. In an organization, each user needs a particular type of information under different circumstances. Some of these may be predictable because a work process requires that an employee have access to information about a customer’s history with the company. Other information needs may be unpredictable, so interfaces and access methods have to be tailored to meet these needs.

In theory, an information platform can be customized to meet the needs of a group of users or a single user. In reality, the cost and complexity of building personalization from a common framework may tax an organization’s resources. To economize, a “good enough” system is provided. Users find the system disappointing. This situation triggers more spending and creates an inefficient information environment.

What this list suggests is that the type of laser focus that some vendors are bringing to specific markets may be the key to success in the highly competitive text processing market. Agree or disagree? Let me know via the comments section of this Web log.

Stephen Arnold, June 23, 2008

Connectors: A Big Deal for Enterprise Search

June 22, 2008

In my travels on June 17 to June 20, 2008, I participated in three conversations about connectors. A connector is a program that converts a source document in one format to some other format. The idea is that a connector makes it possible for a content processing system to ingest content with a minimum of computational hassle.

For some reason, connectors are the topic of the moment in the circles in which I move. In 2005, I discovered a userful white paper on the Web site of Persistent. I had to dig the link out in order to send it to the people with whom I met last week, and I thought some readers of Beyond Search might find it useful as well.

The Persistent document is “Unified Connectors for Enterprise Search Softwares.” You can download the document here.  An individual is not identified as the author.  I found the write up useful.

Persistent is a firm providing software consulting, engineering, and outsourcing. Persistent makes additional information available at its Web site here.

Stephen Arnold, June 22, 2008

Connectors: Rounding Up Some Definitions

June 22, 2008

I received an email this morning (June 22, 2008). The writer asked, “Are connectors the same as filters?” As I walked the world’s most wonderful dog, I considered this question. This short essay is a summary of my thoughts. If you have other concepts and definitions to add, please, use the comments section to share them with me and the three other readers of this Web log. Ooops. There may be four readers. I sent a link to my father and he often looks at what I write.

Connectors

Let us look at what Google provides as a definition. Enter the query “define:connectors” and the Google returns nine definitions. A quick scan of the links and the text snippets provides a useful starting point; specifically:

A growing collection of libraries that abstract the interfaces of specific hardware or enterprise integration methods. (More here)

Google offers a number of related phrases to assist me, but none of these seem to relate to enterprise search, content processing, or text analytics.

800px-Pyrite_Fools_Gold_Macro_1

Is this gold or fool’s gold? Without a formal method for testing, even experience rock hounds may not know what the substance is. Software can deliver valuable functions or deliver a lower value operation.

File Conversion

Language is tricky, and business English is a slippery type of language. I know, for example, that some companies provide file conversion tools. So, what is the meaning of file conversion. For a definition, I turn to one of the vendors offering file conversion software. I enter the query “stellent file conversion” into Google and get a pointer to Stellent’s Dynamic Conversion Process“. This is a function that takes content from a Web page or other source and makes it Web viewable.

I recalled licensing software under the name “Outside In”, and my recollection was that Stellent bought this company and continued to sell the product. My recollection is that Stellent’s software components could take a file in one format such as XyWrite III+ and convert it to Microsoft Rich Text Format. Few people today need to convert XyWrite files, but the US House of Representatives still has some XyWrite files kicking around even today I heard.

I consulted my digital archive and located this explanation of the software. I am going to paraphrase the description to pull out the key point: The technology allows developers the ability to “view, filter and convert more than 225 file formats without using native applications.

After a bit of poking around I located a description of Outside In on the Oracle Web site here. Oracle purchased Stellent, and you can license the software from Oracle. The most recent version of the Outside In software performs a number of functions important to file conversion, which seems to be the main thrust of Oracle’s description of the Outside In technology; specifically Oracle says:

  • Clean Content—Identifies and scrubs risky hidden data from Microsoft Office documents
  • Content Access—Extracts text and metadata from more than 400 file types
  • File ID—Quickly and accurately identifies file types
  • HTML Export—Converts files into HTML rendering embedded graphics as a GIF, JPEG, or PNG
  • Image Export—Converts files into TIFF, JPEG, BMP, GIF, or PNG images
  • PDF Export—Converts files into PDF without native applications or 3rd party libraries
  • Search Export—Converts files into one of four formats designed specifically for search
  • Viewer—Renders high-fidelity views of files and allows printing, copy/paste, and annotations
  • XML Export—Converts and normalizes files into XML that defines properties, content, and structure

Oracle’s checklist provides a good round up of the bits and pieces that comprise file conversion functions. It seems that we have a definition of sorts. Note: Oracle provides a useful 2007 white paper to help you navigate through the sub concepts embedded in the Outside In system here.

File conversion–A software that performs a number of separate operations to change a file in one format to another format. The purpose of file conversion is to eliminate the need to open a native applications such as XyWrite to export a file in a different format.

But what about information in a database like IBM’s DB2, Microsoft’s SQL Server, and Oracle’s database? Well, these file types are widely used in organizations, and it is easy for a database administrator to export a relational database as a comma separated value file or in what is called the CSV format. Also, may systems can “read” database files or database reports. But I have heard that these features do not work on certain types of information stored in a database; for example, the database contains row and column headings that are not plain English or the cells in the database are filled with numerical strings that are codes.

One work around is to write a report, query the database, save the answer to the query as HTML or XML and then process those HTML or XML files as individual documents. But that seems like a great deal of work. What happens to those cryptic row and column headings? What does the report do to make the values in the cells understandable to a human.

We don’t need file conversion. We need another process? What is it called?

Read more

Text Analytics Summit Summary Sparks UIMA Thoughts

June 22, 2008

Seth Grimes posted a useful series of links about the Text Analytics Summit, held in Boston the week of June 16, 2008. You can read his take on the conference here. I was not at the conference. I was on the other side of the country at the Gilbane shin dig. To make up for my non attendance, I have been reading about the summit.

From what I can deduce from the Web log posts, the conference attracted the Babe Ruths and Ty Cobbs of text analysis, a market that nestles between enterprise search and business intelligence. I am not too certain about the boundaries of either of these markets, but text analytics is polymorphic and can appear searchy or business intelligency depending upon the context.

I clicked through the links Mr. Grimes provides, and I recommend that you spend a few finites with each of the presentations. I learned a great deal. Please, review his short essay.

One point stuck in my mind. The purpose of this essay is to call your attention to this comment and offer several observations about its implications for those who want to move beyond key word retrieval. Keep in mind that I am offering my opinion.

Here’s the comment. Mr. Grimes writes:

I’ll conclude with one disappointing surprise on the technical front, that UIMA — the Unstructured Information Management Architecture, an integration framework created by IBM and released several years ago as open source to the Apache — has not been more broadly accepted. IBM software architect Thomas Hampp spoke about his company’s use of the framework in the OmniFind Analytics edition, but Technology Panel participants said that their companies — Attensity (David Bean), Business Objects (Claire Thomas), Clarabridge (Justin Langseth), Jodange (Larry Levy), and SPSS (Olivier Jouve) — simply do not perceive user demand for the interoperability that UIMA can offer.

My understanding of this statement and the supporting evidence in the form of high profile industry executives is that an open standard developed by IBM has little, if any, market traction. In short, if the UIMA standard were gasoline, your automobile would not run or just sputter along.

Let us assume that this lack of UIMA demand is accurate. Now I know this is a big assumption, and I am confident that an IBM wizard will tell me that I am wrong. Nevertheless, I want to follow this assumption in the next part of the essay.

Possible Causes

[Please, keep in mind that I am offering my opinion in a free Web log. If you have not read the editorial policy for this Web log, click on the About link on any page of Beyond Search. Some readers forget that I am using this Web log as a journal and a container for the information that does not appear in my for fee reports and my paid writings such as my monthly column in KMWorld. Some folks are reading my musings and ignoring or forgetting what I am trying to capture for myself in these posts. Check out the disclaimer here.]

What might be causing the lack of interest in UIMA, which as you know is an open source framework to allow different software gizmos to talk to one another? For a more precise definition UIMA, you can give the IBM search engine a whirl or click this Wikipedia link, http://en.wikipedia.org/wiki/UIMA.

Here is my short list of the causes for the UIMA excitement void. I am not annoyed with IBM. I own IBM servers, but I want to pick up Mr. Grimes’ s statement and perform a thought experiment. If this type of writing troubles you, please, click away from Beyond Search. Also, I am reacting to a comment about IBM, but I want to use IBM as an example of any large company’s standards or open source initiative.

First, IBM is IBM. IBM has an obligation to its shareholders to deliver growth. Therefore, IBM’s promulgating a standard is in some way large or small a way to sell IBM products and services. Maybe potential UIMA users are not interested in the potential upsell that may follow.

Second, open source and standards have proven to be incredibly useful. Maybe IBM nees to put more effort into educating partners, vendors, and customers about UIMA? Maybe IBM has invested in UIMA and found that marketing did not produce the expected results, so IBM has moved on.

Third, maybe today IBM lacks clout in the search and content processing sector. In 1960, IBM could dictate what was hot and what was not. UIMA’s underwhelming penetration might be evidence that the IBM of today lacks the moxie the company enjoyed almost a half century ago.

And one fourth possibility is that no one really wants to embrace UIMA. Enterprise software is not a level playing field. The vendor wants to own the customer, locking out any other vendor who might suck dollars from the company owning a customer. IBM and other enterprise vendors want to build walls, not create open doors.

I have several other thoughts on my list, but these four provide insight into my preliminary thinking.

Observations

Now let’s consider the implications of these four points, assuming, of course, that I am correct.

  1. Big companies and standards do not blend as well as a peanut butter and jelly sandwich. The two ingredients may not yet be fully in harmony. Big companies want money and open standards do not have the revenue to risk ratio that makes financial officers comfortable.
  2. Open source is hard to control. Vendors and buyers want control. Vendors want to control the technology. Buyers want to control risk. Open source may reduce the vendor’s control over a system and buyers lose control over the risk a particular open source system introduces into an enterprise.
  3. Open source appeals to those willing to break with traditional information technology behavior. IBM, despite its sporty standards garb, is a traditional vendor selfing traditional solutions. Open source is making headway, but it is most successful when youthful blood flows through the enterprise. Maybe UIMA needs more time for the old cows to leave the stock pen?

What is your view? Is your organization ready to embrace UIMA, big company standards, and open source? Agree? Disagree? Let me know.

Stephen Arnold, June 22, 2008

Microsoft Fast: A 45-Day Innovation Cycle Yields a Web Part

June 21, 2008

Microsoft announced Fast ESP Search Web Parts for SharePoint. You can read the full announcement here. Microsoft said, “Using these Web Parts and Site Template SharePoint administrators will be able to quickly and easily build FAST ESP-based search sites inside SharePoint 2007 by simply dropping in and configuring the appropriate components.”You can download the Web parts from www.codeplex.com/espwebparts.

The announcement says, “Some of the FAST ESP search capabilities that can be exposed within SharePoint Server 2007 using these Web Parts include:

  • Search Box Web Part — Search box for query term submission and includes “did you mean” functionality for query correction
  • Result List Web Part — Displays search results and supports sorting, pagination, and navigator-based filtering
  • Navigator Web Part — Displays dynamic navigators that profile search results across a set of pre-defined dimensions and allow users to refine the search through navigation clicks
  • Breadcrumb Web Part — Displays the search term(s) and list of navigators used to obtain the current result set.”

Can you mom integrate Fast ESP with SharePoint? Probably not. Will more innovations flow from the Microsoft and Fast ESP teams? Almost certainly.

Microsoft will have to kick its enterprise search activity up a notch. SharePoint is popular but it is creating information access challenges that Microsoft Certified Gold partners stand ready to remediate. In fact, for eace of use, assisted navigation, and better scaling, Microsoft Fast has to leap frog Coveo, Exalead, ISYS Search Software, and other firms with snap in solutions. Most of these outfits are more nimble than Microsoft. So winning in search does not mean killing Google. Microsoft Fast must swat annoying up and coming vendors who are quick and clever, a challenging combination for a firm that releases a mostly pre-existing Web part in a month and a half.

Stephen Arnold, June 21, 2008

When Search News Isn’t News, It’s Disinformation

June 21, 2008

One of my short essays triggered a number of anonymous emails (the best kind), nasty phone calls (an interesting diversion for me) and Web log comments of varying clarity. The flash point item is here.

I have another project to complete, but this reaction to my short news item about Autonomy winning a search contract from a library in Lyon, France, kept nagging at me. As far as I can tell, Autonomy issued a news release about a renewed license agreement, not a new win.

I want to step back. I am not interested in Autonomy’s deal in Lyon. I am interested in the broader topic of what is new in enterprise search. In a conference call this afternoon, one of the people on the call asked me, “What did you learn from the 18 interviews in the Search Wizards Speak series?”

The answer I gave was, “There were only three of four innovations that struck me as new.”

Let’s consider my comment. I talked with 18 “search wizards” over a period of four months. I can identify only three or four new developments.”

I know that the companies for whom these wizards work have generated news releases. Some pump out publicity every two weeks. Others punch the PR button six or seven times a year. What are these companies announcing that I don’t consider news.

Here is my short list:

  1. New software versions. This is an item of interest, but it is not going to be picked up by Computerworld.
  2. Added features or functions. A popular innovation is “social”. The idea is fuzzy but seems to mean that a system user can add index tags or attach a note for any other person who accesses a document or report.
  3. Deal wins. The vendor lands a contract and issues a news release saying, “We won this big deal.” Shareholders and competitors have more interest in these than I do.
  4. New hires. This is legitimate news, but the enterprise search industry lacks a Wall Street Journal to gather the executive changes which light the marketing fires at Booz, Allen & Hamilton and McKinsey & Company. These firms write a new hire and say, “Congratulations. We can help you be successful.” Competitors and insurance sales people salivate over such announcements.

What happens when this type of news is diluted with multiple releases of the same information? What is the impact of pumped up version announcements which contain only bug fixes and a couple of add ons? What is the cumulative effect of repeated executive hiring announcements?

My thought is that enterprise search vendors are engaging in disinformation. I don’t think the consumers of the disinformation are potential purchasers, stakeholders, or employees of the company issuing the release.

Nope.

The folks who gobble up enterprise search information are the executives at other enterprise search companies. The search industry, which is under seige by customers and companies offering higher value solutions, is talking to itself.

I grew up in a small town. Information circulated quickly and was chock full of gossip, half truths, and insinuations. The intelligence was parochial; that is, the small town’s thought processes were honed for baloney processing.

When hard data from the “outside world” arrived, few knew how to interpret or put it in its approrpiate context. I think the enterprise search sector is close to becoming the equivallent of a dead end town on the edge of the prairie about 150 miles from a city with a million people.

Enterprise search vendors’ efforts to make sales, build buzz, differentiate themselves, and puff up their achievements are the equivalent of a digital peacock spreading its tail feathers. Other peacocks notices but no one else knows what the heck the squawks and flash mean.

Enterprise search is drifting close to disinformation. The marketing is filled with metaphors and homiletic assurances. The news is often not news; it is the peacock squawk and tail shaking. The licensees are wising up. Users of enterprise search systems are grousing. IT departments are unable to deal with some of the search systems because they are too complex. Options are now available.

What’s the fix? I don’t think there is a magic wand that can address disinformation. Customers will decide. Vendors may be too busy news releasing to one another to notice that the buyers have licensed enterprise applications with search baked in or settled for a plug-and-play solution. When the search vendors’ conversation lapses, their world may have changed without their noticing.

Stephen Arnold, June 21, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta