Coveo: Pushing Beyond Search

April 26, 2008

I’ve been briefed on the Coveo technology. I also labor for cash for the owners of CRM Magazine. Nevertheless, I want to point out that the “Rising Star” Award underscores an interesting shift in search and retrieval. You can read about the award here.

Coveo, one of the companies known for its “snap in” solution to the woes of Microsoft SharePoint’s built in search system has been recognized for its customer relationship management services. CRM is the god father of the self-service customer support movement. The idea is that customers can help themselves solve problems if the customer can find the information. Coveo’s system does that well. On the flip side, the people manning the customer support toll free lines and digging through the email need technology to find answers as well. CRM Magazine’s award underscores Coveo’s ability to deliver on that front as well.

Coveo has been successful in moving “beyond search” with its interface and assisted-search interface. But the company has also won key accounts where vendors such as RightNow, Oracle, and others have long held sway. Coveo, based in frosty Québec City, Québec, continues to innovate despite the long winters and endless hockey season.

Stephen Arnold, April 25, 2008

Microsoft Chomps and Swallows Fast

April 26, 2008

It’s official. On April 24, 2008, Fast Search & Transfer became part of the Microsoft operation. You can read the details at Digital Trends here, the InfoWorld version here, or Examiner.com’s take here.

John Lervik, the Fast Search CEO, will become a corporate vice president at Microsoft. He will report to Jeff Teper, the corporate vice president for the Office Business Platform at Microsoft. The idea–based on my understanding of the set up–is that Dr. Lervik will develop a comprehensive group of search products and services. The offerings will involve Microsoft Search Sever 2008 Express, search for the Microsoft Office SharePoint Server 2007, and the Fast Enterprise Search Platform. Despite my age, I think the idea is to create a single enterprise search platform. Lucky licensees of Fast Search’s technology prior to the buy out will not be orphaned. Good news indeed, assuming the transition verbiage sets like hydrated lime, pozzolana, and aggregate. Some Roman concrete has been solid for two thousand years.

romanconcrete

This is an example of Roman concrete. The idea of “set in stone” means that change is difficult. Microsoft has some management procedures that resist change.

A Big Job

The job is going to be a complicated one for Microsoft’s and Fast Search’s wizards.

First, Microsoft has encouraged partners to develop search solutions for its operating system, servers, and applications. The effort has been wildly successful. For example, if you are one of the more than 80 million SharePoint users, you can use search solutions from specialists like Interse in Denmark to add zip to the metadata functions of SharePoint, dtSearch to deliver lightning-fast performance with a natural language procession option, Coveo for clustering and seamless integration. You can dial into SurfRay’s snap in replacement for the native SharePoint search. You can turn to the ISYS Search System which delivers fast performance, entity extraction, and other other “beyond search” features. In short, there are dozens of companies who have developed solutions to address some of the native search weaknesses in SharePoint. So, one job will be handling the increased competition as the Fast Search team digs in while keeping “certified gold partners” reasonably happy.

immortals

This is a ceramic rendering of two of the “10,000 Immortals”. The idea is that when one Immortal is killed, another one takes his place. Microsoft’s certified gold partners–if shut out of the lucrative SharePoint aftermarket for search–may fight to keep their customers like the “10,000 Immortals”. The competitors will just keep coming until Microsoft emerges victorious.

Read more

Federation: Big Need, Still a Challenge

April 25, 2008

In May 2001, I gave a talk at one of the first Web Search Universities. The audience was baffled by my talk, which I called “Vertical Search Engines: System-Initiated Information Retrieval”. I recall that no one knew what I was talking about. Sigh. Story of my life.

Organizational Reality

Here’s the core diagram from this talk:

silo

This is a clip art silo and it is a basic feature of the enterprise. This silo does not hold corn; it is a metaphor for the information technology department. IT operates in its own world or space. The engineers and computer wizards stick to themselves, use their own jargon, and occasionally snort at the antics of a 20-something in the marketing department.

Here’s another diagram from my 2001 lecture. This diagram shows a company as a collection of silos. I know that people in organizations are part of one big family, everyone is on the same team, and everyone is in the same fox hole. This all-too-common set up of a company appears below:

a company of silos

Each of these silos has its own information. Even in organizations with an effective IT infrastructure, there are nooks and crannies stuffed with digital information. It may be a laptop that a manager carries back and forth, a USB drive, or a Google Search Appliance tucked in a corner of the marketing department where “competitive intelligence” is kept for the use of the marketing mavens.

Read more

Mondeca: A Semantic Technology Company

April 25, 2008

Twice in the last two days I’ve been asked about Mondeca, based in Paris. If you are not familiar with the company, it has been involved in semantic content processing for almost a decade. The company describes itself in this way:

Mondeca provides software solutions that leverage semantics to help organizations obtain maximum return from their accumulated knowledge, content and software applications. Its solutions are used by publishing, media, industry, tourism, sustainable development and government customers worldwide.

The company made a splash in professional publishing with its work for some of the largest scientific, technical, legal, and business publishers. Its customers include Novartis, the Thomson Corporation, LexisNexis, and Strabon.

Mondeca makes a goodly amount of information available on its Web site. You can learn more about the company’s technology, solutions, and management team by working through the links on the Web site.

Indexing by the Book: Automatic Functions Plus Human Interaction

Semantic technology or semantic content analysis can carry different freights of meaning. My understanding is that Mondeca has been a purist when it comes to observing standards, enforcing the rules for well-formed taxonomies, and assembling internally consistent and user friendly controlled term lists. If you are not familiar with the specifics of a rigorous approach to controlled terms and taxonomies, take a look at this screech of Bodega’s subject matter expert interface. Be aware that I pulled this from my files, so the interface shipping today may differ from this approach. The principal features and functions will remain behind the digital drapery, however.My recollection is that this is the interface used by Wolters Kluwer for some of its legal content.

Interface

What is obvious to me is that Mondeca and a handful of other companies involved in semantic technology take an “old school” approach with no short cuts. Alas, some of the more jejune pundits in the controlled vocabulary and taxonomy game can sometimes be more relaxed. Without training in the fine art of thesauri, a quick glance makes it difficult for an observer to see the logical problems and inconsistencies in a thesaurus or taxonomy. However, after the user runs some queries that deliver more chaff than wheat, the quick-and-dirty approach is like one of those sugar-free and fat-free cookies. There’s just not enough substance to satisfy the user’s information craving.

Read more

Autonomy on Track to Break $400 Million in Revenue in 2008

April 24, 2008

Autonomy announced on April 24, 2008, its first quarter financial results. You can read the highlights and the details here.

The key part of the announement for me is that the company’s revenues for the first quarter of 2008 were $105.1 million, up 61 percent from $65.5 million for the first quarter of 2007. The turbo booster was strong organic growth and the contribution from ZANTAZ, the email and eDiscovery unit of Autonomy. In the first quarter of 2008, revenue from North America was $63.9 million, representing 61 percent of the company’s total revenues. Autonomy’s adjusted gross profits were $93.5 million, up 56 percent from $60.0 million in the first quarter of 2007. Gross margins were 89 percent in the first quarter of 2008. In the first quarter of 2007 gross margins were 92 percent. Gross profits for the first quarter of 2008 were $88.2 million, up 52 percent from $58.1 million in the first quarter of 2007. Gross margins for the first quarter of 2008 were 84 percent, compared to 89 percent in the first quarter of 2007. The company said that its gross margins decreased in the third quarter of 2007 following the acquisition of ZANTAZ in July 2007, but have increased as planned in each subsequent quarter as a result of the integration of ZANTAZ and the transition of the core ZANTAZ business to higher margin sales.

A financial analyst might niggle the company about its earnings. For me, the big news is that if my estimates of Google’s revenue from its enterprise division are on target, Autonomy may beat or exceed the estimated $400 million Google generated in calendar 2007 from its enterprise search and enterprise applications sales. Google, according to my sources, became the number one vendor of enterprise search and retrieval systems on the strength of its more than 8,500 installations of the Google Search Appliance, the Google enterprise geospatial business, and expanding uptake of its cloud-services for organizations.

One question is, “Can Autonomy surpass Google to regain the crown as the leader in enterprise search?” An equally intriguing question is, “Will Google’s strong growth continue and keep its lead as the number one vendor of enterprise search?”.

Stephen Arnold, April 25, 2008

Newspapers: Hastening Their Own Demise

April 24, 2008

I dreamed of Darwin. I think my semiconscious was mulling about survival and adaptation. The financial news from the newspaper publishing world was interesting. Losses at Gannett, McClatchy, and the New York Times suggest continued worsening of their financial weather. You can point and click through the remarkable financial picture by running this query on Google News.

To add insult to injury, Moody’s Investors Service, according to CNN.com, downgraded the New York Times Company’s senior unsecured ratings to ‘Baa3’ from ‘Baa1′ and its commercial paper rating to “Prime-3” from “Prime-2”. This is the difference between a premier league soccer team and a third-division squad playing for beer. The news story I read reported that Moody’s said the New York Times had a “stable” financial outlook. If the first quarter results are stable, I must not have a good grasp of how financial whiz kids think. (Please, read this story quickly. These CNN.com links disappear quickly.)

Enterprise search systems can ingest news and information from third-parties. Some news organizations sell live feeds directly into companies. The information is then indexed and made available to employees within the enterprise search system. Over the last few years, I’ve seen an increase in the use of news on Internet sites first as a supplement to commercial vendors’ news and now as a replacement in some organizations. Are commercial news vendors, newspapers, and legitimate commercial aggregators losing their grip in this important market?

I think newspapers are. It may be too soon to tell if outfits like the Associated Press or giant combines will be affected as well. The digitally adept may be able to deal with Darwinian forces. Others won’t be so fortunate.

Every few months I bump into an executive from a New York publishing company. Some of these titans of information work for media companies with newspapers; others labor within the multi-national combines that own professional publishing companies. A few ride the air currents rising from the burning piles of unsold books, magazines, peer-reviewed journals, and controlled-circulation publications.

Viewed as a group, the financial picture is clear. Consolidation is inevitable. I dropped my subscription to the Financial Times because I was getting three deliveries a week, not six. The FT’s hard copy distribution system was incapable of delivering the paper on a daily basis to my redoubt in rural Kentucky. No apologies and no explanations were forthcoming after three years of complaining to my elusive delivery person. My emails to the FT customer center went unheeded. At a trade show, a chipper Financial Times’s booth worker tried to give me a tan baseball cap with an embroidered “FT” logo. I returned the hat to the young person saying, “No, thanks. I have a Google cap and that is already broken in.”

Three Sources of “Real” News

I want to steer clear of the well-worn theme that Web logs provide an alternative to “real” journalism. The best Web logs from my point of view are those written by individuals who were or could have been cracker jack journalists. I worked at the Courier-Journal & Louisville Times in its salad days. I also worked for the fellow once described to me as “the most hated man in New York publishing,” the sharp-as-a-tack Bill Ziff. Mr. Ziff created three media conglomerates and sold each at the peak of their valuation. He would still be working his magic if age and illness had not side lined him. The best Web log writers could have found a home at either the CJ or at Ziff when these outfits were firing on all cylinders.

I want to take a look at three exemplary news services in a cursory way and then offer some observations about why the newspaper publishers who are losing money are probably going to continue losing money for the foreseeable future. If Rupert Murdoch’s legal eagles are reading this essay, calm down. I am not discussing News Corp., the Wall Street Journal, or the likely takeover of Newsday.

First, navigate to a site called Newsnow. I haven’t kept up with the company after speaking with executives a couple of years ago. The service provides a series of links to news grouped by categories. The center panel presents headlines and one sentence summaries of the major story. When I visited the site this morning (April 24, 2008), I had a tidy line up of items relating to the mortgage crisis affecting Europe. An important point is that even on my real lousy Verizon high-speed, use-it-anywhere wireless service–Newsnow loads quickly and is not annoying.

Newsnow

Read more

Microsoft Releases New Version of Desktop Search

April 24, 2008

Microsoft has available a “preview” version of its desktop search system for Windows XP and Vista at the company’s download center. (These links can go dead without warning, so get your copy now.)

In addition to bug fixes, the “preview”, according to Microsoft’s Web site:

… lets you perform an instant search of your computer. WS4 helps you find and preview documents, e-mail messages, music files, photos, and other items on the computer.

InfoWorld’s story about WS4 reports that performance has been improved.

Microsoft’s activity in search and retrieval has been increasing. The company is in the process of acquiring Fast Search & Transfer and Yahoo. Fast Search does not offer a desktop version of its system, but the company does have a robust Internet indexing capability now used by Yahoo.

Yahoo does not have a single search technology. The company uses technology from such vendors as InQuira, Fast Search & Transfer, plus the search systems that are used for the Flickr service. Yahoo acquired Stata Labs for its email search capability.

With a wealth of search technologies in its future, Microsoft will have many ways to solve its customers information access problems.

Stephen Arnold, April 24, 2008

Semantic Query for Microsoft Dynamics CRM

April 23, 2008

Semantra describes itself as a “developer of conversational analytics software”. The company announced that Semantra 2.0 for Microsoft Dynamics CRM. The system enables can retrieve specific information from back end databases.

A user can “turn critical questions into precise and actionable information by entering familiar business terms into a search box.”

Microsoft Dynamics is a work in progress for the Redmond giant. The company is working to create an online version of the CRM system to provide functions similar to those available from Salesforce.com.

Semantra offers semantic and NLP (natural language processing) content processing systems. The company has been among the first in the text processing sector to make BI (business intelligence) a key part of the company’s system.

More information about Semantra is available on the company’s Web site.

Stephen Arnold, April 23, 2008

Calais: Free Semantic Tagger

April 22, 2008

If you want to see how cloud-based software can perform rich metatagging, you will want to give the free Calais service a whirl. Navigate to the Calais Gallery and scroll down to the Capability Demonstrations and select the Calais Document Viewer. If you don’t see the link, click here.

Now cut a document and paste it into the window. The system will display this type of result:

calais_parser

The tags the ClearForest system automatically identifies are highlighted. The left-hand column of the display shows the types of tags identified; for example, city, company, person, etc. A single click opens a drop down list of what the system found. Worked well and it worked quickly with no “false drops” in the sample document. Performance showed some latency, but that’s not unusual with a cloud-based service and some fancy text crunching taking place on remote servers.

More about Calais

For now Calais is working to build a community to extend the Semantic Web. Without tools like Calais, the Semantic Web is likely to remain a great idea that failed because people don’t want to do tagging. When tagging is done, it’s lousy. I’m supposed to know how to index, and the tags for my Web log are pretty miserable. The reasons may be broader than just my own approach. First, indexing to be useful must use a body of terms that the average user can hit upon and remembered. So neologisms are out and weird jargon won’t work at all. Second, writing for a Web site or a Web log like this one is supposed to be disciplined, but it’s not. I have other research work that commands my primary attention. The Web log, while important, comes second, maybe third on some busy days. Finally, I’m not sure what I will write about. I react to information people send me in email, stories in my RSS reader, and comments made–often off the cuff–on a phone call. It is difficult for me to create a controlled term list because I’m not sure what the topics will be. Therefore, lousy tagging.

Calais asserts that its technologies can address my three failings and probably yours as well. You can download developer tools, upload content to Calais, or use the functions on the Calais Web site. Reuters-ClearForest has posted some useful documentation about Calais here. If your a bit nerdy, you can do some integration of Calais and your application. The best way to get a sense of what’s possible is to explore the sample applications on the Calais Web site.

More about ClearForest

ClearForest was founded by text mining guru Ronen Feldman. You can get the inside scoop on this wizard’s approach to squeezing information from text in his 2006 book, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (Cambridge University Press ISBN 13: 9780521836579).

The ClearForest technology performs “discovery”; that is, the system processes text and identifies important information. The company found a ready market wherever executives wanted to find the “hidden” information in text. I recall attending a presentation by Dr. Feldman in which he showed the ClearForest system processing auto warranty data in the written comments from customer support reps and owners who sent email about their vehicles.

The ClearForest system processed these comments and displayed important discoveries in easy-to-understand reports. One example concerned a flawed component that the ClearForest system pinpointed as one that was causing problems previously overlooked by the automobile manufacturer. The kicker to this example was that the manufacturer was able to make a change to the affected component and take pre emptive action to save significant amounts of warranty cost and avoid customer complaints.

The earlier versions of the ClearForest system made use of rules. Some of these required hand-tweaking or a ClearForest-adept programmer to set up, tune, and deploy. Over the years, ClearForest like other companies looking for ways to leverage “smart” software, more automation has been injected into the ClearForest system.

Read more

Text Processing Vendors as Shark Bait

April 22, 2008

I’m nervous around attorneys or, as the English say, solicitors. The Content Wrangler offers a useful discussion of why legal eagles are circling content processing and text mining vendors. Please, read the original post “Automated Intelligent Document Classification, Data Extraction and Search Tools for Legal Pros”.

The essay appears to be a contribution from A2iA. The tag paragraph for the essay says that A2iA “is the worldwide leading developer of natural handwriting recognition, Intelligent Word Recognition (IWR) and Intelligent Character Recognition (ICR) technologies and products for the payment, mail, document and forms processing markets”. Note: the company’s Web site plays music, so you may want to mute your speakers before navigating to the landing page.

The key point in the essay, from my point of view, is:

Any automated solution must be at least as accurate and error-free as the manual processes it replaces. As accuracy is equally important to successful discovery, due diligence, investigatory or redaction efforts, this is an important advantage of DocumentReaders [nb, A2iA’s product] automated solution.

A defendent facing a life sentence would probably agree that accuracy is sometimes helpful. On the other hand, some legal professionals may disagree under certain circumstances. Speed is also important. Overworked lawyers need ways to pack more billable hours into the work day.

Recommind, Stratify (now a unit of Iron Mountain), and other content processing vendors have found ways to tap into the rivers of money that flow through the mahogany corridors of major law firms. I’m open minded about A2iA. You will want to check out systems available from Brainware and ZyLAB, both companies who made the final 24 in the Beyond Search round up.

I did not include the company in Beyond Search, nor do I have an entry in my list of companies in the text and content processing business. I have added A2iA, however. If you want a useful run down of why counselors have a thirst for systems that can make sense of large quantities of text, tuck this discussion in your files.

My research suggests that legal matters generate so much information that even well-padded clients are reluctant to pay for real-live people to read, analyze, and annotate paper. Digital systems, therefore, are not the first choice, but the only choice due to quantity and costs.

Vendors peddling to law firms may want to check their shark cage. Flawed technology can bite back in interesting ways.

Stephen Arnold, April 22, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta