Navigation bar  
Search AIT

Only match whole words

Kroll Ontrack (A Marsh & McLennan Company)

An Interview with David Chaplin

David Chaplin of Kroll.

You may not be familiar with MarshMcLennan, Kroll, or David Chaplin. Let's tackle each in turn. MarshMcLennan is a diversified financial services firm. Its interests span a number of disciplines. The company is keenly interested in business information and business intelligence. Several years ago, MarschMcLennan acquired a company specializing in financial and business analysis. Most companies crunching numbers and reading journal articles are colorless. Kroll's experts struggled to keep pace with the rapidly increasing flow of digital information, took a logical step.

Kroll acquired Engenium in late 2006, one of the firms specializing is squeezing meaning from text. Engenium, at the time of the acquisition, was pushing "beyond search" with meaning-based search, clustering, and robust semantic analysis functions.

The founder of Engenium is the soft-spoken and quite engaging David Chaplin, now a vice president of advanced search technologies at Kroll Ontrack (a unit of Marsh & McLennan). I talked him into spending some time with me after explaining my goal of getting first-hand accounts about the changes taking place in search, content processing, and related fields.

We sat in the booming public area at the Boston Convention Center. Despite the Kafka-like setting, he was interested in sharing his ideas. I agreed to provide a transcript of the interview to him, so he could "coordinate" with the firm's attorneys.

The full text of the interview appears below:

Engenium was one of the first "value added content processing" companies to be acquired. What were you doing that came to the attention of the Marsh & McLennan companies?

When acquired, Engenium’s strategic technology was being used by Kroll Ontrack, a legal discovery and data recovery company in the Marsh & McLennan portfolio. Kroll Ontrack is the leader in electronic discovery, and as such, I think the company recognized the convergence of content management, document management, electronic discovery, document holds, litigation readiness and enterprise information access (the list could go on and on).

Engenium was, I think, a strategic purchased intended to address the growing demand for cost-effective methods to locate relevant information from large volumes of data in support of investigations, litigation and regulatory compliance matters.

You came on my radar in 2003. When did you start the company and what was the motivation for you to tackle this "value added text processing" sector?

Engenium was founded in 1998. Our first search product was released in late 2000. I was a manager at KPMG and had been involved in implementing search technologies such as PLS (Personal Library Software purchased by AOL) and Verity. What became clear to me through these experiences was that standard keyword based search was not an adequate approach to information access and retrieval.

I recall that you offered concept-based searching, simultaneous key word and parametric searching, and a number of other features that were quite a departure from the standard type a word in a search box and hit the Enter key. What was the engineering approach you took to deliver so many features well ahead of your competitors?

At the onset, Engenium had the advantage of being a small, private company without venture capital. Furthermore, the company began with a key group of self-motivated and self-directed individuals that broke most of the established rules on how to create products.

Each of us had a major stake in the company and were empowered to make decisions and move quickly. We did not hesitate to get software into people’s hands early, so that we could garner feedback speedily.

I recall that we began with conceptual search only, and due to positive responses, we immediately added keyword and parametric capabilities.

What do you mean by parametric?

Oh, values, numbers--the type of query that you once could do with a SQL query. We were among the first to offer both key word and the structured, parametric search in a single interface.

Then what happened?

This move proved to be a very beneficial because it forced us to be an information access company and not a conceptual search company. Our philosophy is to create the best tool for getting the job done well.

When I saw your résumé project the first time, I recalled an early attempt, maybe Resumix or something like that. Yours seemed appropriate to more sophisticated operations related to personal backgrounds? Was your focus HR or did you want to create a more general purpose tool?

Our software has always been agnostic, but certain market verticals were and still are more aggressive in recognizing the need for and implementing cutting edge software.

We obtained traction with applicant tracking vendors and staffing companies very early on due to our ability to process an entire job requisition as a query. This eliminated the need for a recruiter to form a “good query.”

While this helped us gain clout with HR personnel, this capability exists regardless of the application. Companies such as Vurv, ResumeMirror and Virtual Edge were just quicker to see the value.

Can you give me a snapshot of product line up today?

Our Search Results Clustering release is the first product we've developed that does not require indexing the information, and as such, will be able to enhance search results from any search engine. If the desired search result is not on page one of the results we will bring all the results onto page one and provide a well organized and labeled folder structure to navigate to the best result.

At the end of the day, we have two basic products: the query based conceptual keyword and parametric search and non-query based automatic information clustering.

I thought you offered per-megabyte pricing. Did that work for you? What's the pricing model for the Engenium-based products at Kroll?

Quite a few of our customers offer per-megabyte pricing, but we do not use this pricing model.

We offer a standard per server perpetual licensing, subscription licensing and capacity-based licensing. Historically, we have been embedded in other vendors' products or services, so we like to be able to license in a way that doesn't conflict with their pricing models.

Your background included some work in real-time news processing. I think you also worked in litigation support. How did these experiences influence your idea for Engenium, now the Kroll Ontrack service?

My real-life experience with KPMG have been instrumental in the success of Engenium. Implementing search and supporting the information retrieval needs of the KPMG community really impacted my thinking.

I found the existing technologies did not bring intelligence to the information retrieval process. Because I implemented a real-time news solution at KPMG and eventually went to work for the service we used. The real-time processing and information distribution experience further highlighted the need for better information processing.

In real estate the three most important factors are location, location, and location.

In information access, the three most important factors are relevance, relevance, and relevance.

Getting the best information to the right people without the need for a query language is the goal, and I believe we've achieved that.

What innovations have you included in the most recent version of Engenium / Ontrack?

We believe that improving the way people yield search results is critical to the development of better information access. As such, we have spent an entire development cycle on our conceptual search and automatic clustering products.

Specifically, we focused on the necessities of search and improving results as well as search and indexing speed improvements. Ultimately, these changes help provide better ways to search within and interact with results after and initial query and implementing additional text analytics to once again improve search results.

Are you supporting other vendors' systems or are you a stand-alone solution?

Oh, no. We offer the usual integration tools. We want to make it easy for our customers to tap our functions.

Let me give you an example. We are slated to release our new Search Results Clustering product and will soon after release a Microsoft SharePoint Search Results Clustering extension.

As a business, the release of this new product is an important innovation in that we bring our expertise in conceptual search and automatic clustering and now provide a product that can enhance existing search implementations. I believe that this product will enhance the search experience users have when utilizing systems that are based on traditional, non-intelligent, keyword search. At the end of the day, we strive to never lose sight of the fact that it is always about being faster, more efficient and more accurate.

With Autonomy's buy out of Verity, the outright failure of Entopia, and the Microsoft - Fast deal, what's your view of the volatility in the content processing sector?

I don’t believe that the volatility will decrease. I do believe there are not very many big moves to be made right now. I believe there are some big guys out there who want to make a move in this space.

An underlying factor is that I do not believe corporate America believes that they are getting what they need from search and they are finding an increasing number of employees go to the Internet first before even checking their internal systems.

Search - Information Access done right still has a big upside, so I do think there will be some movement.

So, are you saying there is still opportunities in search and content processing?

Yes. Despite the consolidation, there's room for innovation.

Google has been working to embed some advanced processes in its system; for example, there's Ramanathan Guha's Programmable Search Engine, which is a semantic system and method. But semantics have been slow to catch on. What do you see happening going forward with semantic technology?

Semantics will play a large role in the future of search. It is inconceivable to think that search and information access can advance without taking context into account. Words and how they are used matter, and a document is much more than a bag of words.

What do you mean "bag of words"?

That's a short hand way to refer to a key word index. String matching is an important function, but as I learned at KPMG almost a decade ago, key word searching is not enough.

Are there other issues?

Lots of them. Let me give you an example. Term mismatch must be dealt with. Because two people will use the same term to describe something less than 20 percent of the time, great search results are dependent on analyzing context.

Likewise, in most cases relevance is in the eye of the beholder, so it naturally makes sense that the search systems will need to take into account the user and his/her need to manipulate relevancy algorithms or get customized search results.

What's on the agenda in the future for the Engenium / Kroll product line?

We will be releasing a new product this month that clusters the search results from any search engine. To create this product, we applied our expertise in conceptual search and automatic information clustering into a product that makes existing search more valuable and productive.

Search Results Clustering will be our third product. It takes the value of information clustering and applies it to search results, providing labeled folders with similar documents organized in a way that allows you to navigate a hit list without paging through 200 results.

The number of new companies entering the search and content processing "space" is increasing. What's your view on too many hungry mouths and too few chocolate chip cookies?

I'm guessing this will be like any other industry....some will make it and some won't. There can be a real advantage to entering an established market with a new idea or approach. However, making it work and getting traction is not easy.

The market is growing, and, at this point, I would say enough mouths are not satisfied yet with what they're being fed to give some start-ups a chance.

The VC aspect actually makes it tougher though because it can take away nimbleness and require results in a time period that thwarts innovation.

There are lists of trends in search and content processing. Will you go out on a limb and identify the three of four major trends in search and content processing that you see in the last half of 2008 and in the first half of 2009?

I'll jump around a bit on this one and hopefully not be saying what everyone else says. I believe that there will continue to be the clash of the titans with respect to enterprise search, enterprise information access or search and information infrastructure level.

The Microsoft purchase of FAST shows their commitment, Autonomy, IBM etc....are all vying for the ubiquitous unifying search across all information.

I believe that you will continue to see a bifurcation in that one size fits all search is not always very satisfying, which creates a growing demand for search derivative solutions as well as improved search within critical business systems. Many times search needs to be made more useable and effective within the critical business systems. This allows for niche and specialized search vendors to provide important value and improvement to information access in a knowledge economy. It really comes down to the right tool for the job.

I believe you will see a continued move toward improving the way one interacts with search and search results. Examples include: more transparency to relevancy algorithms, more intelligence in understanding the query, visualization of results, visualization of the corpus of information available to you pre query...etc.

The final trend will be the continued and growing recognition of conceptual search as the necessary and evolutionary advancement of traditional keyword / Boolean search.

ArnoldIT Comment

The Kroll Ontrack line up of products are among the most robust in the business intelligence sector. A licensee can use these for eDiscovery, litigation support, fraud detection, and other specialized functions. Kroll, like other companies in the business intelligence business, keeps a low profile. You can learn more about the company by exploring the links on the firm's "About the Company" Web page here. Kroll positions itself as "the world's leading risk consulting company". You can contact the company by navigating to this form and providing the requested information. Unlike some of the search vendors who rely on Madison Avenue to generate business, Kroll gets work via referral.

Stephen E. Arnold, April 28, 2008

    Navigation bar  
ArnoldIT Home Articles Speeches Services Features About ArnoldIT Articles Speeches Services Features About ArnoldIT ArnoldIT Home