Exclusive Interview with Kathleen Dahlgren, Cognition Technologies

February 18, 2009

Cognition Technologies’ Kathleen Dahlgren spoke with Harry Collier about her firm’s search and content processing system. Cognition’s core technology, Cognition’s Semantic NLPTM, is the outgrowth of ideas and development work which began over 23 years ago at IBM where Cognition’s founder and CTO, Kathleen Dahlgren, Ph.D., led a research team to create the first prototype of a “natural language understanding system.” In 1990, Dr. Dahlgren left IBM and formed a new company called Intelligent Text Processing (ITP). ITP applied for and won an innovative research grant with the Small Business Administration. This funding enabled the company to develop a commercial prototype of what would become Cognition’s Semantic NLP. That work won a Small Business Innovation Research (SBIR) award for excellence in 1995. In 1998, ITP was awarded a patent on a component of the technology.

Dr. Dahlgren is one of the featured speakers at the Boston Search Engine Meeting. This conference is the world’s leading venue for substantive discussions about search, content processing, and semantic technology. Attendees have an opportunity to hear talks by recognized leaders in information retrieval and then speak with these individuals, ask questions, and engage in conversations with other attendees. You can get more information about the Boston Search Engine Meeting here.

The full text of Mr. Collier’s interview with Dr. Dahlgren, conducted on February 13, 2009, appears below:

Will you describe briefly your company and its search / content processing technology?
CognitionSearch uses linguistic science to analyze language and provide meaning-based search.  Cognition has built the largest semantic map of English with morphology (word stems such as catch-caught, baby-babies, communication, intercommunication), word senses (strike meaning hit, strike a state of baseball, etc.), synonymy (“strike” meaning hit, “beat” meaning hit, etc.), hyponymy (“vehicle”-“motor vehicle”-“car”-“Ford”), meaning contexts (“strike” means game state in the context of “baseball”) and phrases (“bok-choy”).  .  The semantic map enables CognitionSearch to unravel the meaning of text and queries, with the result that  search performs with over 90% precision and 90% recall.

What are the three major challenges you see in search / content processing in 2009?

That’s a good question. The three challenges in my opinion are:

  1. Too much irrelevant material retrieved – poor precision
  2. Too much relevant material missed – poor recall
  3. Getting users to adopt new ways of searching that are available with advanced search technologies.  NLP semantic search offers users the opportunity to state longer queries in plain English and get results, but they are currently used to keywords, so there will be an adaptation required of them to take advantage of the new advanced technology.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

Poor precision and poor recall are due to the use of pattern-matching and statistical search software.  As long as meaning is not recovered, the current search engines will produce mostly irrelevant material.  Statistics on popularity boost many of the relevant  results to the top, but as a measure across all retrievals, precision is under 30%.  Poor recall means that sometimes there are no relevant hits, even though there may be many hits.  This is because the alternative ways of expressing the user’s intended meaning in the query are not understood by the search engine.  If they add synonyms without first determining meaning, recall can improve, but at the expense of extremely poor precision.  This is because all the synonyms of an ambiguous word in all of its meanings, are used as search terms.    Most of these are off target.  While the ambiguous words in a language are relatively few, they are among the most frequent words.  For example, the seventeen thousand most frequent words of English tend to be ambiguous.

What is your approach to problem solving in search and content processing?

Cognition focuses on improving search by improving the underlying software and making it mimic human linguistic reasoning in many respects.  CognitionSearch first determines the meanings of words in context and then searches on the particular meanings of search terms, their synonyms (also disambiguated) and hyponyms (more specific word meanings in a concept hierarchy or ontology).  For example, given a search for “mental disease in kids”  CognitionSearch first determines that “mental disease” is a phrase, and synonymous with an ontological node, and that “kids” has stem “kid”, and that it means “human child” not a type of “goat”.  It then finds document with sentences having “mental-dsease” or “OCD” or “obsessive compulsive disorder” or “schizophrenia”, etc. and “kid” (meaning human child) or “child” (meaning human child) or “young person” or “toddler”, etc.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar?

Natural language processing systems have been notoriously challenged by scalability.  Recent massive upgrades in computer power have now made NLP a possibility in web search.  CognitionSearch has sub-second response time and is fully distributed to as many processors as desired for both indexing and search.  Distribution is one solution to scalability.  Another CognitionSearch implements is to compile all reasoning into the index, so that any delays caused by reasoning are not experienced by the end user.

Google has disrupted certain enterprise search markets with its appliance solution. The Google brand creates the idea in the minds of some procurement teams and purchasing agents that Google is the only or preferred search solution. What can a vendor do to adapt to this Google effect? Is Google a significant player in enterprise search, or is Google a minor player?

Google’s search appliance highlights the weakness of popularity-based searching.  On the web, with Google’s vast history of searches, popularity is effective in positioning  the more desired sites at the top the relevance rank.  Inside the enterprise, popularity is ineffective and Google performs as a plain pattern-matcher.  Competitive vendors need to explain this to clients, and even show them with head-to-head comparisons of search with Google and search with their software on the same data.   Google brand allegiance is a barrier to sales in enterprise search.

Information governance is gaining importance. Search / content processing is becoming part of eDiscovery or internal audit procedures. What’s your view of the the role of search / content processing technology in these specialized sectors?

Intelligent search in eDiscovery can dig up the “smoking gun” of violations within an organization.  For example, in the recent mortgage crisis, buyers were lent money without proper proof of income.  Terms for this were “stated income only”, “liar loan”, “no-doc loan”, “low-documentation loan”.  In eDiscovery, intelligent search such as CognitionSearch would find all mentions of that concept, regardless of the way it was expressed in documents and email.  Full exhaustiveness in search empowers lawyers analyzing discovery documents to find absolutely everything that is relevant or responsive.  Likewise, intelligent search empowers corporate oversight personnel, and corporate staff in general, to find the desired information without being inundated with irrelevant hits (retrievals).  Dedicated systems for eDiscovery and corporate search  need only house the indices, not the original documents.  It should be possible to host a company-wide secure Web site for internal search at low cost.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

Semantics and the semantic web have attracted a great deal of interest lately.  One type of semantic search involves tagging of documents and Web sites, and relating them to each other in a hierarchy expressed in the tags.  This type of semantic search enables taggers to perfectly control reasoning with respect to the various documents or sites, but is labor-intensive.   Another type of semantic search is runs on free text, is fully automatic, and uses semantically-based software to automatically characterize the meaning of documents and sites, as with CognitionSearch.

Mobile search is emerging as an important branch of search / content processing. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

Mobile search heightens the need for improved precision, because the devices don’t have space to display millions of results, most of which are irrelevant.

Where can I find more information about your products, services, and research?

http://www.cognition.com

Harry Collier, Infonortics, Ltd., February 18, 2009

Interview with Janus Boye: New Search Tutorial

February 18, 2009

In the last three years, Janus Boye, Managing Director of JBoye in Denmark, has been gaining influence as a conference organizer. In 2009, Mr. Boye is expanding to the United Kingdom and the United States. Kenny Toth, ArnoldIT.com, spoke with Mr. Boye on February 17, 2009. The full text of the interview appears below.

Why are you sponsoring a tutorial with the two enterprise search experts, Martin White and Stephen Arnold?

Personally I’m very fascinated with search as it is one of complex challenges of the web that remains essentially unsolved. For a while I’ve wanted to create a seminar on search that would cover technology, implementation and management to really assist our many community of practice members that on a regular basis tells me that search is broken. Some of them have invested heavily in software and found that even the most expensive software products does not solve successful search alone. Some of them have also found their vendor go either bankrupt or become acquired. Beyond vendors, many members have underestimated the planning required to make search work. Martin White and Stephen Arnold have recently published a new report on Successful Enterprise Search Management, which the seminar is modeled after.

image

Janus Boye. http://www.jboye.com

What will the attendees learn in the tutorial?

My goal is that at the end of the seminar, attendees will understand the business and management issues that impact a successful implementation. The attendees will learn about how the marketplace is shifting, what skills you need in your team, what can go wrong and how you avoid it, and how you get the most out of your consultants and vendors.

Isn’t search a stale subject? What will be new and unusual about this tutorial.

Search is far from a stable subject. If you are among those that use SharePoint every day, you know that search still have a long way to go. Come to the seminar and learn about the larger trends driving the market as well as recent developments, such as the Microsoft FAST roadmap

Will these be lectures or will there be interactivity between the experts and the audience?

The agenda for the seminar is done so that there will be plenty of room for interactivity. The idea is that delegates can get answers to their burning questions. There will be room for Q & A, and some sessions are also divided into sub-groups so that delegates can discuss their challenges in smaller groups.

If I attend, what will be the three or four takeaways from this show?

There will be several takeaways at the seminar, in particular around themes such as content, procurement, implementation, security, social search, language and the vendor marketplace.

Where is the tutorial and what are the details?

The tutorial will be held in London, UK. See http://www.jboye.co.uk/events/workshop-successful-enterprise-search-management-q209/ for more.

Kenny Toth, February 18, 2009

Exclusive Interview with David Milward, CTO, Linguamatics

February 16, 2009

Stephen Arnold and Harry Collier interviewed David Milward,the chief technical officer of Linguamatics, on February 12, 2009. Mr. Milward will be one of the featured speakers at the April 2009 Boston Search Engine Meeting. You will find minimal search “fluff” at this important conference. The focus is upon search, information retrieval, and content processing. You will find no trade show booths staffed, no multi-track programs that distract, and no search engine optimization sessions. The Boston Search Engine Meeting is focused on substance from informed experts. More information about the premier search conference is here. Register now.

The full text of the interview with David Milward appears below:

Will you describe briefly your company and its search / content processing technology?

Linguamatics’ goal is to enable our customers to obtain intelligent answers from text – not just lists of documents.  We’ve developed agile natural language processing (NLP)-based technology that supports meaning-based querying of very large datasets. Results are delivered as relevant, structured facts and relationships about entities, concepts and sentiment.
Linguamatics’ main focus is solving knowledge discovery problems faced by pharma/biotech organizations. Decision-makers need answers to a diverse range of questions from text, both published literature and in-house sources. Our I2E semantic knowledge discovery platform effectively treats that unstructured and semi-structured text as a structured, context-specific database they can query to enable decision support.

Linguamatics was founded in 2001, is headquartered in Cambridge, UK with US operations in Boston, MA. The company is privately owned, profitable and growing, with I2E deployed at most top-10 pharmaceutical companies.

splash page

What are the three major challenges you see in search / content processing in 2009?

The obvious challenges I see include:

  • The ability to query across diverse high volume data sources, integrating external literature with in-house content. The latter content may be stored in collaborative environments such as SharePoint, and in a variety of formats including Word and PDF, as well as semi-structured XML.
  • The need for easy and affordable access to comprehensive content such as scientific publications, and being able to plug content into a single interface.
  • The demand by smaller companies for hosted solutions.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

People have traditionally been able to do simple querying across multiple data sources, but there has been an integration challenge in combining different data formats, and typically the rich structure of the text or document has been lost when moving between formats.

Publishers have tended to develop their own tools to support access to their proprietary data. There is now much more recognition of the need for flexibility to apply best of breed text mining to all available content.

Potential users were reluctant to trust hosted services when queries are business- sensitive. However, hosting is becoming more common, and a considerable amount of external search is already happening using Google and, in the case of life science researchers, PubMed.

What is your approach to problem solving in search and content processing?

Our approach encompasses all of the above. We want to bring the power of NLP-based text mining to users across the enterprise – not just the information specialists.  As such we’re bridging the divide between domain-specific, curated databases and search, by providing querying in context. You can query diverse unstructured and semi-structured content sources, and plug in terminologies and ontologies to give the context. The results of a query are not just documents, but structured relationships which can be used for further data mining and analysis.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar?

Our customers want scalability across the board – both in terms of the size of the document repositories that can be queried and also appropriate querying performance.  The hardware does need to be compatible with the task.  However, our software is designed to give valuable results even on relatively small machines.

People can have an insatiable demand for finding answers to questions – and we typically find that customers quickly want to scale to more documents, harder questions, and more users. So any text mining platform needs to be both flexible and scalable to support evolving discovery needs and maintain performance.  In terms of performance, raw CPU speed is sometimes less of an issue than network bandwidth especially at peak times in global organizations.

Information governance is gaining importance. Search / content processing is becoming part of eDiscovery or internal audit procedures. What’s your view of the the role of search / content processing technology in these specialized sectors?

Implementing a proactive e-Discovery capability rather than reacting to issues when they arrive is becoming a strategy to minimize potential legal costs. The forensic abilities of text mining are highly applicable to this area and have an increasing role to play in both eDiscovery and auditing. In particular, the ability to search for meaning and to detect even weak signals connecting information from different sources, along with provenance, is key.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

Organizations are still challenged to maximize the value of what is already known – both in internal documents or in published literature, on blogs, and so on.  Even in global companies, text mining is not yet seen as a standard capability, though search engines are ubiquitous. This is changing and I expect text mining to be increasingly regarded as best practice for a wide range of decision support tasks. We also see increasing requirements for text mining to become more embedded in employees’ workflows, including integration with collaboration tools.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

Customers recognize the value of linking entities and concepts via semantic identifiers. There’s effectively a semantic engine at the heart of I2E and so semantic knowledge discovery is core to what we do.  I2E is also often used for data-driven discovery of synonyms, and association of these with appropriate concept identifiers.

In the life science domain commonly used identifiers such as gene ids already exist.  However, a more comprehensive identification of all types of entities and relationships via semantic web style URIs could still be very valuable.

Where can I find more information about your products, services, and research?

Please contact Susan LeBeau (susan.lebeau@linguamatics.com, tel: +1 774 571 1117) and visit www.linguamatics.com.

Stephen Arnold (ArnoldIT.com) and Harry Collier (Infonortics, Ltd.), February 16, 2009

Arnold Interviewed in Content Matters

February 12, 2009

A feathering preening item. Barry Graubart, who works for the hot Alacra and edits the Content Matters Web log, interviewed Stephen E. Arnold, on February 11, 2009. The full text of the interview appears here. I read what I said and found it coherent, a radical change from most of my previous interview work for KDKA in Pittsburgh and a gig with a Charlotte 50,000 watt station. For me, the most interesting comment in the column was Mr. Graubert’s unexpected editorial opinion. Mr. Graubart graciously described me as the “author of the influential Beyond Search blog.” A happy quack for Barry Graubert.

Stephen Arnold, February 12, 2009

Francisco Corella, Pomcor, an Exclusive Interview

February 11, 2009

Another speaker on the program at Infonortics’ Boston Search Engine Meeting agreed to be interviewed by Harry Collier, the founder of the premier search and content processing event. Francisco Corella is one of the senior managers of Pomcor. The company’s Noflail search system leverages open source and Yahoo’s BOSS (build your own search system). Navigate to the Infonortics.com Web site and sign up for the conference today. In Boston, you can meet Mr. Corella and other innovators in information retrieval.

The full text of the interview appears below:

Will you describe briefly your company and its search technology?

Pomcor is dedicated to Web technology innovation.  In the area of search we have created Noflail Search, a search interface that runs on the Flex platform.  Search results are currently obtained from the Yahoo BOSS API, but this may change in the future.   Noflail Search helps the user solve tough search problems by prefetching the results of related queries, and supporting the simultaneous browsing of the result sets of multiple queries.  It sounds complicated, but new users find the interface familiar and comfortable from the start.  Noflail Search also lets users save useful queries—yes, queries, not results.  This is akin to bookmarking the queries, but a lot more practical.

What are the three major challenges you see in search / content processing in 2009?

First challenge: what I call the indexable unit problem.  A Web page is often not the desired indexable unit.  If you want to cook sardines with triple sec (after reading Thurber) and issue a query [sardines “triple sec”] you will find pages that have a recipe with sardines and a recipe with triple sec.  If there is a page with a recipe that uses both sardines and triple sec, it may be buried too deep for you to find.  In this case the desired indexable unit is the recipe, not the page.  Other indexable units: articles in a catalog, messages in an email archive, blog entries, news.  There are ad-hoc solutions for blog entries and news, but no general-purpose solutions.

Second challenge: what I call the deep API problem.  Several search engines offer public Web APIs that enable search mashups.  Yahoo, in particular, encourages developers to reorder search results and merge results from different sources.  But no search API provides more than the first 1000 results from any result set, and you cannot reorder a set if you only have a tiny subset of its elements.  What’s needed is a deep API that lets you build your own index from crawler raw data or by combining multiple sources.

Third challenge: incorporate semantic technology into mainstream search engines.

With search processing decades old, what have been the principal  barriers to resolving these challenges in the past?

The three challenges have not been resolved for different reasons. Indexable units require a new standard to specify the units within a page, and a restructuring of the search engines; hence a lot of inertia stands in the way of a solution.  The need for a deep API is new and not widely recognized yet.  And semantics are inherently difficult.

What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Noflail Search is a substantial improvement on the traditional search interface.  Nothing more, nothing less.  It may be surprising that such an improvement is coming now, after search engines have been in existence for so many years.  Part of the reason for this may be that Google has a quasi-monopoly in Web search, and monopolies tend to stifle innovation.  Our innovations are a direct result of the appearance of public Web APIs, which lower the barrier to entry and foster innovation.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

The crisis may have both negative and positive effects on search innovation.  Financial pressure causes consolidation, which reduces innovation.  But the urge to reduce cost could also lead to the development of an ecosystem where different players solve different pieces of the search puzzle.  Some could specialize in crawler software, some in index construction, some in user interface improvements, some in various aspects of semantics, some in various vertical markets.

A technogical ecosystem materialized in the 80’s for the PC industry, and resulted in amazing cost reduction.  Will this happen again for search?  Today we are seeing mixed signals.  We see reasons for hope in the emergence of many alternative search engines, and the release by Microsoft of Live Search API 2.0 with support for revenue sharing. On the other hand, Amazon recently dropped Alexa, and Yahoo is now changing the rules of the game for Yahoo BOSS, reneging on its promise of free API access with revenue sharing.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar? Is performance a non issue?

Noflail Search is computationally demanding.  When the user issues a query, Noflail Search precomputes the result sets of up to seven related queries in addition to the result set of the original query, and prefetches the first page of each result set.  If the query has no results (which may easily happen in a search restricted to a particular Web site), it determines the most specific subqueries (queries with fewer terms) that do produce results; this requires traversing the entire subgraph of subqueries with zero results and its boundary, computing the results set of each node.  All this is perfectly feasible and actually takes very little real time.

How do we do it? 

Since Noflail Search is built on the Flex platform, the code runs on the Flash plug-in in the user’s computer and obtains
search results directly from the Yahoo Boss API.  Furthermore, the code exploits the inherent parallelism of any Web API.  Related queries are all run simultaneously.  And the algorithm for traversing the zero-result subgraph is carefully designed to maximize concurrency.

Yahoo, however, has just announced that they will be charging fees for API queries instead of sharing ad revenue.  If we continue to use Yahoo BOSS, it may not be econonmically feasible to prefecth the results of related queries or analyze zero results as we do now. Thus, although performance is a non-issue technically, demands of computational power have financial implications.

As you look forward, what are some new features / issues that you think will become more important in 2009?

Obviously we think that the new user interface features in Noflail Search are important and hope they’ll become widely used in 2009.  We have of course filed patent applications on the new features, but we are very willing to license the inventions to others. As for a breakthrough over the next 36 months, as a consumer of search, I very much hope that the indexable unit problem will be solved.  This would increase search accuracy and make life easier for everybody.

Where can I find more information about your products, services, and research?

Noflail Search is available at http://noflail.com/, and white papers on the new features can be found in the Search Technology page (http://www.pomcor.com/search_technology.html) of the Pomcor Web site http://www.pomcor.com/).

Harry Collier, Infonortics Ltd., February 11, 2009

Semantic Engines Dmitri Soubbotin Exclusive Interview

February 10, 2009

Semantics are booming. Daily I get spam from the trophy generation touting the latest and greatest in semantic technology. A couple of eager folks are organizing a semantic publishing system and gearing up for a semantic conference. I think these efforts are admirable, but I think that the trophy crowd confuses public relations with programming on occasion. Not Dmitri Soubbotin, one of the senior managers at Semantic Engines. Harry Collier and I were able to get the low-profile wizard to sit down and talk with us. Mr. Soubbotin’s interview with Harry Collier (Infonortics Ltd.) and me appears below.

Please, keep in mind that Dmitri Soubbotin is one of world class search, content processing, and semantic technologies experts who will be speaking at the April 2009 Boston Search Engine Meeting. Unlike fan-club conferences or SEO programs designed for marketers, the Boston Search Engine Meeting tackles substantive subjects in an informed way. The opportunity to talk with Mr. Soubbotin or any other speaker at this event is a worthwhile experience. The interview with Mr. Soubbotin makes clear the approach that the conference committee for the Boston Search Engine Meeting. Substance, not marketing hyperbole is the focus for the two day program. For more information and to register, click here.

Now the interview:

Will you describe briefly your company and its search / content
processing technology?

Semantic Engines is mostly known for its search engine SenseBot (www.sensebot.net). The idea of it is to provide search results for a user’s query in the form of a multi-document summary of the most relevant Web sources, presented in a coherent order. Through text mining, the engine attempts to understand what the Web pages are about and extract key phrases to create a summary.

So instead of giving a collection of links to the user, we serve an answer in the form of a summary of multiple sources. For many informational queries, this obviates the need to drill down into individual sources and saves the user a lot of time. If the user still needs more detail, or likes a particular source, he may navigate to it right from the context of the summary.

Strictly speaking, this is going beyond information search and retrieval – to information synthesis. We believe that search engines can do a better service to the users by synthesizing informative answers, essays, reviews, etc., rather than just pointing to Web sites. This idea is part of our patent filing.

Other things that we do are Web services for B2B that extract semantic concepts from texts, generate text summaries from unstructured content, etc. We also have a new product for bloggers and publishers called LinkSensor. It performs in-text content discovery to engage the user in exploring more of the content through suggested relevant links.

What are the three major challenges you see in search / content processing in 2009?

There are many challenges. Let me highlight three that I think are interesting:

First,  Relevance: Users spend too much time searching and not always finding. The first page of results presumably contains the most relevant sources. But unless search engines really understand the query and the user intent, we cannot be sure that the user is satisfied. Matching words of the query to words on Web pages is far from an ideal solution.

Second, Volume: The number of results matching a user’s query may be well beyond human capacity to review them. Naturally, the majority of searchers never venture beyond the first page of results – exploring the next page is often seen as not worth the effort. That means that a truly relevant and useful piece of content that happens to be number 11 on the list may become effectively invisible to the user.

Third, Shallow content: Search engines use a formula to calculate page rank. SEO techniques allow a site to improve its ranking through the use of keywords, often propagating a rather shallow site up on the list. The user may not know if the site is really worth exploring until he clicks on its link.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

Not understanding the intent of the user’s query and matching words syntactically rather than by their sense – these are the key barriers preventing from serving more relevant results. NLP and text mining techniques can be employed to understand the query and the Web pages content, and come up with an acceptable answer for the user. Analyzing
Web page content on-the-fly can also help in distinguishing whether a page has value for the user or not.
Of course, the infrastructure requirements would be higher when semantic analysis is used, raising the cost of serving search results. This may have been another barrier to broader use of semantics by
major search engines.

What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Smarter, more intelligent software. We use text mining to parse Web pages and pull out the most representative text extracts of them, relevant to the query. We drop the sources that are shallow on content, no matter how high they were ranked by other search engines. We then order the text extracts to create a summary that ideally serves as a useful answer to the user’s query. This type of result is a good fit for an informational query, where the user’s goal is to
understand a concept or event, or to get an overview of a topic. The closer together are the source documents (e.g., in a vertical space), the higher the quality of the summary.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated
into enterprise applications?

More and more, people expect to have the same features and user interface when they search at work as they get from home. The underlying difference is that behind the firewall the repositories and taxonomies are controlled, as opposed to the outside world. On one hand, it makes it easier for a search application within the enterprise as it narrows its focus and the accuracy of search can get higher. On the other hand, additional features and expertise would be required compared to the Web search. In general, I think the opportunities in the enterprise are growing for standalone search
providers with unique value propositions.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

I think the use of semantics and intelligent processing of content will become more ubiquitous in 2009 and further. For years, it has been making its way from academia to “alternative” search engines, occasionally showing up in the mainstream. I think we are going to see much higher adoption of semantics by major search engines, first of all Google. Things have definitely been in the works, showing as small improvements here and there, but I expect a critical mass of
experimenting to accumulate and overflow into standard features at some point. This will be a tremendous shift in the way search is perceived by users and implemented by search engines. The impact on the SEO techniques that are primarily keyword-based will be huge as well. Not sure whether this will happen in 2009, but certainly within
the next 36 months.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

I expect to see higher proliferation of Semantic Web and linked data. Currently, the applications in this field mostly go after the content that is inherently structured although hidden within the text – contacts, names, dates. I would be interested to see more integration of linked data apps with text mining tools that can understand unstructured content. This would allow automated processing of large volumes of unstructured content, making it semantic web-ready.

Where can we find more information about your products, services, and research?

Our main sites are www.sensebot.net and www.semanticengines.com. LinkSensor, our tool for bloggers/publishers is at www.linksensor.com. A more detailed explanation of our approach with examples can be found in the following article:
http://www.altsearchengines.com/2008/Q7/22/alternative-search-results/.

Stephen Arnold (Harrod’s Creek, Kentucky) and Harry Collier (Tetbury, Glou.), February 10, 2009

Daniel Tunkelang: Co-Founder of Endeca Interviewed

February 9, 2009

As other search conferences gasp for the fresh air of enervating speakers, Harry Collier’s Boston Search Engine Conference (more information is here) has landed another thought-leader speaker. Daniel Tunkelang is one of the founders of Endeca. After the implosion of Convera and the buys out of Fast Search and Verity, Endeca is one of the two flagship vendors of search, content processing, and information management systems recognized by most information technology professionals. Dr. Tunkelang writes an informative Web log The Noisy Channel here.

image

Dr. Daniel Tunkelang. Source: http://www.cs.cmu.edu/~quixote/dt.jpg

You can get a sense of Dr. Tunkelang’s views in this exclusive interview conducted by Stephen Arnold with the assistance of Harry Collier, Managing Director, Infonortics Ltd.. If you want to hear and meet Dr. Tunkelang, attend the Boston Search Engine Meeting, which is focused on search and information retrieval. The Boston Search Engine Meeting is the show you may want to consider attending. All beef, no filler.

image

The speakers, like Dr. Tunkelang, will challenge you to think about the nature of information and the ways to deal with substantive issues, not antimacassars slapped on a problem. We interviewed Mr. Tunkelang on February 5, 2009. The full text of this interview appears below.

Tell us a bit about yourself and about Endeca.

I’m the Chief Scientist and a co-founder of Endeca, a leading enterprise search vendor. We are the largest organically grown company in our space (no preservatives or acquisitions!), and we have been recognized by industry analysts as a market and technology leader. Our hundreds of clients include household names in retail (Wal*Mart, Home Depot); manufacturing and distribution (Boeing, IBM); media and publishing (LexisNexis, World Book), financial services (ABN AMRO, Bank of America), and government (Defense Intelligence Agency, National Cancer Institute).

My own background: I was an undergraduate at MIT, double majoring in math and computer science, and I completed a PhD at CMU, where I worked on information visualization. Before joining Endeca’s founding team, I worked at the IBM T. J. Watson Research Center and AT&T Bell Labs.

What differentiates Endeca from the field of search and content processing vendors?

In web search, we type a query in a search box and expect to find the information we need in the top handful of results. In enterprise search, this approach too often breaks down. There are a variety of reasons for this breakdown, but the main one is that enterprise information needs are less amenable to the “wisdom of crowds” approach at the heart of PageRank and related approaches used for web search. As a consequence, we must get away from treating the search engine as a mind reader, and instead promote bi-directional communication so that users can effectively articulate their information needs and the system can satisfy them. The approach is known in the academic literature as human computer information retrieval (HCIR).

Endeca implements an HCIR approach by combining a set-oriented retrieval with user interaction to create an interactive dialogue, offering next steps or refinements to help guide users to the results most relevant for their unique needs. An Endeca-powered application responds to a query with not just relevant results, but with an overview of the user’s current context and an organized set of options for incremental exploration.

What do you see as the three major challenges facing search and content processing in 2009 and beyond?

There are so many challenges! But let me pick my top three:

Social Search. While the word “social” is overused as a buzzword, it is true that content is becoming increasingly social in nature, both on the consumer web and in the enterprise. In particular, there is much appeal in the idea that people will tag content within the enterprise and benefit from each other’s tagging. The reality of social search, however, has not lived up to the vision. In order for social search to succeed, enterprise workers need to supply their proprietary knowledge in a process that is not only as painless as possible, but demonstrates the return on investment. We believe that our work at Endeca, on bootstrapping knowledge bases, can help bring about effective social search in the enterprise.

Federation.  As much as an enterprise may value its internal content, much of the content that its workers need resides outside the enterprise. An effective enterprise search tool needs to facilitate users’ access to all of these content sources while preserving value and context of each. But federation raises its own challenges, since every repository offers different levels of access to its contents. For federation to succeed, information repositories will need to offer more meaningful access than returning the top few results for a search query.

Search is not a zero-sum game. Web search engines in general–and Google in particular–have promoted a view of search that is heavily adversarial, thus encouraging a multi-billion dollar industry of companies and consultants trying to manipulate result ranking. This arms race between search engines and SEO consultants is an incredible waste of energy for both sides, and distracts us from building better technology to help people find information.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search and content processing?

There’s no question that information technology purchase decisions will face stricter scrutiny. But, to quote Rahm Emmanuel, “Never let a serious crisis go to waste…it’s an opportunity to do things you couldn’t do before.” Stricter scrutiny is a good thing; it means that search technology will be held accountable for the value it delivers to the enterprise. There will, no doubt, be an increasing pressure to cut costs, from price pressure on vendor to substituting automated techniques for human labor. But that is how it should be: vendors have to justify their value proposition. The difference in today’s climate is that the spotlight shines more intensely on this process.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated into enterprise applications? If yes, how will this shift affect the companies providing stand alone search / content processing solutions? If no, what do you see the role of standalone search / content processing applications becoming?

Better search is a requirement for many enterprise applications–not just BI and Call Centers, but also e-commerce, product lifecycle management, CRM, and content management.  The level of search in these applications is only going to increase, and at some point it just isn’t possible for workers to productively use information without access to effective search tools.

For stand-alone vendors like Endeca, interoperability is key. At Endeca, we are continually expanding our connectivity to enterprise systems: more connectors, leveraging data services, etc.  We are also innovating in the area of building configurable applications, which let businesses quickly deploy the right set features for their users.  Our diverse customer base has driven us to support the diversity of their information needs, e.g., customer support representatives have very different requirements from those of online shoppers. Most importantly, everyone benefits from tools that offer an opportunity to meaningfully interact with information, rather than being subjected to a big list of results that they can only page through.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?

Yes!  Each acquisition changes the dynamics in the market, both creating opportunities and shutting doors at the same time.  For SharePoint customers who want to keep the number of vendors they work with to a minimum, the acquisition of FAST gives them a better starting point over Microsoft Search Server.  For FAST customers who aren’t using SharePoint, I can only speculate as to what is in store for them.

For other vendors in the marketplace, the options are:

  • Get aligned with (or acquired by) one of the big vendors and get more tightly tied into a platform stack like FAST;
  • Carve out a position in a specific segment, like we’re seeing with Autonomy and e-Discovery, or
  • Be agnostic, and serve a number of different platforms and users like Endeca or Google do.  In this group, you’ll see some cases where functionality is king, and some cases where pricing is more important, but there will be plenty of opportunities here to thrive.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar? Is performance a non issue?

Performance is absolutely a consideration, even for systems that make efficient use of hardware resources. And it’s not just about using CPU for run-time query processing: the increasing size of data collections has pushed on memory requirements; data enrichment increases the expectations and resource requirements for indexing; and richer capabilities for query refinement and data visualization present their own performance demands.

Multicore computing is the new shape of Moore’s Law: this is a fundamental consequence of the need to manage power consumption on today’s processors, which contain billions of transistors. Hence, older search systems that were not designed to exploit data parallelism during query evaluation will not scale up as hardware advances.

While tasks like content extraction, enrichment, and indexing lend themselves well to today’s distributed computing approaches, the query side of the problem is more difficult–especially in modern interfaces that incorporate faceted search, group-bys, joins, numeric aggregations, et cetera. Much of the research literature on query parallelism from the database community addresses structured, relational data, and most parallel database work has targeted distributed memory models, so existing techniques must be adapted to handle the problems of search.

Google has disrupted certain enterprise search markets with its appliance solution. The Google brand creates the idea in the minds of some procurement teams and purchasing agents that Google is the only or preferred search solution. What can a vendor do to adapt to this Google effect? Is Google a significant player in enterprise search, or is Google a minor player?

I think it is a mistake for the higher-end search vendors to dismiss Google as a minor player in the enterprise. Google’s appliance solution may be functionally deficient, but Google’s brand is formidable, as is its position of the appliance as a simple, low-cost solution. Moreover, if buyers do not understand the differences among vendor offerings, they may well be inclined to decide based on the price tag–particularly in a cost-conscious economy. It is thus more incumbent than ever on vendors to be open about what their technology can do, as well as to build a credible case for buyers to compare total cost of ownership.

Mobile search is emerging as an important branch of search / content processing. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

A number of folks have noted that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant. I’m delighted to see designers striving to improve the signal-to-noise ratio in information seeking applications.

Still, I think we can take the idea much further. More efficient or ergonomic use of real estate boils down to stripping extraneous content–a good idea, but hardly novel, and making sites vertically oriented (i.e., no horizontal scrolling) is still a cosmetic change. The more interesting question is how to determine what information is best to present in the limited space–-that is the key to optimizing interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice. Ultimately, we need to reconsider the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.

Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

Semantic search means different things to different people, but broadly falls into two categories: Using linguistic and statistical approaches to derive meaning from unstructured text, using semantic web approaches to represent meaning in content and query structure. Endeca embraces both of these aspects of semantic search.

From early on, we have developed an extensible framework for enriching content through linguistic and statistical information extraction. We have developed some groundbreaking tools ourselves, but have achieved even better results by combining other vendor’s document analysis tools with our unique ability to improve their results through corpus analysis.

The growing prevalence of structured data (e.g., RDF) with well-formed ontologies (e.g., OWL) is very valuable to Endeca, since our flexible data model is ideal for incorporating heterogeneous, semi-structured content. We have done this in major applications for the financial industry, media/publishing, and the federal government.

It is also important that semantic search is not just about the data. In the popular conception of semantic search, the computer is wholly responsible derives meaning from the unstructured input. Endeca’s philosophy, as per the HCIR vision, is that humans determine meaning, and that our job is to give them clues using all of the structure we can provide.

Where can I find more information about your products, services, and research?

Endeca’s web site is http://endeca.com/. I also encourage you to read my blog, The Noisy Channel (http://thenoisychannel.com/), where I share my ideas (as do a number of other people!) on improving the way that people interact with information.

Stephen Arnold, February 9, 2009

Lexalytics’ Jeff Caitlin on Sentiment and Semantics

February 3, 2009

Editor’s Note: Lexalytics is one of the companies that is closely identified with analyzing text for sentiment. When a flow of email contains a negative message, Lexalytics’ system can flag that email. In addition, the company can generate data that provides insight into how people “feel” about a company or product. I am simplifying, of course. Sentiment analysis has emerged as a key content processing function, and like other language-centric tasks, the methods are of increasing interest.

Jeff Caitlin will speak at what has emerged as the “must attend” search and content processing conference in 2009. The Infonortics’ Boston Search Engine meeting features speakers who have an impact on sophisticated search, information processing, and text analytics. Other conferences respond to public relations; the Infonortics’ conference emphasizes substance.

If you want to attend, keep in mind that attendance at the Boston Search Engine Meeting is limited. To get more information about the program, visit the Infonortics Ltd. Web site at www.infonortics.com or click here.

The exclusive interview with Jeff Caitlin took place on February 2, 2009. Here is the text of the interview conducted by Harry Collier, managing director of Infonortics and the individual who created this content-centric conference more than a decade ago. Beyond Search has articles about Lexalytics here and here.

Will you describe briefly your company and its search / content processing technology?

Lexalytics is a Text Analytics company that is best known for our ability to measure the sentiment or tone of content. We plug in on the content processing side of the house, and take unstructured content and extract interesting and useful metadata that applications like Search Engines can use to improve the search experience. The types of metadata typically extracted include: Entities, Concepts, Sentiment, Summaries and Relationships (Person to Company for example).

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

The simple fact that machines aren’t smart like people and don’t actually “understand” the content it is processing… or at least it hasn’t to date. The new generation of text processing systems have advanced grammatic parsers that are allowing us to tackle some of the nasty problems that have stymied us in the past. One such example is Anaphora resolution, sometimes referred to as “pronominal preference”, which is a bunch of big confusing sounding words to explain the understanding of “pronouns”. If you took the sentence, “John Smith is a great guy, so great that he’s my kids godfather and one of the nicest people I’ve ever met.” For people this is a pretty simple sentence to parse and understand, but for a machine this has given us fits for decades. Now with grammatic parsers we understand that “John Smith” and “he” are the same person, and we also understand who the speaker is and what the subject is in this sentence. This enhanced level of understanding is going to improve the accuracy of text parsing and allow for a much deeper analysis of the relationships in the mountains of data we create every day.

What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Lexalytics is definitely on the better content processing side of the house, our belief is that you can only go so far by improving the search engine… eventually you’re going to have to make the data better to improve the search experience. This is 180 degrees apart from Google who focus exclusively on the search algorithms. This works well for Google in the web search world where you have billions of documents at your disposal, but hasn’t worked as well in the corporate world where finding information isn’t nearly as important as finding the right information and helping users understand why it’s important and who understands it. Our belief is that metadata extraction is one of the best ways to learn the “who” and “why” of content so that enterprise search applications can really improve the efficiency and understanding of their users.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

For Lexalytics the adverse business climate has altered the mix of our customers, but to date has not affected the growth in our business (Q1 2009 should be our best ever). What has clearly changed is the mix of customers investing in Search and Content Processing, we typically run about 2/3 small companies and 1/3 large companies. In this environment we are seeing a significant uptick in large companies looking to invest as they seek to increase their productivity. At the same time, we’re seeing a significant drop in the number of smaller companies looking to spend on Text Analytics and Search. The Net-Net of this is that if anything Search appears to be one of the areas that will do well in this climate, because data volumes are going up and staff sizes are going down.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?

As one of the vendors that works closely with 2 of the 3 the major Enterprise Search vendors we see these acquisitions as a good thing. FAST for example seems to be a well-run organization under Microsoft, and they seem to be very clear on what they do and what they don’t do. This makes it much easier for both partners and smaller vendors to differentiate their products and services from all the larger players. As an example, we are seeing a significant uptick in leads coming directly from the Enterprise Search vendors that are looking to us for help in providing sentiment/tone measurement for their customers. Though these mergers have been good for us, I suspect that won’t be the case for all vendors. We work with the enterprise search companies rather than against them, if you compete with them this may make it even harder to be considered.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

The biggest change is going to be the move away from entities that are explicitly stated within a document to a more ‘fluffy’ approach. Whilst this encompasses things like inferring directly stated relationships – “Joe works at Big Company Inc” – is a directly stated relationship it also encompasses being able to infer this information from a less direct statement. “Joe, got in his car and drove, like he did everyday his job at Big Company Inc.” It also covers things like processing of reviews and understanding that sound quality is a feature of an iPod from the context of the document, rather than having a specific list. It also encompasses things of a more semantic nature. Such as understanding that a document talking about Congress is also talking about Government, even though Government might not be explicitly stated.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

One of the key uses of semantic understanding in the future will be in understanding what people are asking or complaining about in content. It’s one thing to measure the sentiment for an item that you’re interested in (say it’s a digital camera), but it’s quite another to understand the items that people are complaining about while reviewing a camera and noting that the “the battery life sucks”. We believe that joining the subject of a discussion to the tone for that discussion will be one of the key advancements in semantic understanding that takes place in the next couple of years.

Where can I find out more about your products, services and research?

Lexalytics can be found on the web at www.lexalytics.com. Our Web log discusses our thoughts on the industry: www.lexalytics.com/lexablog. A downloadable trial is available here. We also have prepared a white paper, and you can get a copy here.

Harry Collier, February 3, 2009

Frank Bandach, Chief Scientist, eeggi on Semantics and Search

February 2, 2009

An Exclusive Interview by Infonortics Ltd. and Beyond Search

Harry Collier, managing director and founder of the influential Boston Search Engine Meeting, interview Frank Bandach, chief scientist, eeggi, a semantic search company, on January 27, 2009. eeggi has maintained a low profile. The interview with Mr. Bandach is among the first public descriptions of the company’s view of the fast-changing semantic search sector.

The full text of the interview appears below.

Will you describe briefly your company and its search technology?

We are a small new company implementing our very own new technology. Our technology is framed in a rather controversial theory of natural language, exploiting the idea that language itself is a predetermined structure, and as we grow, we simply feed new words to increase its capabilities and its significance. In other words, our brains did not learn to speak but we were rather destined to speak. Scientifically speaking, eeggi is mathematical clustering structure which models natural language, and therefore, some portions of rationality itself. Objectively speaking, eeggi is a linguistic reasoning and rationalizing analysis engine. As a linguistic reasoning engine, is then only natural, that we find ourselves cultivating search, but also other technological fields such as Speech recognition, Concept analysis, Responding, Irrelevance Removal, and others.

What are the three major challenges you see in search in 2009?

The way I perceive this, is that many of the challenges facing search in 2009 (irrelevance, nonsense, and ambiguity) I believe are the same that were faced in previous years. I think that simply our awareness and demands are increasing, and thus require for smarter and more accurate results. This is after all, the history of evolution.

With search decades old, what have been the principal barriers to resolving these challenges in the past?

These problems (irrelevance, nonsense, and ambiguity) have currently being addressed through Artificial Intelligence. However, AI is branched into many areas and disciplines, and AI is also currently evolving and changing. Our approach is unique and follows a completely different attitude, or if I may say, spirit than that from current AI disciplines.

What is your approach to problem solving in search? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Our primary approach is machine intelligence focusing in zero irrelevance, while allowing for synonyms, similarities, rational disambiguation of homonyms or multi-conceptual words, dealing with collocations as unit concepts, grammar, permitting rationality and finally information discovery.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search?

The immediate impact of a weak economy, affects all industries, but the fore long impact will be absorbed and disappeared. The future belongs to technology. This is indeed the principle that was ignited long ago with the industrial revolution. It is true, the world faces many challenges ahead, but technology is the reflection of progress, and technology is uniting us day by day, allowing, and at times forcing us, to understand, accept, and admit our differences. For example, unlike ever before, United Sates and India are now becoming virtual neighbors thanks to the Internet.

Search systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search becoming increasingly integrated into enterprise applications? If yes, how will this shift affect the companies providing stand alone search / content processing solutions? If no, what do you see the role of standalone search / content processing applications becoming?

Form our stand, search, translation, speech recognition, machine intelligence, … for all matters, language, all fall under a single umbrella which we identify thorough a Linguistic Reasoning and Rationalization Analysis engine we call eeggi.

Is that an acronym?

Yes. eeegi is shorthand for “”engineered, encyclopedic, global and grammatical identities”.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

I truly believe that users will become more and more critical of irrelevance and the quality of their results. New generations will be, and are, more aware and demanding of machine performance. For example, while in my youth to have two little bars and a square in the middle represented a tennis match, and it was an exiting experience, in today’s standards, presenting the same scenario to a kid, will be become a laughing matter. As newer generations move in, foolish results will not form part in their minimum of expectations.

Mobile search is emerging as an important branch of search. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

This is a very interesting question… The functionalities and applications from several machines inevitably begin to merge the very instant that technology permits miniaturization or when a single machine can efficiently evolve and support the applications of the others. Most of the times, is the smallest machine the one that wins. It is going to be very interesting to see how cell phones move, more and more, into fields that were reserved exclusively to computers. It is true, that cell phones by nature, need to integrate small screens, but new folding screens and even projection technologies could do for much larger screens, and as Artificial Intelligence takes on challenges before only available through user performance, screens them selves may move into a secondary function. After all, you and me, we are now talking without implementing any visual aid or for that purpose, screen.

Where can I find more information about your products, services, and research?

We are still a bit into stealth mode. But we have a Web site (eeggi.com) that displays and discusses some basic information. We hope, that by November of 2009, we would have build sufficient linguistic structures to allow eeggi to move into automatic learning of other languages with little, or possibly, no aid from natural speakers or human help.

Thank you.

Harry Collier, Managing Director, Infonortics Ltd.

BA-Insight Points to Strong 2009

February 2, 2009

In an exclusive interview for the Search Engine Wizards series, Guy Mounier, one of the senior managers at BA-Insight, looks for a strong 2009. The company grew rapidly in 2008. Although privately-held, Mr. Mounier said, “We are profitable and have been experiencing rapid growth.” You can read the full text of this interview here.

One of the most interesting comments made by Mr. Mounier was:

BA-Insight is the top Enterprise Search ISV Partner of Microsoft. We are a Managed Partner, a status reserved to 200 MS Partners worldwide, and a Global Alliance Member of Microsoft Technology Centers. We are also part of the Google Compete Team. Our software extends the Microsoft Enterprise Search platform, it does not replace it. In fact, our software is not a Search engine. It is a critical differentiate with other ISV’s in the information access sector. We focus exclusively on plug-and-play connectors to enterprise systems, and advanced search experience on top of MS Enterprise Search and MS SharePoint. We will support FAST in the Office 14 time frame.

BA-Insight has found a lucrative niche. The company adds a turbo boost the its clients’ Microsoft systems. With its support for Google systems, BA-Insight is poised to take advantage of that company’s push into organizations as well.

Mr. Mounier told Search Wizards Speak:

Our next major release is scheduled for end of 2009 and will target the next version of SharePoint (Office 14). We will add significant improvements in the form of automatic metadata extraction, dynamic data visualization, and on-the-fly document assembly.

On the subject of having Microsoft as a partner, Mr. Mounier said:

Microsoft is actually a great company to partner with. Their Solution Sales Professionals, responsible for technical solution sales, always reach out to the partner ecosystem, SI’s or ISV’s, to put forth a solution to the customer on top of the Microsoft platform. Microsoft is a significant contributor to our sales pipeline. We conduct regular webinars and other events with their field sales force to stay top of mind, as many partners are competing for their attention. This has been rather easy as of late, as search becomes increasingly strategic to them. The other benefit of being a top partner of Microsoft is that we get visibility into their product pipeline, typically 18 months or more, that our competitors do not have. We know of their future product investments, and can make sure we stay aligned with their roadmap, adding new features that don’t collide with theirs.

For more information about BA-Insight, navigate to the company’s Web site at www.ba-insight.com or click here.

Stephen Arnold, February 2, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta