Daniel Tunkelang: Co-Founder of Endeca Interviewed

February 9, 2009

As other search conferences gasp for the fresh air of enervating speakers, Harry Collier’s Boston Search Engine Conference (more information is here) has landed another thought-leader speaker. Daniel Tunkelang is one of the founders of Endeca. After the implosion of Convera and the buys out of Fast Search and Verity, Endeca is one of the two flagship vendors of search, content processing, and information management systems recognized by most information technology professionals. Dr. Tunkelang writes an informative Web log The Noisy Channel here.

image

Dr. Daniel Tunkelang. Source: http://www.cs.cmu.edu/~quixote/dt.jpg

You can get a sense of Dr. Tunkelang’s views in this exclusive interview conducted by Stephen Arnold with the assistance of Harry Collier, Managing Director, Infonortics Ltd.. If you want to hear and meet Dr. Tunkelang, attend the Boston Search Engine Meeting, which is focused on search and information retrieval. The Boston Search Engine Meeting is the show you may want to consider attending. All beef, no filler.

image

The speakers, like Dr. Tunkelang, will challenge you to think about the nature of information and the ways to deal with substantive issues, not antimacassars slapped on a problem. We interviewed Mr. Tunkelang on February 5, 2009. The full text of this interview appears below.

Tell us a bit about yourself and about Endeca.

I’m the Chief Scientist and a co-founder of Endeca, a leading enterprise search vendor. We are the largest organically grown company in our space (no preservatives or acquisitions!), and we have been recognized by industry analysts as a market and technology leader. Our hundreds of clients include household names in retail (Wal*Mart, Home Depot); manufacturing and distribution (Boeing, IBM); media and publishing (LexisNexis, World Book), financial services (ABN AMRO, Bank of America), and government (Defense Intelligence Agency, National Cancer Institute).

My own background: I was an undergraduate at MIT, double majoring in math and computer science, and I completed a PhD at CMU, where I worked on information visualization. Before joining Endeca’s founding team, I worked at the IBM T. J. Watson Research Center and AT&T Bell Labs.

What differentiates Endeca from the field of search and content processing vendors?

In web search, we type a query in a search box and expect to find the information we need in the top handful of results. In enterprise search, this approach too often breaks down. There are a variety of reasons for this breakdown, but the main one is that enterprise information needs are less amenable to the “wisdom of crowds” approach at the heart of PageRank and related approaches used for web search. As a consequence, we must get away from treating the search engine as a mind reader, and instead promote bi-directional communication so that users can effectively articulate their information needs and the system can satisfy them. The approach is known in the academic literature as human computer information retrieval (HCIR).

Endeca implements an HCIR approach by combining a set-oriented retrieval with user interaction to create an interactive dialogue, offering next steps or refinements to help guide users to the results most relevant for their unique needs. An Endeca-powered application responds to a query with not just relevant results, but with an overview of the user’s current context and an organized set of options for incremental exploration.

What do you see as the three major challenges facing search and content processing in 2009 and beyond?

There are so many challenges! But let me pick my top three:

Social Search. While the word “social” is overused as a buzzword, it is true that content is becoming increasingly social in nature, both on the consumer web and in the enterprise. In particular, there is much appeal in the idea that people will tag content within the enterprise and benefit from each other’s tagging. The reality of social search, however, has not lived up to the vision. In order for social search to succeed, enterprise workers need to supply their proprietary knowledge in a process that is not only as painless as possible, but demonstrates the return on investment. We believe that our work at Endeca, on bootstrapping knowledge bases, can help bring about effective social search in the enterprise.

Federation.  As much as an enterprise may value its internal content, much of the content that its workers need resides outside the enterprise. An effective enterprise search tool needs to facilitate users’ access to all of these content sources while preserving value and context of each. But federation raises its own challenges, since every repository offers different levels of access to its contents. For federation to succeed, information repositories will need to offer more meaningful access than returning the top few results for a search query.

Search is not a zero-sum game. Web search engines in general–and Google in particular–have promoted a view of search that is heavily adversarial, thus encouraging a multi-billion dollar industry of companies and consultants trying to manipulate result ranking. This arms race between search engines and SEO consultants is an incredible waste of energy for both sides, and distracts us from building better technology to help people find information.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search and content processing?

There’s no question that information technology purchase decisions will face stricter scrutiny. But, to quote Rahm Emmanuel, “Never let a serious crisis go to waste…it’s an opportunity to do things you couldn’t do before.” Stricter scrutiny is a good thing; it means that search technology will be held accountable for the value it delivers to the enterprise. There will, no doubt, be an increasing pressure to cut costs, from price pressure on vendor to substituting automated techniques for human labor. But that is how it should be: vendors have to justify their value proposition. The difference in today’s climate is that the spotlight shines more intensely on this process.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated into enterprise applications? If yes, how will this shift affect the companies providing stand alone search / content processing solutions? If no, what do you see the role of standalone search / content processing applications becoming?

Better search is a requirement for many enterprise applications–not just BI and Call Centers, but also e-commerce, product lifecycle management, CRM, and content management.  The level of search in these applications is only going to increase, and at some point it just isn’t possible for workers to productively use information without access to effective search tools.

For stand-alone vendors like Endeca, interoperability is key. At Endeca, we are continually expanding our connectivity to enterprise systems: more connectors, leveraging data services, etc.  We are also innovating in the area of building configurable applications, which let businesses quickly deploy the right set features for their users.  Our diverse customer base has driven us to support the diversity of their information needs, e.g., customer support representatives have very different requirements from those of online shoppers. Most importantly, everyone benefits from tools that offer an opportunity to meaningfully interact with information, rather than being subjected to a big list of results that they can only page through.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?

Yes!  Each acquisition changes the dynamics in the market, both creating opportunities and shutting doors at the same time.  For SharePoint customers who want to keep the number of vendors they work with to a minimum, the acquisition of FAST gives them a better starting point over Microsoft Search Server.  For FAST customers who aren’t using SharePoint, I can only speculate as to what is in store for them.

For other vendors in the marketplace, the options are:

  • Get aligned with (or acquired by) one of the big vendors and get more tightly tied into a platform stack like FAST;
  • Carve out a position in a specific segment, like we’re seeing with Autonomy and e-Discovery, or
  • Be agnostic, and serve a number of different platforms and users like Endeca or Google do.  In this group, you’ll see some cases where functionality is king, and some cases where pricing is more important, but there will be plenty of opportunities here to thrive.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar? Is performance a non issue?

Performance is absolutely a consideration, even for systems that make efficient use of hardware resources. And it’s not just about using CPU for run-time query processing: the increasing size of data collections has pushed on memory requirements; data enrichment increases the expectations and resource requirements for indexing; and richer capabilities for query refinement and data visualization present their own performance demands.

Multicore computing is the new shape of Moore’s Law: this is a fundamental consequence of the need to manage power consumption on today’s processors, which contain billions of transistors. Hence, older search systems that were not designed to exploit data parallelism during query evaluation will not scale up as hardware advances.

While tasks like content extraction, enrichment, and indexing lend themselves well to today’s distributed computing approaches, the query side of the problem is more difficult–especially in modern interfaces that incorporate faceted search, group-bys, joins, numeric aggregations, et cetera. Much of the research literature on query parallelism from the database community addresses structured, relational data, and most parallel database work has targeted distributed memory models, so existing techniques must be adapted to handle the problems of search.

Google has disrupted certain enterprise search markets with its appliance solution. The Google brand creates the idea in the minds of some procurement teams and purchasing agents that Google is the only or preferred search solution. What can a vendor do to adapt to this Google effect? Is Google a significant player in enterprise search, or is Google a minor player?

I think it is a mistake for the higher-end search vendors to dismiss Google as a minor player in the enterprise. Google’s appliance solution may be functionally deficient, but Google’s brand is formidable, as is its position of the appliance as a simple, low-cost solution. Moreover, if buyers do not understand the differences among vendor offerings, they may well be inclined to decide based on the price tag–particularly in a cost-conscious economy. It is thus more incumbent than ever on vendors to be open about what their technology can do, as well as to build a credible case for buyers to compare total cost of ownership.

Mobile search is emerging as an important branch of search / content processing. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

A number of folks have noted that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant. I’m delighted to see designers striving to improve the signal-to-noise ratio in information seeking applications.

Still, I think we can take the idea much further. More efficient or ergonomic use of real estate boils down to stripping extraneous content–a good idea, but hardly novel, and making sites vertically oriented (i.e., no horizontal scrolling) is still a cosmetic change. The more interesting question is how to determine what information is best to present in the limited space–-that is the key to optimizing interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice. Ultimately, we need to reconsider the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.

Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

Semantic search means different things to different people, but broadly falls into two categories: Using linguistic and statistical approaches to derive meaning from unstructured text, using semantic web approaches to represent meaning in content and query structure. Endeca embraces both of these aspects of semantic search.

From early on, we have developed an extensible framework for enriching content through linguistic and statistical information extraction. We have developed some groundbreaking tools ourselves, but have achieved even better results by combining other vendor’s document analysis tools with our unique ability to improve their results through corpus analysis.

The growing prevalence of structured data (e.g., RDF) with well-formed ontologies (e.g., OWL) is very valuable to Endeca, since our flexible data model is ideal for incorporating heterogeneous, semi-structured content. We have done this in major applications for the financial industry, media/publishing, and the federal government.

It is also important that semantic search is not just about the data. In the popular conception of semantic search, the computer is wholly responsible derives meaning from the unstructured input. Endeca’s philosophy, as per the HCIR vision, is that humans determine meaning, and that our job is to give them clues using all of the structure we can provide.

Where can I find more information about your products, services, and research?

Endeca’s web site is http://endeca.com/. I also encourage you to read my blog, The Noisy Channel (http://thenoisychannel.com/), where I share my ideas (as do a number of other people!) on improving the way that people interact with information.

Stephen Arnold, February 9, 2009

Microsoft Rolls Out Low Cost Search to Counter Google in the Enterprise

February 8, 2009

Computerworld’s “Microsoft Unveils Enterprise Search Products” here makes public one of the worst kept secrets in the enterprise search and content processing arena in my experience. The Computerworld story reports that the enterprise search market is “at a tipping point.” (More about this idea below.) In response, Microsoft will release “a pair of low-cost enterprise search products.” The announcement will be given a big hoo ha at the Fast Forward Conference where the believers in the $1.23 billion Fast ESP search system gather each year.

The Computerworld story trots down a familiar path. Autonomy’s products are expensive. Google offers a low priced search solution which costs less than $2,000. Microsoft is now responding to an opportunity because Microsoft has a habit of playing “me too” games. Microsoft has lots of partners. The company has put up a new Web site to explain the low cost solutions. It is located at www.microsoft.com/enterprisesearch.

Now I don’t want to be a wet blanket, but I have to capture the thoughts that flapped through my mind after reading this Computerworld story. I was walking my two technical advisors, Kenlico’s Canadian Mist (a former show dog) and my rescued, hopelessly confused boxer, Tess. She, as you know, tracks Microsoft search technology for me.

The thoughts are jejune and my opinion, but I value them this morning:

  1. Any enterprise search system is expensive, not just six figures but often seven figures. The reason is that low cost solutions have to be matched to specific user needs and the context of the organization. Whether the tailoring is done by the licensee’s staff or by consultants, time is the killer here. The research I have done over the past decades about the cost of behind-the-firewall search is that vendors have to be cautious about their assertions regarding customization. A quick example surfaces in the event of a legal matter. The cost of the basic behind-the-firewall search system becomes an issue when that system cannot meet the needs of eDiscovery. A single system could be free like Lucene or SOLR, but the cost of behind-the-firewall search skyrockets over night. Similar cost spikes occur when non text content must be processed, the company must comply with a government’s security guidelines, when the company acquires another outfit and has to “integrate” the systems for information retrieval, and so on.
  2. Google is not cheap. The cheap system is a way to let an organization kick the tires. A Google system with several hot spare GB 7007s costs more than Autonomy and Endeca in similar configurations. Why is this? People want Google and are price insensitive in my experience. Sure, Google’s resellers can wheel and deal some, but most competitors are clueless about the cost of the Google solutions and what the Google solutions do to disrupt the existing information technology systems and procedures. The GOOG for its part lets the customers figure out Google functions on its own, so the penetration of Google and its costs move at a snail’s pace. The Google snail, however, is pretty smart and has a bigger picture in mind than a couple of GB 7007s.
  3. Enterprise search–what I rather pedantically call “behind the firewall search”–is hugely complex. A simple solution reveals the hot button for the licensee. Once in the crucible of a operating organization, the notion of search quickly yields to entity extraction, semantic analysis, reports, answering questions, and search without search. For basic search and retrieval, there are many options, including Microsoft’s. But for more significant methods for giving organizations what users need to keep the company in business, simple search won’t do the job. Companies ranging from Attivio to Relegence exist to deliver a different type of solution. In my experience, there is no single enterprise search solution.
  4. Where does the Fast Search & Technology, Linux centric suite of search technology fit into this free offering. If Fast ESP is now free or discounted, what will SharePoint administrators do when confronted with the need to assemble, customize, script, and baby sit a quite complex code assembly. Fast ESP consists of original code, acquired technology, licensed technology, and other pieces and parts created by different Fast engineers in different Fast offices. SharePoint is complex. Making SharePoint more complex is good for job security but not good for the organization’s debugging and maintenance budget line item in my opinion.

I think the significance of these announcements is that price pressure will be put on the vendors who offer snap in search and content processing systems for SharePoint. I like the products and services from a number of vendors in this space. The functionality of BA-Insight, Coveo, Exalead, or ISYS Search Software, among others, may offer SharePoint licensees more options than either the Microsoft solutions or the Google solutions. This announcement will lead some Microsoft faithful to say, “Well, Microsoft’s solution is good enough and cheap. Let’s do it.” But that will not be sufficient to stop the bubbling up that the Google approach uses to give Microsoft itself wet feet and chills.

Metadata Extraction

February 8, 2009

A happy quack to the reader who sent me a link to “Automate Metadata Extraction for Corporate Search and Mashups” by Dan McCreary here. The write up focuses on the UIMA framework and the increasing interest in semantics, not just key word indexing. I found the inclusion of code snippets useful. The goslings here at Beyond Search are urged to copy, cut and paste before writing original scripts. Why reinvent the wheel? The snippets may not be the exact solution one needs, but a quick web footed waddle through them revealed some useful items. Mr. McCreary has added a section about classification and he used the phrase “faceted search” which may agitate the boffins at Endeca and other firms where facets are as valuable as a double eagle silver dollar. I was less enthusiastic about the discussion of Eclipse, but you may find it just what you need to chop down some software costs.

The write up in in several parts. Here are the links to each section: Part 1, Part 2, and Part 3. I marked this article for future reference. Quite useful if a bit pro-IBM.

Stephen Arnold, February 6, 2009

Great Bit Faultline: IT and Legal Eagles

February 6, 2009

The legal conference LegalTech generates quite a bit of information and disinformation about search, content processing, and text mining. Vendors with attorneys on the marketing and sales staff are often more cautious in their wording even though these professionals are not the school president type personalities some vendors prefer. Other vendors are “all sales all the time” and this crowd surfs the trend waves.

You will have to decide whose news release to believe. I read an interesting story in Centre Daily Times here called “Continuing Disconnect between IT and Legal Greatly Hindering eDiscovery Efforts, Recommind Survey Finds”. The article makes a point for which I have only anecdotal information; namely, information technology wizards know little about the eDiscovery game. IT wonks want to keep systems running, restore files, and prevent users from mucking up the enterprise systems. eDiscovery on the other hand wants to pour through data, suck it into a system that prevents spoliation (a fancy word for delete or change documents), and create a purpose built system that attorneys can use to fight for truth, justice, and the American way.

Now, Recommind, one of the many firms claiming leadership in the eDiscovery space, reports the results of a survey. (Without access to the sample selection method and details of the analytic tools, the questionnaire itself, and the folks who did the analysis I’m flying blind.) The article asserts:

Recommind’s survey demonstrates that there is significant work remaining to achieve this goal: only 37% of respondents reported that legal and IT are working more closely together than a year before. This issue is compounded by the fact that only 21% of IT respondents felt that eDiscovery was a “very high” priority, in stark contrast with the overwhelming importance attached to eDiscovery by corporate legal departments. Furthermore, there remains a significant disconnect between corporate accountability and project responsibility, with legal “owning” accountability for eDiscovery (73% of respondents), records management (47%) and data retention (50%), in spite of the fact that the IT department actually makes the technology buying decisions for projects supporting these areas 72% of the time. Exacerbating these problems is an alarming shortage of technical specifications for eDiscovery-related projects. Only 29% of respondents felt that IT truly understood the technical requirements of eDiscovery. The legal department fared even worse, with only 12% of respondents indicating that legal understood the requirements. Not surprisingly, this disconnect is leading to a lack of confidence in eDiscovery project implementation, with only 27% of respondents saying IT is very helpful during eDiscovery projects, and even fewer (16%) believing legal is.

My reaction to these alleged findings was, “Well, makes sense.” You will need to decide for yourself. My hunch is that IT and legal departments are a little like the Hatfields and the McCoys. No one knows what the problem is, but there is a problem.

What I find interesting is that enterprise search and content processing systems are generally inappropriate for the rigors of eDiscovery and other types of legal work. What’s amusing is a search vendor trying to sell to a lawyer who has just been surprised in a legal action. The lawyer has some specific needs, and most enterprise search systems don’t meet these. Equally entertaining is a purpose built legal system being repackaged as a general purpose enterprise search system. That’s a hoot as well.

As the economy continues its drift into the financial Bermuda Triangle, I think everyone involved in legal matters will become more, not less, testy. Stratify, for example, began life as Purple Yogi and an intelligence-centric tool. Now Stratify is a more narrowly defined system with a clutch of legal functions. Does an IT department understand a Stratify? Nope. Does an IT department understand a general purpose search system like Lucene. Nope. Generalists have a tough time understanding the specific methods of experts who require a point solution.

In short, I think the numbers in the Recommind study may be subject to questions, but the overall findings seem to be generally on target.,

Stephen Arnold, February 6, 2009

Google’s Medical Probe

February 5, 2009

Yikes, a medical probe. Quite an image for me. In New York City at one of Alan Brody’s events in early 2007, I described Google’s “I’m feeling doubly lucky” invention. The idea was search without search. One example I used to illustrate search without search was a mobile device that could monitor a user’s health. The “doubly lucky” metaphor appears in a Google open source document and suggests that a mobile device can react to information about a user. In one use case, I suggested, Google could identify a person with a heart problem and summon assistance. No search required. The New York crowd sat silent. One person from a medical company asked, “How can a Web search and advertising company play a role in health care?” I just said, “You might want to keep your radar active?” In short, my talk was a bust. No one had a clue that Google could do mobile, let alone mobile medical devices. Those folks probably don’t remember my talk. I live in rural Kentucky and clearly am a bumpkin. But I think when some of the health care crowd read “Letting Google Take Your Pulse” in the oh-so-sophisticated Forbes Magazine, on February 5, 2009, those folks will have a new pal at trade shows. Googzilla is in the remote medical device monitoring arena. You can read the story here–just a couple of years after Google disclosed the technology in a patent application. No sense in rushing toward understanding the GOOG when you are a New Yorker, is there? For me, the most interesting comment in the Forbes’s write up was:

For IBM, the new Google Health functions are also a dress rehearsal for “smart” health care nationwide. The computing giant has been coaxing the health care industry for years to create a digitized and centrally stored database of patients’ records. That idea may finally be coming to fruition, as President Obama’s infrastructure stimulus package works its way through Congress, with $20 billion of the $819 billion fiscal injection aimed at building a new digitized health record system.

Well, better to understand too late than never. Next week I will release a service to complement Oversight to allow the suave Manhattanites an easy way to monitor Google’s patent documents. The wrong information at the wrong time can be hazardous to a health care portfolio in my opinion.

Stephen Arnold, February 5, 2009

Lexalytics’ Jeff Caitlin on Sentiment and Semantics

February 3, 2009

Editor’s Note: Lexalytics is one of the companies that is closely identified with analyzing text for sentiment. When a flow of email contains a negative message, Lexalytics’ system can flag that email. In addition, the company can generate data that provides insight into how people “feel” about a company or product. I am simplifying, of course. Sentiment analysis has emerged as a key content processing function, and like other language-centric tasks, the methods are of increasing interest.

Jeff Caitlin will speak at what has emerged as the “must attend” search and content processing conference in 2009. The Infonortics’ Boston Search Engine meeting features speakers who have an impact on sophisticated search, information processing, and text analytics. Other conferences respond to public relations; the Infonortics’ conference emphasizes substance.

If you want to attend, keep in mind that attendance at the Boston Search Engine Meeting is limited. To get more information about the program, visit the Infonortics Ltd. Web site at www.infonortics.com or click here.

The exclusive interview with Jeff Caitlin took place on February 2, 2009. Here is the text of the interview conducted by Harry Collier, managing director of Infonortics and the individual who created this content-centric conference more than a decade ago. Beyond Search has articles about Lexalytics here and here.

Will you describe briefly your company and its search / content processing technology?

Lexalytics is a Text Analytics company that is best known for our ability to measure the sentiment or tone of content. We plug in on the content processing side of the house, and take unstructured content and extract interesting and useful metadata that applications like Search Engines can use to improve the search experience. The types of metadata typically extracted include: Entities, Concepts, Sentiment, Summaries and Relationships (Person to Company for example).

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

The simple fact that machines aren’t smart like people and don’t actually “understand” the content it is processing… or at least it hasn’t to date. The new generation of text processing systems have advanced grammatic parsers that are allowing us to tackle some of the nasty problems that have stymied us in the past. One such example is Anaphora resolution, sometimes referred to as “pronominal preference”, which is a bunch of big confusing sounding words to explain the understanding of “pronouns”. If you took the sentence, “John Smith is a great guy, so great that he’s my kids godfather and one of the nicest people I’ve ever met.” For people this is a pretty simple sentence to parse and understand, but for a machine this has given us fits for decades. Now with grammatic parsers we understand that “John Smith” and “he” are the same person, and we also understand who the speaker is and what the subject is in this sentence. This enhanced level of understanding is going to improve the accuracy of text parsing and allow for a much deeper analysis of the relationships in the mountains of data we create every day.

What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Lexalytics is definitely on the better content processing side of the house, our belief is that you can only go so far by improving the search engine… eventually you’re going to have to make the data better to improve the search experience. This is 180 degrees apart from Google who focus exclusively on the search algorithms. This works well for Google in the web search world where you have billions of documents at your disposal, but hasn’t worked as well in the corporate world where finding information isn’t nearly as important as finding the right information and helping users understand why it’s important and who understands it. Our belief is that metadata extraction is one of the best ways to learn the “who” and “why” of content so that enterprise search applications can really improve the efficiency and understanding of their users.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

For Lexalytics the adverse business climate has altered the mix of our customers, but to date has not affected the growth in our business (Q1 2009 should be our best ever). What has clearly changed is the mix of customers investing in Search and Content Processing, we typically run about 2/3 small companies and 1/3 large companies. In this environment we are seeing a significant uptick in large companies looking to invest as they seek to increase their productivity. At the same time, we’re seeing a significant drop in the number of smaller companies looking to spend on Text Analytics and Search. The Net-Net of this is that if anything Search appears to be one of the areas that will do well in this climate, because data volumes are going up and staff sizes are going down.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?

As one of the vendors that works closely with 2 of the 3 the major Enterprise Search vendors we see these acquisitions as a good thing. FAST for example seems to be a well-run organization under Microsoft, and they seem to be very clear on what they do and what they don’t do. This makes it much easier for both partners and smaller vendors to differentiate their products and services from all the larger players. As an example, we are seeing a significant uptick in leads coming directly from the Enterprise Search vendors that are looking to us for help in providing sentiment/tone measurement for their customers. Though these mergers have been good for us, I suspect that won’t be the case for all vendors. We work with the enterprise search companies rather than against them, if you compete with them this may make it even harder to be considered.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

The biggest change is going to be the move away from entities that are explicitly stated within a document to a more ‘fluffy’ approach. Whilst this encompasses things like inferring directly stated relationships – “Joe works at Big Company Inc” – is a directly stated relationship it also encompasses being able to infer this information from a less direct statement. “Joe, got in his car and drove, like he did everyday his job at Big Company Inc.” It also covers things like processing of reviews and understanding that sound quality is a feature of an iPod from the context of the document, rather than having a specific list. It also encompasses things of a more semantic nature. Such as understanding that a document talking about Congress is also talking about Government, even though Government might not be explicitly stated.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

One of the key uses of semantic understanding in the future will be in understanding what people are asking or complaining about in content. It’s one thing to measure the sentiment for an item that you’re interested in (say it’s a digital camera), but it’s quite another to understand the items that people are complaining about while reviewing a camera and noting that the “the battery life sucks”. We believe that joining the subject of a discussion to the tone for that discussion will be one of the key advancements in semantic understanding that takes place in the next couple of years.

Where can I find out more about your products, services and research?

Lexalytics can be found on the web at www.lexalytics.com. Our Web log discusses our thoughts on the industry: www.lexalytics.com/lexablog. A downloadable trial is available here. We also have prepared a white paper, and you can get a copy here.

Harry Collier, February 3, 2009

Digital Textbook Start Up

February 2, 2009

A textbook start up seems unrelated to search. It’s not. You can read about Flatworld, an open source variant, set up to make money with educational materials here. The company wants to offer online books for free. Hard copies carry a price tag.Glyn Moody, who wrote “Flatworld: Open Textbooks” for Open…, made this interesting comment:

It’s too early to tell how this particular implementation will do, but I am absolutely convinced this open textbook approach will do to academic publishing what open source has done to software.

Dead tree educational publishers take note. Change is coming and really fast. Gutenberg gave printing a boost. Online gives a new publishing medium a similar shove.

Stephen Arnold, February 2, 2009

Frank Bandach, Chief Scientist, eeggi on Semantics and Search

February 2, 2009

An Exclusive Interview by Infonortics Ltd. and Beyond Search

Harry Collier, managing director and founder of the influential Boston Search Engine Meeting, interview Frank Bandach, chief scientist, eeggi, a semantic search company, on January 27, 2009. eeggi has maintained a low profile. The interview with Mr. Bandach is among the first public descriptions of the company’s view of the fast-changing semantic search sector.

The full text of the interview appears below.

Will you describe briefly your company and its search technology?

We are a small new company implementing our very own new technology. Our technology is framed in a rather controversial theory of natural language, exploiting the idea that language itself is a predetermined structure, and as we grow, we simply feed new words to increase its capabilities and its significance. In other words, our brains did not learn to speak but we were rather destined to speak. Scientifically speaking, eeggi is mathematical clustering structure which models natural language, and therefore, some portions of rationality itself. Objectively speaking, eeggi is a linguistic reasoning and rationalizing analysis engine. As a linguistic reasoning engine, is then only natural, that we find ourselves cultivating search, but also other technological fields such as Speech recognition, Concept analysis, Responding, Irrelevance Removal, and others.

What are the three major challenges you see in search in 2009?

The way I perceive this, is that many of the challenges facing search in 2009 (irrelevance, nonsense, and ambiguity) I believe are the same that were faced in previous years. I think that simply our awareness and demands are increasing, and thus require for smarter and more accurate results. This is after all, the history of evolution.

With search decades old, what have been the principal barriers to resolving these challenges in the past?

These problems (irrelevance, nonsense, and ambiguity) have currently being addressed through Artificial Intelligence. However, AI is branched into many areas and disciplines, and AI is also currently evolving and changing. Our approach is unique and follows a completely different attitude, or if I may say, spirit than that from current AI disciplines.

What is your approach to problem solving in search? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Our primary approach is machine intelligence focusing in zero irrelevance, while allowing for synonyms, similarities, rational disambiguation of homonyms or multi-conceptual words, dealing with collocations as unit concepts, grammar, permitting rationality and finally information discovery.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search?

The immediate impact of a weak economy, affects all industries, but the fore long impact will be absorbed and disappeared. The future belongs to technology. This is indeed the principle that was ignited long ago with the industrial revolution. It is true, the world faces many challenges ahead, but technology is the reflection of progress, and technology is uniting us day by day, allowing, and at times forcing us, to understand, accept, and admit our differences. For example, unlike ever before, United Sates and India are now becoming virtual neighbors thanks to the Internet.

Search systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search becoming increasingly integrated into enterprise applications? If yes, how will this shift affect the companies providing stand alone search / content processing solutions? If no, what do you see the role of standalone search / content processing applications becoming?

Form our stand, search, translation, speech recognition, machine intelligence, … for all matters, language, all fall under a single umbrella which we identify thorough a Linguistic Reasoning and Rationalization Analysis engine we call eeggi.

Is that an acronym?

Yes. eeegi is shorthand for “”engineered, encyclopedic, global and grammatical identities”.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

I truly believe that users will become more and more critical of irrelevance and the quality of their results. New generations will be, and are, more aware and demanding of machine performance. For example, while in my youth to have two little bars and a square in the middle represented a tennis match, and it was an exiting experience, in today’s standards, presenting the same scenario to a kid, will be become a laughing matter. As newer generations move in, foolish results will not form part in their minimum of expectations.

Mobile search is emerging as an important branch of search. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

This is a very interesting question… The functionalities and applications from several machines inevitably begin to merge the very instant that technology permits miniaturization or when a single machine can efficiently evolve and support the applications of the others. Most of the times, is the smallest machine the one that wins. It is going to be very interesting to see how cell phones move, more and more, into fields that were reserved exclusively to computers. It is true, that cell phones by nature, need to integrate small screens, but new folding screens and even projection technologies could do for much larger screens, and as Artificial Intelligence takes on challenges before only available through user performance, screens them selves may move into a secondary function. After all, you and me, we are now talking without implementing any visual aid or for that purpose, screen.

Where can I find more information about your products, services, and research?

We are still a bit into stealth mode. But we have a Web site (eeggi.com) that displays and discusses some basic information. We hope, that by November of 2009, we would have build sufficient linguistic structures to allow eeggi to move into automatic learning of other languages with little, or possibly, no aid from natural speakers or human help.

Thank you.

Harry Collier, Managing Director, Infonortics Ltd.

eeggi Founder Interviewed

February 2, 2009

Frank Bandach, Chief Scientist at eeggi (the acronym stands for “engineered, encyclopedic, global and grammatical identities”) is a semantic search system with a mathematical foundation. You can view demonstrations and get more information here. eeggi has kept a low profile, but Mr. Bandach will deliver one of the presentations at the Infonortics’ Boston Search Engine Meeting in April 2009. You can get more information about the conference at www.infonortics.com or click here.

Beyond Search will post Mr. Bandach’s interviewed conducted by Harry Collier on February 1, 2009. In the interval before the April Boston Search Engine meeting, other interviews and information will be posted here as well. Mr. Collier, managing director of Infonortics, has granted permission to  ArnoldIT.com to post the interviews as part of the Search Wizards Speak Web series here.

The Boston Search Engine Meeting is the premier event for search, content processing, and text analytics. If you attend one search-centric conference in 2009, the Boston Search Engine Meeting is the one for your to do list. Other conferences tackle search without the laser focus of the Infonortics’ program committee. In fact, outside of highly technical event sponsored by the ACM, most search conferences wobble across peripheral topics and Web 2.0 trends. Not the Boston Search Engine Meeting. As the interview with eeggi’s senior manager reveals, Infonortics tackles search and content processing with speakers who present useful insights and information.

Unlike other events, the Infonortics Boston Search Engine Meeting attendance is limited. The program recognizes speakers for excellence with the Ev Brenner award selected by such search experts as Dr. Liz Liddy (Dean, Syracuse University), Dr. David Evans (Justsytem, Tokyo), and Sue Feldman (IDC’s vice president of search technology research). Some conferences use marketers, journalists, or search newbies to craft a conference program. Not Mr. Collier. You meet attendees and speakers who have a keen interest in search technology, innovations, and solutions. Experts in search engine marketing find the Boston Meeting foreign territory.

Click here for the interview with Frank Bandach, eeggi.

Stephen Arnold, February 1, 2009

SurfRay Management

January 31, 2009

I did some poking around on Friday, January 30, 2008. One of the more interesting items was confirmation that SurfRay president Bill Cobb resigned from SurfRay earlier this month. There has been some chatter that he was forced out of the company. That’s not exactly on the money. I spoke with Mr. Cobb and he is interested in exploring leadership opportunities. With regard to the future of SurfRay, I can say with confidence that the company is indeed sailing in rough waters against a headwind. If an investment firm is poised to acquire the assets, that deal will have to be made quickly. In my opinion, time is running out for SurfRay.

Stephen Arnold, January 31, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta