Selected Quotes from the Study
Semantic content processing is experiencing a resurgence. In 2007, Google’s invention of what are examples of semantic technologies appeared in a series of five patent applications published by the USPTO. Page 41
The volume of unstructured information is significant, and it seems to be growing at double digits each month. So a typical organization that starts a calendar year with 100 gigabytes of unstructured information will finish the year with 250 gigabytes or more data. However, much of this may become structured data, which is now growing rapidly due to the surge of interest in XML. Page 44
One of the better-known pioneers in LSI is Ramana Rao, formerly at Xerox PARC and later at Inxight Software (now a unit of Business Objects), who told me several years ago: No one has to teach a human to recognize the glint from a tooth in a dark hedge. Our ancestors gave us the ability to spot important information very, very quickly. Reading is orders of magnitude slower than this innate ability to spot what’s significant. Page 45
There is no such thing as “enterprise search.” The phrase is one of those marketing buzzwords that became widely used and rarely considered. Page 55
Endeca takes a high-end consulting firm approach to its consulting business. One licensee said, Endeca is similar to a Booz, Allen & Hamilton or a McKinsey. Its focus is on the strategic use of information. We didn’t think we needed this type of support. What we found out was that our second-year costs for customizing our installation and addressing performance issues jumped significantly. Endeca does not provide a price list, but comments from licensees suggest that the firstyear fees start at $500,000 and go up. Page 61
If we consider Google processing its existing information in a dataspace, a user could run one query and get results from Web logs, Web sites, news, books, and other sources. Google’s universal search is a step in the right direction, but dataspaces would take federated searching much, much further. Page 79
Social search can have unexpected consequences. Companies in this sector include Tacit Software Inc., Eurekster, and the previously mentioned Autonomy, and Fast Search & Transfer. The leader in this segment is Tacit Software. Page 83
Classification, entity extraction, and point-and-click access to related content are quickly becoming “must have” features. However, many organizations find themselves unable to afford the seven figure price tags of some of the higher profile systems. Page 89
At this time, we believe Attensity offers one of the more sophisticated text mining systems on offer today. Attensity is one of the leaders in next-generation text mining. Page 109
Bitext illustrates the interest in NLP and linguistic search in Europe. Along with Exalead, PolySpot, and Sine Qua Non, entrepreneurial activity in rich text processing is increasing. The Bitext system can add NLP to almost any search system. Page 114
I would suggest that law firms, analysts, and organizations dealing with problem content test the Brainware system. Page 121
Google and Exalead appear to have somewhat similar philosophies regarding banks of commodity servers running Linux with some special tweaks. Like Google, Exalead is a mathematics-centric company. There are some linguistic operations, but the core of Exalead is algorithmic. Page 148
Click for book ordering information...