New IBM Redbook: IBM Watson Enterprise Search and Analytics

October 12, 2014

The Redbook is free. You can download it from this IBM link for now. The full title is “IBM Watson Content Analytics. Discovering Actionable Insight from Your Content.”

The Redbook weighs in with 598 pages of Watson goodness. If you follow the IBM content analytics products, you may know that the previous version was know as IBM Content Analytics with Enterprise Search or (ICAwES).

The Redbook presents some philosophical content. IBM has a tradition to uphold. In addition, the Redbook provides information about facets (yep, good old metadata), some mathy features that make analytics analytical, and sentiment analysis.

ICAwES does not operate as an island. The sprawling system can hook into IBM’s semi automatic classification system, Cognos, and interface tools.

Is ICAwES an “enterprise search” system? I would say, “Sure is.” You will have to work through the Redbook and draw your own conclusions. You will also want to identify the Watson component. Watson is Lucene with IBM scripts and wrappers, but IBM has far more colorful lingo for describing the system. After all, IBM Watson is supposed to generate $1 billion in a snappy manner. If IBM’s plan bears revenue fruit, in five or six years, Watson will be a $10 billion per year business. That’s quite a goal, considering Autonomy required 13 years to push into $800 million in revenue territory and IBM has been offering information retrieval systems since the days of STAIRS.

The new information in the July 2014 edition of the Redbook adds a chapter containing some carefully selected case studies. There is a new chapter called “Enterprise Search” to which I will return in a moment. Also, the many authors of the Redbook have added to the discussion of Cognos, one of IBM’s business intelligence systems. Finally, the Redbook provides some helpful suggestions for “customizing and extending the content analytics miner.”

I urge you to work through this volume because it provides a useful yardstick against which to measure the IBM Watson marketing and public relations explanations against the reality, limitations, and complexity of the IBM Content Analytics system. Is the Redbook describing a product or a collection of components that an IBM implementation team will use to craft a customized solution?

The chapter on Enterprise Search begins on page 445 and continues to page 486. The solution is a two part affair. On one hand, processed content will output data about the entities, word frequencies, and similar metrics in the corpus and updates to the corpus. On the other hand, ICAwES is a search and retrieval system. Many vendors take this approach today; however, certain types of content cannot be comprehensively processed by the system. Examples range from video content, engineering drawings, digital imagery, and certain types of ephemeral content such as text messages sent via an ad hoc Bluetooth mesh network. One can code up a fix, but that is likely to be more hassle than many licensees will tolerate.

The Redbook shows some ready-to-use interfaces. These can, of course, be modified. The sample in the screenshot below looks quite a bit like the original Fulcrum Technologies’ presentation of information processed by the system. A more modern implementation would be Amazon’s recent JSON centric system for content.


ICAwES Redbook, Copyright IBM 2014.

The illustration shows a record viewed by tags; for example categories. Items can be tallied in a chart that provides a summary of how many content objects share a particular index terms. The illustration shows the ICAwES identifying terms in a user’s query, identifying entities like IBM Lotus Domino, and other features associated with Autonomy IDOL or Endeca style systems. Both of these date from the late 1990s, so IBM is not pushing too far from the dirt path carved out of the findability woods by former leaders in enterprise search.

IBM provides information needed to implement query expansion. Yes, a dictionary lurks within the system, and an interface is provided so the licensee can be like Noah Webster. The system is rules based, and a specialist is needed to create or edit rules. As you may know, rules based systems suffer from several drawbacks. Rules have to be maintained, subject matter experts or programmers are usually required to make the proper judgments, and rules can drift out of phase with the users’ queries unless the system is monitored with above average rigor. Like Autonomy IDOL, skimp on monitoring and tuning, and the system can generate some interesting results.

The provided user interface looks like this:


ICAwES Redbook, Copyright IBM 2014.

With many users wanting a “big red button” to simplify information access, this interface brings forward the high density displays associated with TeraText and similar legacy systems. The density seems to include hints of Attivio and BA Insight user interfaces as well. There are many choices available to the user. However, without special training, it is unlikely that a marketing professional using ICAwES will be able to make full use of of query trees, category trees, and the numerous icons that appear in four different locations. I can hear the user now, “I want this system to be just like Google? I want to type in a three words and scan the results.”

Net net. If you are working in an organization that favors IBM solutions, this system is likely to be what senior management licenses. Keep in mind that ICAwES will require the ministrations of IBM professional services, probably additional headcount, and on-going work to keep the system delivering useful results to users and decision makers.

The system delivers key word search, rich indexing, and basic metrics about the content. IBM offers more robust analytic tools in its SPSS product line. For more comprehensive text analysis, take a look at IBM i2 and Cybertap solutions if your organization has appropriate credentials for these somewhat more sophisticated information access and analysis systems.

After working through the Redbook, I had one question, “Where’s Watson?”

Stephen E Arnold, October 12, 2014

The AIIM Enterprise Search Study 2014

October 10, 2014

I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:

Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.

The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).

Some History for the Young Reader of Beyond Search

AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).

AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.

The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.

But AIIM evokes for me:

Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.

I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.

Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.

Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive  for Beyond Search.)

Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].

Talk about good news for vendors of findability solutions. If  one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.

Finding:  47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.

The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.

Read more

LucidWorks Takes Bullets, Mid Tier Consultant Gets Some Info Upside Down and Backward

September 6, 2014

Navigate to “Trouble at LucidWorks: Lawsuits, Lost Deals, & Layoffs Plague the Search Startup Despite Funding.”

Another search vendor struggling for survival is not a surprise. What is interesting is that the write up identifies that venture money was needed to stay afloat, a youthful whiz kid cannot deliver revenues, and that former staff say some pretty negative things.

What struck me as interesting was the information smashed into some sentences from a mid tier consulting firm’s search expert. Did you know that Microsoft gives away Fast Search & Transfer technology. This the same code that received high marks in a magic quadrant and contributed to a jail sentence for the founder of Fast Search. Did you know Did you know that the Google Search Appliance was a low cost search option? I did not. In fact, if you look up prices on the US government’s site, the GSA is a pretty expensive solution. Did you know that make money in open source search is not easy? Maybe not easy but it seems as if RedHat is doing okay.

Why do I ask these questions? I enjoy pointing out that what looks like reasonable statements from an expert may be “out of square.” For color on this reference, see this Beyond Search article.

What about LucidWorks? The company struggled with creating revenue around a layer of software that interacts with Lucene. There were squabbles, turnover in senior management, and pivots.

What is important is that even when a search and content processing company minimizes these and other issues, search is a darned tough software segment to make spin cash.

LucidWorks may survive. But in the larger context of information retrieval, the long shadows cast by Autonomy and Fast Search & Transfer are reminders that painting word pictures about complex technology may be much easier than building a search company with sustainable revenues.

Stephen E Arnold, September , 2014

The Knowledge Quotient Saucisson Link: Back to Sociology in the 1970s

August 5, 2014

I have mentioned recent “expert analyses” of the enterprise search and content marketing sector. In my view, these reports are little more than gussied up search engine optimization (SEO), content marketing plays. See, for example, this description of the IDC report about “knowledge quotient”. Sounds good, right. So does most content marketing and PR generated by enterprise search vendors trying to create sustainable revenue and sufficient profits to keep the investors on their boats, in their helicopters, and on the golf course. Disappointing revenues are not acceptable to those with money who worry about risk and return, not their mortgage payment.

Some content processing vendors are in need of sales leads. Others are just desperate for revenue. The companies with venture money in their bank account have to deliver a return. Annoyed funding sources may replace company presidents. This type of financial blitzkrieg has struck BA Insight and LucidWorks. Other search vendors are in legal hot water; for example, one Fast Search & Transfer executive and two high profile Autonomy Corp. professionals. Other companies tap dance from buzzword to catchphrase in the hopes of avoiding the fate of Convera, Delphes, or Entopia. The marketing beat goes on, but the revenues for search solutions remains a challenge. How will IBM hit $10 billion in Watson revenues in five or six years? Good question, but I know the answer. Perhaps accounting procedures might deliver what looks like a home run for Watson. Perhaps the Jeopardy winner will have to undergo Beverly Hills-style plastic surgery? Will the new Watson look like today’s Watson? I would suggest that some artificiality could be discerned.

Last week, one of my two or three readers wrote to inform me that the phrase “knowledge quotient” is a registered trademark. One of my researchers told me that when one uses the phrase “knowledge quotient,” one should include the appropriate symbol. Omission can mean many bad things, mostly involving attorneys:


Another one of the goslings picked up the vaporous “knowledge quotient” and poked around for other uses of the word. Remember. I encountered this nearly meaningless quasi academic jargon in the title of an IDC report about content processing, authored by the intrepid expert Dave Schubmehl.

According to one of my semi reliable goslings, the phrase turned up in a Portland State University thesis. The authors were David Clitheroe and Garrett Long.


The trademark was registered in 2004 by Penn State University. Yep, that’s the university which I associate with an unfortunate management “issue.” According to Justia, the person registering the phrase “knowledge quotient” was a Penn State employee named Gene V J Maciol.

So we are considering a chunk of academic jargon cooked up to fulfill a requirement to get an advanced degree in sociology in 1972. That was about 40 years ago. I am not familiar with sociology or the concept knowledge quotient.

I printed out the 111 page document and read it. I do have some observations about the concept and its relationship to search and content processing. Spoiler alert: Zero, none, zip, nada, zilch.

The topic of the sociology paper is helping kids in trouble. I bristled at the assumptions implicit in the write up. Some cities had sufficient resources to help children. Certain types of faculties are just super. I assume neither of the study’s authors were in a reformatory, orphanage, or insane asylum.

Anyway the phrase “knowledge quotient” is toothless. It means, according to page 31:

the group’s awareness and knowledge of the [troubled youth or orphan] home.

And the “quotient” part? Here it is in all its glory:

A knowledge quotient reflects the group’s awareness and knowledge of the home.

Read more

More Knowledge Quotient Silliness: The Florida Gar of Search Marketing

August 1, 2014

I must be starved for intellectual Florida Gar. Nibble on this fish’s lateral line and get nauseous or dead. Knowledge quotient as a concept applied to search and retrieval is like a largish Florida gar. Maybe a Florida gar left too long in the sun.


Lookin’ yummy. Looks can be deceiving in fish and fishing for information. A happy quack to

I ran a query on one of the search systems that I profile in my lectures for the police and intelligence community. With a bit of clicking, I unearthed some interesting uses of the phrase “knowledge quotient.”

What surprised me is that the phrase is a favorite of some educators. The use of the term as a synonym for plain old search seems to be one of those marketing moments of magic. A group of “experts” with degrees in home economics, early childhood education, or political science sit around and try to figure out how to sell a technology that is decades old. Sure, the search vendors make “improvements” with ever increasing speed. As costs rise and sales fail to keep pace, the search “experts” gobble a cinnamon latte and innovate.

In Dubai earlier this year, I saw a reference to a company engaged in human resource development. I think this means “body shop,” “lower cost labor,” or “mercenary registry,” but I could be off base. The company is called Knowledge Quotient FZ LLC. If one tries to search for the company, the task becomes onerous. Google is giving some love to the recent IDC study by an “expert” named Dave Schubmehl. As you may know, this is the “professional” who used by information and then sold it on Amazon until July 2014 without paying me for my semi-valuable name. For more on this remarkable approach to professional publishing, see

Also, in Dubai is a tutoring outfit called Knowledge Quotient which delivers home tutoring to the children of parents with disposable income. The company explains that it operates a place where learning makes sense.

Companies in India seem to be taken with the phrase “knowledge quotient.” Consider Chessy Knowledge Quotient Private Limited. In West Bengal, one can find one’s way to Mukherjee Road and engage the founders with regard to an “effective business solution.” See Please, do not confuse Chessy with KnowledgeQ, the company operating as Knowledge Quotient Education Services India Pvt Ltd. in Bangalore. See

What’s the relationship between these companies operating as “knowledge quotient” vendors and search? For me, the appropriation of names and applying them to enterprise search contributes to the low esteem in which many search vendors are held.

Why is Autonomy IDOL such a problem for Hewlett Packard? This is a company that bought a mobile operating system and stepped away from. This is a company that brought out a tablet and abandoned it in a few months. This is a company that wrote off billions and then blamed the seller for not explaining how the business worked. In short, Autonomy, which offers a suite of technology that performs as well or better than any other search system, has become a bit of Florida gar in my view. Autonomy is not a fish. Autonomy is a search and content processing system. When properly configured and resourced, it works as well as any other late 1990s search system. I don’t need meaningless descriptions like “knowledge quotient” to understand that the “problem” with IDOL is little more than HP’s expectations exceeding what a decades old technology can deliver.

Why is Fast Search & Transfer an embarrassment to many who work in the search sector. Perhaps the reason has to do with the financial dealings of the company. In addition to fines and jail terms, the Fast Search system drifted from its roots in Web search and drifted into publishing, smart software, and automatic functions. The problem was that when customers did not pay, the company did not suck it up, fix the software, and renew their efforts to deliver effective search. Nah, Fast Search became associated with a quick sale to Microsoft, subsequent investigations by Norwegian law enforcement, and the culminating decision to ban one executive from working in search. Yep, that is a story that few want to analyze. Search marketers promised and the technology did not deliver, could not deliver given Fast Search’s circumstances.

What about Excalibur/Convera? This company managed to sell advanced search and retrieval to Intel and the NBA. In a short time, both of these companies stepped away from Convera. The company then focused on a confection called “vertical search” based on indexing the Internet for customers who wanted narrow applications. Not even the financial stroking of Allen & Co. could save Convera. In an interesting twist, Fast Search purchased some of Convera’s assets in an effort to capture more US government business. Who digs into the story of Excalibur/Convera? Answer: No one.

What passes for analysis in enterprise search, information retrieval, and content processing is the substitution of baloney for fact-centric analysis. What is the reason that so many search vendors need multiple injections of capital to stay in business? My hunch is that companies like Antidot, Attivio, BA Insight, Coveo, Sinequa, and Palantir, among others, are in the business of raising money, spending it in an increasingly intense effort to generate sustainable revenue, and then going once again to capital markets for more money. When the funding sources dry up or just cut off the company, what happens to these firms? They fail. A few are rescued like Autonomy, Exalead, and Vivisimo. Others just vaporize as Delphes, Entopia, and Siderean did.

When I read a report from a mid tier consulting firm, I often react as if I had swallowed a chunk of Florida gar. An example in my search file is basic information about “The Knowledge Quotient: Unlocking the Hidden Value of Information.” You can buy this outstanding example of ahistorical analysis from, the employer of Dave Schubmehl. (Yep, the same professional who used my research without bothering to issue me a contract or get permission from me to fish with my identity. My attorney, if I understand his mumbo jumbo, says this action was not identity theft, but Schubmehl’s actions between May 2012 and July 2014 strikes me as untoward.)

Net net: I wonder if any of the companies using the phrase “knowledge quotient” are aware of brand encroachment. Probably not. That may be due to the low profile search enjoys in some geographic regions where business appears to be more healthy than in the US.

Can search marketing be compared to Florida gar? I want to think more about this.

Stephen E Arnold, August 1, 2014

Gartner and Enterprise Search 2014

July 31, 2014

At lunch yesterday, several search aware people discussed a July 2014 Gartner study. One of the folks had a crumpled image of the July 2014 “magic quadrant.” This is, I believe, report number G00260831. Like other mid tier consulting firms, Gartner works hard to find something that will hook customers’ and prospects’ attention. The Gartner approach is focused on companies that purport to have enterprise search systems. From my vantage point, the Gartner approach is miles ahead of the wild and illogical IDC report about knowledge, a “quotient,” and “unlocking” hidden value. See Now I have not fallen in love with Gartner. The situation is more like my finding my content and my name for sale on Amazon. You can see what my attorney complained about via this link, I think I was “schubmehled,” not outwitted.

I am the really good looking person. Image source:

What the IDC report lacks in comprehensiveness with regard to vendors, Gartner mentions quite a few companies allegedly offering enterprise search solutions. You must chase down your local Garnter sales person for more details. I want to summarize the points that surfaced in our lunch time pizza fest.

First, the Gartner “study” includes 18 or 19 vendors. Recommind is on the Gartner list even though a supremely confident public relations “professional” named Laurent Ionta insisted that Recommind was not in the July 2014 Gartner report. I called her attention to report number G00260831 and urged her to use her “bulldog” motivation to contact her client and Gartner’s experts to get the information from the horse’s mouth as it were. (Her firm is and its is supported to be the Digital Agency of the Year and on the Inc 5000 list of the fastest growing companies in America.) I am impressed with the accolades she included in her emails to me. The fact that this person who may work on the Recommind account was unaware that Gartner pegged Recommind as a niche player seemed like a flub of the first rank. When it comes to search, not even those in the search sector may know who’s on first or among the chosen 19.

To continue with my first take away from lunch, there were several companies that those at lunch thought should be included in the Gartner “analysis.” As I recall, the companies to which my motley lunch group wanted Gartner to apply their considerable objective and subjective talents were:

  • ElasticSearch. This in my view is the Big Dog in enterprise search at the moment. The sole reason is that ElasticSearch has received an injection of another $70 million to complement the $30 odd million it had previously gather. Oh, ElasticSearch is a developer magnet. Other search vendors should be so popular with the community crowd.
  • Oracle. This company owns and seems to offer Endeca solutions along with RightNow/InQuira natural language processing for enterprise customer support, the fading Secure Enterprise Search system, and still popping and snapping Oracle Text. I did not mention to the lunch crowd that Oracle also owns Artificial Linguistics and Triple Hop technology. This information was, in my view, irrelevant to my lunch mates.
  • SphinxSearch. This system is still getting love from the MySQL contingent. Imagine no complex structured query language syntax to find information tucked in a cell.

There are some other information retrieval outfits that I thought of mentioning, but again, my free lunch group does not know what it does not know. Like many folks who discuss search with me, learning details about search systems is not even on the menu. Even when the information is free, few want to confuse fantasy with reality.

The second take away is that rational for putting most vendors in the niche category puzzled me. If a company really has an enterprise search solution, how is that solution a niche? The companies identified as those who can see where search is going are, as I heard, labeled “visionaries.” The problem is that I am not sure what a search visionary is; for example, how does a French aerospace and engineering firm qualify as a visionary? Was HP a visionary when it bought Autonomy, wrote off $8 billion, and initiated litigation against former colleagues? How does this Google supplied definition apply to enterprise search:

able to see visions in a dream or trance, or as a supernatural apparition?

The final takeaway for me was the failure to include any search system from China, Germany, or Russia. Interesting. Even my down on their heels lunch group was aware of Yandex and its effort in enterprise search via a Yandex appliance. Well, internationalization only goes so far I suppose.

I recall hearing one of my luncheon guests say that IBM was, according the “experts” at Gartner, a niche player.Gentle reader,  I can describe IBM many ways, but I am not sure it is a niche player like Exorbyte (eCommerce mostly) and MarkLogic (XML data management). Nope, IBM’s search embraces winning Jeopardy, creating recipes with tamarind, and curing assorted diseases. And IBM offers plain old search as part of DB2 and its content management products plus some products obtained via acquisition. Cybertap search, anyone? When someone installs, what used to be OmniFind, I thought IBM was providing an enterprise class information retrieval solution. Guess I am wrong again.

Net net: Gartner has prepared the ground for a raft of follow on analyses. I would suggest that you purchase a copy of the July 2014 Gartner search report. You may be able to get your bearings so you can answer these questions:

  1. What are the functional differences among the enterprise search systems?
  2. How does the HP Autonomy “solution” compare to the pre-HP Autonomy solution?
  3. What is the cost of a Google Search Appliance compared to a competing product from Maxxcat or Thunderstone? (Yep, two more vendors not in the Gartner sample.)
  4. What causes a company to move from being a challenger in search to a niche player?
  5. What makes both a printer company and a Microsoft-centric solution qualified to match up with Google and HP Autonomy in enterprise search?
  6. What are the licensing costs, customizing costs, optimizing costs, and scaling costs of each company’s enterprise search solution? (You can find the going rate for the Google Search Appliance at The other 18? Good luck.)

I will leave you to your enterprise search missions. Remember. Gartner, unlike some other mid-tier consulting firms, makes an effort to try to talk about what its consultants perceive as concrete aspects of information retrieval. Other outfits not so much. That’s why I remain confused about the IDC KQ (knowledge quotient) thing, the meaning of hidden value, and unlocking. Is information like a bike padlock?

Stephen E Arnold, July 31, 2014

IHS Enterprise Search: Semantic Concept Lenses Are Here

July 29, 2014

I pointed out in that IDC, a mid tier consulting firm that has marketed my information without permission on Amazon of all places, has rolled out a new report about content processing. The academic sounding title is “The Knowledge Quotient: Unlocking the Hidden Value of Information.” Conflating knowledge and information is not logically satisfying to me. But you may find the two words dusted with “value” just the ticket to career success.

I have not read the report, but I did see a list of the “sponsors” of the study. The list, as I pointed out, was an eclectic group, including huge firms struggling for credibility (HP and IBM) down to consulting firms offering push ups for indexers.

One company on my list caused me to go back through my archive of search information. The firm that sparked my interest is Information Handling Services or IHS or Information Handling Service. The company is publicly traded and turning a decent profit. The revenue of IHS has moved toward $2 billion. If the global economy perks up and the defense sector is funded at pre-drawdown levels, IHS could become a $2 billion company.

IHS is a company with an interesting history and extensive experience with structured and unstructured search. Few of those with whom I interacted when I was working full time considered IHS a competitor to the likes of Autonomy, Endeca, and Funnelback.

In the 2013 10-K on page 20, IHS presents its “cumulative total return” in this way:


The green line looks like money. Another slant on the company’s performance can be seen in a chart available from Google Finance.

The Google chart shows that revenue is moving upwards, but operating margins are drifting downward and operating income is suppressed. Like Amazon, the costs for operating and information centric company are difficult to control. Amazon seems to have thrown in the towel. IHS is managing like the Dickens to maintain a profit for its stakeholders. For stakeholders, is the hope is that hefty profits will be forthcoming?


Source: Google Finance

My initial reaction was, “Is IHS trying to find new ways to generate higher margin revenue?”

Like Thomson Reuters and Reed Elsevier, IHS required different types of content processing plumbing to deliver its commercial databases. Technical librarians and the competitive intelligence professionals monitoring the defense sector are likely to know about IHS different products. The company provides access to standards documents, regulatory information, and Jane’s military hardware information services. (Yep, Jane’s still has access to retired naval officers with mutton chop whiskers and interesting tweed outfits. I observed these experts when I visited the company in England prior to IHS’s purchase of the outfit.)

The standard descriptions of IHS peg the company’s roots with a trade magazine outfit called Rogers Publishing. My former boss at Booz, Allen & Hamilton loved some of the IHS technical services. He was, prior to joining Booz, Allen the head of research at Martin Marietta, an IHS customer in the 1970s. Few remember that IHS was once tied in with Thyssen Bornemisza. (For those with an interest in history, there are some reports about the Baron that are difficult to believe. See

Large professional publishing companies were early, if somewhat reluctant, supporters of SGML and XML. Running a query against a large collection of structured textual information could be painfully slow when one relied on traditional relational database management systems in the late 1980s. Without SGML/XML, repurposing content required humans. With scripts hammering on SGML/XML, creating new information products like directories and reports eliminated the expensive humans for the most part. Fewer expensive humans in the professional publishing business reduces costs…for a while at least.

IHS climbed on the SGML/XML diesel engine and began working to deliver snappy online search results. As profit margins for professional publishers were pressured by increasing marketing and technology costs, IHS followed the path of other information centric companies. IHS began buying content and services companies that, in theory, would give the professional publishing company a way to roll out new, higher margin products. Even secondary players in the professional publishing sector like Ebsco Electronic Publishing wanted to become billion dollar operations and then get even bigger. Rah, rah.

These growth dreams electrify many information company’s executives. The thought that every professional publishing company and every search vendor are chasing finite or constrained markets does not get much attention. Moving from dreams to dollars is getting more difficult, particularly in professional publishing and content processing businesses.

My view is that packaging up IHS content and content processing technology got a boost when IHS purchased the Invention Machine in mid 2012.

Years ago I attended a briefing by the founders of the Invention Machine. The company demonstrated that an engineer looking for a way to solve a problem could use the Invention Machine search system to identify candidate systems and methods from the processed content. I recall that the original demonstration data set was US patents and patent applications. My thought was that an engineer looking for a way to implement a particular function for a system could — if the Invention Machine system worked as presented — could present a patent result set. That result set could be scanned to eliminate any patents still in force. The resulting set of patents might yield a procedure that the person looking for a method could implement without having to worry about an infringement allegation. The original demonstration was okay, but like most “new” search technologies, Invention Machine faced funding, marketing, and performance challenges. IHS acquired Invention Machine, its technologies, its Eastern European developers, and embraced the tagging, searching, and reporting capabilities of the Invention Machine.

The Goldfire idea is that an IHS client can license certain IHS databases (called “knowledge collections”) and then use Goldfire / Invention Machine search and analytic tools to get the knowledge “nuggets” needed to procure a missile guidance component.

The jargon for this finding function is “semantic concept lenses.” If the licensee has content in a form supported by Goldfire, the licensee can search and analyze IHS information along with information the client has from its own sources. A bit more color is available at

The IHS search system is described in terms familiar to a librarian and a technical analyst; for example, here’s the attributes for Goldfire “cloud” from an IHS 2013 news release:

  • “Patented semantic search technology providing precise access to answers in documents. [Note: IHS has numerous patents but it is not clear what specific inventions or assigned inventions apply directly to the search and retrieval solution(s)]
  • Access to more than 90 million scientific and technical “must have” documents curated by IHS. This aggregated, pre-indexed collection spans patents, premium IHS content sources, trusted third-party content providers, and the Deep Web.
  • The ability to semantically index and research across any desired web-accessible information such as competitive or supplier websites, social media platforms and RSS feeds – turning these into strategic knowledge assets.
  • More than 70 concept lenses that promote rapid research, browsing and filtering of related results sets thus enabling engineers to explore a concept’s definitions, applications, advantages, disadvantages and more.
  • Insights into consumer sentiment giving strategy, product management and marketing teams the ability to recognize customer opinions, perceptions, attitudes, habits and expectations – relative to their own brands and to those of their partners’ and competitors’ – as expressed in social media and on the Web.”

Most of these will resonate with those familiar with the assertions of enterprise search and content processing vendors. The spin, which I find notable, is that IHS delivers both content and information retrieval. Most enterprise search vendors provide technology for finding and analyzing data. The licensee has to provide the content unless the enterprise search vendor crawls the Web or other sources, creates an archive or a basic index, and then provides an interface that is usually positioned as indexing “all content” for the user.

According to Virtual Strategy Magazine (which presumably does not cover “real” strategy), I learned that US 8666730:

covers the semantic concept “lenses” that IHS Goldfire uses to accelerate research. The lenses correlate with the human knowledge system, organizing and presenting answers to engineers’ or scientists’ questions – even questions they did not think to ask. These lenses surface concepts in documents’ text, enabling users to rapidly explore a concept’s definitions, applications, advantages, disadvantages and more.

The key differentiator is claimed to move IHS Goldfire up a notch. The write up states:

Unlike today’s textual, question-answering technologies, which work as meta-search engines to search for text fragments by keyword and then try to extract answers similar to the text fragment, the IHS Goldfire approach is entirely unique – providing relevant answers, not lists of largely irrelevant documents. With IHS Goldfire, hundreds of different document types can be parsed by a semantic processor to extract semantic relationships like subject-action-object, cause-and-effect and dozens more. Answer-extraction patterns are then applied on top of the semantic data extracted from documents and answers are saved to a searchable database.

According to Igor Sovpel, IHS Goldfire:

“Today’s engineers and technical professionals are underserved by traditional Internet and enterprise search applications, which help them find only the documents they already know exist,” said Igor Sovpel, chief scientist for IHS Goldfire. “With this patent, only IHS Goldfire gives users the ability to quickly synthesize optimal answers to a variety of complex challenges.”

Is IHS’ new marketing push in “knowledge” and related fields likely to have an immediate and direct impact on the enterprise search market? Perhaps.

There are several observations that occurred to me as I flipped through my archive of IHS, Thyssen, and Invention Machine information.

First, IHS has strong brand recognition in what I would call the librarian and technical analyst for engineering demographic. Outside of lucrative but quite niche markets for petrochemical information or silhouettes and specifications for the SU 35, IHS suffers the same problem of Thomson Reuters and Wolters Kluwer. Most senior managers are not familiar with the company or its many brands. Positioning Goldfire as an enterprise search or enterprise technical documentation/data analysis tool will require a heck of a lot of effective marketing. Will positioning IHS cheek by jowl with IBM and a consulting firm that teaches indexing address this visibility problem? The odds could be long.

Second, search engine optimization folks can seize on the name Goldfire and create some dissonance for IHS in the public Web search indexes. I know that companies like Attivio and Microsoft use the phrase “beyond search” to attract traffic to their Web sites. I can see the same thing happening. IHS competes with other professional publishing companies looking for a way to address their own marketing problems. A good SEO name like “Goldfire” could come under attack and quickly. I can envision lesser competitors usurping IHS’ value claims which may delay some sales or further confuse an already uncertain prospect.

Third, enterprise search and enterprise content analytics is proving to be a difficult market from which to wring profitable, sustainable revenue. If IHS is successful, the third party licensees of IHS data who resell that information to their online customers might take steps to renegotiate contracts for revenue sharing. IHS will then have to ramp up its enterprise search revenues to keep or outpace revenues from third party licensees. Addressing this problem can be interesting for those managers responsible for the negotiations.

Finally, enterprise search has a lot of companies planning on generating millions or billions from search. There can be only one prom queen and a small number of “close but no cigar” runner ups. Which company will snatch the crown?

This IHS search initiative will be interesting to watch.

Stephen E Arnold, July 29, 2014

I2E Semantic Enrichment Unveiled by Linguamatics

July 21, 2014

The article titled Text Analytics Company Linguamatics Boosts Enterprise Search with Semantic Enrichment on MarketWatch discusses the launch of 12E Semantic Enrichment from Linguamatics. The new release allows for the mining of a variety of texts, from scientific literature to patents to social media. It promises faster, more relevant search for users. The article states,

“Enterprise search engines consume this enriched metadata to provide a faster, more effective search for users. I2E uses natural language processing (NLP) technology to find concepts in the right context, combined with a range of other strategies including application of ontologies, taxonomies, thesauri, rule-based pattern matching and disambiguation based on context. This allows enterprise search engines to gain a better understanding of documents in order to provide a richer search experience and increase findability, which enables users to spend less time on search.”

Whether they are spinning semantics for search, or if it is search spun for semantics, Linguamatics has made their technology available to tens of thousands of users of enterprise search. Representative John M. Brimacombe was straightforward in his comments about the disappointment surrounding enterprise search, but optimistic about 12E. It is currently being used by many top organizations, as well as the Food and Drug Administration.

Chelsea Kerwin, July 21, 2014

Sponsored by, developer of Augmentext

13 Career Jeopardizing Enterprise Search Issues

July 6, 2014

The ArnoldIT team has combed through our archive of enterprise search data. We have identified the top 13 surprises that enterprise search delivers to licensees. Get hit with several of these surprises and you might find yourself seeking a future in a different line of work.

13 Users don’t like the new system or the old system, for that matter. Dissatisfaction with enterprise search, regardless of vendor, runs at 55 to 70 percent. See Successful Enterprise Search Management at

12 No one pays attention to search costs until the CFO conducts an audit. Cost overruns plague nine out of 10 enterprise search deployments. The reason is that a comprehensive compilation of costs is not part of an enterprise search deployment. When a system crashes, the costs for emergency and rush work comes from a different line item. Customization is usually taken from a different budget allocation. Consultants and contractors are paid from another budget allocation. When the costs are added up, everyone seems surprised at the money spent for a system few find satisfactory. The hunt for a scapegoat is on.

11 Open source search and proprietary search solutions differ little in costs. Aside from an initial licensing fee, the costs for customization, optimization, contractors, programming, and enhancement are essentially the same.

10 Many major enterprise search systems are now based on open source software. The reason is that the cost for the basic functions are rising and are difficult to control. Therefore, vendors use open source and concentrate on extra cost add ons.

9 Every enterprise search system struggles to process content without human intervention, additional “connectors,” extract transform load activities, and original scripting. When content cannot be acquired, the few who notice will squawk, often loudly.

8 Latency creates a problem because new or changed content imposes significant costs on the licensee to cope with the need to process content in near real time and then refresh the indexes whether these are state of the art in memory systems or old style spinning discs and cached methods. When a user asks, “Why is the system too slow?, it may be difficult to make improvements when budgets are constrained.

7 Modern systems include adds on that permit faceting, query expansion, and linguistic functions. Unless these are “tuned” by subject matter experts or analysts, the outputs can generate irrelevant or off-query outputs. The notion of “smart, automatic” search and retrieval are often chimera.

6 Users typically do not conduct a thorough, research librarian type of investigation of query results. Enterprise search systems that generate laundry lists of results that are stale or irrelevant will be used by the person running the query. The assumption that online systems are “correct” is held by 95 percent of an enterprise system’s users. Only when a user cannot find a document that is supposed to be in the search index will the user realize that the system is not working as assumed. If the issues arises during a crunch, the prudent search manager will polish that résumé.

5 Enterprise search scaling is expensive and complex. The idea that scaling is seamless and economical is false. Improving the “performance” of an enterprise search system requires correct identification of the particular factor or factors creating latency. More frequent updates may not be possible without re-engineering an enterprise search system’s infrastructure. How much has Google’s core method changed in 14 years? What about Amazon? What about Autonomy, Endeca, Lucene, etc.? The answer is, “Not too much.” Search is very, very complicated.

4 Interface does not improve precision and recall of a search system. Interface and cosmetic design changes are easy to talk about and “more fun” to work on that figuring out how to process content more quickly and update the searchable indexes in with significantly less latency. If users grouse, an interface change won’t silence the critics or slow the proliferation of bootleg systems in units that are dissatisfied with the search status quo.

3 Search with text mining functions often rely on standard methods and algorithm  configurations that licensees cannot modify without specialist training. As a result, many systems output results that may be based on assumptions not germane to the licensee’s content. Hence, outputs purporting to provide insight into business intelligence or predictions may be incorrect. Search is not text mining. Search is not a Silver Bullet for Big Data. Search is pretty much type a query and get a laundry lists of stuff that must be reviewed by a human. Automatic reports are often off point.

2 Search appliances are not money savers. The Google Search Appliance costs as much as Autonomy or Endeca to deploy. The cloud is not the big money savers marketers want me to believe it is. Cloud search solutions reduce the need for capital expense, but the on going costs are comparable to on premises solutions. A search appliance may be like handcuffs. The cloud may be overly complicated. No highways leads to the Magic Kingdom for search. If you think search is a slam dunk, you are misinformed.

1 Enterprise search systems are more alike than different. The reason is that computational methods have not changed much since the first commercial systems became available  in the late 1960s. The differences are created by marketing, not by significantly different numerical recipes. Most users cannot differentiate between or among systems. The concepts of precision and recall are unknown. Users believe that search systems are right almost all the time. Yikes.

Stephen E Arnold, July 6, 2014

Presentation by a NoSQL Leader

July 4, 2014

The purported father of NoSQL, Norman T. Kutemperor, made an appearance at this year’s Enterprise Search & Discovery conference, we learn from “Scientel Presented Advanced Big Data Content Management & Search With NoSQL DB at Enterprise Search Summit in NY on May 13” at IT Business Net. The press release states:

“Norman T. Kutemperor, President/CEO of Scientel, presented on Scientels Enterprise Content Management & Search System (ECMS) capabilities using Scientels Gensonix NoSQL DB on May 13 at the Enterprise Search & Discovery 2014 conference in NY. Mr. Kutemperor, who has been termed the Father of NoSQL, was quoted as saying, When it comes to Big Data, advanced content management and extremely efficient searchability and discovery are key to gaining a competitive edge. The presentation focused on: The Power of Content – More power in a NoSQL environment.”

According to the write-up, Kutemperor spoke about the growing need to manage multiple types of unstructured data within a scalable system, noting that users now expect drag-and-drop functionality. He also asserted that any NoSQL system should automatically extract text and build an index that can be searched by both keywords and sentences. Of course, no discussion of databases would be complete without a note about the importance of security, and Kutemperor emphasized that point as well.

The veteran info-tech company Scientel has been in business since 1977. These days, they focus on NoSQL database design; however, it should be noted that they also design and produce optimized, high-end servers to go with their enterprise Genosix platform. The company makes its home in Bingham Farms, Michigan.

Cynthia Murrell, July 04, 2014

Sponsored by, developer of Augmentext

Next Page »