Training Your Smart Search System

August 2, 2014

With the increasing chatter about smart software, I want to call to your attention this article, “Improving the Way Neural Networks Learn.” Keep in mind that some probabilistic search systems have to be trained on content that closely resembles the content the system will index. The training is important, and training can be time consuming. The licensee has to create a training set of data that is similar to what the software will index. Then the training process is run, a human checks the system outputs, and makes “adjustments.” If the training set is not representative, the indexing will be off. If the human makes corrections that are wacky, then the indexing will be off. When the system is turned loose, the resulting index may return outputs that are not what the user expected or the outputs are incorrect. Whether the system uses knows enough to recognize incorrect results varies from human to human.

If you want to have a chat with your vendor regarding the time required to train or re-train a search system relying on sample content, print out this article. If the explanation does not make much sense to you, you can document off query results sets, complain to the search system vendor, or initiate a quick fix. Note that quick fixes involve firing humans believed to be responsible for the system, initiate a new search procurement, or pretend that the results are just fine. I suppose there are other options, but I have encountered these three approach seasoned with either legal action or verbal grousing to the vendor. Even when the automated indexing is tuned within an inch of its life, accuracy is likely to start out in the 85 to 90 percent range and then degrade.

Training can be a big deal. Ignoring the “drift” that occurs when the smart software has been taught or learned something that distorts the relevance of results can produce some sharp edges.

Stephen E Arnold, August 2, 2014

Written by Stephen E. Arnold · Filed Under Indexing, News | Comments Off on Training Your Smart Search System

More Knowledge Quotient Silliness: The Florida Gar of Search Marketing

August 1, 2014

I must be starved for intellectual Florida Gar. Nibble on this fish’s lateral line and get nauseous or dead. Knowledge quotient as a concept applied to search and retrieval is like a largish Florida gar. Maybe a Florida gar left too long in the sun.

Lookin’ yummy. Looks can be deceiving in fish and fishing for information. A happy quack to https://www.flmnh.ufl.edu/fish/Gallery/Descript/FloridaGar/FloridaGar.html

I ran a query on one of the search systems that I profile in my lectures for the police and intelligence community. With a bit of clicking, I unearthed some interesting uses of the phrase “knowledge quotient.”

What surprised me is that the phrase is a favorite of some educators. The use of the term as a synonym for plain old search seems to be one of those marketing moments of magic. A group of “experts” with degrees in home economics, early childhood education, or political science sit around and try to figure out how to sell a technology that is decades old. Sure, the search vendors make “improvements” with ever increasing speed. As costs rise and sales fail to keep pace, the search “experts” gobble a cinnamon latte and innovate.

In Dubai earlier this year, I saw a reference to a company engaged in human resource development. I think this means “body shop,” “lower cost labor,” or “mercenary registry,” but I could be off base. The company is called Knowledge Quotient FZ LLC. If one tries to search for the company, the task becomes onerous. Google is giving some love to the recent IDC study by an “expert” named Dave Schubmehl. As you may know, this is the “professional” who used by information and then sold it on Amazon until July 2014 without paying me for my semi-valuable name. For more on this remarkable approach to professional publishing, see http://wp.me/pf6p2-auy.

Also, in Dubai is a tutoring outfit called Knowledge Quotient which delivers home tutoring to the children of parents with disposable income. The company explains that it operates a place where learning makes sense.

Companies in India seem to be taken with the phrase “knowledge quotient.” Consider Chessy Knowledge Quotient Private Limited. In West Bengal, one can find one’s way to Mukherjee Road and engage the founders with regard to an “effective business solution.” See http://chessygroup.co.in. Please, do not confuse Chessy with KnowledgeQ, the company operating as Knowledge Quotient Education Services India Pvt Ltd. in Bangalore. See http://www.knowledgeq.org.

What’s the relationship between these companies operating as “knowledge quotient” vendors and search? For me, the appropriation of names and applying them to enterprise search contributes to the low esteem in which many search vendors are held.

Why is Autonomy IDOL such a problem for Hewlett Packard? This is a company that bought a mobile operating system and stepped away from. This is a company that brought out a tablet and abandoned it in a few months. This is a company that wrote off billions and then blamed the seller for not explaining how the business worked. In short, Autonomy, which offers a suite of technology that performs as well or better than any other search system, has become a bit of Florida gar in my view. Autonomy is not a fish. Autonomy is a search and content processing system. When properly configured and resourced, it works as well as any other late 1990s search system. I don’t need meaningless descriptions like “knowledge quotient” to understand that the “problem” with IDOL is little more than HP’s expectations exceeding what a decades old technology can deliver.

Why is Fast Search & Transfer an embarrassment to many who work in the search sector. Perhaps the reason has to do with the financial dealings of the company. In addition to fines and jail terms, the Fast Search system drifted from its roots in Web search and drifted into publishing, smart software, and automatic functions. The problem was that when customers did not pay, the company did not suck it up, fix the software, and renew their efforts to deliver effective search. Nah, Fast Search became associated with a quick sale to Microsoft, subsequent investigations by Norwegian law enforcement, and the culminating decision to ban one executive from working in search. Yep, that is a story that few want to analyze. Search marketers promised and the technology did not deliver, could not deliver given Fast Search’s circumstances.

What about Excalibur/Convera? This company managed to sell advanced search and retrieval to Intel and the NBA. In a short time, both of these companies stepped away from Convera. The company then focused on a confection called “vertical search” based on indexing the Internet for customers who wanted narrow applications. Not even the financial stroking of Allen & Co. could save Convera. In an interesting twist, Fast Search purchased some of Convera’s assets in an effort to capture more US government business. Who digs into the story of Excalibur/Convera? Answer: No one.

What passes for analysis in enterprise search, information retrieval, and content processing is the substitution of baloney for fact-centric analysis. What is the reason that so many search vendors need multiple injections of capital to stay in business? My hunch is that companies like Antidot, Attivio, BA Insight, Coveo, Sinequa, and Palantir, among others, are in the business of raising money, spending it in an increasingly intense effort to generate sustainable revenue, and then going once again to capital markets for more money. When the funding sources dry up or just cut off the company, what happens to these firms? They fail. A few are rescued like Autonomy, Exalead, and Vivisimo. Others just vaporize as Delphes, Entopia, and Siderean did.

When I read a report from a mid tier consulting firm, I often react as if I had swallowed a chunk of Florida gar. An example in my search file is basic information about “The Knowledge Quotient: Unlocking the Hidden Value of Information.” You can buy this outstanding example of ahistorical analysis from IDC.com, the employer of Dave Schubmehl. (Yep, the same professional who used my research without bothering to issue me a contract or get permission from me to fish with my identity. My attorney, if I understand his mumbo jumbo, says this action was not identity theft, but Schubmehl’s actions between May 2012 and July 2014 strikes me as untoward.)

Net net: I wonder if any of the companies using the phrase “knowledge quotient” are aware of brand encroachment. Probably not. That may be due to the low profile search enjoys in some geographic regions where business appears to be more healthy than in the US.

Can search marketing be compared to Florida gar? I want to think more about this.

Stephen E Arnold, August 1, 2014

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise search, Marketing, News | 3 Comments

The IHS Invention Machine: US 8,666,730

July 31, 2014

I am not an attorney. I consider this a positive. I am not a PhD with credentials as impressive Vladimir Igorevich Arnold, my distant relative. He worked with Andrey Kolmogorov, who was able to hike in some bare essentials AND do math at the same time. Kolmogorov and Arnold—both interesting, if idiosyncratic, guys. Hiking in the wilderness with some students, anyone?

Now to the matter at hand. Last night I sat down with a copy of US 8,666,730 B2 (hereinafter I will use this shortcut for the patent, 730), filed in an early form in 2009, long before Information Handing Service wrote a check to the owners of The Invention Machine.

The title of the system and method is “Question Answering System and Method Based on Semantic Labeling of Text Documents and User Questions.” You can get your very own copy at www.uspto.gov. (Be sure to check out the search tips; otherwise, you might get a migraine dealing with the search system. I heard that technology was provided by a Canadian vendor, which seems oddly appropriate if true. The US government moves in elegant, sophisticated ways.

Well, 730 contains some interesting information. If you want to ferret out more details, I suggest you track down a friendly patent attorney and work through the 23 page document word by word.

My analysis is that of a curious old person residing in rural Kentucky. My advisors are the old fellows who hang out at the local bistro, Chez Mine Drainage. You will want to keep this in mind as I comment on this James Todhunter (Framingham, Mass), Igor Sovpel (Minsk, Belarus), and Dzianis Pastanohau (Minsk, Belarus). Mr. Todhunter is described as “a seasoned innovator and inventor.” He was the Executive Vice President and Chief Technology Officer for Invention Machine. See http://bit.ly/1o8fmiJ, Linked In at (if you are lucky) http://linkd.in/1ACEhR0, and this YouTube video at http://bit.ly/1k94RMy. Igor Sovpel, co inventor of 730, has racked up some interesting inventions. See http://bit.ly/1qrTvkL. Mr. Pastanohau was on the 730 team and he also helped invent US 8,583,422 B2, “System and Method for Automatic Semantic Labeling of Natural Language Texts.”

The question answering invention is explained this way:

A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.

The idea, as I understand it, is that I can craft a question without worrying about special operators like AND or field labels like CC=. Presumably I can submit this type of question to a search system based on 730 and its related inventions like the automatic indexing in 422.

The references cited for this 2009 or earlier invention are impressive. I recognized Mr. Todhunter’s name, that of a person from Carnegie Mellon, and one of the wizards behind the tagging system in use at SAS, the statistics outfit loved by graduate students everywhere. There were also a number of references to Dr. Liz Liddy, Syracuse University. I associated her with the mid to late 1990s system marketed then as DR LINK (Document Retrieval Linguistic Knowledge). I have never been comfortable with the notion of “knowledge” because it seems to require that subject matter experts and other specialists update, edit, and perform various processes to keep the “knowledge” from degrading into a ball of statistical fuzz. When someone complains that a search system using Bayesian methods returns off point results, I look for the humans who are supposed to perform “training,” updates, remapping, and other synonyms for “fixing up the dictionaries.” You may have other experiences which I assume are positive and have garnered you rapid promotion for your search system competence. For me, maintaining knowledge bases usually leads to lots of hard work, unanticipated expenses, and the customary termination of a scapegoat responsible for the search system.

I am never sure how to interpret extensive listings of prior art. Since I am not qualified to figure out if a citation is germane, I will leave it to you to wade through the full page of US patent, foreign patent documents, and other publications. Who wants to question the work of the primary examiner and the Faegre Baker Daniels “attorney, agent, or firm” tackling 730.

On to the claims. The patent lists 28 claims. Many of them refer to operations within the world of what the inventors call expanded Subject-Action-Object or eSAO. The idea is that the system figures out parts of speech, looks up stuff in various knowledge bases and automatically generated indexes, and presents the answer to the user’s question. The lingo of the patent is sufficiently broad to allow the system to accommodate an automated query in a way that reminded me of Ramanathan Guha’s massive semantic system. I cover some of Dr. Guha’s work in my now out of print monograph, Google Version 2.0, published by one of the specialist publishers that perform Schubmehl-like maneuvers.

My first pass through the 730’s claims was a sense of déjà vu, which is obviously not correct. The invention has been award the status of a “patent”; therefore, the invention is novel. Nevertheless, these concepts pecked away at me with the repetitiveness of the woodpecker outside my window this morning:

Automatic semantic labeling which I interpreted as automatic indexing
Natural language process, which I understand suggests the user takes the time to write a question that is neither too broad nor too narrow. Like the children’s story, the query is “just right.”
Assembly of bits and chunks of indexed documents into an answer. For me the idea is that the system does not generate a list of hits that are probably germane to the query. The Holy Grail of search is delivering to the often lazy, busy, or clueless user an answer. Google does this for mobile users by looking at a particular user’s behavior and the clusters to which the user belongs in the eyes of Google math, and just displaying the location of the pizza joint or the fact that a parking garage at the airport has an empty space.
The system figures out parts of speech, various relationships, and who-does-what-to-whom. Parts of speech tagging has been around for a while and it works as long as the text processed in not in the argot of a specialist group plotting some activity in a favela in Rio.
The system performs the “e” function. I interpreted the “e” to mean a variant of synonym expansion. DR LINK, for example, was able in 1998 to process the phrase white house and display content relevant to presidential activities. I don’t recall how this expansion from bound phrase to presidential to Clinton. I do recall that DR LINK had what might be characterized as a healthy appetite for computing resources to perform its expansions during indexing and during query processing. This stuff is symmetrical. What happens to source content has to happen during query processing in some way.
Relevance ranking takes place. Various methods are in use by search and content processing vendors. Some of based on statistical methods. Others are based on numerical recipes that the developer knows can be computed within the limits of the computer systems available today. No N=NP, please. This is search.
There are linguistic patterns. When I read about linguistic patterns I recall the wild and crazy linguistic methods of Delphes, for example. Linguistics are in demand today and specialist vendors like Bitext in Madrid, Spain, are in demand. English, Chinese, and Russian are widely used languages. But darned useful information is available in other languages. Many of these are kept fresh via neologisms and slang. I often asked my intelligence community audiences, “What does teddy bear mean?” The answer is NOT a child’s toy. The clue is the price tag suggested on sites like eBay auctions.

The interesting angle in 730 is the causal relationship. When applied to processes in the knowledge bases, I can see how a group of patents can be searched for a process. The result list could display ways to accomplish a task. NOTting out patents for which a royalty is required leaves the searcher with systems and methods that can be used, ideally without any hassles from attorneys or licensing agents.

Several questions popped into my mind as I reviewed the claims. Let me highlight three of these:

First, computational load when large numbers of new documents and changed content has to be processed. The indexes have to be updated. For small domains of content like 50,000 technical reports created by an engineering company, I think the system will zip along like a 2014 Volkswagen Golf.

Source: US8666730, Figure 1

When terabytes of content arrived every minute, then the functions set forth in the block diagram for 730 have to be appropriately resourced. (For me, “appropriately resourced” means lots of bandwidth, storage, and computational horsepower.)

Second, the knowledge base, as I thought about when I first read the patent, has to be kept in tip top shape. For scientific, technical, and medical content, this is a more manageable task. However, when processing intercepts in slang filled Pashto, there is a bit more work required. In general, high volumes of non technical lingo become a bottleneck. The bottleneck can be resolved, but none of the solutions are likely to make a budget conscious senior manager enjoy his lunch. In fact, the problem of processing large flows of textual content is acute. Short cuts are put in place and few of those in the know understand the impact of trimming on the results of a query. Don’t ask. Don’t tell. Good advice when digging into certain types of content processing systems.

Third, the reference to databases begs this question, “What is the amount of storage required to reduce index latency to less than 10 seconds for new and changed content?” Another question, “What is the gap that exists for a user asking a mission critical question between new and changed content and the indexes against which the mission critical query is passed?” This is not system response time, which as I recall for DR LINK era systems was measured in minutes. The user sends a query to the system. The new or changed information is not yet in the index. The user makes a decision (big or small, significant or insignificant) based on incomplete, incorrect, or stale information. No big problem is one is researching a competitor’s new product. Big problem when trying to figure out what missile capability exists now in an region of conflict.

My interest is enterprise search. IHS, a professional publishing company that is in the business of licensing access to its for fee data, seems to be moving into the enterprise search market. (See http://bit.ly/1o4FyL3.) My researchers (an unreliable bunch of goslings) and I will be monitoring the success of IHS. Questions of interest to me include:

What is the fully loaded first year cost of the IHS enterprise search solution? For on premises installations? For cloud based deployment? For content acquisition? For optimization? For training?
How will the IHS system handle flows of real time content into its content processing system? What is the load time for 100 terabytes of text content with an average document size of 50 Kb? What happens to attachments, images, engineering drawings, and videos embedded in the stream as native files or as links to external servers?
What is the response time for a user’s query? How does the user modify a query in a manner so that result sets are brought more in line with what the user thought he was requesting?
How do answers make use of visual outputs which are becoming increasingly popular in search systems from Palantir, Recorded Future, and similar providers?
How easy is it to scale content processing and index refreshing to keep pace with the doubling of content every six to eight weeks that is becoming increasingly commonplace for industrial strength enterprise search systems? How much reengineering is required for log scale jumps in content flows and user queries?

Take a look at 730 an d others in the Invention Machine (IHS) patent family. My hunch is that if IHS is looking for a big bucks return from enterprise search sales, IHS may find that its narrow margins will be subjected to increased stress. Enterprise search has never been nor is now a license to print money. When a search system does pump out hundreds of millions in revenue, it seems that some folks are skeptical. Autonomy and Fast Search & Transfer are companies with some useful lessons for those who want a digital Klondike.

Written by Stephen E. Arnold · Filed Under algorithms, News, Search | 2 Comments

IHS Enterprise Search: Semantic Concept Lenses Are Here

July 29, 2014

I pointed out in http://bit.ly/X9d219 that IDC, a mid tier consulting firm that has marketed my information without permission on Amazon of all places, has rolled out a new report about content processing. The academic sounding title is “The Knowledge Quotient: Unlocking the Hidden Value of Information.” Conflating knowledge and information is not logically satisfying to me. But you may find the two words dusted with “value” just the ticket to career success.

I have not read the report, but I did see a list of the “sponsors” of the study. The list, as I pointed out, was an eclectic group, including huge firms struggling for credibility (HP and IBM) down to consulting firms offering push ups for indexers.

One company on my list caused me to go back through my archive of search information. The firm that sparked my interest is Information Handling Services or IHS or Information Handling Service. The company is publicly traded and turning a decent profit. The revenue of IHS has moved toward $2 billion. If the global economy perks up and the defense sector is funded at pre-drawdown levels, IHS could become a $2 billion company.

IHS is a company with an interesting history and extensive experience with structured and unstructured search. Few of those with whom I interacted when I was working full time considered IHS a competitor to the likes of Autonomy, Endeca, and Funnelback.

In the 2013 10-K on page 20, IHS presents its “cumulative total return” in this way:

The green line looks like money. Another slant on the company’s performance can be seen in a chart available from Google Finance.

The Google chart shows that revenue is moving upwards, but operating margins are drifting downward and operating income is suppressed. Like Amazon, the costs for operating and information centric company are difficult to control. Amazon seems to have thrown in the towel. IHS is managing like the Dickens to maintain a profit for its stakeholders. For stakeholders, is the hope is that hefty profits will be forthcoming?

Source: Google Finance

My initial reaction was, “Is IHS trying to find new ways to generate higher margin revenue?”

Like Thomson Reuters and Reed Elsevier, IHS required different types of content processing plumbing to deliver its commercial databases. Technical librarians and the competitive intelligence professionals monitoring the defense sector are likely to know about IHS different products. The company provides access to standards documents, regulatory information, and Jane’s military hardware information services. (Yep, Jane’s still has access to retired naval officers with mutton chop whiskers and interesting tweed outfits. I observed these experts when I visited the company in England prior to IHS’s purchase of the outfit.)

The standard descriptions of IHS peg the company’s roots with a trade magazine outfit called Rogers Publishing. My former boss at Booz, Allen & Hamilton loved some of the IHS technical services. He was, prior to joining Booz, Allen the head of research at Martin Marietta, an IHS customer in the 1970s. Few remember that IHS was once tied in with Thyssen Bornemisza. (For those with an interest in history, there are some reports about the Baron that are difficult to believe. See http://bit.ly/1qIylne.)

Large professional publishing companies were early, if somewhat reluctant, supporters of SGML and XML. Running a query against a large collection of structured textual information could be painfully slow when one relied on traditional relational database management systems in the late 1980s. Without SGML/XML, repurposing content required humans. With scripts hammering on SGML/XML, creating new information products like directories and reports eliminated the expensive humans for the most part. Fewer expensive humans in the professional publishing business reduces costs…for a while at least.

IHS climbed on the SGML/XML diesel engine and began working to deliver snappy online search results. As profit margins for professional publishers were pressured by increasing marketing and technology costs, IHS followed the path of other information centric companies. IHS began buying content and services companies that, in theory, would give the professional publishing company a way to roll out new, higher margin products. Even secondary players in the professional publishing sector like Ebsco Electronic Publishing wanted to become billion dollar operations and then get even bigger. Rah, rah.

These growth dreams electrify many information company’s executives. The thought that every professional publishing company and every search vendor are chasing finite or constrained markets does not get much attention. Moving from dreams to dollars is getting more difficult, particularly in professional publishing and content processing businesses.

My view is that packaging up IHS content and content processing technology got a boost when IHS purchased the Invention Machine in mid 2012.

Years ago I attended a briefing by the founders of the Invention Machine. The company demonstrated that an engineer looking for a way to solve a problem could use the Invention Machine search system to identify candidate systems and methods from the processed content. I recall that the original demonstration data set was US patents and patent applications. My thought was that an engineer looking for a way to implement a particular function for a system could — if the Invention Machine system worked as presented — could present a patent result set. That result set could be scanned to eliminate any patents still in force. The resulting set of patents might yield a procedure that the person looking for a method could implement without having to worry about an infringement allegation. The original demonstration was okay, but like most “new” search technologies, Invention Machine faced funding, marketing, and performance challenges. IHS acquired Invention Machine, its technologies, its Eastern European developers, and embraced the tagging, searching, and reporting capabilities of the Invention Machine.

The Goldfire idea is that an IHS client can license certain IHS databases (called “knowledge collections”) and then use Goldfire / Invention Machine search and analytic tools to get the knowledge “nuggets” needed to procure a missile guidance component.

The jargon for this finding function is “semantic concept lenses.” If the licensee has content in a form supported by Goldfire, the licensee can search and analyze IHS information along with information the client has from its own sources. A bit more color is available at http://bit.ly/WLA2Dp.

The IHS search system is described in terms familiar to a librarian and a technical analyst; for example, here’s the attributes for Goldfire “cloud” from an IHS 2013 news release:

“Patented semantic search technology providing precise access to answers in documents. [Note: IHS has numerous patents but it is not clear what specific inventions or assigned inventions apply directly to the search and retrieval solution(s)]
Access to more than 90 million scientific and technical “must have” documents curated by IHS. This aggregated, pre-indexed collection spans patents, premium IHS content sources, trusted third-party content providers, and the Deep Web.
The ability to semantically index and research across any desired web-accessible information such as competitive or supplier websites, social media platforms and RSS feeds – turning these into strategic knowledge assets.
More than 70 concept lenses that promote rapid research, browsing and filtering of related results sets thus enabling engineers to explore a concept’s definitions, applications, advantages, disadvantages and more.
Insights into consumer sentiment giving strategy, product management and marketing teams the ability to recognize customer opinions, perceptions, attitudes, habits and expectations – relative to their own brands and to those of their partners’ and competitors’ – as expressed in social media and on the Web.”

Most of these will resonate with those familiar with the assertions of enterprise search and content processing vendors. The spin, which I find notable, is that IHS delivers both content and information retrieval. Most enterprise search vendors provide technology for finding and analyzing data. The licensee has to provide the content unless the enterprise search vendor crawls the Web or other sources, creates an archive or a basic index, and then provides an interface that is usually positioned as indexing “all content” for the user.

According to Virtual Strategy Magazine (which presumably does not cover “real” strategy), I learned that US 8666730:

covers the semantic concept “lenses” that IHS Goldfire uses to accelerate research. The lenses correlate with the human knowledge system, organizing and presenting answers to engineers’ or scientists’ questions – even questions they did not think to ask. These lenses surface concepts in documents’ text, enabling users to rapidly explore a concept’s definitions, applications, advantages, disadvantages and more.

The key differentiator is claimed to move IHS Goldfire up a notch. The write up states:

Unlike today’s textual, question-answering technologies, which work as meta-search engines to search for text fragments by keyword and then try to extract answers similar to the text fragment, the IHS Goldfire approach is entirely unique – providing relevant answers, not lists of largely irrelevant documents. With IHS Goldfire, hundreds of different document types can be parsed by a semantic processor to extract semantic relationships like subject-action-object, cause-and-effect and dozens more. Answer-extraction patterns are then applied on top of the semantic data extracted from documents and answers are saved to a searchable database.

According to Igor Sovpel, IHS Goldfire:

“Today’s engineers and technical professionals are underserved by traditional Internet and enterprise search applications, which help them find only the documents they already know exist,” said Igor Sovpel, chief scientist for IHS Goldfire. “With this patent, only IHS Goldfire gives users the ability to quickly synthesize optimal answers to a variety of complex challenges.”

Is IHS’ new marketing push in “knowledge” and related fields likely to have an immediate and direct impact on the enterprise search market? Perhaps.

There are several observations that occurred to me as I flipped through my archive of IHS, Thyssen, and Invention Machine information.

First, IHS has strong brand recognition in what I would call the librarian and technical analyst for engineering demographic. Outside of lucrative but quite niche markets for petrochemical information or silhouettes and specifications for the SU 35, IHS suffers the same problem of Thomson Reuters and Wolters Kluwer. Most senior managers are not familiar with the company or its many brands. Positioning Goldfire as an enterprise search or enterprise technical documentation/data analysis tool will require a heck of a lot of effective marketing. Will positioning IHS cheek by jowl with IBM and a consulting firm that teaches indexing address this visibility problem? The odds could be long.

Second, search engine optimization folks can seize on the name Goldfire and create some dissonance for IHS in the public Web search indexes. I know that companies like Attivio and Microsoft use the phrase “beyond search” to attract traffic to their Web sites. I can see the same thing happening. IHS competes with other professional publishing companies looking for a way to address their own marketing problems. A good SEO name like “Goldfire” could come under attack and quickly. I can envision lesser competitors usurping IHS’ value claims which may delay some sales or further confuse an already uncertain prospect.

Third, enterprise search and enterprise content analytics is proving to be a difficult market from which to wring profitable, sustainable revenue. If IHS is successful, the third party licensees of IHS data who resell that information to their online customers might take steps to renegotiate contracts for revenue sharing. IHS will then have to ramp up its enterprise search revenues to keep or outpace revenues from third party licensees. Addressing this problem can be interesting for those managers responsible for the negotiations.

Finally, enterprise search has a lot of companies planning on generating millions or billions from search. There can be only one prom queen and a small number of “close but no cigar” runner ups. Which company will snatch the crown?

This IHS search initiative will be interesting to watch.

Stephen E Arnold, July 29, 2014

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise search, News, Search, Semantic | 3 Comments

HP Autonomy Opens IDOL APIs to App Developers

July 29, 2014

App developers can now work with HP Autonomy’s Intelligent Data Operating Layer engine through the company’s new API program. We learned about the initiative from eWeek’s, “HP Autonomy’s IDOL OnDemand APIs Nurture Apps Ecosystem.” The piece by Darryl K. Taft presents a slide show with examples of those APIs being put to use. He writes:

“IDOL OnDemand delivers Web service APIs that allow developers to tap into the explosive growth of unstructured information to build a new generation of apps…. IDOL OnDemand APIs include a growing portfolio of APIs within the format conversion, image analysis, indexing, search, and text analysis categories. Through an early access program, hackathons and several TopCoder challenges, some great apps have emerged. During the weekend of June 7-8, developers participated in an IDOL OnDemand Hackathon in San Francisco, where participants built apps using IDOL OnDemand Web service APIs. This slide show covers several of the early apps to emerge from these events. Enterprise developers are also adopting the IDOL OnDemand platform, with big names such as PwC and HP taking advantage of the developer-friendly technology to accelerate their development projects using the API’s.”

See the slide show for a look at 12 of these weekend projects. Developers should then check out the IDOL OnDemand site for more information. Founded in 1996, Autonomy grew from research originally performed at Cambridge University. Their solutions help prominent organizations around the world manage large amounts of data. Tech giant HP famously purchased the company in 2011.

Cynthia Murrell, July 29, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Applications, News | Comments Off on HP Autonomy Opens IDOL APIs to App Developers

Google and Findability without the Complexity

July 28, 2014

Shortly after writing the first draft of Google: The Digital Gutenberg, “Enterprise Findability without the Complexity” became available on the Google Web site. You can find this eight page polemic at http://bit.ly/1rKwyhd or you can search for the title on—what else?—Google.com.

Six years after the document became available, Google’s anonymous marketer/writer raised several interesting points about enterprise search. The document appeared just as the enterprise search sector was undergoing another major transformation. Fast Search & Transfer struggled to deliver robust revenues and a few months before the Google document became available, Microsoft paid $1.2 billion for what was another enterprise search flame out. As you may recall, in 2008, Convera was essentially non operational as an enterprise search vendor. In 2005, Autonomy bought the once high flying Verity and was exerting its considerable management talent to become the first enterprise search vendor to top $500 million in revenues. Endeca was flush with Intel and SAP cash, passing on other types of financial instruments due to the economic downturn. Endeca lagged behind Autonomy in revenues and there was little hope that Endeca could close the gap between it and Autonomy.

Secondary enterprise search companies were struggling to generate robust top line revenues. Enterprise search was not a popular term. Companies from Coveo to Sphinx sought to describe their information retrieval systems in terms of functions like customer support or database access to content stored in MySQL. Vivisimo donned a variety of descriptions, culminating in its “reinvention” as a Big Data tool, not a metasearch system with a nifty on the fly clustering algorithm. IBM was becoming more infatuated with open source search as a way to shift development an bug fixes to a “community” working for the benefit of other like minded developers.

Google’s depiction of the complexity of traditional enterprise search solutions. The GSA is, of course, less complex—at least on the surface exposed to an administrator.

Google’s Findability document identified a number of important problems associated with traditional enterprise search solutions. To Google’s credit, the company did not point out that the majority of enterprise search vendors (regardless of the verbal plumage used to describe information retrieval) were either losing money or engaged in a somewhat frantic quest for financing and sales).

Here are the issues Google highlighted:

User of search systems are frustrated
Enterprise search is complex. Google used the word “daunting”, which was and still is accurate
Few systems handle file shares, Intranets, databases, content management systems, and real time business applications with aplomb. Of course, the Google enterprise search solution does deliver on these points, asserted Google.

Furthermore, Google provides integrated search results. The idea is that structured and unstructured information from different sources are presented in a form that Google called “integrated search results.”

Google also emphasized a personalized experience. Due to the marketing nature of the Findability document, Google did not point out that personalization was a feature of information retrieval systems lashed to an alert and work flow component. Fulcrum Technologies offered a clumsy option for personalization. iPhrase improved on the approach. Even Endeca supported roles, important for the company’s work at Fidelity Investments in the UK. But for Google, most enterprise search systems were not personalizing with Google aplomb.

Google then trotted out the old chestnuts gleaned from a lunch discussion with other Googlers and sifting competitors’ assertions, consultants’ pronouncements, and beliefs about search that seemed to be self-evident truths; for example:

Improved customer service
Speeding innovation
Reducing information technology costs
Accelerating adoption of search by employees who don’t get with the program.

Google concluded the Findability document with what has become a touchstone for the value of the Google Search Appliance. Kimberly Clark, “a global health and hygiene company,” reduced administrative costs for indexing 22 million documents. The costs of the Google Search Appliance, the consultant fees, and the extras like GSA fail over provisions were not mentioned. Hard numbers, even for Google, are not part of the important stuff about enterprise search.

One interesting semantic feature caught my attention. Google does not use the word knowledge in this 2008 document.

Several questions:

Was Google unaware of the fusion of information retrieval and knowledge?
Does the Google Search Appliance deliver a laundry list of results, not knowledge? (A GSA user has to scan the results, click on links, and figure out what’s important to the matter at hand, so the word “knowledge” is inappropriate.)
Why did Google sidestep providing concrete information about costs, productivity, and the value of indexing more content that is allegedly germane to a “personalized” search experience? Are there data to support the implicit assertion “more is better.” Returning more results may mean that the poor user has to do more digging to find useful information. What about a few, on point results? Well, that’s not what today’s technology delivers. It is a fiction about which vendors and customers seem to suspend disbelief.

With a few minor edits—for example, a genuflection to “knowledge—this 2008 Findability essay is as fresh today as it was when Google output its PDF version.

Several observations:

First, the freshness of the Findability paper underscores the staleness and stasis of enterprise search in the past six years. If you scan the free search vendor profiles at www.xenky.com/vendor-profiles, explanations of the benefits and functions of search from the 1980s are also applicable today. Search, the enterprise variety, seems to be like a Grecian urn which “time cannot wither.”

Second, the assertions about the strengths and weaknesses of search were and still are presented without supporting facts. Everyone in the enterprise search business recycles the same cant. The approach reminds me of my experience questioning a member of a sect. The answer “It just is…” is simply not good enough.

Third, the Google Search Appliance has become a solution that costs as much, if not more, than other big dollar systems. Just run a query for the Google Search Appliance on www.gsaadvantage.gov and check out the options and pricing. Little wonder than low cost solutions—whether they are better or worse than expensive systems—are in vogue. Elasticsearch and Searchdaimon can be downloaded without charge. A hosted version is available from Qbox.com and is relatively free of headaches and seven figure charges.

Net net: Enterprise search is going to have to come up with some compelling arguments to gain momentum in a world of Big Data, open source, and once burned twice shy buyers. I wonder why venture / investment firms continue to pump money into what is same old search packaged with decades old lingo.

I suppose the idea that a venture funded operation like Attivio, BA Insight, Coveo, or any other company pitching information access will become the next Google is powerful. The problem is that Google does not seem capable of making its own enterprise search solution into another Google.

This is indeed interesting.

Stephen E Arnold, July 28, 2014

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search | 1 Comment

Surprising Sponsored Search Report and Content Marketing

July 28, 2014

Content marketing hath embraced the mid tier consulting firms. IDC, an outfit that used my information without my permission from 2012 until July 2014, has published a study about “knowledge.” I was not able to view the entire report, but the executive summary was available for download at http://bit.ly/1l10sGH. (Verified at 11 am, July 25, 2014) If you have some extra money, you may want to pay an IDC scale fee to learned about “the knowledge quotient.”

I am looking forward to the full IDC report, which promises to be as amusing as a recent Gartner report about search. The idea of rigorous, original research and an endorsement from a company like McKinsey or Boston Consulting Group is a Holy Grail of marketing. McKinsey and BCG (what I call blue chip firms), while not perfect, are produce client smiles for most of their engagements.

Consulting, however, does not have an American Bar Association or other certification process to “certify” a professional’s capabilities. In fact, at Booz, Allen I learned that Halliburton NUS, a nuclear consulting and services shop, was in the eyes of Booz, Allen a “grateful C.” Booz, Allen, like Bain and SRI, were grade A firms. I figured if I were hired at Booz, Allen I could pick up some A-level attributes. Consultants not trained by one of the blue chip firms had to work harder, smarter, and more effectively. Slack off and the consulting firms lower on the totem pole were unlikely to claw their way to the top. When a consulting firm has been a grade C for decades, it is highly unlikely that the blue chip outfits will worry too much about these competitors.

This IDC particular report 249643ES is funded by whom? The fact that I was able to download the report from one of the companies listed as a “sponsor” suggests that Smartlogic and nine other companies were underwriting the rigorous research. You can download the report (verified at 2 30 pm, July 25, 2014) at this link. Hasten to do it, please.

In the consulting arena, multi-client studies come in different flavors or variants. At Booz, Allen & Hamilton, the 1976 Study of World Economic Change was paid for by a number of large banks. We did not write about these banks. We delivered previously uncollected information in a Booz, Allen package. The boss was William Simon, former secretary of the US treasury. He brought a certain mindset and credibility to our project.

The authors of the IDC report are Dave Schubmehl and Dan Vesset. Frankly I don’t known enough about these “experts” to compare them to William Simon. My hunch is that Mr. Simon’s credentials might have had a bit more credibility. We supplemented the Booz, Allen team with specialists from Claremont College, where Peter Drucker was grooming some quite bright business analysts. In short, the high caliber Booz, Allen professionals, the Claremont College whiz kids, and William Simon combined to generate a report with a substantive information payload.

Based on my review of the Executive Summary of “The Knowledge Quotient,” direct comparisons with the Booz, Allen report or even reports from some of the mid tier firms’ analyses in my files are difficult to make. I can, however, highlight a handful of issues that warrant further consideration. Let’s look at three areas where the information highway may be melting in the summer heat.

1. A Focus on Knowledge and the Notion of a Quotient

I do a for fee column for Knowledge Management Magazine. I want to be candid. I am not sure that I have a solid understanding of what the heck “knowledge” is. I know that a quotient is the result obtained by dividing one number by another number. I am not able to accept that an intangible like “knowledge” can be converted to a numeric output. Lard on some other abstractions like “value” and the entire premise of the report is difficult to take seriously.

Well, quite a few companies did take the idea seriously, and we need to look at the IDC material to get a feel for the results based on a survey of 2,155 organizations and in depth interviews with 11 organizations “discovered.” The fact that there are 11 sponsors and 11 in depth interviews suggests that the sample is not an objective one as far as the interviews are concerned. But I may be wrong. Is that a signal that this IDC report is a marketing exercise dressed up as an objective report?

2. The Old Chestnut Makes an Appearance

A second clue is the inclusion of a matrix that reminded me of an unimaginative variation on the 1970 Boston Consulting Group’s tool. The BCG approach used market share or similar “hard” data about products and business units. A version of the BCG quadrant appears below:

IDC’s “experts” may be able to apply numbers to nebulous concepts. I would not want to try and pull off this legerdemain. The Schubmehl and Vesset version for IDC strikes me a somewhat spongy; for example, how does one create a quotient for knowledge when parameterizing “socialization” or “culture.” Is the association with New Age and pop culture intentional?

3. The Sponsors: An Eclectic Group United by Sponsoring IDC?

The third tip off to the focus of the report are the sponsors themselves. The 11 companies are an eclectic group, including a giant computer services firm (IBM) a handful of small companies with little or no corporate profile, and an indexing company that delivers training, services, and advice.

4. A Glimpse of the Takeaways

Fourth, the Executive Summary highlights what appear to be important takeaways from the year long research effort. For example, KQ leaders have their expectations exceeded presumably because these KQ savvy outfits have licensed one or more of the study sponsors’ products. The Executive Summary references a number of case studies. As you may know, positive case studies about search and content processing are not readily available. IDC promises a clutch of cases.

And IDC on pages iv and v of the Executive Summary uses a bullet list and some jargon to give a glimpse of high KQ outfits’ best practices. The idea is that if content is indexed and searchable, there are some benefits to the companies.

After 50 years, I assume IDC has this type of work nailed. I would point out that IDC used my information in its for fee reports from August 2012 until July 2014. My attorney was successful in getting IDC to stop connecting my name and that of my researchers with one of IDC’s top billing analysts. I find surfing on my content and name untoward. But again there are substantive differences between blue chip consulting firms and those lower on the for fee services totem pole.

I wonder if the full report will contain positive profiles of the sponsoring organizations. Be prepared to pay a lot for this “knowledge quotient” report. On the other hand, some of the sponsors may provide you with a copy if you have a gnawing curiosity about the buzzwords and jargon the report embraces; for example, analytics,

Some potential reader will have to write a big check. For example, to get one of the IDC reports with my name on it from 2012 to July 2014, the per report price was $3,500. I would not be surprised if the sticker for this KQ report is even higher. Based on the Executive Summary, KQ looks like a content marketing play. The “inclusions” are the profiles of the sponsors.

I will scout around for the Full Monty, and I hope it is fully clothed and buttoned up. Does IDC have a William Simon to ride herd on its “experts”? From my experience, IDC’s rigorousness is quite different. For example, IDC’s Dave Schubmehl used my information and attached himself to my name. Is this the behavior of a blue chip?

Stephen E Arnold, July 28, 2014

Written by Stephen E. Arnold · Filed Under Business strategy, Consulting, News | 1 Comment

Pre Oracle InQuira: A Leader in Knowledge Assessment?

July 28, 2014

Oracle purchased InQuira in 2011. One of the writers for Beyond Search reminded me that Beyond Search covered the InQuira knowledge assessment marketing ploy in 2009. You can find that original article at http://bit.ly/WYYvF7.

InQuira’s technology is an option in the Oracle RightNow customer support system. RightNow was purchased by Oracle in 2001. For those who are the baseball card collectors of enterprise search, you know that RightNow purchased Q-Go technology to make its customer support system more intuitive, intelligent, and easier to use. (Information about Q-Go is at http://bit.ly/1nvyW8G.)

InQuira’s technology is not cut from a single chunk of Styrofoam. InQuira was formed in 2002 by fusing the Answerfriend, Inc. and Electric Knowledge, Inc. systems. InQuira was positioned as a question answering system. For years, Yahoo relied on InQuira to deliver answers to Yahooligans seeking help with Yahoo’s services. InQuira also provided the plumbing to www.honda.com. InQuira hopped on the natural language processing bandwagon and beat the drum until it layered on “knowledge” as a core functionality. The InQuira technology was packaged as a “semantic processing engine.”

InQuira used its somewhat ponderous technology along with AskJeeves’ type short cuts to improve the performance of its system. The company narrowed its focus from “boil the ocean search” to a niche focus. InQuira wanted to become the go to system for help desk applications.

InQuira’s approach involved vocabularies. These were similar to the “knowledge bases” included with some versions of Convera. InQuira, according to my files, used the phrase “loop of incompetence.” I think the idea was that traditional search systems did not allow a customer support professional to provide an answer that would make customers happy the majority of the time. InQuira before Oracle emphasized that its system would provide answers, not a list of Google style hits.

The InQuira system can be set up to display a page of answers in the form of sentences snipped from relevant documents. The idea is that the InQuira system eliminates the need for a user to review a laundry list of links.

The word lists and knowledge bases require maintenance. Some tasks can be turned over to scripts, but other tasks require the ministrations of a human who is a subject matter expert or a trained indexer. The InQuira concept knowledge bases also requires care and feeding to deliver on point results. I would point out that this type of knowledge care is more expensive than a nursing home for a 90 year old parent. A failure to maintain the knowledge bases usually results in indexing drift and frustrated users. In short, the systems are perceived as not working “like Google.”

Why is this nitty gritty important? InQuira shifted from fancy buzzwords as the sharp end of its marketing spear to the more fuzzy notion of knowledge. The company, beginning in late 2008, put knowledge first and the complex, somewhat baffling technology second. To generate sales leads, InQuira’s marketers hit on the idea of a “knowledge assessment.”

The outcome of the knowledge marketing effort was the sale of the company to Oracle in mid 2011. At the time of the sale, InQuira had an adaptor for Oracle Siebel. Oracle appears to have had a grand plan to acquire key customer support search and retrieval functionality. Armed with technology that was arguably better than the ageing Oracle SES system, Oracle could create a slam dunk solution for customer support applications.

Since the application, many search vendors have realized that some companies were not ready to write a Moby Dick sized check for customer support search. Search vendors adopted the lingo of InQuira and set out to make sales to organizations eager to reduce the cost of customer support and avoid the hefty license fees some vendors levied.

What I find important about InQuira are:

It is one of the first search engines to be created by fusing two companies that were individually not able to generate sustainable revenue
InQuira’s tactic to focus on customer support and then add other niche markets brought more discipline to the company’s message than the “one size fits all” that was popular with Autonomy and Fast Search.
InQuira figured out that search was not a magnetic concept. The company was one of the first to explain its technology, benefits, and approach in terms of a nebulous concept; that is, knowledge. Who knows what knowledge is, but it does seem important, right?
The outcome of InQuira’s efforts made it possible for stakeholders to sell the company to Oracle. Presumably this exist was a “success” for those who divided up Oracle’s money.

Net net: Shifting search and content processing to knowledge is a marketing tactic. Will it work in 2014 when search means Google? Some search vendors who have sold their soul to venture capitalists in exchange for millions of jump start dollars hope so.

My thought is that knowledge won’t sell information retrieval. Once a company installs a search systems, users can find what they need or not. Fuzzy does not cut it when users refuse to use a system, scream for a Google Search Appliance, or create a work around for a doggy system.

Stephen E Arnold, July 28, 2014

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search | Comments Off on Pre Oracle InQuira: A Leader in Knowledge Assessment?

Sponsors of Two Content Marketing Plays

July 27, 2014

I saw some general information about allegedly objective analyses of companies in the search and content processing sector.

The first report comes from the Gartner Group. The company has released its “magic quadrant” which maps companies by various allegedly objective methods into leaders, challengers, niche players, and visionaries.

The most recent analysis includes these companies:

Attivio
BA Insight
Coveo
Dassault Exalead
Exorbyte
Expert System
Google
HP Autonomy IDOL
IBM
HIS
Lucid Works
MarkLogic
Mindbreeze
Perceptive ISYS Search
PolySpot
~~Recommind~~
Sinequa

There are several companies in the Gartner pool whose inclusion surprises me. For example, Exorbyte is primarily an eCommerce company with a very low profile in the US compared to Endeca or New Zealand based SLI Systems. Expert System is a company based in Italy. This company provides semantic software which I associated with mobile applications. IHS (International Handling Service) provides technical information and a structured search system. MarkLogic is a company with XML data management software that has landed customers in publishing and the US government. With an equally low profile is Mindbreeze, a home brew search system funded by Microsoft-centric Fabasoft. Dassault Exalead, PolySpot, and Sinequa are French companies offering what I call “information infrastructure.” Search is available, but the approach is digital information plumbing.

The IDC report, also allegedly objective, is sponsored by nine companies. These outfits are:

Attivio
Coveo
Earley & Associates
HP Autonomy IDOL
IBM
IHS
Lexalytics
Sinequa
Smartlogic

This collection of companies is also eclectic. For example, Earley & Associates does indexing training, consulting, and does not have a deep suite of enterprise software. IHS (International Handling Services) appears in the IDC report as a knowledge centric company. I think I understand the concept. Technical information in Extensible Markup Language and a mainframe-style search system allow an engineer to locate a specification or some other technical item like the SU 25. Lexalytics is a sentiment analysis company. I do not consider figuring out if a customer email is happy or sad the same as Coveo’s customer support search system. Smartlogic is interesting because the company provides tools that permit unstructured content to be indexed. Some French vendors call this process “fertilization.” I suppose that for purists, indexing might be just as good a word.

What unifies these two lists are the companies that appear in both allegedly objective studies:

Attivio
Coveo
HP
IBM
IHS (International Handling Service)
Sinequa

My hunch is that the five companies appearing in both lists are in full bore, pedal to the metal marketing mode.

Attivio and Coveo have ingested tens of millions in venture funding. At some point, investors want a return on their money. The positioning of these two companies’ technologies as search and the somewhat unclear knowledge quotient capability suggest that implicit endorsement by mid tier consulting firms will produce sales.

The appearance of HP and IBM on each list is not much of a surprise. The fact that Oracle Endeca is not in either report suggests that Oracle has other marketing fish to fry. Also, Elasticsearch, arguably the game changer in search and content processing, is not in either pool may be evidence that Elasticsearch is too busy to pursue “expert” analysts laboring in the search vineyard. On the other hand, Elasticsearch may have its hands full dealing with demands of developers, prospects, and customers.

IHS has not had a high profile in either search or content processing. The fact that International Handling Services appears signals that the company wants to market its mainframe style and XML capable system to a broader market. Sinequa appears comfortable with putting forth its infrastructure system as both search and a knowledge engine.

I have not seen the full reports from either mid tier consulting firm. My initial impression of the companies referenced in the promotional material for these recent studies is that lead generation is the hoped for outcome of inclusion.

Other observations I noted include:

The need to generate leads and make sales is putting multi-company reports back on the marketing agenda. The revenue from these reports will be welcomed at IDC and Gartner I expect. The vendors who are on the hook for millions in venture funding are hopeful that inclusion in these reports will shake the money trees from Boston to Paris.
The language used to differentiate and describe the companies referenced in these two studies is unlikely to clarify the differences between similar companies or make clear the similarities. From my point of view, there are few similarities among the companies referenced in the marketing collateral for the IDC and Gartner study.
The message of the two reports appears to be “these companies are important.” My thought is that because IDC and Gartner assume their brand conveys a halo of excellence, the companies in these reports are, therefore, excellent in some way.

Net net: Enterprise search and content processing has a hurdle to get over: Search means Google. The companies in these reports have to explain why Google is not the de facto choice for enterprise search and then explain how a particular vendor’s search system is better, faster, cheaper, etc.

For me, a marketer or search “expert” can easily stretch search to various buzzwords. For some executives, customer support is not search. Customer support uses search. Sentiment analysis is not search. Sentiment analysis is a signal for marketers or call center managers. Semantics for mobile phones, indexing for SharePoint content, and search for a technical data sheet are quite different from eCommerce, business intelligence, and business process engineering.

A fruit cake is a specific type of cake. Each search and content processing system is distinct and, in my opinion, not easily fused into the calorie rich confection. A collection of systems is a lumber room stuffed with different objects that don’t have another place in a household.

The reports seem to make clear that no one in the mid tier consulting firms or the search companies knows exactly how to position, explain, and verify that content processing is the next big thing. Is it?

Maybe a Google Search Appliance is the safe choice? IBM Watson does recipes, and HP Autonomy connotes high profile corporate disputes.

Elasticsearch, anyone?

Stephen E Arnold, July 27, 2014

Written by Stephen E. Arnold · Filed Under Marketing, News, Search | 3 Comments

Search and Data-Starved Case Studies

July 19, 2014

LinkedIn discussions fielded a question about positive search and content processing case studies. I posted a link to a recent paper from Italy (you can find the url at this link).

My Overflight system spit out another case study. The publisher is Hewlett Packard and the example involves Autonomy. The problem concerns the UK’s National Health Service” and its paperless future. You can download the four page document at http://bit.ly/1wIsifS.

The Italian case study focuses on cheerleading for the Google Search Appliance. The HP case study promotes the Autonomy IDOL system applied to medical records.

the HP Autonomy document caught my attention because it uses a buzzword I first heard at Booz, Allen & Hamilton in 1978. Harvey Poppel, then a BAH partner, coined the phrase. The idea caught on. Mr. Poppel, who built a piano, snagged some ink in Business Week. That was a big deal in the late 1970s. Years later I met Alan Siegel, a partner at a New York design firm. He was working on promotion of the Federal government’s paperless initiative. About 10 years ago, I spent some time with Forrest (Woody) Horton, who was a prominent authority on the paperless office. Across the decades, talk about paperless offices generated considerable interest. These interactions about paperless environments have spanned 36 years. Paper seems to be prevalent wherever I go.

When I read the HP Autonomy case study, I thought about the efforts of some quite bright individuals directed at eliminating hard copy documents. There are reports, studies, and analyses about the problems of finding information in paper. I expected a reference to hard data or some hard data. The context for the paperless argument would have captured my attention.

The HP Autonomy case study talks about an integrator’s engineers using IDOL to build a solution. The product is called Evolve and:

It sued 28 years of information management expertise to improve efficiency, productivity and regulatory compliance. The IDOL analytics engine was co-opted into Evolve because it automatically ingests and segments medical records and documents according to their content and concepts, making it easier to find and analyze specific information.

The wrap up of the case study is a quote that is positive about the Kainos Evolve system. No big surprise.

After reading the white paper, three thoughts crossed my mind.

First, the LinkedIn member seeking positive search and content processing case studies might not find the IDOL case study particularly useful. The information is more of an essay from an ad agency generated in-house magazine.

Second, the LinkedIn person wondered why there were so few positive case studies about successful search and content processing installations. I think there are quite a few white papers, case studies, and sponsored content marketing articles crafted along the lines of the HP Autonomy case study. The desire to give the impression that the product encounters no potholes scrubs out the details so useful to a potential licensee.

Third, the case study describes a mandated implementation. So the Evolve product is in marketing low gear. The enthusiasm for implementing a new product shines brightly. Does the glare from the polish obscure a closer look.

At a minimum, I would have found the following information helpful even if presented in bullet points or tabular form:

What was the implementation time? What days, weeks, or months of professional work were required to get the system up and running?
What was the project’s initial budget? Was the project completed within the budget parameters?
What is the computing infrastructure required for the installation? Was the infrastructure on premises, cloud, or hybrid?
What is the latency in indexing and query processing?
What connectors were used “as is”? Were new connectors required? If yes, how long did it take to craft a functioning connector?
What training did users of the system require?

Information at this level of detail is difficult to obtain. In my experience, most search and content processing systems require considerable attention to detail. Take a short cut, and the likelihood of an issue rises sharply.

Obviously neither the vendor nor the licensee want information about schedule shifts, cost over or under- runs and triage expenses to become widely known. The consequence of this jointly enforced fact void helps create case studies that are little more than MBA jargon.

Little wonder the LinkedIn member’s plea went mostly ignored. Paper is unlikely to disappear because lawyers thrive on hard copies. When litigation ensues, the paperless office and the paperless medical practice becomes a challenge.

Stephen E Arnold, July 19, 2014

Written by Stephen E. Arnold · Filed Under Marketing, News, Search | Comments Off on Search and Data-Starved Case Studies

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Training Your Smart Search System

More Knowledge Quotient Silliness: The Florida Gar of Search Marketing

The IHS Invention Machine: US 8,666,730

IHS Enterprise Search: Semantic Concept Lenses Are Here

HP Autonomy Opens IDOL APIs to App Developers

Google and Findability without the Complexity

Surprising Sponsored Search Report and Content Marketing

Pre Oracle InQuira: A Leader in Knowledge Assessment?

Sponsors of Two Content Marketing Plays

Search and Data-Starved Case Studies

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta