LinkedIn Enterprise Search: Generalizations Abound
November 11, 2014
Three or four days ago I received a LinkedIn message that a new thread had been started on the Enterprise Search Engine Professionals group. You will need to be a member of LinkedIn and do some good old fashioned brute force search to locate the thread with this headline, “Enterprise Search with Chinese, Spanish, and English Content.”
The question concerned a LinkedIn user information vacuum job. A member of the search group wanted recommendations for a search system that would deliver “great results with content outside of English.” Most of the intelligence agencies have had this question in play for many years.
The job hunters, consultants, and search experts who populate the forum do not step forth with intelligence agency type responses. In a decision making environment when inputs in a range of language are the norm for risk averse, the suggestions offered to the LinkedIn member struck me as wide of the mark. I wouldn’t characterize the answers as incorrect. Uninformed or misinformed are candidate adjectives, however.
One suggestion offered to the questioner was a request to define “great.” Like love and trust, great is fuzzy and subjective. The definition of “great”, according the expert asking the question, boils down to “precision, mainly that the first few results strike the user as correct.” Okay, the user must perceive results as “correct.” But as ambiguous as this answer remains, the operative term is precision.
In search, precision is not fuzzy. Precision has a definition that many students of information retrieval commit to memory and then include in various tests, papers, and public presentations. For a workable definition, see Wikipedia’s take on the concept or L. Egghe’s “The Measures Precision, Recall, Fallout, and Miss As a function of the Number of Retrieved Documents and Their Mutual Interrelations, Universiiteit Antwerp, 2000.
In simple terms, the system matches the user’s query. The results are those that the system determines containing identical or statistically close results to the user’s query. Old school brute force engines relied on string matching. Think RECON. More modern search systems toss in term matching after truncation, nearness of the terms used in the user query to the occurrence of terms in the documents, and dozens of other methods to determine likely relevant matches between the user’s query and the document set’s index.
With a known corpus like ABI/INFORM in the early 1980s, a trained searcher testing search systems can craft queries for that known result set. Then as the test queries are fed to the search system, the results can be inspected and analyzed. Running test queries was an important part of our analysis of a candidate search system; for example, the long-gone DIALCOM system or a new incarnation of the European Space Agency’s system. Rigorous testing and analysis makes it easy to spot dropped updates or screw ups that routinely find their way into bulk file loads.
Our rule of thumb was that if an ABI/INFORM index contained a term, a high precision result set on SDC ORBIT would include a hit with that term in the respective hit. If the result set did not contain a match, it was pretty easy to pinpoint where the indexing process started dropping files.
However, when one does not know what’s been indexed, precision drifts into murkier areas. After all, how can one know if a result is on point if one does not know what’s been indexed? One can assume that a result set is relevant via inspection and analysis, but who has time for that today. That’s the danger in the definition of precision in what the user perceives. The user may not know what he or she is looking for. The user may not know the subject area or the entities associated consistently with the subject area. Should anyone be surprised when the user of a system has no clue what a system output “means”, whether the results are accurate, or whether the content is germane to the user’s understanding of the information needed.
Against this somewhat drab backdrop, the suggestions offered to the LinkedIn person looking for a search engine that delivers precision over non-English content or more accurately content that is not the primary language of the person doing a search are revelatory.
Here are some responses I noted:
- Hire an integrator (Artirix, in this case) and let that person use the open source Lucene based Elasticsearch system to deliver search and retrieval. Sounds simplistic. Yep, it is a simple answer that ignores source language translation, connectors, index updates, and methods for handling the pesky issues related to how language is used. Figuring out what a source document in an language with which the user is not fluent is fraught with challenges. Forget dictionaries. Think about the content processing pipeline. Search is almost the caboose at the end of a very long train.
- Use technology from LinguaSys. This is a semantic system that is probably not well known outside of a narrow circle of customers. This is a system with some visibility within the defense sector. Keep in mind that it performs some of the content processing functions. The technology has to be integrated into a suitable information retrieval system. LinguaSys is the equivalent of adding a component to a more comprehensive system. Another person mentioned BASIS Technologies, another company providing multi language components.
- Rely on LucidWorks. This is an open source search system based on SOLR. The company has spun the management revolving door a number of times.
- License Dassault’s Exalead system. The idea is wroth considering, but how many organizations are familiar with Exalead or willing to embrace the cultural approach of France’s premier engineering firm. After years of effort, Exalead is not widely known in some pretty savvy markets. But the Exalead technology is not 100 percent Exalead. Third party software delivers the goods, so Exalead is an integrator in my view.
- Embrace the Fast Search & Transfer technology, now incorporated into Microsoft SharePoint. Unmentioned is the fact that Fast Search relied on a herd of human linguists in Germany and elsewhere to keep its 1990s multi lingual system alive and well. Fast Search, like many other allegedly multi lingual systems, rely on rules and these have to be written, tweaked, and maintained.
So what did the LinkedIn member learn? The advice offers one popular approach: Hire an integrator and let that company deliver a “solution.” One can always fire an integrator, sue the integrator, or go to work for the integrator when the CFO tries to cap the cost of system that must please a user who may not know the meaning of nus in Japanese from a now almost forgotten unit of Halliburton.
The other approach is to go open source. Okay. Do it. But as my analysis of the Danish Library’s open source search initiative in Online suggested, the work is essentially never done. Only a tolerant government and lax budget oversight makes this avenue feasible for many organizations with a search “problem.”
The most startling recommendation was to use Fast Search technology. My goodness. Are there not other multi lingual capable search systems dating from the 1990s available? Autonomy, anyone?
Net net: The LinkedIn enterprise search threads often underscore one simple fact:
Enterprise search is assumed to be one system, an app if you will.
One reason for the frequent disappointment with enterprise search is this desire to buy an iPad app, not engineer a constellation of systems that solve quite specific problems.
Stephen E Arnold,November 11, 2014
Enterprise Search: Essentially Marginalized to Good Enough
November 9, 2014
I use Google Trends to see what’s hot and what’s not in the world of information retrieval. If you want to use the free version of Google Trends, navigate to http://www.google.com/trends/ and explore. That’s some of what Google does to make decisions about how much of Larry Page’s “wood” to put behind the Google Search Appliance eight ball.
I plugged in “enterprise search.” When one allows Google to output its version of the popularity of the term, you get this graph. It shows a downward trend but the graph is without much context. The pale lettering does not help. Obviously Googlers do not view the world through trifocals with 70 year old eyes. Here’s the Trends’ output for “enterprise search”:
Now let’s add some context. From the “enterprise search” Trends’ output, click the pale blue plus and add this with quotes: “big data.” Here’s the output for this two factor analysis:
One does not have to be an Ivy League data scientist to see the difference between the hackneyed “enterprise search” and more zippy but meaningless “Big Data.” I am not saying Big Data solutions actually work. What’s clear is that pushing enterprise search is not particularly helpful when the Trends’ reveal a flat line for years, not hours, not days, not months–years.
I think it is pretty clear why I can assert with confidence that “enterprise search” appears to be a non starter. I know why search vendors persist in telling me what “enterprise search” is. The vendors are desperate to find the grip that a Tupinambis lizard possesses. Instead of clinging to a wall in the sun at 317 R. Dr. Emílio Ribas (Cambui) (where I used to live in Campinas, SP), the search vendors are clinging to chimera. The goal is to make sales, but if the Google data are even sort of correct, enterprise search is flat lining.
Little wonder that consultant reports like those from the mid tier crowd try to come up with verbiage that will create sales leads for the research sponsors; case in point, knowledge quotient. See Meme of the Moment for a fun look at IDC’s and search “expert” Dave Schubmehl’s most recent attempt to pump up the music.
The question is, “What is generating revenue?” In a sense, excitement surrounds vendors who deliver solutions. These include search, increasingly supplied by open source software. Elasticsearch is zipping along, but search is not the main dish. Search is more like broccoli or carrots.
The good news is that there is a group of companies, numbering about 30, which have approached search differently. As a result, many of these companies are growing and charting what I call “next generation search.”
Want to know more? Well, that’s good. Watch for my coverage of this sector in the weeks and months ahead. I will toss a small part of our research into my November Information Today column. A tiny chunk. Keep that in mind.
In the meantime, think critically about the craziness flowing from many mid tier or azure chip consulting firms. Those “outputs” are marketing, self aggrandizing, and, for me, downright silly. What’s that term for doing trivial actions again and again?
Stephen E Arnold, November 9, 2014
Enterprise Search: Is It Really a Loser?
November 5, 2014
I read “Enterprise Search: Despite Benefits, Few Organizations Use Enterprise Search.” The headline caught my attention. In my experience, most organizations have information access systems. Let me give you several recent examples:
- US government agency. This agency licenses technology from a start up called Red Owl Analytics. That system automatically gathers and makes available information pertinent to the licensing agency. One of the options available to the licensee is to process information that is available within the agency. The system generates outputs and there are functions that allow a user to look for information. I am reasonably confident that the phrase “enterprise search” would not be applied to this company’s information access system. Because Red Owl fits into a process for solving a business problem, the notion of “enterprise search” would be inappropriate.
- Small accounting firm. This company uses Microsoft Windows 7. The six person staff uses a “workgroup” method that is easy to set up and maintain. The Windows 7 user can browse the drives to which access has been granted by the part time system administrator. When a person needs to locate a document, the built in search function is used. The solution is good enough. I know that when Windows-centric, third party solutions were made known to the owner, the response was, “Windows 7 search is good enough.”
- Large health care company with dozens of operating units. The company has been working to integrate certain key systems. The largest on-going project is deploying a electronic health care system. Each of the units has legacy search technology. The most popular search systems are those built into applications used every day. Database access is provided by these applications. One unit experimented with a Google Appliance and found that it was useful to the marketing department. Another unit has a RedDot content management system and has an Autonomy stub. The company has no plans, as I understand it, to make federated enterprise search a priority. There is no single reason. Other projects have higher priority and include a search function.
If my experience is representative (and I am not suggesting what I have encountered is what you will encounter), enterprise search is a tough sell. When I read this snippet, I was a bit surprised:
Enterprise search tools are expected to improve and that may improve uptake of the technology. Steven Nicolaou, Principal Consultant at Microsoft, commented that “enterprise search products will become increasingly and more deeply integrated with existing platforms, allowing more types of content to be searchable and in more meaningful ways. It will also become increasingly commoditized, making it less of a dark art and more of a platform for discovery and analysis.”
What this means is that the company that provides “good enough” search baked into an operating system (think Windows) or an application (think about the search function in an electronic health record), there will be little room for a third party to land a deal in most cases.
The focus in enterprise search has been off the mark for many years. In fact, today’s vendors are recycling the benefits and features hawked 30 years ago. I posted a series of enterprise search vendor profiles at www.xenky.com/vendor-profiles. If you work through that information, you will find that the marketing approaches today are little more than demonstrations of recycling.
The opportunity in information access has shifted. The companies making sales and delivering solid utility to licensees are NOT the companies that beat the drum for customer support, indexing, and federated search.
The future belongs to information access systems that fit into mission critical business processes. Until the enterprise search vendors embrace a more innovative approach to information access, their future looks a bit cloudy.
In cooperation with Telestrategies, we may offer a seminar that talks about new directions in information access. The key is automation, analytics, and outputs that alert, not the old model of fiddling with a query until the index is unlocked and potentially useful information is available.
If you want more information about this invitation only seminar, write me at seaky2000 at yahoo dot com, and I will provide more information.
Stephen E Arnold, November 5, 2014
New IBM Redbook: IBM Watson Enterprise Search and Analytics
October 12, 2014
The Redbook is free. You can download it from this IBM link for now. The full title is “IBM Watson Content Analytics. Discovering Actionable Insight from Your Content.”
The Redbook weighs in with 598 pages of Watson goodness. If you follow the IBM content analytics products, you may know that the previous version was know as IBM Content Analytics with Enterprise Search or (ICAwES).
The Redbook presents some philosophical content. IBM has a tradition to uphold. In addition, the Redbook provides information about facets (yep, good old metadata), some mathy features that make analytics analytical, and sentiment analysis.
ICAwES does not operate as an island. The sprawling system can hook into IBM’s semi automatic classification system, Cognos, and interface tools.
Is ICAwES an “enterprise search” system? I would say, “Sure is.” You will have to work through the Redbook and draw your own conclusions. You will also want to identify the Watson component. Watson is Lucene with IBM scripts and wrappers, but IBM has far more colorful lingo for describing the system. After all, IBM Watson is supposed to generate $1 billion in a snappy manner. If IBM’s plan bears revenue fruit, in five or six years, Watson will be a $10 billion per year business. That’s quite a goal, considering Autonomy required 13 years to push into $800 million in revenue territory and IBM has been offering information retrieval systems since the days of STAIRS.
The new information in the July 2014 edition of the Redbook adds a chapter containing some carefully selected case studies. There is a new chapter called “Enterprise Search” to which I will return in a moment. Also, the many authors of the Redbook have added to the discussion of Cognos, one of IBM’s business intelligence systems. Finally, the Redbook provides some helpful suggestions for “customizing and extending the content analytics miner.”
I urge you to work through this volume because it provides a useful yardstick against which to measure the IBM Watson marketing and public relations explanations against the reality, limitations, and complexity of the IBM Content Analytics system. Is the Redbook describing a product or a collection of components that an IBM implementation team will use to craft a customized solution?
The chapter on Enterprise Search begins on page 445 and continues to page 486. The solution is a two part affair. On one hand, processed content will output data about the entities, word frequencies, and similar metrics in the corpus and updates to the corpus. On the other hand, ICAwES is a search and retrieval system. Many vendors take this approach today; however, certain types of content cannot be comprehensively processed by the system. Examples range from video content, engineering drawings, digital imagery, and certain types of ephemeral content such as text messages sent via an ad hoc Bluetooth mesh network. One can code up a fix, but that is likely to be more hassle than many licensees will tolerate.
The Redbook shows some ready-to-use interfaces. These can, of course, be modified. The sample in the screenshot below looks quite a bit like the original Fulcrum Technologies’ presentation of information processed by the system. A more modern implementation would be Amazon’s recent JSON centric system for content.
ICAwES Redbook, Copyright IBM 2014.
The illustration shows a record viewed by tags; for example categories. Items can be tallied in a chart that provides a summary of how many content objects share a particular index terms. The illustration shows the ICAwES identifying terms in a user’s query, identifying entities like IBM Lotus Domino, and other features associated with Autonomy IDOL or Endeca style systems. Both of these date from the late 1990s, so IBM is not pushing too far from the dirt path carved out of the findability woods by former leaders in enterprise search.
IBM provides information needed to implement query expansion. Yes, a dictionary lurks within the system, and an interface is provided so the licensee can be like Noah Webster. The system is rules based, and a specialist is needed to create or edit rules. As you may know, rules based systems suffer from several drawbacks. Rules have to be maintained, subject matter experts or programmers are usually required to make the proper judgments, and rules can drift out of phase with the users’ queries unless the system is monitored with above average rigor. Like Autonomy IDOL, skimp on monitoring and tuning, and the system can generate some interesting results.
The provided user interface looks like this:
ICAwES Redbook, Copyright IBM 2014.
With many users wanting a “big red button” to simplify information access, this interface brings forward the high density displays associated with TeraText and similar legacy systems. The density seems to include hints of Attivio and BA Insight user interfaces as well. There are many choices available to the user. However, without special training, it is unlikely that a marketing professional using ICAwES will be able to make full use of of query trees, category trees, and the numerous icons that appear in four different locations. I can hear the user now, “I want this system to be just like Google? I want to type in a three words and scan the results.”
Net net. If you are working in an organization that favors IBM solutions, this system is likely to be what senior management licenses. Keep in mind that ICAwES will require the ministrations of IBM professional services, probably additional headcount, and on-going work to keep the system delivering useful results to users and decision makers.
The system delivers key word search, rich indexing, and basic metrics about the content. IBM offers more robust analytic tools in its SPSS product line. For more comprehensive text analysis, take a look at IBM i2 and Cybertap solutions if your organization has appropriate credentials for these somewhat more sophisticated information access and analysis systems.
After working through the Redbook, I had one question, “Where’s Watson?”
Stephen E Arnold, October 12, 2014
The AIIM Enterprise Search Study 2014
October 10, 2014
I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:
Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.
The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).
Some History for the Young Reader of Beyond Search
AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).
AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.
The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.
But AIIM evokes for me:
Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.
I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.
Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.
Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive for Beyond Search.)
Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].
Talk about good news for vendors of findability solutions. If one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.
Finding: 47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.
The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.
LucidWorks Takes Bullets, Mid Tier Consultant Gets Some Info Upside Down and Backward
September 6, 2014
Navigate to “Trouble at LucidWorks: Lawsuits, Lost Deals, & Layoffs Plague the Search Startup Despite Funding.”
Another search vendor struggling for survival is not a surprise. What is interesting is that the write up identifies that venture money was needed to stay afloat, a youthful whiz kid cannot deliver revenues, and that former staff say some pretty negative things.
What struck me as interesting was the information smashed into some sentences from a mid tier consulting firm’s search expert. Did you know that Microsoft gives away Fast Search & Transfer technology. This the same code that received high marks in a magic quadrant and contributed to a jail sentence for the founder of Fast Search. Did you know Did you know that the Google Search Appliance was a low cost search option? I did not. In fact, if you look up prices on the US government’s GSAadvantage.gov site, the GSA is a pretty expensive solution. Did you know that make money in open source search is not easy? Maybe not easy but it seems as if RedHat is doing okay.
Why do I ask these questions? I enjoy pointing out that what looks like reasonable statements from an expert may be “out of square.” For color on this reference, see this Beyond Search article.
What about LucidWorks? The company struggled with creating revenue around a layer of software that interacts with Lucene. There were squabbles, turnover in senior management, and pivots.
What is important is that even when a search and content processing company minimizes these and other issues, search is a darned tough software segment to make spin cash.
LucidWorks may survive. But in the larger context of information retrieval, the long shadows cast by Autonomy and Fast Search & Transfer are reminders that painting word pictures about complex technology may be much easier than building a search company with sustainable revenues.
Stephen E Arnold, September , 2014
The Knowledge Quotient Saucisson Link: Back to Sociology in the 1970s
August 5, 2014
I have mentioned recent “expert analyses” of the enterprise search and content marketing sector. In my view, these reports are little more than gussied up search engine optimization (SEO), content marketing plays. See, for example, this description of the IDC report about “knowledge quotient”. Sounds good, right. So does most content marketing and PR generated by enterprise search vendors trying to create sustainable revenue and sufficient profits to keep the investors on their boats, in their helicopters, and on the golf course. Disappointing revenues are not acceptable to those with money who worry about risk and return, not their mortgage payment.
Some content processing vendors are in need of sales leads. Others are just desperate for revenue. The companies with venture money in their bank account have to deliver a return. Annoyed funding sources may replace company presidents. This type of financial blitzkrieg has struck BA Insight and LucidWorks. Other search vendors are in legal hot water; for example, one Fast Search & Transfer executive and two high profile Autonomy Corp. professionals. Other companies tap dance from buzzword to catchphrase in the hopes of avoiding the fate of Convera, Delphes, or Entopia. The marketing beat goes on, but the revenues for search solutions remains a challenge. How will IBM hit $10 billion in Watson revenues in five or six years? Good question, but I know the answer. Perhaps accounting procedures might deliver what looks like a home run for Watson. Perhaps the Jeopardy winner will have to undergo Beverly Hills-style plastic surgery? Will the new Watson look like today’s Watson? I would suggest that some artificiality could be discerned.
Last week, one of my two or three readers wrote to inform me that the phrase “knowledge quotient” is a registered trademark. One of my researchers told me that when one uses the phrase “knowledge quotient,” one should include the appropriate symbol. Omission can mean many bad things, mostly involving attorneys:
Another one of the goslings picked up the vaporous “knowledge quotient” and poked around for other uses of the word. Remember. I encountered this nearly meaningless quasi academic jargon in the title of an IDC report about content processing, authored by the intrepid expert Dave Schubmehl.
According to one of my semi reliable goslings, the phrase turned up in a Portland State University thesis. The authors were David Clitheroe and Garrett Long.
The trademark was registered in 2004 by Penn State University. Yep, that’s the university which I associate with an unfortunate management “issue.” According to Justia, the person registering the phrase “knowledge quotient” was a Penn State employee named Gene V J Maciol.
So we are considering a chunk of academic jargon cooked up to fulfill a requirement to get an advanced degree in sociology in 1972. That was about 40 years ago. I am not familiar with sociology or the concept knowledge quotient.
I printed out the 111 page document and read it. I do have some observations about the concept and its relationship to search and content processing. Spoiler alert: Zero, none, zip, nada, zilch.
The topic of the sociology paper is helping kids in trouble. I bristled at the assumptions implicit in the write up. Some cities had sufficient resources to help children. Certain types of faculties are just super. I assume neither of the study’s authors were in a reformatory, orphanage, or insane asylum.
Anyway the phrase “knowledge quotient” is toothless. It means, according to page 31:
the group’s awareness and knowledge of the [troubled youth or orphan] home.
And the “quotient” part? Here it is in all its glory:
A knowledge quotient reflects the group’s awareness and knowledge of the home.
More Knowledge Quotient Silliness: The Florida Gar of Search Marketing
August 1, 2014
I must be starved for intellectual Florida Gar. Nibble on this fish’s lateral line and get nauseous or dead. Knowledge quotient as a concept applied to search and retrieval is like a largish Florida gar. Maybe a Florida gar left too long in the sun.
Lookin’ yummy. Looks can be deceiving in fish and fishing for information. A happy quack to https://www.flmnh.ufl.edu/fish/Gallery/Descript/FloridaGar/FloridaGar.html
I ran a query on one of the search systems that I profile in my lectures for the police and intelligence community. With a bit of clicking, I unearthed some interesting uses of the phrase “knowledge quotient.”
What surprised me is that the phrase is a favorite of some educators. The use of the term as a synonym for plain old search seems to be one of those marketing moments of magic. A group of “experts” with degrees in home economics, early childhood education, or political science sit around and try to figure out how to sell a technology that is decades old. Sure, the search vendors make “improvements” with ever increasing speed. As costs rise and sales fail to keep pace, the search “experts” gobble a cinnamon latte and innovate.
In Dubai earlier this year, I saw a reference to a company engaged in human resource development. I think this means “body shop,” “lower cost labor,” or “mercenary registry,” but I could be off base. The company is called Knowledge Quotient FZ LLC. If one tries to search for the company, the task becomes onerous. Google is giving some love to the recent IDC study by an “expert” named Dave Schubmehl. As you may know, this is the “professional” who used by information and then sold it on Amazon until July 2014 without paying me for my semi-valuable name. For more on this remarkable approach to professional publishing, see http://wp.me/pf6p2-auy.
Also, in Dubai is a tutoring outfit called Knowledge Quotient which delivers home tutoring to the children of parents with disposable income. The company explains that it operates a place where learning makes sense.
Companies in India seem to be taken with the phrase “knowledge quotient.” Consider Chessy Knowledge Quotient Private Limited. In West Bengal, one can find one’s way to Mukherjee Road and engage the founders with regard to an “effective business solution.” See http://chessygroup.co.in. Please, do not confuse Chessy with KnowledgeQ, the company operating as Knowledge Quotient Education Services India Pvt Ltd. in Bangalore. See http://www.knowledgeq.org.
What’s the relationship between these companies operating as “knowledge quotient” vendors and search? For me, the appropriation of names and applying them to enterprise search contributes to the low esteem in which many search vendors are held.
Why is Autonomy IDOL such a problem for Hewlett Packard? This is a company that bought a mobile operating system and stepped away from. This is a company that brought out a tablet and abandoned it in a few months. This is a company that wrote off billions and then blamed the seller for not explaining how the business worked. In short, Autonomy, which offers a suite of technology that performs as well or better than any other search system, has become a bit of Florida gar in my view. Autonomy is not a fish. Autonomy is a search and content processing system. When properly configured and resourced, it works as well as any other late 1990s search system. I don’t need meaningless descriptions like “knowledge quotient” to understand that the “problem” with IDOL is little more than HP’s expectations exceeding what a decades old technology can deliver.
Why is Fast Search & Transfer an embarrassment to many who work in the search sector. Perhaps the reason has to do with the financial dealings of the company. In addition to fines and jail terms, the Fast Search system drifted from its roots in Web search and drifted into publishing, smart software, and automatic functions. The problem was that when customers did not pay, the company did not suck it up, fix the software, and renew their efforts to deliver effective search. Nah, Fast Search became associated with a quick sale to Microsoft, subsequent investigations by Norwegian law enforcement, and the culminating decision to ban one executive from working in search. Yep, that is a story that few want to analyze. Search marketers promised and the technology did not deliver, could not deliver given Fast Search’s circumstances.
What about Excalibur/Convera? This company managed to sell advanced search and retrieval to Intel and the NBA. In a short time, both of these companies stepped away from Convera. The company then focused on a confection called “vertical search” based on indexing the Internet for customers who wanted narrow applications. Not even the financial stroking of Allen & Co. could save Convera. In an interesting twist, Fast Search purchased some of Convera’s assets in an effort to capture more US government business. Who digs into the story of Excalibur/Convera? Answer: No one.
What passes for analysis in enterprise search, information retrieval, and content processing is the substitution of baloney for fact-centric analysis. What is the reason that so many search vendors need multiple injections of capital to stay in business? My hunch is that companies like Antidot, Attivio, BA Insight, Coveo, Sinequa, and Palantir, among others, are in the business of raising money, spending it in an increasingly intense effort to generate sustainable revenue, and then going once again to capital markets for more money. When the funding sources dry up or just cut off the company, what happens to these firms? They fail. A few are rescued like Autonomy, Exalead, and Vivisimo. Others just vaporize as Delphes, Entopia, and Siderean did.
When I read a report from a mid tier consulting firm, I often react as if I had swallowed a chunk of Florida gar. An example in my search file is basic information about “The Knowledge Quotient: Unlocking the Hidden Value of Information.” You can buy this outstanding example of ahistorical analysis from IDC.com, the employer of Dave Schubmehl. (Yep, the same professional who used my research without bothering to issue me a contract or get permission from me to fish with my identity. My attorney, if I understand his mumbo jumbo, says this action was not identity theft, but Schubmehl’s actions between May 2012 and July 2014 strikes me as untoward.)
Net net: I wonder if any of the companies using the phrase “knowledge quotient” are aware of brand encroachment. Probably not. That may be due to the low profile search enjoys in some geographic regions where business appears to be more healthy than in the US.
Can search marketing be compared to Florida gar? I want to think more about this.
Stephen E Arnold, August 1, 2014
Gartner and Enterprise Search 2014
July 31, 2014
At lunch yesterday, several search aware people discussed a July 2014 Gartner study. One of the folks had a crumpled image of the July 2014 “magic quadrant.” This is, I believe, report number G00260831. Like other mid tier consulting firms, Gartner works hard to find something that will hook customers’ and prospects’ attention. The Gartner approach is focused on companies that purport to have enterprise search systems. From my vantage point, the Gartner approach is miles ahead of the wild and illogical IDC report about knowledge, a “quotient,” and “unlocking” hidden value. See http://bit.ly/1rpQymz. Now I have not fallen in love with Gartner. The situation is more like my finding my content and my name for sale on Amazon. You can see what my attorney complained about via this link, http://bit.ly/1k7HT8k. I think I was “schubmehled,” not outwitted.
I am the really good looking person. Image source: http://bit.ly/1rPWjN3
What the IDC report lacks in comprehensiveness with regard to vendors, Gartner mentions quite a few companies allegedly offering enterprise search solutions. You must chase down your local Garnter sales person for more details. I want to summarize the points that surfaced in our lunch time pizza fest.
First, the Gartner “study” includes 18 or 19 vendors. Recommind is on the Gartner list even though a supremely confident public relations “professional” named Laurent Ionta insisted that Recommind was not in the July 2014 Gartner report. I called her attention to report number G00260831 and urged her to use her “bulldog” motivation to contact her client and Gartner’s experts to get the information from the horse’s mouth as it were. (Her firm is www.lewispr.com and its is supported to be the Digital Agency of the Year and on the Inc 5000 list of the fastest growing companies in America.) I am impressed with the accolades she included in her emails to me. The fact that this person who may work on the Recommind account was unaware that Gartner pegged Recommind as a niche player seemed like a flub of the first rank. When it comes to search, not even those in the search sector may know who’s on first or among the chosen 19.
To continue with my first take away from lunch, there were several companies that those at lunch thought should be included in the Gartner “analysis.” As I recall, the companies to which my motley lunch group wanted Gartner to apply their considerable objective and subjective talents were:
- ElasticSearch. This in my view is the Big Dog in enterprise search at the moment. The sole reason is that ElasticSearch has received an injection of another $70 million to complement the $30 odd million it had previously gather. Oh, ElasticSearch is a developer magnet. Other search vendors should be so popular with the community crowd.
- Oracle. This company owns and seems to offer Endeca solutions along with RightNow/InQuira natural language processing for enterprise customer support, the fading Secure Enterprise Search system, and still popping and snapping Oracle Text. I did not mention to the lunch crowd that Oracle also owns Artificial Linguistics and Triple Hop technology. This information was, in my view, irrelevant to my lunch mates.
- SphinxSearch. This system is still getting love from the MySQL contingent. Imagine no complex structured query language syntax to find information tucked in a cell.
There are some other information retrieval outfits that I thought of mentioning, but again, my free lunch group does not know what it does not know. Like many folks who discuss search with me, learning details about search systems is not even on the menu. Even when the information is free, few want to confuse fantasy with reality.
The second take away is that rational for putting most vendors in the niche category puzzled me. If a company really has an enterprise search solution, how is that solution a niche? The companies identified as those who can see where search is going are, as I heard, labeled “visionaries.” The problem is that I am not sure what a search visionary is; for example, how does a French aerospace and engineering firm qualify as a visionary? Was HP a visionary when it bought Autonomy, wrote off $8 billion, and initiated litigation against former colleagues? How does this Google supplied definition apply to enterprise search:
able to see visions in a dream or trance, or as a supernatural apparition?
The final takeaway for me was the failure to include any search system from China, Germany, or Russia. Interesting. Even my down on their heels lunch group was aware of Yandex and its effort in enterprise search via a Yandex appliance. Well, internationalization only goes so far I suppose.
I recall hearing one of my luncheon guests say that IBM was, according the “experts” at Gartner, a niche player.Gentle reader, I can describe IBM many ways, but I am not sure it is a niche player like Exorbyte (eCommerce mostly) and MarkLogic (XML data management). Nope, IBM’s search embraces winning Jeopardy, creating recipes with tamarind, and curing assorted diseases. And IBM offers plain old search as part of DB2 and its content management products plus some products obtained via acquisition. Cybertap search, anyone? When someone installs, what used to be OmniFind, I thought IBM was providing an enterprise class information retrieval solution. Guess I am wrong again.
Net net: Gartner has prepared the ground for a raft of follow on analyses. I would suggest that you purchase a copy of the July 2014 Gartner search report. You may be able to get your bearings so you can answer these questions:
- What are the functional differences among the enterprise search systems?
- How does the HP Autonomy “solution” compare to the pre-HP Autonomy solution?
- What is the cost of a Google Search Appliance compared to a competing product from Maxxcat or Thunderstone? (Yep, two more vendors not in the Gartner sample.)
- What causes a company to move from being a challenger in search to a niche player?
- What makes both a printer company and a Microsoft-centric solution qualified to match up with Google and HP Autonomy in enterprise search?
- What are the licensing costs, customizing costs, optimizing costs, and scaling costs of each company’s enterprise search solution? (You can find the going rate for the Google Search Appliance at www.gsaadvantage.gov. The other 18? Good luck.)
I will leave you to your enterprise search missions. Remember. Gartner, unlike some other mid-tier consulting firms, makes an effort to try to talk about what its consultants perceive as concrete aspects of information retrieval. Other outfits not so much. That’s why I remain confused about the IDC KQ (knowledge quotient) thing, the meaning of hidden value, and unlocking. Is information like a bike padlock?
Stephen E Arnold, July 31, 2014
IHS Enterprise Search: Semantic Concept Lenses Are Here
July 29, 2014
I pointed out in http://bit.ly/X9d219 that IDC, a mid tier consulting firm that has marketed my information without permission on Amazon of all places, has rolled out a new report about content processing. The academic sounding title is “The Knowledge Quotient: Unlocking the Hidden Value of Information.” Conflating knowledge and information is not logically satisfying to me. But you may find the two words dusted with “value” just the ticket to career success.
I have not read the report, but I did see a list of the “sponsors” of the study. The list, as I pointed out, was an eclectic group, including huge firms struggling for credibility (HP and IBM) down to consulting firms offering push ups for indexers.
One company on my list caused me to go back through my archive of search information. The firm that sparked my interest is Information Handling Services or IHS or Information Handling Service. The company is publicly traded and turning a decent profit. The revenue of IHS has moved toward $2 billion. If the global economy perks up and the defense sector is funded at pre-drawdown levels, IHS could become a $2 billion company.
IHS is a company with an interesting history and extensive experience with structured and unstructured search. Few of those with whom I interacted when I was working full time considered IHS a competitor to the likes of Autonomy, Endeca, and Funnelback.
In the 2013 10-K on page 20, IHS presents its “cumulative total return” in this way:
The green line looks like money. Another slant on the company’s performance can be seen in a chart available from Google Finance.
The Google chart shows that revenue is moving upwards, but operating margins are drifting downward and operating income is suppressed. Like Amazon, the costs for operating and information centric company are difficult to control. Amazon seems to have thrown in the towel. IHS is managing like the Dickens to maintain a profit for its stakeholders. For stakeholders, is the hope is that hefty profits will be forthcoming?
Source: Google Finance
My initial reaction was, “Is IHS trying to find new ways to generate higher margin revenue?”
Like Thomson Reuters and Reed Elsevier, IHS required different types of content processing plumbing to deliver its commercial databases. Technical librarians and the competitive intelligence professionals monitoring the defense sector are likely to know about IHS different products. The company provides access to standards documents, regulatory information, and Jane’s military hardware information services. (Yep, Jane’s still has access to retired naval officers with mutton chop whiskers and interesting tweed outfits. I observed these experts when I visited the company in England prior to IHS’s purchase of the outfit.)
The standard descriptions of IHS peg the company’s roots with a trade magazine outfit called Rogers Publishing. My former boss at Booz, Allen & Hamilton loved some of the IHS technical services. He was, prior to joining Booz, Allen the head of research at Martin Marietta, an IHS customer in the 1970s. Few remember that IHS was once tied in with Thyssen Bornemisza. (For those with an interest in history, there are some reports about the Baron that are difficult to believe. See http://bit.ly/1qIylne.)
Large professional publishing companies were early, if somewhat reluctant, supporters of SGML and XML. Running a query against a large collection of structured textual information could be painfully slow when one relied on traditional relational database management systems in the late 1980s. Without SGML/XML, repurposing content required humans. With scripts hammering on SGML/XML, creating new information products like directories and reports eliminated the expensive humans for the most part. Fewer expensive humans in the professional publishing business reduces costs…for a while at least.
IHS climbed on the SGML/XML diesel engine and began working to deliver snappy online search results. As profit margins for professional publishers were pressured by increasing marketing and technology costs, IHS followed the path of other information centric companies. IHS began buying content and services companies that, in theory, would give the professional publishing company a way to roll out new, higher margin products. Even secondary players in the professional publishing sector like Ebsco Electronic Publishing wanted to become billion dollar operations and then get even bigger. Rah, rah.
These growth dreams electrify many information company’s executives. The thought that every professional publishing company and every search vendor are chasing finite or constrained markets does not get much attention. Moving from dreams to dollars is getting more difficult, particularly in professional publishing and content processing businesses.
My view is that packaging up IHS content and content processing technology got a boost when IHS purchased the Invention Machine in mid 2012.
Years ago I attended a briefing by the founders of the Invention Machine. The company demonstrated that an engineer looking for a way to solve a problem could use the Invention Machine search system to identify candidate systems and methods from the processed content. I recall that the original demonstration data set was US patents and patent applications. My thought was that an engineer looking for a way to implement a particular function for a system could — if the Invention Machine system worked as presented — could present a patent result set. That result set could be scanned to eliminate any patents still in force. The resulting set of patents might yield a procedure that the person looking for a method could implement without having to worry about an infringement allegation. The original demonstration was okay, but like most “new” search technologies, Invention Machine faced funding, marketing, and performance challenges. IHS acquired Invention Machine, its technologies, its Eastern European developers, and embraced the tagging, searching, and reporting capabilities of the Invention Machine.
The Goldfire idea is that an IHS client can license certain IHS databases (called “knowledge collections”) and then use Goldfire / Invention Machine search and analytic tools to get the knowledge “nuggets” needed to procure a missile guidance component.
The jargon for this finding function is “semantic concept lenses.” If the licensee has content in a form supported by Goldfire, the licensee can search and analyze IHS information along with information the client has from its own sources. A bit more color is available at http://bit.ly/WLA2Dp.
The IHS search system is described in terms familiar to a librarian and a technical analyst; for example, here’s the attributes for Goldfire “cloud” from an IHS 2013 news release:
- “Patented semantic search technology providing precise access to answers in documents. [Note: IHS has numerous patents but it is not clear what specific inventions or assigned inventions apply directly to the search and retrieval solution(s)]
- Access to more than 90 million scientific and technical “must have” documents curated by IHS. This aggregated, pre-indexed collection spans patents, premium IHS content sources, trusted third-party content providers, and the Deep Web.
- The ability to semantically index and research across any desired web-accessible information such as competitive or supplier websites, social media platforms and RSS feeds – turning these into strategic knowledge assets.
- More than 70 concept lenses that promote rapid research, browsing and filtering of related results sets thus enabling engineers to explore a concept’s definitions, applications, advantages, disadvantages and more.
- Insights into consumer sentiment giving strategy, product management and marketing teams the ability to recognize customer opinions, perceptions, attitudes, habits and expectations – relative to their own brands and to those of their partners’ and competitors’ – as expressed in social media and on the Web.”
Most of these will resonate with those familiar with the assertions of enterprise search and content processing vendors. The spin, which I find notable, is that IHS delivers both content and information retrieval. Most enterprise search vendors provide technology for finding and analyzing data. The licensee has to provide the content unless the enterprise search vendor crawls the Web or other sources, creates an archive or a basic index, and then provides an interface that is usually positioned as indexing “all content” for the user.
According to Virtual Strategy Magazine (which presumably does not cover “real” strategy), I learned that US 8666730:
covers the semantic concept “lenses” that IHS Goldfire uses to accelerate research. The lenses correlate with the human knowledge system, organizing and presenting answers to engineers’ or scientists’ questions – even questions they did not think to ask. These lenses surface concepts in documents’ text, enabling users to rapidly explore a concept’s definitions, applications, advantages, disadvantages and more.
The key differentiator is claimed to move IHS Goldfire up a notch. The write up states:
Unlike today’s textual, question-answering technologies, which work as meta-search engines to search for text fragments by keyword and then try to extract answers similar to the text fragment, the IHS Goldfire approach is entirely unique – providing relevant answers, not lists of largely irrelevant documents. With IHS Goldfire, hundreds of different document types can be parsed by a semantic processor to extract semantic relationships like subject-action-object, cause-and-effect and dozens more. Answer-extraction patterns are then applied on top of the semantic data extracted from documents and answers are saved to a searchable database.
According to Igor Sovpel, IHS Goldfire:
“Today’s engineers and technical professionals are underserved by traditional Internet and enterprise search applications, which help them find only the documents they already know exist,” said Igor Sovpel, chief scientist for IHS Goldfire. “With this patent, only IHS Goldfire gives users the ability to quickly synthesize optimal answers to a variety of complex challenges.”
Is IHS’ new marketing push in “knowledge” and related fields likely to have an immediate and direct impact on the enterprise search market? Perhaps.
There are several observations that occurred to me as I flipped through my archive of IHS, Thyssen, and Invention Machine information.
First, IHS has strong brand recognition in what I would call the librarian and technical analyst for engineering demographic. Outside of lucrative but quite niche markets for petrochemical information or silhouettes and specifications for the SU 35, IHS suffers the same problem of Thomson Reuters and Wolters Kluwer. Most senior managers are not familiar with the company or its many brands. Positioning Goldfire as an enterprise search or enterprise technical documentation/data analysis tool will require a heck of a lot of effective marketing. Will positioning IHS cheek by jowl with IBM and a consulting firm that teaches indexing address this visibility problem? The odds could be long.
Second, search engine optimization folks can seize on the name Goldfire and create some dissonance for IHS in the public Web search indexes. I know that companies like Attivio and Microsoft use the phrase “beyond search” to attract traffic to their Web sites. I can see the same thing happening. IHS competes with other professional publishing companies looking for a way to address their own marketing problems. A good SEO name like “Goldfire” could come under attack and quickly. I can envision lesser competitors usurping IHS’ value claims which may delay some sales or further confuse an already uncertain prospect.
Third, enterprise search and enterprise content analytics is proving to be a difficult market from which to wring profitable, sustainable revenue. If IHS is successful, the third party licensees of IHS data who resell that information to their online customers might take steps to renegotiate contracts for revenue sharing. IHS will then have to ramp up its enterprise search revenues to keep or outpace revenues from third party licensees. Addressing this problem can be interesting for those managers responsible for the negotiations.
Finally, enterprise search has a lot of companies planning on generating millions or billions from search. There can be only one prom queen and a small number of “close but no cigar” runner ups. Which company will snatch the crown?
This IHS search initiative will be interesting to watch.
Stephen E Arnold, July 29, 2014