Enterprise Search: Fee Versus Free

November 25, 2014

I read a pretty darned amazing article “Is Free Enterprise Search a Game Changer?” My initial reaction was, “Didn’t the game change with the failures of flagship enterprise search systems?” And “Didn’t the cost and complexity of many enterprise search deployments fuel the emergence of the free and open source information retrieval systems?”

Many proprietary vendors are struggling to generate sustainable revenues and pay back increasingly impatient stakeholders. The reality is that the proprietary enterprise search “survivors” fear meeting the fate of  Convera, Delphes, Entopia, Perfect Search, Siderean Software, TREX, and other proprietary vendors. These outfits went away.


Many vendors of proprietary enterprise search systems have left behind an environment in which revenues are simply not sustainable. Customers learned some painful lessons after licensing brand name enterprise search systems and discovering the reality of their costs and functionality. A happy quack to http://bit.ly/1AMHBL6 for this image of desolation.

Other vendors, faced with mounting costs and zero growth in revenues, sold their enterprise search companies. The spate of sell outs that began in the mid 2000s were stark evidence that delivering information retrieval systems to commercial and governmental organizations was difficult to make work.

Consider these milestones:

Autonomy sold to Hewlett Packard. HP promptly wrote off billions of dollars and launched a fascinating lawsuit that blamed Autonomy for the deal. HP quickly discovered that Autonomy, like other complex content processing companies, was difficult to sell, difficult to support, and difficult to turn into a billion dollar baby.

Convera, the product of Excalibur’s scanning legacy and ConQuest Software, captured some big deals in the US government and with outfits like the NBA. When the system did not perform like a circus dog, the company wound down. One upside for Convera alums was that they were able to set up a consulting firm to keep other companies from making the Convera-type mistakes. The losses were measured in the tens of millions.

Read more

Enterprise Search: Essentially Marginalized to Good Enough

November 9, 2014

I use Google Trends to see what’s hot and what’s not in the world of information retrieval. If you want to use the free version of Google Trends, navigate to http://www.google.com/trends/ and explore. That’s some of what Google does to make decisions about how much of Larry Page’s “wood” to put behind the Google Search Appliance eight ball.


I plugged in “enterprise search.” When one allows Google to output its version of the popularity of the term, you get this graph. It shows a downward trend but the graph is without much context. The pale lettering does not help. Obviously Googlers do not view the world through trifocals with 70 year old eyes. Here’s the Trends’ output for “enterprise search”:


Now let’s add some context. From the “enterprise search” Trends’ output, click the pale blue plus and add this with quotes: “big data.” Here’s the output for this two factor analysis:


One does not have to be an Ivy League data scientist to see the difference between the hackneyed “enterprise search” and more zippy but meaningless “Big Data.” I am not saying Big Data solutions actually work. What’s clear is that pushing enterprise search is not particularly helpful when the Trends’ reveal a flat line for years, not hours, not days, not months–years.

I think it is pretty clear why I can assert with confidence that “enterprise search” appears to be a non starter. I know why search vendors persist in telling me what “enterprise search” is. The vendors are desperate to find the grip that a Tupinambis  lizard possesses. Instead of clinging to a wall in the sun at 317 R. Dr. Emílio Ribas (Cambui)  (where I used to live in Campinas, SP), the search vendors are clinging to chimera. The goal is to make sales, but if the Google data are even sort of correct, enterprise search is flat lining.

Little wonder that consultant reports like those from the mid tier crowd try to come up with verbiage that will create sales leads for the research sponsors; case in point, knowledge quotient. See Meme of the Moment for a fun look at IDC’s and search “expert” Dave Schubmehl’s most recent attempt to pump up the music.

The question is, “What is generating revenue?” In a sense, excitement surrounds vendors who deliver solutions. These include search, increasingly supplied by open source software. Elasticsearch is zipping along, but search is not the main dish. Search is more like broccoli or carrots.

The good news is that there is a group of companies, numbering about 30, which have approached search differently. As a result, many of these companies are growing and charting what I call “next generation search.”

Want to know more? Well, that’s good. Watch for my coverage of this sector in the weeks and months ahead. I will toss a small part of our research into my November Information Today column. A tiny chunk. Keep that in mind.

In the meantime, think critically about the craziness flowing from many mid tier or azure chip consulting firms. Those “outputs” are marketing, self aggrandizing, and, for me, downright silly. What’s that term for doing trivial actions again and again?

Stephen E Arnold, November 9, 2014

Enterprise Search, Knowledge Management, & Customer Service: Some of the Study Stuff Ups Evident?

October 27, 2014

One of my two or three readers sent me a link to “The 10 Stuff Ups We All Make When Interpreting Research.” The article walks through some common weaknesses individuals make when “interpreting research.” I don’t agree with the “all” in the title.

This article arrived as I was reading a recent study about search. As an exercise on a surprisingly balmy Sunday afternoon in Kentucky, I jotted down the 10 “stuff ups” presented in the Interpreting Research article. Here they are in my words, paraphrased to sidestep plagiarism, copyright, and Google duplication finder issues:

  1. One study, not a series of studies. In short, an anomaly report.
  2. One person’s notion of what is significant may be irrelevant.
  3. Mixing up risk and the Statistics 101 notion of “number needed to treat” gets the cart before the horse.
  4. Trends may not be linear.
  5. Humans find what they want to find; that is, pre existing bias or cooking the study.
  6. Ignore the basics and layer cake the jargon.
  7. Numbers often require context. Context in the form of quotes in one on one interviews require numbers.
  8. Models and frameworks do not match reality; that is, a construct is not what is.
  9. Specific situations do matter.
  10. Inputs from colleagues may not identify certain study flaws.

To test the article’s premises, I I turned to a study sent to me by a persona named Alisa Lipzen. Its title is “The State of Knowledge Management: 2014. Growing role & Value of Unified Search in Customer Service.” (If the link does not work for you, you will have to contact either of the sponsors, the Technology Services Industry Association or Coveo, an enterprise search vendor based in Canada.) You may have to pay for the report. My copy was free. Let’s do a quick pass through the document to see if it avoids the “stuff ups.”

First, the scope of the report is broad:

1. Knowledge management. Although I write a regular column for KMWorld, I must admit that I am not able to define exactly what this concept means. Like many information access buzzwords, the shotgun marriage of “knowledge” and “management” glues together two abstractions. In most usages, knowledge management refers to figuring out what a person “knows” and making that information available to others in an organization. After all, when a person quits, having access to that person’s “knowledge” has a value. But “knowledge” is as difficult to nail down as “management.” I suppose one knows it when one encounters it.

2. Unified search. The second subject is “unified search.” This is the idea that a person can use a single system to locate information germane to a query from a single search box. Unified suggests that widely disparate types of information are presented in a useful manner. For me, the fact that Google, arguably the best resourced information access company, has been unable to deliver unified search. Note that Google calls its goal “universal search.” In the 1980s, Fulcrum Technologies (Ottawa, Canada) search offered a version of federated search. In 2014, Google requires that a user run a query across different silos of information; for example, if I require informatio0n about NGFW I have to run the query across Google’s Web index, Google scholarly articles, Google videos, Google books, Google blogs, and Google news. This is not very universal. Most “unified” search solutions are marketing razzle dazzle for financial, legal, technical, and other reasons. Therefore, organizations have to have different search systems.

3. Customer service. This is a popular bit of jargon. The meaning of customer service, for me, boils down to cost savings. Few companies have the appetite to pay for expensive humans to deal with the problems paying customers experience. Last week, I spent one hour on hold with an outfit called Wellcare. The insurance company’s automated system reassured me that my call was important. The call was never answered. What did I learn. Neither my call nor my status as a customer was important. Most information access systems applied to “customer service” are designed to drive the cost of support and service as low as possible.


“Get rid of these expensive humans,” says the MBA. “I want my annual bonus.”

I was not familiar with the TSIA. What is its mission? According the the group’s Web site:

TSIA is organized around six major service disciplines that address the major service businesses found in a typical technology company.

Each service discipline has its own membership community led by a seasoned research executive. Additionally, each service discipline has the following:

In addition, we have a research practice on Service Technology that spans across all service discipline focus areas.

My take is that TSIA is a marketing-oriented organization for its paying members.

Now let’s look at some of the the report’s key findings:

The people, process, and technology components of technology service knowledge management (KM) programs. This year’s survey examined core metrics and practices related to knowledge capture, sharing, and maintenance, as well as forward-looking elements such as video, crowd sourcing, and expertise management. KM is no longer just of interest to technical support and call centers. The survey was open to all TSIA disciplines, and 50% of the 400-plus responses were from groups other than support services, including 24% of responses from professional services organizations.

Read more

The AIIM Enterprise Search Study 2014

October 10, 2014

I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:

Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.

The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).

Some History for the Young Reader of Beyond Search

AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).

AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.

The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.

But AIIM evokes for me:

Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.

I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.

Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.

Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive  for Beyond Search.)

Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].

Talk about good news for vendors of findability solutions. If  one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.

Finding:  47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.

The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.

Read more

IDC Tweets, IBM, and Content Marketing

September 29, 2014

Some Backstory

In 2012 and 2013, IDC sold my content with my name and Dave Schubmehl’s. These were nifty IDC “official” reports. The only hitch in the git along is that IDC did not trouble itself to issue a contract, get my permission, or tell me what they were doing with research my team prepared. The deal was witnessed by a law librarian, and I have a stack of emails about my research into such open source companies as Attivio, ElasticSearch (one of the disruptors of the enterprise search market), IBM (the subject of the IDC twit storm), Lucid Imagination (now Lucid Works which I write when I feel playful as Lucid works, really?), and eight other companies.

Hit by a twit storm. Rough seas ahead. Image from www.qsl.net.

In 2012, I had the open source research. IDC wanted the open source content to use in a monograph. So in front of a law librarian, IDC’s search “expert” thought the exchange of my information for open source intelligence, money, and stuff to sell was a great idea. (I have a file of email from IDC to me about what IDC wanted, but I never got a contract. But IDC had my research. Ah, those administrative delays.) IDC, however, was organized enough to additions to my company research like an open source industry overview.

In an odd approach to copyright, IDC did not produce a contract but it produced reports about four open source companies. Mr. Schubmehl and IDC just went about producing what were recycled company reports and trying to sell them at $3,500 a whack. Is that value or an example of the culture of narcissism? It may come as a surprise to you, gentle reader, but I sell research for money. I have a business model and it has worked for about 40 years. When an outfit uses the research without issuing a contract, I have to start thinking about such issues as fairness, integrity, copyright, and name surfing. Call me idiosyncratic, but when my name is used without my permission, I wonder how a big and allegedly respected organization can operate like a BearStearns-type senior executive.

Then, the straw that broke the proverbial camel’s back, a librarian told me that IDC was selling a report with my name and Mr. Schubmehl’s on Amazon. Wow, Amazon, the Wal-Mart for the digital age. The reports, now removed from Amazon’s blue light special shelf cost $3,500. Not bad for eight pages of  information based on my year long research investment into the wild and volatile world of open source search and content processing. Surf’s up for Mr. Schubmehl.

Well, IDC after some prodding by my very gentle legal gerbil stopped selling my work. We received a proposal that offered me a pittance for a guarantee that I would not talk or write about this name surfing, unauthorized resale of my information on Amazon, and the flubs of Mr. Schubmehl.

My legal gerbil rejected IDC’s lawyer crafted “deal,” and I am now converting my IDC misadventure  into a metaphor for some of the deeper issues associated with “experts” and certain professional services firms. My legal gerbil suggested a significantly higher fee, but, like many of that ilk, the gerbil broke my heart.

Hence, IDC and Mr. Schubmehl’s tweets and twit storm are on my fragile ship’s radar. Let’s review the IBM IDC Schubmehl twit storm on just one day in September 2014. Trigger warning: Do not emulate the IDC Schubmehl method for your content marketing program. One day of tweets only generates a lot of twit.

Now to the Twit Storm Unleashed on September 16, 2014

Using my Overflight system, I monitor IDC tweets. Quite an interesting series of tweets appears on September 16, 2014. Mr. Schubmehl posted 25 tweets about IBM Watson.

Here are three examples of the Watson content content to which his name was attached::

  • September 16, 2014. #WatsonAnalytics uses Watson cognitive technologies to ingest structured data and find relationships – Robin Grosset & Dan Wolfson
  • September 16, 2014 Combo of cognitive with cloud analytics improves process, analysis and decision making – cognitive will change all mkts #WatsonAnalytics
  • September 16, 2014 #WatsonAnalytics will be using a freemium model….first time for IBM…

Obviously there is nothing wrong with a tweet about an IBM product. What’s one more twit emission in a flow of several hundred thousand 144 character text outputs.

There is nothing illegal with two dozen tweets about IBM. What two dozen tweets do is make me laugh and see this content marketing effort as fodder for corporate weirdness.

Also, this IBM twit storm is not on the Miley Cyrus or Lady Gaga scale, but it is notable because it is a one day twit storm quite unlike the Jeopardy journey. Quite a marketing innovation: getting an alleged “expert” to craft  16 “original” tweets in one day and issue seven retweets of tweets from others who are fans of Big Blue. A few Schubmehl tweets on the 16th illustrated diversity; for example, “The FBI’s Facial Recognition System Is Here.” Hmm. The FBI and facial recognition. I wonder why one is interested in this development.

The terms mentioned in these IBM centric tweets on September 16, 2014, reveal the marketing jargon that IBM is using to generate revenue from the game show winning technology. My list of buzzwords from the tweets read like a who’s who of blogosphere and venture oriented yak:

  • Automated data cleansing
  • Analytics (cloud based)
  • Big Data
  • Cognitive (system and capabilities)
  • Data explorer
  • Democratizing
  • Freemium
  • Natural Language Computing
  • Natural Language Query.

From this list of buzzwords my favorites are “cognitive,” “Big Data,” and the number one silly word “Freemium.” Imagine. Freemium from IBM. Imagine.

My Interpretation of the Twit Storm

Let me capture several preliminary observations:

First, the Schubmehl Twitter activity on September 16, 2014 focuses mostly on IBM’s challenged Watson business development effort. The cluster of tweets on the 16th suggest a somewhat ungainly and down-market content marketing play.

Did Mr. Schubmehl wake up on the 16th of September and decide to crank out Watson centric tweets? Did IBM pay IDC and Mr. Schubmehl to do some content marketing like thousands of PR firms do each day? We even have these outfits in Harrod’s Creek, Kentucky to flog auto sales, bourbon, and cheesy festivals in Middletown, Kentucky.

Here’s a question: “How many tweets does a McKinsey or Bain type of consulting firm issue on a single day for a single product that seems to be struggling for revenue?” If you know, please, use the comments section of this blog to provide some factoids.

Second, the tweets provide the reader with a list of what seem to be IBM Watson aficionados or employees who have the job of making the shotgun marriage of open source code, legacy Almaden technology, and proprietary scripts into a billion dollar revenue producer soon, very soon, gentle reader. The individuals mentioned in the September 16, 2014, tweets include:

  • Steve Gold, Baylor University
  • Robin Grosset, Distinguished engineer Watson Analytics.
  • Dan Wolfson, IBM Distinguished Engineer
  • Bob Picciano, Senior vice president, IBM information and analytics group.

Perhaps Mr. Gold is objective? I ask, “Do the other three IBM wizards looking at the world through IBM tinted spectacles when reading their business objectives for the current fiscal year?” I asked myself, “Should I trust these individuals who presumably are also “experts” in all things related to Watson?” My preliminary answer is, “Not for an objective view of the game show winning Watson.”

Third, what’s the payoff of this twit storm for IBM? Did IBM expect me to focus on the Schubmehl twit storm and convert the information into my idea of a 10 minute stand up comedy routine to deliver at the upcoming intelligence and law enforcement conference in nine days? Is it possible that “doing social media” looks good on a weekly report when an executive does not have juicy revenue numbers to present? The value of the effort strikes me as modest. In fact, viewed as a group, the tweets could be interpreted as a indicator of IBM’s slide into desperation marketing?

What about consulting firms and their ability to pump out high margin revenue?

Outfits like Gerson Lehrman Group have put the squeeze on mid tier consulting firms. The bottom feeders with its middle school teacher and poet contingent are not likely to sell to the IBMs of the world. GLG types companies are also nipping at the low end business of the blue chip outfits like Bain, Boston Consulting, and even McKinsey.

Put GLG can deliver to a client retired professionals from blue chip firms and on point experts. As a result, GLG has made life very, very tough for the mid tier outfits. Why pay $50,000 for an unproven “expert” when you can buy a person with a pedigree for an hour and pay a few hundred bucks when you need a factoid or an opinion? I consider IDC’s move to content marketing indicative of a fundamental shift in the character of a consulting firm’s business. The shift to low level PR work seems out of character for a professionals services with a commitment to intellectual rigor.

Every few days I learn that something called TopSEOs.com generates a list of content marketing leaders. Will IDC appear on this list?

For those who depend on lower- or mid tier consulting firms for professional counsel, how would you answer these questions:

  1. What is the intellectual substance behind pronouncements? Is there original research underpinning pronouncements and projections, or are the data culled from secondary sources and discussions with paying customers?
  2. What is the actual relationship between a mid tier consulting firm and the companies discussed in “authoritative” reports? Are these reports and projects inclusions (a fancy word for ads) or are they objective discussions of companies?
  3. Are the experts presented as “experts” actually experts or are they individuals who want to hit revenue goals while keeping costs as low as possible?

I don’t have definitive answers to these questions. Perhaps one day I can use a natural language query to tap into Big Data and rely on cognitive methods to provide answers.

For now, a one day twit storm is a wonderful example of how not to close deals, build reputations, and stimulate demand for advanced technology offered via a “Freemium” model. What the heck does that mean anyway?

Stephen E Arnold, September 29, 2014

New York Times Online: An Inside View

September 24, 2014

Check out the presentation “The Surprising Path to a Faster NYTimes.com.”

I was surprised at some of the information in the slide deck. First, I thought the New York Times was first online in the 1970s via LexisNexis.


This is not money. See http://bit.ly/1rus9y8

I thought that was an exclusive deal and reasonably profitable for both LexisNexis and the New York Times. When the newspaper broke off that exclusive to do its own thing, the revenue hit on the New York Times was immediate. In addition, the decision had significant cost implications for the newspaper.

The New York Times needed to hire people who allegedly create an online system. The newspaper had to license software, write code, hire consultants, maintain computers not designed to set type and organize circulation. The New York Times had to learn on the fly about converting content for online content processing. Learning that one does not know anything after thinking one knew everything is a very, very inefficient way to get into the online business. In short, the blow off of the LexisNexis deal added significant initial and then ever increasing on-going costs to the New York Times Co. I don’t think anyone at the New York Times has ever sat down to figure out the cost of that decision to become the Natty Bumpo of the newspaper publishing world.

I had heard that the newspaper raked in the 1970s seven figures a year while LexisNexis did the heavy lifting. Yep, that included figuring out how to put the newspaper content on tape into a suitable form for LexisNexis’ mainframe system. Figuring this out inside the New York Times in the early 1990s made this sound: Crackle, crackle, whoosh. That is the sound of a big company burning money not for a few months but for DECADES, folks. DECADES.


Photo from US Fish and Wildlife.

When the newspaper decided that it could do an online service itself and presumably make more money, the newspaper embarked on the technical path discussed in the slide deck. Few recall that the fellow who set up the journal Online worked on the online version of the newspaper. I recall speaking to that person shortly after he and the newspaper parted ways. He did not seem happy with budgets, technology, or vision. But, hey, that was decades ago.


How some information companies solve common problems with new tools. Image thanks to Enlgishrussia.com at http://bit.ly/1ps0MPF.

In the slide deck, we get an insider’s view of trying to deal with the problem of technical decisions made decades ago. What’s interesting is that the cost of the little adventure by the newspaper does not reflect the lost revenue from the LexisNexis exclusive. The presentation does illustrate quite effectively how effort cannot redress technical decisions made in the past.

This is an infrastructure investment problem. Unlike a physical manufacturing facility, an information centric business is difficult to re-engineer. There is the money problem. It costs a lot to rip and replace or put up a new information facility and then cut it over when it is revved and ready. But information centric businesses have another problem. Most succeed by virtue of luck. The foundation technology is woven into the success of the business, but in ways that are often non replicable.

The New York Times killed off the LexisNexis money flow. Then it had to figure out how to replicate that LexisNexis money flow and generate a bigger profit. What happened? The New York Times spent more money creating the various iterations of the Times Online, lost the LexisNexis money, and became snared in the black hole of trying to figure out how to make online information generate lots of dough. I am suggesting that the New York Times may be kidding itself with the new iteration of the Times Online service.

Read more

Autumn Approaches: Time for Realism about Search

September 1, 2014

Last week I had a conversation with a publisher who has a keen interest in software that “knows” what content means. Armed with that knowledge, a system can then answer questions.

The conversation was interesting. I mentioned my presentations for law enforcement and intelligence professionals about the limitations of modern and computationally expensive systems.

Several points crystallized in my mind. One of these is addressed, in part, in a diagram created by a person interested in machine learning methods. Here’s the diagram created by SciKit:


The diagram is designed to help a developer select from different methods of performing estimation operations. The author states:

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data.

First, notice that there is a selection process for choosing a particular numerical recipe. Now who determines which recipe is the right one? The answer is the coding chef. A human exercises judgment about a particular sequence of operation that will be used to fuel machine learning. Is that sequence of actions the best one, the expedient one, or the one that seems to work for the test data? The answer to these questions determines a key threshold for the resulting “learning system.” Stated another way, “Does the person licensing the system know if the numerical recipe is the most appropriate for the licensee’s data?” Nah. Does a mid tier consulting firm like Gartner, IDC, or Forrester dig into this plumbing? Nah. Does it matter? Oh, yeah. As I point out in my lectures, the “accuracy” of a system’s output depends on this type of plumbing decision. Unlike a backed up drain, flaws in smart systems may never be discerned. For certain operational decisions, financial shortfalls or the loss of an operation team in a war theater can be attributed to one of many variables. As decision makers chase the Silver Bullet of smart, thinking software, who really questions the output in a slick graphic? In my experience, darned few people. That includes cheerleaders for smart software, azure chip consultants, and former middle school teachers looking for a job as a search consultant.

Second, notice the reference to a “rough guide.” The real guide is understanding of how specific numerical recipes work on a set of data that allegedly represents what the system will process when operational. Furthermore, there are plenty of mathematical methods available. The problem is that some of the more interesting procedures lead to increased computational cost. In a worst case, the more interesting procedures cannot be computed on available resources. Some developers know about N=NP and Big O. Others know to use the same nine or ten mathematical procedures taught in computer science classes. After all, why worry about math based on mereology if the machine resources cannot handle the computations within time and budget parameters? This means that most modern systems are based on a set of procedures that are computationally affordable, familiar, and convenient. Does this similar of procedures matter? Yep. The generally squirrely outputs from many very popular systems are perceived as completely reliable. Unfortunately, the systems are performing within a narrow range of statistical confidence. Stated in a more harsh way, the outputs are just not particularly helpful.

In my conversation with the publisher, I asked several questions:

  1. Is there a smart system like Watson that you would rely upon to treat your teenaged daughter’s cancer? Or, would you prefer the human specialist at the Mayo Clinic or comparable institution?
  2. Is there a smart system that you want directing your only son in an operational mission in a conflict in a city under ISIS control? Or, would you prefer the human-guided decision near the theater about the mission?
  3. Is there a smart system you want managing your retirement funds in today’s uncertain economy? Or, would you prefer the recommendations of a certified financial planner relying on a variety of inputs, including analyses from specialists in whom your analyst has confidence?

When I asked these questions, the publisher looked uncomfortable. The reason is that the massive hyperbole and marketing craziness about fancy new systems creates what I call the Star Trek phenomenon. People watch Captain Kirk talking to devices, transporting himself from danger, and traveling between far flung galaxies. Because a mobile phone performs some of the functions of the fictional communicator, it sure seems as if many other flashy sci-fi services should be available.

Well, this Star Trek phenomenon does help direct some research. But in terms of products that can be used in high risk environments, the sci-fi remains a fiction.

Believing and expecting are different from working with products that are limited by computational resources, expertise, and informed understanding of key factors.

Humans, particularly those who need money to pay the mortgage, ignore reality. The objective is to close a deal. When it comes to information retrieval and content processing, today’s systems are marginally better than those available five or ten years ago. In some cases, today’s systems are less useful.

Read more

The Knowledge Quotient Saucisson Link: Back to Sociology in the 1970s

August 5, 2014

I have mentioned recent “expert analyses” of the enterprise search and content marketing sector. In my view, these reports are little more than gussied up search engine optimization (SEO), content marketing plays. See, for example, this description of the IDC report about “knowledge quotient”. Sounds good, right. So does most content marketing and PR generated by enterprise search vendors trying to create sustainable revenue and sufficient profits to keep the investors on their boats, in their helicopters, and on the golf course. Disappointing revenues are not acceptable to those with money who worry about risk and return, not their mortgage payment.

Some content processing vendors are in need of sales leads. Others are just desperate for revenue. The companies with venture money in their bank account have to deliver a return. Annoyed funding sources may replace company presidents. This type of financial blitzkrieg has struck BA Insight and LucidWorks. Other search vendors are in legal hot water; for example, one Fast Search & Transfer executive and two high profile Autonomy Corp. professionals. Other companies tap dance from buzzword to catchphrase in the hopes of avoiding the fate of Convera, Delphes, or Entopia. The marketing beat goes on, but the revenues for search solutions remains a challenge. How will IBM hit $10 billion in Watson revenues in five or six years? Good question, but I know the answer. Perhaps accounting procedures might deliver what looks like a home run for Watson. Perhaps the Jeopardy winner will have to undergo Beverly Hills-style plastic surgery? Will the new Watson look like today’s Watson? I would suggest that some artificiality could be discerned.

Last week, one of my two or three readers wrote to inform me that the phrase “knowledge quotient” is a registered trademark. One of my researchers told me that when one uses the phrase “knowledge quotient,” one should include the appropriate symbol. Omission can mean many bad things, mostly involving attorneys:


Another one of the goslings picked up the vaporous “knowledge quotient” and poked around for other uses of the word. Remember. I encountered this nearly meaningless quasi academic jargon in the title of an IDC report about content processing, authored by the intrepid expert Dave Schubmehl.

According to one of my semi reliable goslings, the phrase turned up in a Portland State University thesis. The authors were David Clitheroe and Garrett Long.


The trademark was registered in 2004 by Penn State University. Yep, that’s the university which I associate with an unfortunate management “issue.” According to Justia, the person registering the phrase “knowledge quotient” was a Penn State employee named Gene V J Maciol.

So we are considering a chunk of academic jargon cooked up to fulfill a requirement to get an advanced degree in sociology in 1972. That was about 40 years ago. I am not familiar with sociology or the concept knowledge quotient.

I printed out the 111 page document and read it. I do have some observations about the concept and its relationship to search and content processing. Spoiler alert: Zero, none, zip, nada, zilch.

The topic of the sociology paper is helping kids in trouble. I bristled at the assumptions implicit in the write up. Some cities had sufficient resources to help children. Certain types of faculties are just super. I assume neither of the study’s authors were in a reformatory, orphanage, or insane asylum.

Anyway the phrase “knowledge quotient” is toothless. It means, according to page 31:

the group’s awareness and knowledge of the [troubled youth or orphan] home.

And the “quotient” part? Here it is in all its glory:

A knowledge quotient reflects the group’s awareness and knowledge of the home.

Read more

Are HP, Google and IDC “Out of Square”?

August 2, 2014

Editor’s note: These three companies are involved in search and content processing. The opinion piece considers the question, “Is management unable to ensure standard business processes working in some businesses today?” Links have been inserted to open source information that puts some of the author’s comments in context. Comments about this essay may be posted using the Comments function for this blog.

Forgetting to Put Postage on Lots of Letters

I read “HP to Pay $32.5 Million to Settle Claims of Overbilling USPS.” (Keep in mind you may have to pony up some cash to access this article. Mr. Murdoch needs cash to buy more media properties. Do your part!)

The main point of the story, told by “real” journalists, is that the company failed “to comply with pricing terms.” The “real” news story asserts:

The DOJ also alleged H-P made misrepresentations during the negotiation of the contract with the USPS regarding its pricing and its plans to ensure it would provide the required most favored customer pricing.

I suppose any company can overlook putting postage on an envelope. When that happened to me in my day of snail mail activity, my local postmistress Claudette would give me a call and I would go to the Harrod’s Creek post office and buy a stamp.

I am no big time manager, but I understood that snail mail required a stamp. If you are a member of the House or Senate, the rules are different, but even the savvy Congressperson makes sure the proper markings appear on the absolutely essential missives.

My mind, which I admit is not as agile as it was when I worked at Halliburton Nuclear Utility Services, drew a dotted line between this seemingly trivial matter of goofing on an administrative procedure and the fantastic events still swirling around Hewlett Packard’s purchase of Autonomy, a vendor of search and content processing software.

A number of questions flapped slowly across my mind:

  1. Is HP management becoming careless with trivial matters like paying $11 billion for a company generating about $800 million in revenue and forgetting to pay the US post office?
  2. Is the thread weaving together such HP events as the mobile operating system affair, the HP tablet, the fumbling of the Alta Vista opportunity, and the apparent administrative goofs like the Autonomy purchase and this alleged postage stamp licking flawed administrative processes?
  3. What does the stamp sticking, Autonomy litigating, and alleged eavesdropping say about the company’s “git ‘er done” approach?

Larry the Cable Guy for President!!!

The attitude may apply to confident senior managers with incentives to produce revenue. Image source: http://profileengine.com/groups/profile/420722222/larry-the-cable-guy-for-president

I don’t think too much about Hewlett Packard. I do wonder if HP is an isolated actor or if companies with search interests are focusing on priorities that seem to be orthogonal to what I understand to be appropriate corporate behavior. One isolated event is highly suggestive.

But what do similar events suggest? In this short essai, I want to summarize two events. Both of these are interesting. For me, I see a common theme connecting the HP stamp licking and the two macro events. The glue fixing these in my mind is what seems to be a failure of management to pay attention to details.

But first, let’s go back in time for a modest effort penned by Edmund Spenser.
Read more

Gartner and Enterprise Search 2014

July 31, 2014

At lunch yesterday, several search aware people discussed a July 2014 Gartner study. One of the folks had a crumpled image of the July 2014 “magic quadrant.” This is, I believe, report number G00260831. Like other mid tier consulting firms, Gartner works hard to find something that will hook customers’ and prospects’ attention. The Gartner approach is focused on companies that purport to have enterprise search systems. From my vantage point, the Gartner approach is miles ahead of the wild and illogical IDC report about knowledge, a “quotient,” and “unlocking” hidden value. See http://bit.ly/1rpQymz. Now I have not fallen in love with Gartner. The situation is more like my finding my content and my name for sale on Amazon. You can see what my attorney complained about via this link, http://bit.ly/1k7HT8k. I think I was “schubmehled,” not outwitted.

I am the really good looking person. Image source: http://bit.ly/1rPWjN3

What the IDC report lacks in comprehensiveness with regard to vendors, Gartner mentions quite a few companies allegedly offering enterprise search solutions. You must chase down your local Garnter sales person for more details. I want to summarize the points that surfaced in our lunch time pizza fest.

First, the Gartner “study” includes 18 or 19 vendors. Recommind is on the Gartner list even though a supremely confident public relations “professional” named Laurent Ionta insisted that Recommind was not in the July 2014 Gartner report. I called her attention to report number G00260831 and urged her to use her “bulldog” motivation to contact her client and Gartner’s experts to get the information from the horse’s mouth as it were. (Her firm is www.lewispr.com and its is supported to be the Digital Agency of the Year and on the Inc 5000 list of the fastest growing companies in America.) I am impressed with the accolades she included in her emails to me. The fact that this person who may work on the Recommind account was unaware that Gartner pegged Recommind as a niche player seemed like a flub of the first rank. When it comes to search, not even those in the search sector may know who’s on first or among the chosen 19.

To continue with my first take away from lunch, there were several companies that those at lunch thought should be included in the Gartner “analysis.” As I recall, the companies to which my motley lunch group wanted Gartner to apply their considerable objective and subjective talents were:

  • ElasticSearch. This in my view is the Big Dog in enterprise search at the moment. The sole reason is that ElasticSearch has received an injection of another $70 million to complement the $30 odd million it had previously gather. Oh, ElasticSearch is a developer magnet. Other search vendors should be so popular with the community crowd.
  • Oracle. This company owns and seems to offer Endeca solutions along with RightNow/InQuira natural language processing for enterprise customer support, the fading Secure Enterprise Search system, and still popping and snapping Oracle Text. I did not mention to the lunch crowd that Oracle also owns Artificial Linguistics and Triple Hop technology. This information was, in my view, irrelevant to my lunch mates.
  • SphinxSearch. This system is still getting love from the MySQL contingent. Imagine no complex structured query language syntax to find information tucked in a cell.

There are some other information retrieval outfits that I thought of mentioning, but again, my free lunch group does not know what it does not know. Like many folks who discuss search with me, learning details about search systems is not even on the menu. Even when the information is free, few want to confuse fantasy with reality.

The second take away is that rational for putting most vendors in the niche category puzzled me. If a company really has an enterprise search solution, how is that solution a niche? The companies identified as those who can see where search is going are, as I heard, labeled “visionaries.” The problem is that I am not sure what a search visionary is; for example, how does a French aerospace and engineering firm qualify as a visionary? Was HP a visionary when it bought Autonomy, wrote off $8 billion, and initiated litigation against former colleagues? How does this Google supplied definition apply to enterprise search:

able to see visions in a dream or trance, or as a supernatural apparition?

The final takeaway for me was the failure to include any search system from China, Germany, or Russia. Interesting. Even my down on their heels lunch group was aware of Yandex and its effort in enterprise search via a Yandex appliance. Well, internationalization only goes so far I suppose.

I recall hearing one of my luncheon guests say that IBM was, according the “experts” at Gartner, a niche player.Gentle reader,  I can describe IBM many ways, but I am not sure it is a niche player like Exorbyte (eCommerce mostly) and MarkLogic (XML data management). Nope, IBM’s search embraces winning Jeopardy, creating recipes with tamarind, and curing assorted diseases. And IBM offers plain old search as part of DB2 and its content management products plus some products obtained via acquisition. Cybertap search, anyone? When someone installs, what used to be OmniFind, I thought IBM was providing an enterprise class information retrieval solution. Guess I am wrong again.

Net net: Gartner has prepared the ground for a raft of follow on analyses. I would suggest that you purchase a copy of the July 2014 Gartner search report. You may be able to get your bearings so you can answer these questions:

  1. What are the functional differences among the enterprise search systems?
  2. How does the HP Autonomy “solution” compare to the pre-HP Autonomy solution?
  3. What is the cost of a Google Search Appliance compared to a competing product from Maxxcat or Thunderstone? (Yep, two more vendors not in the Gartner sample.)
  4. What causes a company to move from being a challenger in search to a niche player?
  5. What makes both a printer company and a Microsoft-centric solution qualified to match up with Google and HP Autonomy in enterprise search?
  6. What are the licensing costs, customizing costs, optimizing costs, and scaling costs of each company’s enterprise search solution? (You can find the going rate for the Google Search Appliance at www.gsaadvantage.gov. The other 18? Good luck.)

I will leave you to your enterprise search missions. Remember. Gartner, unlike some other mid-tier consulting firms, makes an effort to try to talk about what its consultants perceive as concrete aspects of information retrieval. Other outfits not so much. That’s why I remain confused about the IDC KQ (knowledge quotient) thing, the meaning of hidden value, and unlocking. Is information like a bike padlock?

Stephen E Arnold, July 31, 2014

Next Page »