Artificial Intelligence: Duh? What?

December 13, 2014

I have been following the “AI will kill us”, the landscape of machine intelligence craziness, and “Artificial Intelligence Isn’t a Threat—Yet.”

The most recent big thinking on this subject appears in the Wall Street Journal, an organization in need of any type of intelligence: Machine, managerial, fiscal, online, and sci-fi.

Harsh? Hmm. The Wall Street Journal has been running full page ads for Factiva. If you are not familiar with this for fee service, think 1981. The system gathers “high value” content and makes it available to humans clever enough to guess the keywords that unlock, not answers, but a list of documents presumably germane to the keyword query. There are wrappers that make Factiva more fetching. But NGIA systems (what I call next generation information access systems) use the Factiva methods perfected 40 years ago as a utility.

These are Cheetos. nutritious, right? Will your smart kitchen let you eat these when it knows you are 30 pounds overweight, have consumed a quart of alcohol infused beverages, and ate a Snickers for lunch? Duh? What?

NGIA systems are sort of intelligent. The most interesting systems recurse through the previous indexes as the content processing system ingests data from users happily clicking, real time content streaming to the collection service, and threshold adjustments made either by savvy 18 year olds or some numerical recipes documented by Google’s Dr. Norvig in his standard text Artificial Intelligence.

So should be looking forward to the outputs of a predictive system pumping directly into an autonomous unmanned aerial vehicle? Will a nifty laser weapon find and do whatever the nifty gizmo does to a target? Will the money machine figure out why I need $300 for concrete repairs and decline to give it to me because the ATM “knows” the King of Concrete could not lay down in a feather bed. Forget real concrete.

The Wall Street Journal write up offers up this titbit:

Read more

Pi in the Sky: HP and IBM Race to Catch Up with NGIA Leaders

December 7, 2014

I read “HP Takes Analytics to the Cloud in Comeback to IBM’s Watson.” The write up is darned interesting. Working through the analysis reminded me that HP does not realized that Autonomy’s 1999 customer BAE Systems has been working with analytics from the cloud for—what?—15 years? What about Recorded Future, SAIC, and dozens of other companies running successful businesses with this strategy?

dreadnoughtus 2

The article points out that two large and somewhat pressured $100 billion companies are innovating like all get out. I learned:

Although it [Hewlett Packard] may not win any trivia contests in the foreseeable future, the hardware maker’s entry into the world of end-of-end analytics does hold up to Watson where the rubber meets the road in the enterprise…But the true equalizer for the company is IDOL, the natural language processing and search it obtained through the $11.7 billion acquisition of Autonomy Corp. PLC in 2011, which reduces the gap between human and machine interaction in a similar fashion to IBM’s cognitive computing platform.

Okay. IBM offers Watson, which was supposed to generate a billion or more by 2015 and then surge to $10 billion in revenue in another four or five years. What is Watson? As I understand it, Watson is open source code, some bits and pieces from IBM’s research labs, and wrappers that convert search into a towering giant of artificial intelligence. Why doesn’t IBM focus on its next generation information access units that are exciting and delivering services that customers want. i2 does not produce recipes incorporating tamarind. Cybertap does not help sick teenagers.

HP, on the other hand, owns the Autonomy Digital Reasoning Engine and the Integrated Data Operating Layer. These incorporate numerical recipes based on the work of Bayes, LaPlace, and Markov, among others. The technology is not open source. Instead, IDOL is a black box. HP spent $11 billion for Autonomy, figured out that it overpaid, wrote off $5 billion or so, and launched a global scorched earth policy for its management methods. Recently, HP has migrated DRE and IDOL to the cloud. Okay, but HP is putting more effort into accusing Autonomy of fooling HP. Didn’t HP buy Autonomy after experts reviewed the deal, the technology, and the financial statements? HP has lost years in an attempt to redress a perceived wrong. But HP decided to buy Autonomy.

Read more

Enterprise Search: Gritters Are Ready, Enterprise Info Highway Is Resistant

December 3, 2014

In UK talk, a gritter is a giant machine that dumps sand (grit) on a highway to make it less slippery. Enterprise search gritters are ready to dump sand on my forthcoming report about next generation information access.

The reason is that enterprise search is running on a slippery surface. The potential customers are coated in Teflon. The dust up between HP and Autonomy, the indictment of a former Fast Search & Transfer executive, and the dormancy of some high flying vendors (Dieselpoint, Hakia, Siderean Software, et al)—these are reasons why enterprise customers are looking for something that takes the company into information access realms that are beyond search. Here’s an example: “Accounting Differences, Not Fraud, Led to HP’s Autonomy Write Down.” True or false, the extensive coverage of the $11 billion deal and the subsequent billions in write down has not built confidence in the blandishments of the enterprise search vendors.

Image source:

Enter the gritters. Enterprise search vendors are prepping to dump no skid bits on their prospects. Among the non skid silica will be pages from mid tier consultants’ reports about fast movers and three legged rabbits. There will be conference talks that pummel the audience with assertions about the primacy of search. There will be recycled open source technology and “Fast” think packaged as business intelligence. There will be outfits that pine for the days of libraries with big budgets pitching rich metadata to trucking companies and small medical clinics who rightly ask, “What’s metadata?”

Read more

Enterprise Search: Fee Versus Free

November 25, 2014

I read a pretty darned amazing article “Is Free Enterprise Search a Game Changer?” My initial reaction was, “Didn’t the game change with the failures of flagship enterprise search systems?” And “Didn’t the cost and complexity of many enterprise search deployments fuel the emergence of the free and open source information retrieval systems?”

Many proprietary vendors are struggling to generate sustainable revenues and pay back increasingly impatient stakeholders. The reality is that the proprietary enterprise search “survivors” fear meeting the fate of  Convera, Delphes, Entopia, Perfect Search, Siderean Software, TREX, and other proprietary vendors. These outfits went away.


Many vendors of proprietary enterprise search systems have left behind an environment in which revenues are simply not sustainable. Customers learned some painful lessons after licensing brand name enterprise search systems and discovering the reality of their costs and functionality. A happy quack to for this image of desolation.

Other vendors, faced with mounting costs and zero growth in revenues, sold their enterprise search companies. The spate of sell outs that began in the mid 2000s were stark evidence that delivering information retrieval systems to commercial and governmental organizations was difficult to make work.

Consider these milestones:

Autonomy sold to Hewlett Packard. HP promptly wrote off billions of dollars and launched a fascinating lawsuit that blamed Autonomy for the deal. HP quickly discovered that Autonomy, like other complex content processing companies, was difficult to sell, difficult to support, and difficult to turn into a billion dollar baby.

Convera, the product of Excalibur’s scanning legacy and ConQuest Software, captured some big deals in the US government and with outfits like the NBA. When the system did not perform like a circus dog, the company wound down. One upside for Convera alums was that they were able to set up a consulting firm to keep other companies from making the Convera-type mistakes. The losses were measured in the tens of millions.

Read more

Enterprise Search: Essentially Marginalized to Good Enough

November 9, 2014

I use Google Trends to see what’s hot and what’s not in the world of information retrieval. If you want to use the free version of Google Trends, navigate to and explore. That’s some of what Google does to make decisions about how much of Larry Page’s “wood” to put behind the Google Search Appliance eight ball.


I plugged in “enterprise search.” When one allows Google to output its version of the popularity of the term, you get this graph. It shows a downward trend but the graph is without much context. The pale lettering does not help. Obviously Googlers do not view the world through trifocals with 70 year old eyes. Here’s the Trends’ output for “enterprise search”:


Now let’s add some context. From the “enterprise search” Trends’ output, click the pale blue plus and add this with quotes: “big data.” Here’s the output for this two factor analysis:


One does not have to be an Ivy League data scientist to see the difference between the hackneyed “enterprise search” and more zippy but meaningless “Big Data.” I am not saying Big Data solutions actually work. What’s clear is that pushing enterprise search is not particularly helpful when the Trends’ reveal a flat line for years, not hours, not days, not months–years.

I think it is pretty clear why I can assert with confidence that “enterprise search” appears to be a non starter. I know why search vendors persist in telling me what “enterprise search” is. The vendors are desperate to find the grip that a Tupinambis  lizard possesses. Instead of clinging to a wall in the sun at 317 R. Dr. Emílio Ribas (Cambui)  (where I used to live in Campinas, SP), the search vendors are clinging to chimera. The goal is to make sales, but if the Google data are even sort of correct, enterprise search is flat lining.

Little wonder that consultant reports like those from the mid tier crowd try to come up with verbiage that will create sales leads for the research sponsors; case in point, knowledge quotient. See Meme of the Moment for a fun look at IDC’s and search “expert” Dave Schubmehl’s most recent attempt to pump up the music.

The question is, “What is generating revenue?” In a sense, excitement surrounds vendors who deliver solutions. These include search, increasingly supplied by open source software. Elasticsearch is zipping along, but search is not the main dish. Search is more like broccoli or carrots.

The good news is that there is a group of companies, numbering about 30, which have approached search differently. As a result, many of these companies are growing and charting what I call “next generation search.”

Want to know more? Well, that’s good. Watch for my coverage of this sector in the weeks and months ahead. I will toss a small part of our research into my November Information Today column. A tiny chunk. Keep that in mind.

In the meantime, think critically about the craziness flowing from many mid tier or azure chip consulting firms. Those “outputs” are marketing, self aggrandizing, and, for me, downright silly. What’s that term for doing trivial actions again and again?

Stephen E Arnold, November 9, 2014

Enterprise Search, Knowledge Management, & Customer Service: Some of the Study Stuff Ups Evident?

October 27, 2014

One of my two or three readers sent me a link to “The 10 Stuff Ups We All Make When Interpreting Research.” The article walks through some common weaknesses individuals make when “interpreting research.” I don’t agree with the “all” in the title.

This article arrived as I was reading a recent study about search. As an exercise on a surprisingly balmy Sunday afternoon in Kentucky, I jotted down the 10 “stuff ups” presented in the Interpreting Research article. Here they are in my words, paraphrased to sidestep plagiarism, copyright, and Google duplication finder issues:

  1. One study, not a series of studies. In short, an anomaly report.
  2. One person’s notion of what is significant may be irrelevant.
  3. Mixing up risk and the Statistics 101 notion of “number needed to treat” gets the cart before the horse.
  4. Trends may not be linear.
  5. Humans find what they want to find; that is, pre existing bias or cooking the study.
  6. Ignore the basics and layer cake the jargon.
  7. Numbers often require context. Context in the form of quotes in one on one interviews require numbers.
  8. Models and frameworks do not match reality; that is, a construct is not what is.
  9. Specific situations do matter.
  10. Inputs from colleagues may not identify certain study flaws.

To test the article’s premises, I I turned to a study sent to me by a persona named Alisa Lipzen. Its title is “The State of Knowledge Management: 2014. Growing role & Value of Unified Search in Customer Service.” (If the link does not work for you, you will have to contact either of the sponsors, the Technology Services Industry Association or Coveo, an enterprise search vendor based in Canada.) You may have to pay for the report. My copy was free. Let’s do a quick pass through the document to see if it avoids the “stuff ups.”

First, the scope of the report is broad:

1. Knowledge management. Although I write a regular column for KMWorld, I must admit that I am not able to define exactly what this concept means. Like many information access buzzwords, the shotgun marriage of “knowledge” and “management” glues together two abstractions. In most usages, knowledge management refers to figuring out what a person “knows” and making that information available to others in an organization. After all, when a person quits, having access to that person’s “knowledge” has a value. But “knowledge” is as difficult to nail down as “management.” I suppose one knows it when one encounters it.

2. Unified search. The second subject is “unified search.” This is the idea that a person can use a single system to locate information germane to a query from a single search box. Unified suggests that widely disparate types of information are presented in a useful manner. For me, the fact that Google, arguably the best resourced information access company, has been unable to deliver unified search. Note that Google calls its goal “universal search.” In the 1980s, Fulcrum Technologies (Ottawa, Canada) search offered a version of federated search. In 2014, Google requires that a user run a query across different silos of information; for example, if I require informatio0n about NGFW I have to run the query across Google’s Web index, Google scholarly articles, Google videos, Google books, Google blogs, and Google news. This is not very universal. Most “unified” search solutions are marketing razzle dazzle for financial, legal, technical, and other reasons. Therefore, organizations have to have different search systems.

3. Customer service. This is a popular bit of jargon. The meaning of customer service, for me, boils down to cost savings. Few companies have the appetite to pay for expensive humans to deal with the problems paying customers experience. Last week, I spent one hour on hold with an outfit called Wellcare. The insurance company’s automated system reassured me that my call was important. The call was never answered. What did I learn. Neither my call nor my status as a customer was important. Most information access systems applied to “customer service” are designed to drive the cost of support and service as low as possible.


“Get rid of these expensive humans,” says the MBA. “I want my annual bonus.”

I was not familiar with the TSIA. What is its mission? According the the group’s Web site:

TSIA is organized around six major service disciplines that address the major service businesses found in a typical technology company.

Each service discipline has its own membership community led by a seasoned research executive. Additionally, each service discipline has the following:

In addition, we have a research practice on Service Technology that spans across all service discipline focus areas.

My take is that TSIA is a marketing-oriented organization for its paying members.

Now let’s look at some of the the report’s key findings:

The people, process, and technology components of technology service knowledge management (KM) programs. This year’s survey examined core metrics and practices related to knowledge capture, sharing, and maintenance, as well as forward-looking elements such as video, crowd sourcing, and expertise management. KM is no longer just of interest to technical support and call centers. The survey was open to all TSIA disciplines, and 50% of the 400-plus responses were from groups other than support services, including 24% of responses from professional services organizations.

Read more

The AIIM Enterprise Search Study 2014

October 10, 2014

I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:

Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.

The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).

Some History for the Young Reader of Beyond Search

AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).

AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.

The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.

But AIIM evokes for me:

Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.

I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.

Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.

Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive  for Beyond Search.)

Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].

Talk about good news for vendors of findability solutions. If  one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.

Finding:  47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.

The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.

Read more

IDC Tweets, IBM, and Content Marketing

September 29, 2014

Some Backstory

In 2012 and 2013, IDC sold my content with my name and Dave Schubmehl’s. These were nifty IDC “official” reports. The only hitch in the git along is that IDC did not trouble itself to issue a contract, get my permission, or tell me what they were doing with research my team prepared. The deal was witnessed by a law librarian, and I have a stack of emails about my research into such open source companies as Attivio, ElasticSearch (one of the disruptors of the enterprise search market), IBM (the subject of the IDC twit storm), Lucid Imagination (now Lucid Works which I write when I feel playful as Lucid works, really?), and eight other companies.

Hit by a twit storm. Rough seas ahead. Image from

In 2012, I had the open source research. IDC wanted the open source content to use in a monograph. So in front of a law librarian, IDC’s search “expert” thought the exchange of my information for open source intelligence, money, and stuff to sell was a great idea. (I have a file of email from IDC to me about what IDC wanted, but I never got a contract. But IDC had my research. Ah, those administrative delays.) IDC, however, was organized enough to additions to my company research like an open source industry overview.

In an odd approach to copyright, IDC did not produce a contract but it produced reports about four open source companies. Mr. Schubmehl and IDC just went about producing what were recycled company reports and trying to sell them at $3,500 a whack. Is that value or an example of the culture of narcissism? It may come as a surprise to you, gentle reader, but I sell research for money. I have a business model and it has worked for about 40 years. When an outfit uses the research without issuing a contract, I have to start thinking about such issues as fairness, integrity, copyright, and name surfing. Call me idiosyncratic, but when my name is used without my permission, I wonder how a big and allegedly respected organization can operate like a BearStearns-type senior executive.

Then, the straw that broke the proverbial camel’s back, a librarian told me that IDC was selling a report with my name and Mr. Schubmehl’s on Amazon. Wow, Amazon, the Wal-Mart for the digital age. The reports, now removed from Amazon’s blue light special shelf cost $3,500. Not bad for eight pages of  information based on my year long research investment into the wild and volatile world of open source search and content processing. Surf’s up for Mr. Schubmehl.

Well, IDC after some prodding by my very gentle legal gerbil stopped selling my work. We received a proposal that offered me a pittance for a guarantee that I would not talk or write about this name surfing, unauthorized resale of my information on Amazon, and the flubs of Mr. Schubmehl.

My legal gerbil rejected IDC’s lawyer crafted “deal,” and I am now converting my IDC misadventure  into a metaphor for some of the deeper issues associated with “experts” and certain professional services firms. My legal gerbil suggested a significantly higher fee, but, like many of that ilk, the gerbil broke my heart.

Hence, IDC and Mr. Schubmehl’s tweets and twit storm are on my fragile ship’s radar. Let’s review the IBM IDC Schubmehl twit storm on just one day in September 2014. Trigger warning: Do not emulate the IDC Schubmehl method for your content marketing program. One day of tweets only generates a lot of twit.

Now to the Twit Storm Unleashed on September 16, 2014

Using my Overflight system, I monitor IDC tweets. Quite an interesting series of tweets appears on September 16, 2014. Mr. Schubmehl posted 25 tweets about IBM Watson.

Here are three examples of the Watson content content to which his name was attached::

  • September 16, 2014. #WatsonAnalytics uses Watson cognitive technologies to ingest structured data and find relationships – Robin Grosset & Dan Wolfson
  • September 16, 2014 Combo of cognitive with cloud analytics improves process, analysis and decision making – cognitive will change all mkts #WatsonAnalytics
  • September 16, 2014 #WatsonAnalytics will be using a freemium model….first time for IBM…

Obviously there is nothing wrong with a tweet about an IBM product. What’s one more twit emission in a flow of several hundred thousand 144 character text outputs.

There is nothing illegal with two dozen tweets about IBM. What two dozen tweets do is make me laugh and see this content marketing effort as fodder for corporate weirdness.

Also, this IBM twit storm is not on the Miley Cyrus or Lady Gaga scale, but it is notable because it is a one day twit storm quite unlike the Jeopardy journey. Quite a marketing innovation: getting an alleged “expert” to craft  16 “original” tweets in one day and issue seven retweets of tweets from others who are fans of Big Blue. A few Schubmehl tweets on the 16th illustrated diversity; for example, “The FBI’s Facial Recognition System Is Here.” Hmm. The FBI and facial recognition. I wonder why one is interested in this development.

The terms mentioned in these IBM centric tweets on September 16, 2014, reveal the marketing jargon that IBM is using to generate revenue from the game show winning technology. My list of buzzwords from the tweets read like a who’s who of blogosphere and venture oriented yak:

  • Automated data cleansing
  • Analytics (cloud based)
  • Big Data
  • Cognitive (system and capabilities)
  • Data explorer
  • Democratizing
  • Freemium
  • Natural Language Computing
  • Natural Language Query.

From this list of buzzwords my favorites are “cognitive,” “Big Data,” and the number one silly word “Freemium.” Imagine. Freemium from IBM. Imagine.

My Interpretation of the Twit Storm

Let me capture several preliminary observations:

First, the Schubmehl Twitter activity on September 16, 2014 focuses mostly on IBM’s challenged Watson business development effort. The cluster of tweets on the 16th suggest a somewhat ungainly and down-market content marketing play.

Did Mr. Schubmehl wake up on the 16th of September and decide to crank out Watson centric tweets? Did IBM pay IDC and Mr. Schubmehl to do some content marketing like thousands of PR firms do each day? We even have these outfits in Harrod’s Creek, Kentucky to flog auto sales, bourbon, and cheesy festivals in Middletown, Kentucky.

Here’s a question: “How many tweets does a McKinsey or Bain type of consulting firm issue on a single day for a single product that seems to be struggling for revenue?” If you know, please, use the comments section of this blog to provide some factoids.

Second, the tweets provide the reader with a list of what seem to be IBM Watson aficionados or employees who have the job of making the shotgun marriage of open source code, legacy Almaden technology, and proprietary scripts into a billion dollar revenue producer soon, very soon, gentle reader. The individuals mentioned in the September 16, 2014, tweets include:

  • Steve Gold, Baylor University
  • Robin Grosset, Distinguished engineer Watson Analytics.
  • Dan Wolfson, IBM Distinguished Engineer
  • Bob Picciano, Senior vice president, IBM information and analytics group.

Perhaps Mr. Gold is objective? I ask, “Do the other three IBM wizards looking at the world through IBM tinted spectacles when reading their business objectives for the current fiscal year?” I asked myself, “Should I trust these individuals who presumably are also “experts” in all things related to Watson?” My preliminary answer is, “Not for an objective view of the game show winning Watson.”

Third, what’s the payoff of this twit storm for IBM? Did IBM expect me to focus on the Schubmehl twit storm and convert the information into my idea of a 10 minute stand up comedy routine to deliver at the upcoming intelligence and law enforcement conference in nine days? Is it possible that “doing social media” looks good on a weekly report when an executive does not have juicy revenue numbers to present? The value of the effort strikes me as modest. In fact, viewed as a group, the tweets could be interpreted as a indicator of IBM’s slide into desperation marketing?

What about consulting firms and their ability to pump out high margin revenue?

Outfits like Gerson Lehrman Group have put the squeeze on mid tier consulting firms. The bottom feeders with its middle school teacher and poet contingent are not likely to sell to the IBMs of the world. GLG types companies are also nipping at the low end business of the blue chip outfits like Bain, Boston Consulting, and even McKinsey.

Put GLG can deliver to a client retired professionals from blue chip firms and on point experts. As a result, GLG has made life very, very tough for the mid tier outfits. Why pay $50,000 for an unproven “expert” when you can buy a person with a pedigree for an hour and pay a few hundred bucks when you need a factoid or an opinion? I consider IDC’s move to content marketing indicative of a fundamental shift in the character of a consulting firm’s business. The shift to low level PR work seems out of character for a professionals services with a commitment to intellectual rigor.

Every few days I learn that something called generates a list of content marketing leaders. Will IDC appear on this list?

For those who depend on lower- or mid tier consulting firms for professional counsel, how would you answer these questions:

  1. What is the intellectual substance behind pronouncements? Is there original research underpinning pronouncements and projections, or are the data culled from secondary sources and discussions with paying customers?
  2. What is the actual relationship between a mid tier consulting firm and the companies discussed in “authoritative” reports? Are these reports and projects inclusions (a fancy word for ads) or are they objective discussions of companies?
  3. Are the experts presented as “experts” actually experts or are they individuals who want to hit revenue goals while keeping costs as low as possible?

I don’t have definitive answers to these questions. Perhaps one day I can use a natural language query to tap into Big Data and rely on cognitive methods to provide answers.

For now, a one day twit storm is a wonderful example of how not to close deals, build reputations, and stimulate demand for advanced technology offered via a “Freemium” model. What the heck does that mean anyway?

Stephen E Arnold, September 29, 2014

New York Times Online: An Inside View

September 24, 2014

Check out the presentation “The Surprising Path to a Faster”

I was surprised at some of the information in the slide deck. First, I thought the New York Times was first online in the 1970s via LexisNexis.


This is not money. See

I thought that was an exclusive deal and reasonably profitable for both LexisNexis and the New York Times. When the newspaper broke off that exclusive to do its own thing, the revenue hit on the New York Times was immediate. In addition, the decision had significant cost implications for the newspaper.

The New York Times needed to hire people who allegedly create an online system. The newspaper had to license software, write code, hire consultants, maintain computers not designed to set type and organize circulation. The New York Times had to learn on the fly about converting content for online content processing. Learning that one does not know anything after thinking one knew everything is a very, very inefficient way to get into the online business. In short, the blow off of the LexisNexis deal added significant initial and then ever increasing on-going costs to the New York Times Co. I don’t think anyone at the New York Times has ever sat down to figure out the cost of that decision to become the Natty Bumpo of the newspaper publishing world.

I had heard that the newspaper raked in the 1970s seven figures a year while LexisNexis did the heavy lifting. Yep, that included figuring out how to put the newspaper content on tape into a suitable form for LexisNexis’ mainframe system. Figuring this out inside the New York Times in the early 1990s made this sound: Crackle, crackle, whoosh. That is the sound of a big company burning money not for a few months but for DECADES, folks. DECADES.


Photo from US Fish and Wildlife.

When the newspaper decided that it could do an online service itself and presumably make more money, the newspaper embarked on the technical path discussed in the slide deck. Few recall that the fellow who set up the journal Online worked on the online version of the newspaper. I recall speaking to that person shortly after he and the newspaper parted ways. He did not seem happy with budgets, technology, or vision. But, hey, that was decades ago.


How some information companies solve common problems with new tools. Image thanks to at

In the slide deck, we get an insider’s view of trying to deal with the problem of technical decisions made decades ago. What’s interesting is that the cost of the little adventure by the newspaper does not reflect the lost revenue from the LexisNexis exclusive. The presentation does illustrate quite effectively how effort cannot redress technical decisions made in the past.

This is an infrastructure investment problem. Unlike a physical manufacturing facility, an information centric business is difficult to re-engineer. There is the money problem. It costs a lot to rip and replace or put up a new information facility and then cut it over when it is revved and ready. But information centric businesses have another problem. Most succeed by virtue of luck. The foundation technology is woven into the success of the business, but in ways that are often non replicable.

The New York Times killed off the LexisNexis money flow. Then it had to figure out how to replicate that LexisNexis money flow and generate a bigger profit. What happened? The New York Times spent more money creating the various iterations of the Times Online, lost the LexisNexis money, and became snared in the black hole of trying to figure out how to make online information generate lots of dough. I am suggesting that the New York Times may be kidding itself with the new iteration of the Times Online service.

Read more

Autumn Approaches: Time for Realism about Search

September 1, 2014

Last week I had a conversation with a publisher who has a keen interest in software that “knows” what content means. Armed with that knowledge, a system can then answer questions.

The conversation was interesting. I mentioned my presentations for law enforcement and intelligence professionals about the limitations of modern and computationally expensive systems.

Several points crystallized in my mind. One of these is addressed, in part, in a diagram created by a person interested in machine learning methods. Here’s the diagram created by SciKit:


The diagram is designed to help a developer select from different methods of performing estimation operations. The author states:

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data.

First, notice that there is a selection process for choosing a particular numerical recipe. Now who determines which recipe is the right one? The answer is the coding chef. A human exercises judgment about a particular sequence of operation that will be used to fuel machine learning. Is that sequence of actions the best one, the expedient one, or the one that seems to work for the test data? The answer to these questions determines a key threshold for the resulting “learning system.” Stated another way, “Does the person licensing the system know if the numerical recipe is the most appropriate for the licensee’s data?” Nah. Does a mid tier consulting firm like Gartner, IDC, or Forrester dig into this plumbing? Nah. Does it matter? Oh, yeah. As I point out in my lectures, the “accuracy” of a system’s output depends on this type of plumbing decision. Unlike a backed up drain, flaws in smart systems may never be discerned. For certain operational decisions, financial shortfalls or the loss of an operation team in a war theater can be attributed to one of many variables. As decision makers chase the Silver Bullet of smart, thinking software, who really questions the output in a slick graphic? In my experience, darned few people. That includes cheerleaders for smart software, azure chip consultants, and former middle school teachers looking for a job as a search consultant.

Second, notice the reference to a “rough guide.” The real guide is understanding of how specific numerical recipes work on a set of data that allegedly represents what the system will process when operational. Furthermore, there are plenty of mathematical methods available. The problem is that some of the more interesting procedures lead to increased computational cost. In a worst case, the more interesting procedures cannot be computed on available resources. Some developers know about N=NP and Big O. Others know to use the same nine or ten mathematical procedures taught in computer science classes. After all, why worry about math based on mereology if the machine resources cannot handle the computations within time and budget parameters? This means that most modern systems are based on a set of procedures that are computationally affordable, familiar, and convenient. Does this similar of procedures matter? Yep. The generally squirrely outputs from many very popular systems are perceived as completely reliable. Unfortunately, the systems are performing within a narrow range of statistical confidence. Stated in a more harsh way, the outputs are just not particularly helpful.

In my conversation with the publisher, I asked several questions:

  1. Is there a smart system like Watson that you would rely upon to treat your teenaged daughter’s cancer? Or, would you prefer the human specialist at the Mayo Clinic or comparable institution?
  2. Is there a smart system that you want directing your only son in an operational mission in a conflict in a city under ISIS control? Or, would you prefer the human-guided decision near the theater about the mission?
  3. Is there a smart system you want managing your retirement funds in today’s uncertain economy? Or, would you prefer the recommendations of a certified financial planner relying on a variety of inputs, including analyses from specialists in whom your analyst has confidence?

When I asked these questions, the publisher looked uncomfortable. The reason is that the massive hyperbole and marketing craziness about fancy new systems creates what I call the Star Trek phenomenon. People watch Captain Kirk talking to devices, transporting himself from danger, and traveling between far flung galaxies. Because a mobile phone performs some of the functions of the fictional communicator, it sure seems as if many other flashy sci-fi services should be available.

Well, this Star Trek phenomenon does help direct some research. But in terms of products that can be used in high risk environments, the sci-fi remains a fiction.

Believing and expecting are different from working with products that are limited by computational resources, expertise, and informed understanding of key factors.

Humans, particularly those who need money to pay the mortgage, ignore reality. The objective is to close a deal. When it comes to information retrieval and content processing, today’s systems are marginally better than those available five or ten years ago. In some cases, today’s systems are less useful.

Read more

Next Page »