Sponsors of Two Content Marketing Plays

July 27, 2014

I saw some general information about allegedly objective analyses of companies in the search and content processing sector.

The first report comes from the Gartner Group. The company has released its “magic quadrant” which maps companies by various allegedly objective methods into leaders, challengers, niche players, and visionaries.

The most recent analysis includes these companies:

BA Insight
Dassault Exalead
Expert System
HP Autonomy IDOL
Lucid Works
Perceptive ISYS Search

There are several companies in the Gartner pool whose inclusion surprises me. For example, Exorbyte is primarily an eCommerce company with a very low profile in the US compared to Endeca or New Zealand based SLI Systems. Expert System is a company based in Italy. This company provides semantic software which I associated with mobile applications. IHS (International Handling Service) provides technical information and a structured search system. MarkLogic is a company with XML data management software that has landed customers in publishing and the US government. With an equally low profile is Mindbreeze, a home brew search system funded by Microsoft-centric Fabasoft. Dassault Exalead, PolySpot, and Sinequa are French companies offering what I call “information infrastructure.” Search is available, but the approach is digital information plumbing.

The IDC report, also allegedly objective, is sponsored by nine companies. These outfits are:

Earley & Associates
HP Autonomy IDOL

This collection of companies is also eclectic. For example, Earley & Associates does indexing training, consulting, and does not have a deep suite of enterprise software. IHS (International Handling Services) appears in the IDC report as a knowledge centric company. I think I understand the concept. Technical information in Extensible Markup Language and a mainframe-style search system allow an engineer to locate a specification or some other technical item like the SU 25. Lexalytics is a sentiment analysis company. I do not consider figuring out if a customer email is happy or sad the same as Coveo’s customer support search system. Smartlogic is interesting because the company provides tools that permit unstructured content to be indexed. Some French vendors call this process “fertilization.” I suppose that for purists, indexing might be just as good a word.

What unifies these two lists are the companies that appear in both allegedly objective studies:

IHS (International Handling Service)

My hunch is that the five companies appearing in both lists are in full bore, pedal to the metal marketing mode.

Attivio and Coveo have ingested tens of millions in venture funding. At some point, investors want a return on their money. The positioning of these two companies’ technologies as search and the somewhat unclear knowledge quotient capability suggest that implicit endorsement by mid tier consulting firms will produce sales.

The appearance of HP and IBM on each list is not much of a surprise. The fact that Oracle Endeca is not in either report suggests that Oracle has other marketing fish to fry. Also, Elasticsearch, arguably the game changer in search and content processing, is not in either pool may be evidence that Elasticsearch is too busy to pursue “expert” analysts laboring in the search vineyard. On the other hand, Elasticsearch may have its hands full dealing with demands of developers, prospects, and customers.

IHS has not had a high profile in either search or content processing. The fact that International Handling Services appears signals that the company wants to market its mainframe style and XML capable system to a broader market. Sinequa appears comfortable with putting forth its infrastructure system as both search and a knowledge engine.

I have not seen the full reports from either mid tier consulting firm. My initial impression of the companies referenced in the promotional material for these recent studies is that lead generation is the hoped for outcome of inclusion.

Other observations I noted include:

  1. The need to generate leads and make sales is putting multi-company reports back on the marketing agenda. The revenue from these reports will be welcomed at IDC and Gartner I expect. The vendors who are on the hook for millions in venture funding are hopeful that inclusion in these reports will shake the money trees from Boston to Paris.
  2. The language used to differentiate and describe the companies referenced in these two studies is unlikely to clarify the differences between similar companies or make clear the similarities. From my point of view, there are few similarities among the companies referenced in the marketing collateral for the IDC and Gartner study.
  3. The message of the two reports appears to be “these companies are important.” My thought is that because IDC and Gartner assume their brand conveys a halo of excellence, the companies in these reports are, therefore, excellent in some way.

Net net: Enterprise search and content processing has a hurdle to get over: Search means Google. The companies in these reports have to explain why Google is not the de facto choice for enterprise search and then explain how a particular vendor’s search system is better, faster, cheaper, etc.

For me, a marketer or search “expert” can easily stretch search to various buzzwords. For some executives, customer support is not search. Customer support uses search. Sentiment analysis is not search. Sentiment analysis is a signal for marketers or call center managers. Semantics for mobile phones, indexing for SharePoint content, and search for a technical data sheet are quite different from eCommerce, business intelligence, and business process engineering.

A fruit cake is a specific type of cake. Each search and content processing system is distinct and, in my opinion, not easily fused into the calorie rich confection. A collection of systems is a lumber room stuffed with different objects that don’t have another place in a household.

The reports seem to make clear that no one in the mid tier consulting firms or the search companies knows exactly how to position, explain, and verify that content processing is the next big thing. Is it?

Maybe a Google Search Appliance is the safe choice? IBM Watson does recipes, and HP Autonomy connotes high profile corporate disputes.

Elasticsearch, anyone?

Stephen E Arnold, July 27, 2014

Honk Tracks Search Marketing Memes

July 26, 2014

The Honk page for Beyond Search now tracks information retrieval marketing memes. The information at http://bit.ly/1uqWxfA now includes a discussion of a coinage designed to sell “search” without using the word “search.” Is the approach likely to reverse the fortunes of search vendors who face increasingly intense uphill battles to generate substantive revenue? The Honk “Meme of the Moment” updates will keep you posted.

Stephen E Arnold, July 26, 2014

What SEO, Google, and Marketing Hath Wrought

July 24, 2014

Myths and Misreporting About Malaysia Airlines Flight 17” is an interesting article. I found the examples of misinformation, disinformation, and reformation thought provoking. The write up spotlights a few examples of fake or distorted information about an airline’s doomed flight.

As i considered the article and its appearance in a number of news alerting services, I shifted from the cleverness of the content to a larger and more interesting issue. From the revelations about software that can alter inputs to an online survey (see this link) to fake out “real” news, determining what’s sort of accurate from what’s totally bogus is becoming more and more difficult. I have professional researchers, librarians, and paralegals at my disposal. Most people do not. No longer surprising to me is the email from one of the editors working to fact check my for fee columns. The questions range from “Did IBM Watson invent a recipe with tamarind in its sauce?” to “Do you have a source for the purchase price of Vivisimo?” Now I include online links for the facts and let the editors look up my source without the intermediating email. Even then, there is a sense of wonderment when an editor expresses surmise that what he or she believed is, in fact, either partially true, bogus, or unexpected. Example: “Why do French search vendors feel compelled to throw themselves at the US market despite the historically low success rates?” The answer is anchored in [a] French tax regulations, [b] French culture, particularly when a scruffy entrepreneur from the wrong side of the educational tracks tries to connect with a French money source from the right side of the educational tracks, [c] the lousy financial environment for certain high technology endeavors, and [d] selling to the big US markets looks like a slam dunk, at least for a while.

ww1 fixed copy

The reason for the disconnect between factoids and information manipulation boils down to a handful of factors. Let me highlight several:

First, the need for traffic to Web sites (desktop, mobile, app instances, etc.) is climbing up the hierarchy of business / personal needs. You want traffic today? The choices are limited. Pay Google $25,000 or more a month. Pay an SEO (search engine optimization “expert” whatever you can negotiate. Create content, do traditional marketing, and trust that the traffic follows the “if you build it they will come” pipedream. Most folks just whack at getting traffic and use increasingly SEOized headlines as a low cost way of attracting attention. Think headlines from the National Enquirer in the 1980s.

Second, Google has to pump lots of money into plumbing, infrastructure, moon shots, operational costs  (three months at the Stanford Psych unit, anyone?) At the same time, mobile is getting hot. Two problems plague the sunny world of the GOOG. [a] Revenue from mobile ads is less than from traditional ads. Therefore, Google has to find a way to keep that 2006 style revenue flowing. Because there is a systemic shift, the GOOG needs money. One way to get it is to think about Adwords as a machine that needs tweaking. How does one sell Adwords to those who do not buy enough today? You ponder the question, but it involves traffic to a Web site. [b] Google gets bigger so the “think cheap” days of yore are easier to talk about than deliver. A 15 year old company is getting more and more expensive to run. The upcoming battles with Amazon and Samsung will not be cheap. The housing developments, the Loon balloons, and the jet fleet, smart people, and other oddments of the company—money pits. If the British government can fiddle traffic, is it possible that others have this capability too?

Third, marketing, an easy whipping boy or girl as the case may be. After spending lots and lots on Web sites and apps, some outfits’ CFOs are asking, “What do we get for this spending?” In order to “prove” their worth and stop the whipping, marketers have kicked into overdrive. Baloney, specious, half baked, crazy, and recycled content is generated by the terabyte drive. The old fashioned ideas about verification, accuracy, and provenance are kicked to the side of the road.

Net net: running a query on a search engine, accepting the veracity of a long form article, or just finding out what happened at an event is very difficult. The fixes are not palatable to some people. Others are content to believe that their Internet or Internet search engine dispenses wisdom like the oracle at Delphi. Who knew the “oracles” relied on confusing entrances, various substances, and stage tricks to get their story across.


We now consult digital Delphis. How is that working out when you search for information to address a business problem, find a person who can use finger manipulation to relax a horse’s muscle, or determine if a company is what its Web site says it is?

Stephen E Arnold, July 24, 2014

Is New Math Really New Yet?

July 21, 2014

I read “Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It.” My hunch is that this article will become Google spider food with a protein punch.

In my lectures for the police and intelligence community, I review research findings from journals and my work that reveal a little appreciated factoid; to wit: The majority of today’s content processing systems use a fairly narrow suite of numerical recipes that have been embraced for decades by vendors, scientists, mathematicians, and entrepreneurs. Due to computational constraints and limitations of even the slickest of today’s modern computers, processing certain data sets is a very difficult and expensive in humans, programming, and machine time job.

Thus, the similarity among systems comes from several factors.

  1. The familiar is preferred to the onerous task of finding a slick new way to compute k-means or perform one of the other go-to functions in information processing
  2. Systems have to deliver certain types of functions in order to make it easy for a procurement team or venture oriented investor to ask, “Does your system cluster?” Answer: Yes. Venture oriented investor responds, “Check.” The procedure accounts for the sameness of the feature lists between Palantir, Recorded Future, and simile systems. When the similarities make companies nervous, litigation results. Example: Palantir versus i2 Ltd. (now a unit of IBM).
  3. Alternative methods of addressing tasks in content processing exist, but they are tough to implement in today’s computing systems. The technical reason for the reluctance to use some fancy math from my uncle Vladimir Ivanovich Arnold’s mentor Andrey Kolmogorov is that in many applications the computing system cannot complete the computation. The buzzword for this is P=NP? Here’s MIT’s 2009 explanation
  4. Savvy researchers have to find a way to get from A to B that works within the constraints of time, confidence level required, and funding.

The Wired article identifies other hurdles; for example, the need for constant updating. A system might be able to compute a solution using fancy math on a right sized data set. But toss in constantly updating information and the computing resources often just keep getting hungrier for more storage, bandwidth, and computational power. Then the bigger the data, the computing system has to shove that data around. As fast as an iPad or modern Dell notebook seems, the friction adds latency to a system. For some analyses, delays can have significant repercussions. Most Big Data systems are not the fleetest of foot.

The Wired article explains how fancy math folks cope with these challenges:

Vespignani uses a wide range of mathematical tools and techniques to make sense of his data, including text recognition. He sifts through millions of tweets looking for the most relevant words to whatever system he is trying to model. DeDeo adopted a similar approach for the Old Bailey archives project. His solution was to reduce his initial data set of 100,000 words by grouping them into 1,000 categories, using key words and their synonyms. “Now you’ve turned the trial into a point in a 1,000-dimensional space that tells you how much the trial is about friendship, or trust, or clothing,” he explained.

Wired labels this approach as “piecemeal.”

The fix? Wired reports:

the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he [Yalie mathematician Ronald Coifman] believes is already underway.

Topological analyses and sparsity,  may offer a path forward.

The kicker in the Wired story is the use of the phrase “tractable computational techniques.” The notion of “new math” is an appealing one.

For the near future, the focus will be on optimization of methods that can be computed on today’s gizmos. One widely used method in Autonomy, Recommind, and many other systems originates with Sir Thomas Bayes who died in 1761. My relative died 2010. I understand there were some promising methods developed after Kolmogorov died in 1987.

Inventing new math is underway. The question is, “When will computing systems become available to use these methods without severe sampling limitations?” In the meantime, Big Data keep on rolling in, possibly mis-analyzed and contributing to decisions with unacceptable levels of risk.

Stephen E Arnold, July 21, 2014

Consulting Content Marketing: The Value of a Name

July 20, 2014

One of my readers sent me a link to this IDC report on Amazon. If you cannot read the image, here’s the link verified on July 20, 2014.


Now check out the price of $500. The author is a former IDC expert, Sue Feldman.

Now check out this IDC report on Amazon and note that the price for my work and that of my researchers is $3,500. Notice that Ms. Feldman’s name is on the report. I don’t know if she was employed at IDC when my work was posted on Amazon without my permission. There is one new IDC “expert” name: Dave Schubmehl, a former OpenText and Janya executive. Also, my name is listed almost as an extra.


This is an archived article. IDC removed the report from the Amazon Web site shortly before this update was written.

I wonder if my name and my team’s contribution delivered up to 7X value or was Dave Schubmehl’s contributions the reason for the price boost. What’s clear is that IDC is taking content, using my name, selling reports with my name, and then deleting documents in a stepwise manner.


In any event, thanks to my reader and a pointed reminder to anyone purchasing consulting firm content marketing, find out who provided the information. I would suggest that my team obviously has some value because the former IDC professional’s work was a comparative bargain at $500.

Contracts for reuse of another’s work? No.

Permission to resell my research on Amazon? No.

Payments, sales reports, follow through? No.

What’s that say about well known consulting firm behavior? Exploiting a 70 year old and his research team is one more example of a lapse in common sense, fair play, and corporate governance. Does this seem like a smaller scale version of the Google X Labs’ Forrest Hayes’ matter? I leave you to consider the question and your answer.

Stephen E Arnold, July 20, 2014

Search and Data-Starved Case Studies

July 19, 2014

LinkedIn discussions fielded a question about positive search and content processing case studies. I posted a link to a recent paper from Italy (you can find the url at this link).

My Overflight system spit out another case study. The publisher is Hewlett Packard and the example involves Autonomy. The problem concerns the UK’s National Health Service” and its paperless future. You can download the four page document at http://bit.ly/1wIsifS.

The Italian case study focuses on cheerleading for the Google Search Appliance. The HP case study promotes the Autonomy IDOL system applied to medical records.

the HP Autonomy document caught my attention because it uses a buzzword I first heard at Booz, Allen & Hamilton in 1978. Harvey Poppel, then a BAH partner, coined the phrase. The idea caught on. Mr. Poppel, who built a piano, snagged some ink in Business Week. That was a big deal in the late 1970s. Years later I met Alan Siegel, a partner at a New York design firm. He was working on promotion of the Federal government’s paperless initiative. About 10 years ago, I spent some time with Forrest (Woody) Horton, who was a prominent authority on the paperless office. Across the decades, talk about paperless offices generated considerable interest. These interactions about paperless environments have spanned 36 years. Paper seems to be prevalent wherever I go.

When I read the HP Autonomy case study, I thought about the efforts of some quite bright individuals directed at eliminating hard copy documents. There are reports, studies, and analyses about the problems of finding information in paper. I expected a reference to hard data or some hard data. The context for the paperless argument would have captured my attention.

The HP Autonomy case study talks about an integrator’s engineers using IDOL to build a solution. The product is called Evolve and:

It sued 28 years of information management expertise to improve efficiency, productivity and regulatory compliance. The IDOL analytics engine was co-opted into Evolve because it automatically ingests and segments medical records and documents according to their content and concepts, making it easier to find and analyze specific information.

The wrap up of the case study is a quote that is positive about the Kainos Evolve system. No big surprise.

After reading the white paper, three thoughts crossed my mind.

First, the LinkedIn member seeking positive search and content processing case studies might not find the IDOL case study particularly useful. The information is more of an essay from an ad agency generated in-house magazine.

Second, the LinkedIn person wondered why there were so few positive case studies about successful search and content processing installations. I think there are quite a few white papers, case studies, and sponsored content marketing articles crafted along the lines of the HP Autonomy case study. The desire to give the impression that the product encounters no potholes scrubs out the details so useful to a potential licensee.

Third, the case study describes a mandated implementation. So the Evolve product is in marketing low gear. The enthusiasm for implementing a new product shines brightly. Does the glare from the polish obscure a closer look.

At a minimum, I would have found the following information helpful even if presented in bullet points or tabular form:

  1. What was the implementation time? What days, weeks, or months of professional work were required to get the system up and running?
  2. What was the project’s initial budget? Was the project completed within the budget parameters?
  3. What is the computing infrastructure required for the installation? Was the infrastructure on premises, cloud, or hybrid?
  4. What is the latency in indexing and query processing?
  5. What connectors were used “as is”? Were new connectors required? If yes, how long did it take to craft a functioning connector?
  6. What training did users of the system require?

Information at this level of detail is difficult to obtain. In my experience, most search and content processing systems require considerable attention to detail. Take a short cut, and the likelihood of an issue rises sharply.

Obviously neither the vendor nor the licensee want information about schedule shifts, cost over or under- runs and triage expenses to become widely known. The consequence of this jointly enforced fact void helps create case studies that are little more than MBA jargon.

Little wonder the LinkedIn member’s plea went mostly ignored. Paper is unlikely to disappear because lawyers thrive on hard copies. When litigation ensues, the paperless office and the paperless medical practice becomes a challenge.

Stephen E Arnold, July 19, 2014

What Most Search Vendors Cannot Pull Off

July 19, 2014

I recently submitted an Information Today column that reported about Antidot’s tactical play to enter the US market. One of the fact checkers for the write up alerted me that most of the companies I identified were unknown to US readers. Test yourself. How many of these firms do you recognize? How many of them provide information retrieval services?

  • A2ia
  • Albert (originally AMI Albert and AMI does not mean friend)
  • Dassault Exalead
  • Datops
  • EZ2Find
  • Kartoo
  • Lingway
  • LUT Technologies
  • Pertimm
  • Polyspot
  • Quaero
  • Questel
  • Sinequa

How did you do? The point is that French vendors of information retrieval and content processing technology find themselves in a crowded boat. Most of the enterprise search vendors have flamed out or resigned themselves to pitching to venture capitalist that their technology is the Next Big Thing. A lucky few sell out and cash in; for example Datops. Others are ignored or forgotten.

The same situation exists for vendors of search technology in other countries. Search is a tough business. And when former Googlers like Marissa Meyer was the boss when Yahoo’s share of the Web search market sagged below 10 percent. In the same time period, Microsoft increased Bing’s share to about 14 percent. Google dogpaddled and held steady. Other Web search providers make up the balance of the market players. Business Insider reported:

This is a big problem for Yahoo since its search business is lucrative. While Yahoo’s display ad business fell 7% last quarter, revenue from search was up 6% on a year-over-year basis. Revenue from search was $428 million compared to $436 million from its display ad business.

Now enterprise search vendors have been trying to use verbal magic to unlock consistently growing revenue. So far only two vendors have been able to find a way to open the revenue vault’s lock. Autonomy tallied more than $800 million in revenue at the time of its sale to Hewlett Packard. The outcome of that deal was a multi-billion dollar write off and many legal accusations. One thing is clear through the murky rhetoric the deal produced. Hewlett Packard had zero understanding of search and has been looking for a scapegoat to slaughter for its corporate decision. This is not helping the search vendors chasing deals.

Google converted Web search into a $60 billion revenue stream. The fact that the core idea for online advertising originated with the pay-to-play company GoTo which then morphed into Overture which THEN was acquired by Yahoo. Think of the irony. Yahoo has the technology that makes Google a one trick, but very lucrative revenue pony. But, to be fair, Google Web search is not the enterprise search needed to locate a factoid for a marketing assistant. Feed this query “how me the versions of the marketing VP’s last product road map” to a Google appliance and check the results. The human has to do some old fashioned human-type work. To find this information with a Google Search Appliance or any other information retrieval engine for that matter is tricky. Basic indexing cannot do the job, so most marketing assistants hunt manually through files, folders, and hard copies looking for the Easter egg.

Many of the pioneering search engines tried explaining their products and services using euphemisms. There was question answering, content intelligence, smart content, predictive retrieval, entity extraction, and dozens and dozens of phrases that sound fine but are very difficult to define; for example, knowledge management and the phrase “enterprise search” itself or “image recognition” or “predictive analytics”, among others.

I had a hearty chuckle when I read “Don’t Sell a Product, Sell a Whole New Way of Thinking.” Search has been available for at least 50 years. Think RECON, Orbit, Fulcrum Technologies, BASIS, Teratext, and other artifacts of search and retrieval. Smart folks cooked up even the computationally challenged Delphes system, the metasearch system Vivisimo, and the essentially unknown Quertle.

A romp through these firm’s marketing collateral, PowerPoints, and PDFs makes clear that no buzzword has been left untried. Buyers did and do not know what the systems actually delivered.  This is evidence that search vendors have not been able to “sell a whole new way of thinking.”

No kidding. The synonyms search marketers have used in order to generate interest and hopefully a sale are a catalog of information technology jargon. Here is a short list of some of the terms from the 1990s:

  • Business intelligence
  • Competitive intelligence
  • Content governance
  • Content management
  • Customer support then customer relationship management.
  • Knowledge management
  • Neurodynamics
  • Text analytics

If I accept the Harvard analysis, the failing of enterprise search is not financial fiddling and jargon. As you may recall, Microsoft paid $1.2 billion for Fast Search & Transfer. The investigation into allegations of financial fancy dancing were resolved recently with one executive facing a possible jail term and employment restrictions. There are other companies that tried to blend search with content only to find that the combination was not quite like peanut butter and jelly. Do you use Factiva or Ebsco? Did I hear a “what?’ Other companies embraced slick visualizations to communicate key information at a glance. Do you remember Grokker? There was semantic search. Do you recollect Siderean Software.

One success story was Oingo, renamed Applied Semantics. Google understood the value of mapping words to ads and purchased the company to further its non search goals of generating ad revenue.

According to the HBR:

To find the shift, ask yourself a few questions. What was the original insight that led to the innovation? Where do you feel people “don’t get it” about your solution? What is the “aha” moment when someone turns from disinterested to enthusiastic?

Those who code up search systems are quite bright. Is this pat formula of shifting thinking the solution to the business challenges these firms face:

Attivio. Founded by Fast Search & Transfer alums, the company has ingested more than $35 million in venture funding. The company’s positioning is “an actionable 360 degree view of anything you need.” Okay. Dassault Exalead used the same line several years.

Coveo. The company has tapped venture firms for more than $30 million since the firm’s founding in 2004, Coveo uses the phrase “enterprise search” and wraps it in knowledge workers, custom service, engineering, and CRM. The idea is that Coveo delivers solutions tailored to a specific business functions and employee roles.

SRCH2. This is a Xoogler founded company that like Perfect Search before emphasizes speed. The alternative is better than open source search solutions.

Lucid Works. Like Vivisimo, Lucid Works has embraced Big Data and the cloud. The only slow downs Lucid has encountered has been turnover in CEOs, marketing, and engineering professionals. The most recent hurdle to trip up Lucid is the interest in ElasticSearch, fat with almost $100 million in venture funding and developers from the open source community.

IBM Watson. Based on open source and home grown technology, IBM’s marketers have showcased Watson on Jeopardy and garnered headlines for the $1 billion investment IBM is making in its “smart” information processing system. The most recent demonstration of Watson was producing a recipe for Bon Appetit readers.

Amazon’s search approach is to provide it as a service to those using Amazon Web services. Search is, in my mind, just a utility for Amazon. Amazon’s search system on its eCommerce site is not particularly good. Want to NOT out books not yet available on the system. Well, good luck with that query.

After I stopped chuckling, I realized that the Harvard article is less concerned with precision and recall than advocating deception, maybe cleverness. No enterprise search vendor has approached Autonomy’s revenues with the sole exception of Google’s licensing of the wildly expensive Google Search Appliance. At the time of its sale to Oracle, Endeca was chugging along at an estimated $150 million in revenue. Oracle paid about $1 billion for Endeca. With that benchmark, name another enterprise search vendor or eCommerce search vendor that has raced past Endeca. For the majority of enterprise search vendors, revenues of $3 to $10 million represent very significant achievements.

An MBA who takes over an enterprise search company may believe that wordsmithing will make sales. Sure, some sales may result but will the revenue be sustainable. Most enterprise search sales are a knee jerk to problems with the incumbent search system.

Without concrete positive case studies, talking about search is sophistry. There are comparatively few, specific, return on investment analyses for enterprise seach installations. I provided a link to a struggling LinkedIn person about an Italian library’s shift from the 1960s BASIS system to a Google Search Appliance.

Is enterprise search an anomaly in business software. Will the investment firms get their money back from their investments in search and retrieval?

Ask a Harvard MBA steeped in the lore of selling a whole new way of thinking. Ignore 50 years of search history. Success in search is difficult to achieve. Duplicity won’t do the job.

Stephen E Arnold, July 19, 2014

IBM: Hitting Numbers by Chasing Medium Sized Fish, Not Whales

July 11, 2014

I scanned my false drop stuffed Yahoo Alert a moment ago (5 04 am Eastern time). I clicked a link with the fetching headline “Enterprise Search Adoption among Midsize Firms.” The core of the story is a reference to an allegedly accurate survey from another publisher. I learned “nearly 40 percent of IT departments reported that they have already invested or plan to invest in enterprise search solutions.” Yikes. That means that 60 percent of midsize firms cannot locate information. Looks like a great opportunity to license an enterprise search system. I wondered who was at the root of this article and had such confidence in a market that probably is expensive to convince to pump big bucks into a Google Search Appliance (starts at $50,000 or so), an Autonomy IDOL hosted service or Amazon Search service with no cap on costs, or sign up for a bargain basement hosted search system until the ministrations of an expensive consultant are required. Most organizations use one of the default, utility search systems already included with other applications; for example, Microsoft’s search feature or a freeeware system like Effective File Search or an open source system like Sphinx Search or Searchdaimon.

After clicking of a few links I was directed to the eminence gris behind this article. Guess who? IBM. The link pointed me to http://www.ibm.com/midmarket/us/en/?lnk=mhso&CE=ISM0124. Yep, IBM wants to recover the billion tossed into Watson (really helpful for a midsize business wanting to win a game show or develop a recipe) or the $3 billion extending Moore’s Law.

I know from industry chatter at the trade shows I attend that there is concern about the future of IBM. This does not come just from those customers who pine for the good old days when IBM engineers delivered expensive but top notch service. Nope. The laments come from IBM professionals. I think I heard words like “lost its way,” “chaotic,” and “floundering.”

Several observations:

ITEM: Selling big buck enterprise search services to midsize firms is expensive, slow, and difficult. If these firms were able to float the boats of other search vendors, the vendors would be in high cotton. The middle market already has search and that’s why 60 percent of the outfits in the allegedly accurate survey are not buying standalone systems. Almost every piece of software includes a finding function. These are either good enough or are not used because users have found workarounds.

ITEM: IBM fees are going to cause even “large” midsize businesses (oxymoronic, right?) to pause. Imagine the cost impact of paying IBM sales people to pitch a product/service that a potential customers does not want, cannot afford, or already has available. Losses mount. Seems obvious to me.

ITEM: The clumsy content marketing ploy of creating a content free article and then pitching IBM as a generic solution is silly. Navigate to the IBM Small and Medium Business Solution page. IBM is offering “customized solutions.”

I don’t think the solution is on point. I don’t think the marketing approach is particularly useful. I don’t think the midsize business will beat a path to the door of a company known to sell expensive services while funding billion dollar pipe dreams.

You can, however, sign up for Forward View, an eMagazine. Yep, helpful.

Call me skeptical.

Stephen E Arnold, July 11, 2014

Swimming in a Hadoop Data Lake

July 8, 2014

I read an interview conducted by the consulting firm PWC. The interview appeared with the title “Making Hadoop Suitable for Enterprise Data Science.” The interview struck me as important for two reasons. The questioner and the interview subject introduce a number of buzzwords and business generalizations that will be bandied about in the near future. Second, the interview provides a glimpse of the fish with sharp teeth that swim in what seems to be a halcyon data lake. With Hadoop goodness replenishing the “data pond,” Big Data is a life sustaining force. That’s the theory.

The interview subject is Mike Lang, the CEO of Revelytix. (I am not familiar with Revelytix, and I don’t know how to pronounce the company’s name.) The interviewer is one of those tag teams that high end consulting firms deploy to generate “real” information. Big time consulting firms publish magazines, emulating the McKinsey Quarterly. The idea is that Big Ideas need to be explained so that MBAs can convert information into anxiety among prospects. The purpose of these bespoke business magazines is to close deals and highlight technologies that may be recommended to a consulting firm’s customers. Some quasi consulting firms borrow other people’s work. For an example of this short cut approach, see the IDC Schubmehl write up.

Several key buzzwords appear in the interview:

  • Nimble. Once data are in Hadoop, the Big Data software system, has to be quick and light in movement or action. Sounds very good, especially for folks dealing with Big Data. So with Hadoop one has to use “nimble analytics.” Also, sounds good. I am not sure what a “nimble analytic” is, but, hey, do not slow down generality machines with details, please.
  • Data lakes. These are “pools” of data from different sources. Once data is in a Hadoop “data lake”, every water or data molecule is the same. It’s just like chemistry sort of…maybe.
  • A dump. This is a mixed metaphor, but it seems that PWC wants me to put my heterogeneous data which is now like water molecules in a “dump”. Mixed metaphor is it not? Again. A mere detail. A data lake has dumps or a dump has data lakes. I am not sure which has what. Trivial and irrelevant, of course.
  • Data schema. To make data fit a schema with an old fashioned system like Oracle, it takes time. With a data lake and a dump, someone smashes up data and shapes it. Here’s the magic: “They might choose one table and spend quite a bit of time understanding and cleaning up that table and getting the data into a shape that can be used in their tool. They might do that across three different files in HDFS [Hadoop Distributed File System]. But, they clean it as they’re developing their model, they shape it, and at the very end both the model and the schema come together to produce the analytics.” Yep, magic.
  • Predictive analytics, not just old boring statistics. The idea is that with a “large scale data lake”, someone can make predictions. Here’s some color on predictive analytics: “This new generation of processing platforms focuses on analytics. That problem right there is an analytical problem, and it’s predictive in its nature. The tools to help with that are just now emerging. They will get much better about helping data scientists and other users. Metadata management capabilities in these highly distributed big data platforms will become crucial—not nice-to-have capabilities, but I-can’t-do-my-work-without-them capabilities. There’s a sea of data.”

My take is that PWC is going to bang the drum for Hadoop. Never mind that Hadoop may not be the Swiss Army knife that some folks want it to be. I don’t want to rain on the parade, but Hadoop requires some specialized skills. Fancy math requires more specialized skills. Interpretation of the outputs from data lakes and predictive systems requires even more specialized skills.

No problem as long as the money lake is sufficiently deep, broad, and full.

The search for a silver bullet continues. That’s what makes search and content processing so easy. Unfortunately the buzzwords may not deliver the type of results that inform decisions. Fill that money lake because it feeds the dump.

Stephen E Arnold, July 7, 2014

Keeping Up with IBM: A New Daily Paper.li Is Available

July 1, 2014

I try to keep up with Watson, the billion dollar bet that is much loved by gourmets at Bon Appétit. If you want daily IBM news and information for free, navigate to the Paper.li “The THINQ Magazine Daily.” I think the THINQ is a modern version of the IBM sign I saw in the Federal Systems’ offices in 1973. That sign said, “Think.” Spelling aside, the Paper.li algorithm harvests information from Web sites and presents it in a zippy format. Vivisimo, now and IBM company involved in Big Data, offered a Paper.li service about enterprise search.

The issue I am viewing today (July 1, 2014) covers stuff I never heard of; for example, Bluemix and “world class analytics.” There are some stories with which I am familiar; for example, Watson crafts recipes. I wrote about tamarind as an ingredient not long ago too. The THINQ content includes links to IBM videos. I was not familiar with what is labeled “theCUBE.” I am not into videos because it takes too much time to watch talking heads and PowerPoint slides. Reading is quicker and easier for me, but I am old fashioned.

There is a selection of photos. Some of these come from sources other than IBM. I assume these are snapshots from IBM partners. A number of the pictures show really happy people looking at computing devices and somewhat baffling images with text asking me, “Why do you love social media?” I don’t love social media, but for certain type of law enforcement work, social media is darned useful. Facebook users often post snaps of themselves at crime scenes or capture their thoughts moments before taking some action I find disturbing.

There is an article telling small businesses how these small outfits can use Big Data. The link points to Inc. Magazine and an article with the title “What 3 Small Businesses Learned From Big Data.” The THINQ title does not quite capture what the Inc. article actually says, but I assume that most THINQ visitors will not pay much attention to the meaning adjustment.

If you are curious about IBM, take a look at THINQ. I will stick to my own system for monitoring the exciting world of IBM.

Stephen E Arnold, July 1, 2014

Next Page »