Open Source Search: Just Like Good Old Proprietary Search

April 21, 2014

The last few days have given me some food for thought. I read”Splunk Exec Defects to Tech Disruptor ElasticSearch.” The article points out:

Elastisearch co-founder and chief technology officer, Shay Banon, said the company focus was all about products. “Elasticsearch is building something bigger than any one technology and so I’m excited to have someone like Gaurav [a former Googler] on board, who shares our vision and is going to play an instrumental role in taking our products to the next level,” he said. In the past four months, the company launched its first commercial product, Elasticsearch Marvel….Bloomberg, The New York Times, Facebook, GitHub, Netflix, Yelp, Verizon, McGraw-Hill, WordPress, Atlassian and SoundCloud all use Elasticsearch to store, search and analyze any type of data in real time.

Poor Splunk. The company offers tools to help licensees “listen to their data.” First, Lucid leaves one writer with the impression that felonious behavior is coming down the Information Highway. Splunk was the target of some enthusiastic writer at the IDC combine who apparently became entangled in some Mad Men type of advertising. That article appeared in InfoWorld as “LucidWorks Preps Solr Stack as Splunk Killer.” Now ElasticSearch has allegedly hired a Splunk wizard to herd products down the busy digital trail.

What I find interesting is that open source search is starting to look more like the good old proprietary enterprise search sector. Me too products and executive churn mix with MBA think. The lingering effects of search controversies past like those swirling around Fast Search and Autonomy remain fresh in my mind.

Will ElasticSearch and Lucid Works become the new combatants in the search sector? Today both companies have chosen Splunk as the punching bag.

The more search changes, the more it remains the same it seems. Come to think of it: Most of today’s vendors are following the scripts written for Fulcrum Technologies and Verity who stomped around the C suite in the 1980s. Is the search sector running an endless loop?

Stephen E Arnold, April 21, 2014

Printed Information: The Burden of Adding Value

April 14, 2014

Navigate to your local news vendor (well, there aren’t many here in Harrod’s Creek) and buy a copy of the printed edition of the New York Times. Turn to page B3 of the April 14, 2014 edition and read “Leaner and More Efficient, British Printers Push Forward in Digital Age.” You may be able to find it online at but no guarantees from the goose’s free blog.

The article contained a fascinating statement. I quote attributed to Mr. Kingston of Wyndeham, a printing company (a surviving printing company) in England:

The same applies to books and magazines, Mr. Kingston said. “We can now make a bespoke edition of any magazine; we can bind it in a different way and use special colors. We can personalize it and send it. There is much higher added value there.”

This search for added value is, I assume, a lever with which to reverse these factoids in the write up:

  • Printing has become a “peopleless business” which means that employment has cratered from 350 in one plant to 114
  • In Britain printing employed “around 200,000” in 2001 to about 125,000 when the New York Times went to print a day or so ago
  • Revenues? Ouch. “The industry’s revenue is projected to shrink to about 10 billion pounds, or approximately $17 billion, by 2017, down from more than £15 billion in the 1990s…”
  • Cheaper labor puts the squeeze on UK printers: ““So for things that are time-sensitive like magazines and have to be done in the region, the best deal might be outside of the U.K. — and you can have your products here overnight.”

The write up mentions other factors as well.

My view is that personalizing a magazine about Godzilla will put a load on “adding value’s” shoulders. Perhaps a video would be more appropriate or a social media stream, two channels not highlighted in the New York Times’ article? Stop the presses. Well, spike that.

Stephen E Arnold, April 14, 2014

Hewlett Packard: Foreign Bribes and Search

April 10, 2014

I was disappointed with the news stories about Hewlett Packard’s recent hitch in its git-along. For example, I read “Hewlett Packard Agrees to $108 Million Fine for Foreign Bribes” and saw not one reference to information retrieval, search, and content processing technology. In my view, had HP used the Autonomy technology to process its internal information, IDOL and the Digital Reasoning Engine would have generated some outputs that pointed to anomalies like those the investigators found.

Apparently “findability” is more difficult than it appears even when the company in the spotlight owns one of the go-to search systems. I assumed that it would be trivial to run a few queries and produce documents and “big data” that would show that Hewlett Packard what was cooking in its subsidiaries or with non US deals.

Search apparently was not up to the task because allegations had to be “resolved by third parties.” Apparently it required attorneys and government folks to figure out that HP was taking some short cuts. Here’s a passage I noted:

“Hewlett-Packard subsidiaries created a slush fund for bribe payments, set up an intricate web of shell companies and bank accounts to launder money, employed two sets of books to track bribe recipients, and used anonymous email accounts and prepaid mobile telephones to arrange covert meetings to hand over bags of cash,” said Deputy Assistant Attorney General Bruce Swartz in the Justice Department statement.

Business actions like those mentioned in the Silicon Beat write up make it clear that HP management may not know what is going on or may not be paying attention to existing information about company activities.

Is this an anomaly?

I can’t answer the question, but when investigators from various countries are able to find useful factoids, it raises one question:

What does HP’s much hyped information retrieval system do for company executives?


Was important management information not available to HP’s senior executives? If so, who filtered the digital content?

This $100 million fine comes on the heels of HP’s paying $57 million to settle a shareholder lawsuit about the “personal computer maker’s former management of defrauding shareholders by abandoning a business model it had long touted.” See

The persistent HP business model seems to be one that does not engender my confidence in the company.

I am not sure the IDOL search system is at fault. Does HP use Autonomy’s fraud detection components? Why not index content, run queries, and make decisions based on the heterogeneous types of information that Autonomy can process, usually with some effectiveness?

The jury’s still out on search at HP. Two big fines in a short period of time is unsettling to me because both are germane to the effective use of information retrieval technology.

Stephen E Arnold, April 10, 2014

Content Management: A $12 Billion Market in 2019!

April 8, 2014

Now I enjoy crazy numbers. I recall that someone at Yahoo allegedly said to a New York Times reporter:

Yahoo estimates that it would cost $300 million to build a search service from scratch. [See New York Times, July 10, 2008, page C5) My story about this estimate is at]

Crazy number. Three hundred million would not buy a Web search system in 2008. Today it may cover the cost of jet fuel for Google’s fleet of airplanes.

But crazy numbers get traction and create “real news.”

I read “Enterprise Content Management Market worth $12.32 Billion by 2019.” Now that is an interesting estimate. The calculation surprised me for three reasons:

  1. The outfit promulgating the good “news” is selling a report, presumably to those in the content management sector who need reassurance.
  2. There was no mention of WordPress- and SquareSpace-type outfits, which seem to be moving ahead of the pack of name brand vendors.
  3. The assumption that I actually know what content management or CMS means.

Like search, the CMS vendors have been looking for a way to become more relevant. The implementations of Broadvision, Documentum, Interwoven, Vignette, and other well known CMS systems have had some successes and failures.

The “real” news about this report mentions some aspects of CMS that are similar to the scope creep visible in enterprise search. Here are some examples of what CMS embraces:

enterprise document management, enterprise document imaging and capture, enterprise web content management, enterprise records management, enterprise document collaboration, enterprise digital rights management, content analytics, rich media management, advanced case management, enterprise document output management, enterprise workflow management, and other solutions; by type of emerging applications: social content management, mobile content management, big data management, and cloud content management; by type of deployments: hosted and on-premises; by verticals: academia and education, banking, financial services and insurance (BFSI), consumer goods and retail, energy and power, government and defense, life science and healthcare, manufacturing, media and entertainment, telecom and IT, transportation, tourism, and hospitality, and other verticals; and by regions: North America (NA), Asia Pacific including Japan (APAC), Europe (EU), Middle East and Africa (MEA), and Latin America (LA).

This list is not helpful to me. I think the collection of jargon, buzzwords, and impressive sounding concepts is designed for Web indexing systems and to give a marginalized type of software some strap on muscles.

If information about the magnitude of the CMS market requires this type of verbal legerdemain, how credible is the report, the estimate, and maybe content management itself?

My personal view is that the buzzword content management, like knowledge management, is tough to define and may ultimately lack relevance in today’s business environment. The notion that a specious estimate adds value to those laboring in the CMS sector is amusing. The puffery, apologias, and jargon generated by those trying to sell systems that “manage” content causes me to chortle. Estimates of the volume of Big Data seem to fly in the face of “content management.” Even Google’s robots are struggling to keep pace with content proliferation based on my test queries.

At a time when organizations struggle to figure out what information is in their possession, CMS seems to have failed in its “mission”: Managing content.

CMS’ weakness is the notion of management itself. Since “management” is tough to define, content management sounds like a discipline cooked up by MBA hopefuls in an innovation study group.

Stephen E Arnold, April 7, 2014

HP and Its Business Direction

April 2, 2014

I read “HP Agrees to Pay $57 Million to Settle Shareholder Lawsuit,” a Reuters’ story. I wonder, “Is this an April Fool’s joke?” Maybe.

The “real journalist” write up reports that HP has to pay lots of money for refocusing “the company on business services and products.” Okay, I remember that. Part of that effort was the purchase of Autonomy for $10 or $11 billion.

Here the passage I noted:

In the midst of a multi-year turnaround effort intended to revive growth, HP is trying to reduce its reliance on personal computers and move toward computing equipment and networking gear for enterprises.

The problem is that either the “real journalist”, HP, or gremlins see HP without the cloud, Autonomy as a service, and other voodoo HP was working on.

When HP is at the controls, I am never sure where the farm tractor will go. When “real journalists” explain a company, I want to make sure I am not in the hemp field.

Stephen E Arnold, April 1, 2014

Facebook: All the News Socially

March 27, 2014

I read “STUDY: Facebook’s Role In Pew Research Center’s ‘State Of The News Media 2014’/” The source is a research project from Pew Research Center. The sample, well, who knows? The finding fascinating, particularly to advertisers, news professionals, and old people like me sitting around the cast iron stove in Harrod’s Creek, Kentucky.

Here it is:

30 percent of the sample get their [sic] news from Facebook.

The survey seems to have been completed in mid 2013, which may be important in the wake of Facebook’s interest in virtual reality.

The write up highlights six “Facebook-related findings.” I don’t want to spoil your fun by listing the listicle of the six factoids. I want to point out three of these insights:

  1. Three out of every 10 US adults get “some news while on Facebook.”
  2. The news is “shared by friends.”
  3. The demographics of the Facebook news consumers “were high earners and college educated.”

My thought is that social news is not something a traditional newspaper like my former employer the Courier Journal & Louisville Times considers a native habitat. The idea that social news is news is fascinating. With tools to generate disinformation, misinformation, and reformation, figuring out what’s accurate may be difficult for a Facebooker.

I assume that a Walter Cronkite of social media news will emerge. Advertisers are likely to sniff the edges of the Pew information and conclude, “Opportunity.” Experiencing Facebook as news is a facet of the service that has the potential to be disruptive. Which traditional network will run the Facebook news hour? Will Thomson Reuters and the BBC add a Facebook stream? Opportunities abound.

Stephen E Arnold, March 27, 2014

Google and Pricing: High Stakes WalMarting

March 26, 2014

I read a number of write ups about the new Google cloud pricing. The main idea, in my opinion, that  unifies the different reports is, “Everybody loves a bargain.” Consider “Google Slashes Cloud Prices: Google vs AWS Price Comparison.”

The essay-editorial begins with the invocation of the Google-Amazon joust:

Google threw down the gauntlet to challenge AWS public cloud supremacy by announcing significant price reductions across its Google Cloud Platform. The eye-opening price cuts covered compute (32-percent reduction), storage (68-percent reduction), and BigQuery (85-percent reduction). Google also signaled that future reductions could follow Moore’s Law — citing that historically public cloud prices have dropped only 6 to 8 percent annually as compared to 20- to 30-percent reductions in hardware prices.

The fact that neither Amazon nor Google provide much detail about their actual costs, profits, number of customers, and goals for their cloud services is not of much interest. Explanations of how pricing thresholds operate and migrate excite little curiosity.

Google, playing the Google Search Appliance card, seems to suggest that Amazon’s pricing is complicated. Yep, it is and it is very difficult to pin down with confidence what something will cost until the bits have been chomped and the Amazon accounting system processes its inputs and bills the customer. There is chatter about “sustained use” pricing, on demand pricing, and heavy reserved instance pricing, and in the article I have used as a pivot point for my comments, a cheer for RightScale’s services. These will help the cloud customer figure out what cloud computing costs.



Several observations:

First, the pricing is an example of the WalMarting of technical services. Doesn’t the entire world want lower prices? Once a market has been “won,” what happens? Creative destruction? I refer you, gentle reader, to WalMart’s challenges to rekindle (pun intended) that Sam Walton fire. The profit flat line is not good news to some WalMart stakeholders. But the Google pricing is little more than an old-fashioned price war in a Walton-like march for market share.

Second, Amazon has a bit of a cost problem. The murky Amazon financials, the hard to figure out side companies, and the blurring of revenues from product and services lines are tough to parse. Amazon is working overtime to generate no friction revenue (Prime pricing) and constrain costs. The results are a robust top line and growing pressure on expenses at “everyone’s favorite” online store. Google is cutting prices at a time when Amazon is maybe less than prepared for a price war.

Read more

Autonomy Deconstructed

March 24, 2014

Autonomy has been broken up! According to InfoWorld article “HP Breaks Autonomy IDOl Into Discrete Services,” developers will be able to add advanced text processing compatibilities to their applications with a new PaaS option. HP Autonomy’s IDOL used to only be a software package, but the company sees it has to adapt.

“ ‘If we want to be successful as a platform today, we have to do more than create a large installable product. We have to enable it so developers can use it,’ said Robert Youngjohns, senior vice president and general manager of HP Autonomy.”

The new discrete services make the IDOL features available to enterprise developers, so they can augment their own applications and programs without writing the base code or using third-party libraries. The IDOL services are divided into two categories: stateless APIs and the ones that remain in the HP Cloud. IDOL 10.5 is the basis for the new offering. Eventually Youngjohns want all of IDOL’s features exposed for developers.

Mike Lynch took a “one product approach”now HP disaggregates IDOL’s applications. Was Dr. Lynch wrong about how to make money from IDOL? We’ll find out at when the next HP quarterly report’s financials become available.

Whitney Grace, March 24, 2014
Sponsored by, developer of Augmentext

Google and Pals: Redefining Hiring along with Search

March 23, 2014

I read “Emails From Google’s Eric Schmidt And Sergey Brin Show A Shady Agreement Not To Hire Apple Workers.” Let’s assume the emails suggesting that certain outfits agreed to adopt certain hiring practices. In the happy land of Silicon Valley, big companies have to have some freedom. In the context of Bernie Madoff, the BearStearns’ misstep, and the actions of vendors responsible for the roll out site—what’s the big deal.

The logic of marketing oriented, big buck business operates in a way different from the gas station in Harrod’s Creek, Kentucky. I am delighted that some companies can agree to respect some informal guidelines. I admire a search vendor who can have as pals some executives at other giant, with it firms.

Name a person who has been disadvantaged by the present whiz whiz approach to business. I noted this passage in the cited write up, which may be completely off base:

These emails will make you angry if you believe that companies ought to compete instead of fix prices. They’ll make you even angrier if you believe that workers have the right to sell their labor at the maximum price the market will bear — because Jobs, Schmidt and Brin appear to have spent years instructing their recruiters and HR staff to avoid hiring staff from each others’ companies, according to Pando:

… what began as a secret cartel agreement between Apple’s Steve Jobs and Google’s Eric Schmidt to illegally fix the labor market for hi-tech workers, expanded within a few years to include companies ranging from Dell, IBM, eBay and Microsoft, to Comcast, Clear Channel, Dreamworks, and London-based public relations behemoth WPP. All told, the combined workforces of the companies involved totals well over a million employees.

My hunch is that MBAs are okay with this approach to business. Engineers, although important, are not business people, just cube dwellers. Googlers and Apple fans are just doing what is natural in a Darwinian world where nature may be red in tooth and claw.

Good business is what’s important. Forget such analog notions as fairness, ethical behavior, and integrity. It’s 2014 and search is just fine as long as one does not expect comprehensiveness, precision, and recall. Search for something meaningful and click on those ads or buy a product with questionable provenance. It’s 2014. New rules apply it seems.

Stephen E Arnold, March 23, 2014

HP: Deconstructing IDOL

March 12, 2014

Michael Lynch did what no other founder of a search-and-retrieval company was able to achieve. He operated a company that grew from a couple of government contracts into an $800 million plus giant in 15 years.

My analyses of the pre-Hewlett Packard Autonomy emphasize several facets of Mr. Lynch’s achievement. Competitors were not able to match Autonomy’s marketing. Whether it was the “Portal in a Box” or the augmented reality system Aurasma, competitors had to catch up with Mr. Lynch’s products, features, and benefits. As other search vendors played musical CEOs, Autonomy built a stable senior management team. With each change in leadership, competitors lost time with reorganizations and relearning. Autonomy’s management capabilities have been ignored. Mr. Lynch figured out that growth from search required acquisitions. Once the financing was in place, Autonomy gobbled up companies and its revenues soared.

Companies like Fast Search & Transfer and Endeca labored to close the revenue and marketing gap with Autonomy. Both failed. Fast Search resorted to accounting tricks, and Microsoft has been “investing” in Fast Search technology to make it fit with today’s enterprise. Endeca hit a glass ceiling at about $140 million in annual revenue despite evangelists, fancy MBAs, and a clever partnering method. Oracle is marketing Endeca as a business intelligence system and eCommerce system, not a search system. Other companies with promise just failed. These include Convera, Delphes, and Entopia. TeraText retreated to the government sector. IBM abandoned its in house search technology and just adopted Lucene, an open source toolkit. Other vendors remained essentially invisible like Albert, dtSearch, Lextek, and EPI Thunderstone, among others. Exalead disappeared into an engineering firm that is struggling with its core business.

Autonomy, like it or not, emerged after 15 years as the major brand in search, content processing, and a number of closely related fields.

Despite the changes in the search sector and in Autonomy’s technology line up, Autonomy delivered one product—IDOL, the integrated data operating layer, and its DRE, the digital reasoning engine. One product name persisted for 15 years. One technology, the DRE, powered the famous “black box” at the heart of every autonomy product or service when developed in house or acquired. Once Autonomy bought a company, it IDOLized the product or service.

I read “HP Breaks Autonomy IDOL into Discrete Services.” The write up smacks of the “real journalism” from the azure chip outfit IDC. The story reported in cheerleader fashion:

The service will expose most of the IDOL features as discrete services, accessible through APIs (application programming interfaces). HP is hoping that enterprise developers use the service to embed IDOL functionality into their own applications.

At first glance, this is no big deal. Exalead was moving in this direction before it was purchased by Dassault. Elasticsearch offers a compelling open source and lower cost alternative as well.

In my view, HP has a big job ahead of it. The company has to generate enough revenue from Autonomy licenses to pay back its purchase price, now deeply discounted to several billion dollars. Considering that it took Autonomy 15 years to nose toward $900 million, the HP sales professionals have to get in gear. After all, HP needs to turn Autonomy into a net producer of revenue and profit.

In addition, HP has to make certain that its deconstruction of IDOL does not lose the famous Autonomy magic. Without magic, I am not confident that 1996 technology can cope with the challenges of today’s information processing needs. (Google is also a late 1990s company faced with similar problems of ageing technology and concepts.) Good enough search is available from open source repositories. Lower cost options are available from upstarts like Elasticsearch and Searchdaimon. Once the magic is gone, magic is tough to recapture.

HP has to find a way to make Autonomy’s services usable to those customers who want to download and app and have it work. Autonomy reaches back to the 1990s. Today’s information technology professionals are into a different type of computing experience. Of course, there are organizations that have the money, time, and appetite to tackle Bayesian methods infused with Monte Carlo and Markov Chain methods, seasoned with Laplacian techniques. My hunch is that complexity has the potential to add friction to the chopped up mini-IDOLs and DREs.

Net net: HP has to find a way to make big money flow in a market which is coveted by IBM Watson, Microsoft, and numerous other vendors.

Would Michael Lynch have chopped up IDOL? I don’t think he will be available to answer this question. The squabble about HP’s purchase price generate considerable noise at a time when HP needs focus, clarity, and numerous sales.

Worth watching.

Stephen E Arnold, March 12, 2014

Next Page »