Enterprise Search Ignorance Can Be Costly
May 20, 2013
Why What You Do Not Can Bite Your Pocketbook. Marketers Have Their Interests Front and Center, Not the Customers’ Interests
A few days ago, I sat through several presentations about enterprise search. The systems struck me as quite similar. The emphasis was placed on providing basic information access to users. For the purpose of this short essay, I will not make distinctions among search vendors which position themselves as providers of analytics, business intelligence, discovery, and Big Data access, among other synonyms for search and information retrieval.
The missing pieces of the cost puzzle can make budget deficits a reality. A happy quack to Vermont’s Department of Information and Innovation. See the discussion to drive down the cost of doing business. States are paragons of fiscal probity.
However, the talks caused me to reflect on what the vendors left out of their presentations.
Here’s a checklist of the omissions in commercial systems which are now being marketed as an alternative to the high profile and expensive solutions available from Dassault, Hewlett Packard, Lexmark, Microsoft, and Oracle, Each of these large enterprise software vendors acquired one or more search systems. Each has taken steps to integrate search with other enterprise software solutions.
The gap the acquisition of such companies as Autonomy, Exalead, and others is now left to smaller and less well know vendors of search. I don’t want to mention these companies by name, but a quick search of Bing or Google will surface many of the firms vying to become the next $100 million vendor of enterprise search systems.
The first omission is a component which can acquire, normalize, and present textual content in a form the search system can process. For newcomers to enterprise search, the content acquisition process can add significantly to the cost of deploying an enterprise search system. Connectors are available from a number of specialist vendors. Most of the search vendors provide some basic tools for acquiring content. Depending on the organization, the vendor provided tools may be adequate for acquiring documents in text or Web pages in HTML. Other document types may be more problematic. A vendor offering a system which requires documents to be in a supported XML format often emphasizes the system’s ability to slice, dice, parse, and perform certain operations with alacrity. What’s omitted is the time, cost, technical expertise, and work flows required to get content into the search system. Cloud based enterprise search solutions and certain lower cost enterprise search systems leave content to the licensee or offer for fee consulting services to assist with these often complex activities.
Bloomberg and Alleged Two Way Systems
May 11, 2013
Just a small thing, the Bloomberg privacy breach allegations. There are far weightier matters in search; for example, are evaluations and ratings of search vendors objective? Someone on the LinkedIn Enterprise Search Engine Professional Group even raised the possibility that vendors “pay” for coverage in some consultants’ evaluations of technology.
Well, on to the smaller thing which is labeled this way in the New York Times: “Privacy Breach on Bloomberg’s Data Terminals.” You can located the story in the May 11, 2013, edition of the newspaper. If you look online at http://goo.gl/oeMqA you may be able to view the news story. (Google, no promises because I know how you want every blog post to have continuously updated links, but that’s another issue.)
The main idea seems to have originated with a real journalism operation called The New York Post. This point appears in paragraph six, so it is definitely a subordinate point.
As I understand the allegation, Bloomberg tradition terminals had a function which allowed “journalists to monitor subscribers were promptly disabled.” I think that Bloomberg terminals generate some sort of report which allegedly allowed a journalist to determine if someone had used the terminal. The idea is that no use of a terminal suggests that the person has either moved on, lost his or her hands, or experienced an opportunity to find his / her future elsewhere.
How secure are secure systems. Image source: Sandia.gov at http://goo.gl/NaEBE. Modern methods for accessing digital information are difficult to depict. Paper is tangible. Digital data are just “out there.” Humans assume that if it cannot be seen, the problems associated with what’s “out there” are no big deal. Is this an informed viewpoint?
The Atlantic Wire covered the alleged breach in a story called “Why Billions Are at Stake in the Bloomberg Terminal Privacy Problem.” What I found interesting was that the Atlantic Wire pointed out that the breach allegedly allowed a journalist to determine the “news habits” of Bloomberg terminal users. Is this similar to the type of information which online services extract from users’ Web search histories?
A Fresh Look at Big Data
May 8, 2013
Next week I am doing an invited talk in London. My subject is search and Big Data. I will be digging into this notion in this month’s Honk newsletter and adding some business intelligence related comments at an Information Today conference in New York later this month. (I have chopped the number of talks I am giving this year because at my age air travel and the number of 20 somethings at certain programs makes me jumpy.)
I want to highlight one point in my upcoming London talk; namely, the financial challenge which companies face when they embrace Big Data and then want to search the information in the system and search the Big Data system’s outputs.
Here are the simplified curves:
Notice that precision and recall has not improved significantly over the last 30 years. I anticipate that many search vendors will tell me that their systems deliver excellent precision and recall. I am not convinced. The data which I have reviewed show that over a period of 10 years most systems hit the 80 to 85 percent precision and recall level for content which is about a topic. Content collections composed of scientific, technical, and medical information where the terminology is reasonably constrained can do better. I have seen scores above 90 percent. However, for general collections, precision and recall has not been improving relative to the advances in other disciplines; for example, converting structured data outputs to fancy graphics.
HP, Autonomy, and a Context Free Expert Output about Search: The Bet on a Horse Approach to Market Analysis
May 4, 2013
I don’t think too much about:
- Azure chip consultants. You know, these are the firms which make a living from rah rahs, buzzwording, and pontification to sell reports. (I know. I labored at a non-azure chip outfit for what seems like decades. Experience is a good instructor. Oh, if you are a consultant, please, complain about my opinion using the comments section of this free blog.)
- Hewlett Packard. I recall that the company used to make lab equipment which was cool. Now I think the firm is in some other businesses but as quickly as I latch on to one like the Treo and mobile, HP exits the business. The venerable firm confuses my 69 year old mind.
- Autonomy. I think I did some work for the outfit but I cannot recall. Age and the lifestyle in rural Kentucky takes a toll on the memory I admit.
Nevertheless, I read “HP’s Autonomy Could Face Uphill Battle In Data Market.” There were some gems in the write up which I found amusing and illustrative of the problems which azure chip consulting firms and their experts have when tackling certain business issues.
The main idea of the write up for “investors” is that HP faces “challenges.” Okay. That’s a blinding insight. As you may recall, HP bought Autonomy for $11 billion and then a few months later roiled the “investors” by writing off billions on the deal. That was the mobile phone model, wasn’t it?
The write up then pointed out:
HP wanted Autonomy to jump-start its move into software and cloud-based computing. Autonomy is the No. 1 provider of search and retrieval software that companies use to find and share files and other information on their websites and document management systems.
Okay. But that too seems obvious.
Now here comes the kicker. The expert outfit providing inputs to the reporter doing the bull dog grip on this worn out bone is quoted as saying:
“Software license revenue (in this market) isn’t growing at the same rate as before, and we are beginning to see the rise of some new technologies, specifically content analytics and unified information access,” Schubmehl said. These new types of software can be used with types of business analytics software, business intelligence software and other software to help enterprises do a better job of locating specific information, he says, which is the job of search retrieval software.
I don’t know much about IDC but what strikes me from this passage is that there are some assertions in this snippet which may warrant a tiny bit of evaluation.
Will context free analyses deliver a winner? Will there be a Gamblers Anonymous for those who bet on what journalists and mid tier (second string) consultancies promulgate? For more about Gamblers Anonymous navigate to http://www.gamblersanonymous.org/ga/
Here goes:
The Loan, Own, Bankrupt Model for Publishing
April 27, 2013
Author’s Note: I was not going to make a big deal about the death of my father. He had a long, productive life. He was a pal to some heavy hitters in Illinois politics. He had a couple of good jobs. He worked hard. My family, lawyers, and advisors made the tasks associated with this life event less burdensome. There was one problem, however. The Peoria Journal Star failed to publish my father’s obituary on time. I was not going to discuss this procedural failure on the part of GateHouse Media’s newspaper until I read the article in the Wall Street Journal about companies which allegedly loan themselves money and then own the company to which the money was loaned. When the company in the loan-own mode gets into trouble, some financial tap dancing is in order. After reading the Wall Street Journal story, I decided to capture some thoughts. What’s this have to do with search? Two things: Try to find an obituary when it is not in a system is tough. Second, the modern approach to management often leaves the customer adrift. In my opinion, this is not good. Feel free to skip the write up.
Part I: The Financialing
On most days, I don’t think too much about textbook publishing, newspaper publishing, or loaning myself money and then declaring myself bankrupt. Publishing once was an interesting business, but putting ink on paper seems a bit retro for me. Isn’t digital where it is at?
A publisher who hits upon the clever idea of loaning oneself money, spending it, and then going broke is very 2013ish. The notion strikes me as an idea crafted by a couple of MBAs, a handful of attorneys, and a person suffering from sleep deprivation. I thought publishers published. Not now if the Wall Street Journal’s story is any where near accurate.
I read “Buyout Firm Gathers Cengage Debt” in the April 27, 2013 Wall Street Journal, page B2. If you are a savvy MBA you may be able to locate the story at this link. If not, be prepared to pay up. I did. (No, gentle reader of my personal blog, I do not update links in order to curry favor with the Google. You want good links, go elsewhere, please.)
The main point of the story is that an investment/financial type of firm is “both an owner and senior creditor of Cengage.” If you know your history of professional publishing and its wheeling and dealing, Cengage was once a separate firm and then once a chunk of the Thomson Reuters’ outfit. Thomson Reuters is interesting because it has run a number of senior managers though its executive suites and maintained flat revenues and modest profits for several years.
I think the way this owner and creditor thing works is that one part of a big investment/financial outfit buys a stake in something and another part of the big investment/financial outfit loans the recipient of the money some cash. In short, the big investment/financial outfit both owns the recipient of the money and is a creditor who wants money back.
Got that. I sort of do.
Part II: The Unveilingness
But what’s fascinating to me is a series of comments in the original write up and a reference to a publishing company with which I did business on April 9, 2013. Let’s look at these two blips on my aging radar screen.
Here are the quotes I marked in my dead tree copy of the venerable Wall Street Journal, which I think was or is a Rupert Murdoch property:
- “[APAX] the private equity-firm’s potentially conflicting roles as an owner and creditor”. I like the potentially conflicting, don’t you?
- “Some lawyers say wearing both owner and creditor hats can undercut the goals of bankruptcy law.” What are those goals, by the way? Undercut is an interesting word too.
- “By tacking on a debt investment, a private-equity owner can keep control of a company while sometimes using bankruptcy or other means to cut jobs, cancel contracts, or offload pensions and other obligations.” Seems reasonable to take these steps. Hey, it’s just business in 2013.
On April 9, 2013, I arranged for a mortuary to place an obituary for my father in a company mentioned in this April 27, 2013, Wall Street Journal story. The Peoria Journal Star is owned by GateHouse Media, Inc. GateHouse I learned is one of the outfits which may be both a creditor and an owner of properties which are facing financial headwinds. GateHouse Media’s tag line on its Web site says, “We can.”
Well, maybe the company can be listed in the owner-creditor story in the Wall Street Journal. The company may be able to chop staff. And, for sure, the company can mishandle an obituary like nobody’s business.
The Wall Street Journal article pointed out that some firms use various methods to lessen the financial pain. One of them, cited above, is dumping employees. The elegant phrase is “cutting jobs.”
Part III: The Pay Downing
How does this work out in real life?
Well, the Peoria Journal Star’s obituary desk is staffed on what struck me as somewhat loose hours. One person whom I finally reached by calling the City Desk at the Peoria Journal Star said, “I think someone will come in around 9 am.” Another person told me, “There have been many staff reductions at the paper. Most people wear two or three hats.” A bit of Web surfing revealed that the Peoria Journal Star’s obituary “desk” is actually part of the advertising department and that members of the family could not submit obituaries. As it turns out, a member of the family wondering why the obituary did not appear on April 22, 2013, was published only after the mortuary contacted the Peoria Journal Star’s obituary desk a second time on Monday, April 22.
Swinging for the Fences and Search
April 22, 2013
I have been reading—actually time traveling to an economics class in graduate school—David Stockman’s The Great Deformation. I follow the argument. No problem, but I am skeptical of blame from those who were involved in the events. I have been in quite a few crazy meetings, and I avoid discussing the subjects of most of those stories for two reasons: [a] In the midst of events, I had zero clue about the larger, political forces at work in which the meeting was a grain of sand in the larger dust storm and [b] I focus on search and retrieval, a subject definitely not part of the more interesting meetings in which I have participated over the last 40 years.

What impact does the “big bet” approach to investing have on search, content, and analytics vendors?
However, the “deformation” arguments triggered some thinking after I read “Google Investors Say Yes to Big Bets.” I have been looking at some of the reviews of the book. In the Kirkus Review a theme surfaced:
fiscal math hit the shoals,” leaving a legacy of permanent “massive deficit finance” and the legend that “deficits didn’t matter.”
What’s this have to do with search? Well, that is a good question. I took a moment and looked up the venture money which has flowed into a handful of search and content processing companies. Here’s the table in which I captured my result. The link points to the source (maybe a good source, maybe a lousy source).
| Company | Venture Funding | Year Founded |
| Attivio | $48.2 million | 2007 |
| BA Insight | $10.5 million | 2004 |
| Coveo | $34.7 million | 2004 |
| Digital Reasoning* | $5.2 million | 2000 |
| Palantir ** | $301 million | 2004 |
| Vivisimo | $4 million | 2008 |
* The Digital Reasoning number includes In-Q-Tel funding excludes friends, angels, and family funding
** I included Palantir because in one briefing the system was presented as having a robust search function available to analyst users.
If I total these numbers, I get $403.6 million. Tossing out the astounding $301 million for Palantir, the more “searchy” vendors’ funding in this sample total $102.6 million.
Several questions rose in my mind:
First, in today’s economy, how will these firms return to investors their money, interest, and a profit?
Game Over Mode: Consumers and Searching
April 4, 2013
This morning I read “As Web Search Goes Mobile, Competitors Chip at Google’s Lead.” Keep in mind that when the link goes dead you will need the paper edition of the story on pages A 1 and A 4 of the April 4, 2013 issue or a for fee password to the New York Times’s online service.)
The main point is that mobile is surging. For many reasons, mobile search does not work the way desktop search and Web surfing worked when Backrub was bubbling toward Google. The article identifies the geolocation trend where coordinates coupled with some data about user behavior can deliver a place to buy coffee.
The article then says:
No longer do consumers want to search the Web like the index of a book — finding links at which a particular keyword appears. They expect new kinds of customized search, like that on topical sites such as Yelp, TripAdvisor or Amazon, which are chipping away at Google’s hold. Google and its competitors are trying to develop the knowledge and comprehension to answer specific queries, not just point users in the right direction.
The story then points out that there are 30 trillion Web address which is definitely quite a few places to index content. Searching a massive index with 2.5 words just does not work for “consumers.”
The story identifies social systems which put a person closer to someone or some information from someone which answers the user’s question. The wrap up to the article quotes a Google “fellow” who correctly states a Google truism:
“Most people have this very strong Google habit,” he said. “I go there every day and it gives me information I want, so it’s a self-reinforcing cycle. Not anyone can come in and just do those things.”
So what exactly is happening in consumer search? Outfits like Amazon and LinkedIn look like they are growing and presumably taking traffic from Google. On the other hand, Google seems confident that its market share and its remarkable diversity of ways to present information to users is in pretty good shape. Is this a chess-type draw, a paradox, or an analysis which makes search almost impossible to discuss without getting lost in clicks, segments, traffic, and user behavior data?
My view is that search has become a word which is acceptable in some circles and the equivalent of a curse word in others. Consumer wants answers to questions, and according to some experts, answers to questions the user does not know she yet has formulated. Vendors want revenue. Advertisers want people to buy their products and services. Teens want whatever teens want. Each tiny grouping of online users which can be labeled has search needs.
The problem is that figuring out exactly what the “need” is in a specific context is a field where further research and innovation are needed.
Promise Best Practices: Encouraging Theoretical Innovation in Search
March 29, 2013
The photo below shows the goodies I got for giving my talk at Cebit in March 2013. I was hoping for a fat honorarium, expenses, and a dinner. I got a blue bag, a pen, a notepad, a 3.72 gigabyte thumb drive, and numerous long walks. The questionable hotel in which I stayed had no shuttle. Hitchhiking looked quite dangerous. Taxis were as rare as an educated person in Harrod’s Creek, and I was in the same city as Leibnitz Universität. Despite my precarious health, I hoofed it to the venue which was eerily deserted. I think only 40 percent of the available space was used by Cebit this year. The hall in which I found myself reminded me of an abandoned subway stop in Manhattan with fewer signs.
The PPromise goodies. Stuffed in my bag were hard copies of various PPromise documents. The most bulky of these in terms of paper were also on the 3.73 Gb thumb drive. Redundancy is a virtue I think.
Finally on March 23, 2013, I got around to snapping the photo of the freebies from the PPromise session and reading a monograph with this moniker:
Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation. FP7 ICT 20094.3, Intelligent Information Management. Deliverable 2.3 Best Practices Report.
The acronym should be “PPromise,” not “Promise.” The double “P” makes searching for the group’s information much easier in my opinion.
If one takes the first letter of “Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation” one gets PPromise. I suppose the single “P” was an editorial decision. I personally like “PP” but I live in a rural backwater where my neighbors shoot squirrels with automatic weapons and some folks manufacture and drink moonshine. Some people in other places shoot knowledge blanks and talk about moonshine. That’s what makes search experts and their analyses so darned interesting.
To point out the vagaries of information retrieval, my search to a publicly accessible version of the PPromise document returned a somewhat surprising result.
A couple more queries did the trick. You can get a copy of the document without the blue bag, the pen, the notepad, the 3.72 gigabyte thumb drive, and the long walk at http://www.promise-noe.eu/documents/10156/086010bb-0d3f-46ef-946f-f0bbeef305e8.
So what’s in the Best Practices Report? Straightaway you might not know that the focus of the whole PPromise project is search and retrieval. Indexing, anyone?
Let me explain what PPromise is or was, dive into the best practices report, and then wrap up with some observations about governments in general and enterprise search in particular.
Thomson Reuters: The Pointy End of a Business Sector
March 28, 2013
Thomson Reuters has been a leader in professional publishing for many years. I lost track of the company after the management shake up which accompanied the departure of Michael Brown and some other top executives. Truth be told I was involved in work for the US government, and it was new, exciting, and relevant. My work for publishing companies trying to surf the digital revolution reminded me of my part time job air hammering slag at Keystone Steel & Wire Company.
I read “Data Don’t Add Up for Thomson Reuters.” (This online link can go dead or to a pay wall without warning, and I don’t have an easy way to update links in this free blog. So, there you go.) You can find the story in the printed version of the newspaper or online if you have a subscription. The printed version appears on page C-10, March 28, 2013 edition.
The main point is that Thomson Reuters has not been able to grow organically by selling more information to professionals or by buying promising companies and surfing on surging revenue streams. This is an important point, and I will return to it in a moment. The Wall Street Journal story said:
Shares of Thomson Reuters remain 13% below where they were when the deal closed in April 2008, partly reflecting difficulty integrating two large, international companies.
The article runs though other challenges which range from Bloomberg to Dow Jones, from ProQuest to LexisNexis. The article is short, so the list of challenges has been truncated to a handful of big names.
![]()
Do the professional publishing companies have access to talent on a par with Julius Caesar’s capabilities? In my opinion, without management of exceptional skill, professional publishing companies will be sucked through the rip in the fabric of credibility which Thomson Reuters’ pointed spear has created: Flat earnings, more wrenching cost cutting, and products which confuse customers and do not increase revenue and profits. Image from Wikipedia Vercingetorix write up.
But let’s set aside Thomson Reuters. I want to look at the Thomson Reuters’ situation as the pointy end of a spear. The idea is that Thomson Reuters has worked hard for 20 or 30 years to be the best managed, smartest, and most technologically adept company in the professional publishing sector. With hundreds of brands and almost total saturation of certain markets like trademark and patent information, legal information, and data for wheeler dealers—Thomson Reuters has been trying hard, very hard, to make the right moves. Is time running out?
Like the professional publishing sector which includes outfits as diverse as Cambridge Scientific Abstracts, Ebsco Electronic Publishing, Elsevier, and Wolters Kluwers to name a few outfits with hundreds of millions in revenue. Each of these companies share some components:
- Information is “must have” as opposed nice to have
- Information is for-fee, not free
- Customer segments are not spending in the way the analysts predicted
- Deals have not delivered significant new revenue
- Management shifts replace executives with similar, snap in type people. Innovative and disruptive folks find themselves sitting alone at company meetings.
Search Evaluation in the Wild
March 26, 2013
If you are struggling with search, you may be calling your search engine optimization advisor. I responded to a query from an SEO expert who needed information about enterprise search. His clients, as I understood the question, were seeking guidance from a person with expertise in spoofing the indexing and relevance algorithms used by public Web search vendors. (The discussion appeared in the Search-Based Applications (SBA) and Enterprise Search group on LinkedIn. Note that you may need to be a member of LinkedIn to view the archived discussion.)
The whole notion of turning search into marketing has interested me for a number of year. Our modern technology environment creates a need for faux information. The idea, as Jacques Ellul pointed out in Propaganda, is that modern man needs something to fill a void.
How can search deliver easy, comfortable, and good enough results? Easy. Don’t let the user formulate a query. A happy quack to Resistance Quotes.
It, therefore, makes perfect sense that a customer who is buying relevance in a page of free Web results would expect an SEO expert to provide similar functionality for enterprise search. Not surprisingly, the notion of controlling search results based on an externality like key word stuffing or content flooding is a logical way to approach enterprise search.
Precision, recall, hard metrics about indexing time, and the other impedimenta of the traditional information retrieval expert are secondary to results. Like the metrics about Web traffic, a number is better than no number. If the number’s flaws are not understood, the number is better than nothing. In fact, the entire approach to search as marketing is based on results which are good enough. One can see the consequences of this thinking when one runs a query on Bing or on systems which permit users’ comments to influence relevancy. Vivisimo activated this type of value adding years ago and it still is a good example of trying to make search useful. A result which delivers a laundry list of results which forces the user to work through the document list and determine what is useful is gone. If a document has internal votes of excellence, that document is the “right” one. Instead of precision and recall, modern systems are delivering “good enough” results. The user sees one top hit and makes the assumption that the system has made decisions more informed.
There are some downsides to the good enough approach to search which deliver a concrete result which, like Web traffic statistics, looks so solid, so meaningful. That downside is that the user consumes information which may not be accurate, germane, or timely. In the quest for better search, good enough trumps the mentally exhausting methods of the traditional precision and recall crowd.
To get a better feel for the implications of this “good enough” line of thinking, you may find the September 2012 “deliverable” from Promise whose acronym should be spelled PPromise in my opinion, “Tutorial on Evaluation in the Wild.” The abstract for the document does not emphasize the “good enough” angle, stating:
The methodology estimates the user perception based on a wide range of criteria that cover four categories, namely indexing, document matching, the quality of the search results and the user interface of the system. The criteria are established best practices in the information retrieval domain as well as advancements for user search experience. For each criterion a test script has been defined that contains step-by-step instructions, a scoring schema and adaptations for the three PROMISE use case domains.
The idea is that by running what strike me as subjective data collection from users of systems, an organization can gain insight into the search system’s “performance” and “all aspects of his or her behavior.” (The “all” is a bit problematic to me.)







