Promise Best Practices: Encouraging Theoretical Innovation in Search
March 29, 2013
The photo below shows the goodies I got for giving my talk at Cebit in March 2013. I was hoping for a fat honorarium, expenses, and a dinner. I got a blue bag, a pen, a notepad, a 3.72 gigabyte thumb drive, and numerous long walks. The questionable hotel in which I stayed had no shuttle. Hitchhiking looked quite dangerous. Taxis were as rare as an educated person in Harrod’s Creek, and I was in the same city as Leibnitz Universität. Despite my precarious health, I hoofed it to the venue which was eerily deserted. I think only 40 percent of the available space was used by Cebit this year. The hall in which I found myself reminded me of an abandoned subway stop in Manhattan with fewer signs.
The PPromise goodies. Stuffed in my bag were hard copies of various PPromise documents. The most bulky of these in terms of paper were also on the 3.73 Gb thumb drive. Redundancy is a virtue I think.
Finally on March 23, 2013, I got around to snapping the photo of the freebies from the PPromise session and reading a monograph with this moniker:
Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation. FP7 ICT 20094.3, Intelligent Information Management. Deliverable 2.3 Best Practices Report.
The acronym should be “PPromise,” not “Promise.” The double “P” makes searching for the group’s information much easier in my opinion.
If one takes the first letter of “Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation” one gets PPromise. I suppose the single “P” was an editorial decision. I personally like “PP” but I live in a rural backwater where my neighbors shoot squirrels with automatic weapons and some folks manufacture and drink moonshine. Some people in other places shoot knowledge blanks and talk about moonshine. That’s what makes search experts and their analyses so darned interesting.
To point out the vagaries of information retrieval, my search to a publicly accessible version of the PPromise document returned a somewhat surprising result.
A couple more queries did the trick. You can get a copy of the document without the blue bag, the pen, the notepad, the 3.72 gigabyte thumb drive, and the long walk at http://www.promise-noe.eu/documents/10156/086010bb-0d3f-46ef-946f-f0bbeef305e8.
So what’s in the Best Practices Report? Straightaway you might not know that the focus of the whole PPromise project is search and retrieval. Indexing, anyone?
Let me explain what PPromise is or was, dive into the best practices report, and then wrap up with some observations about governments in general and enterprise search in particular.
Thomson Reuters: The Pointy End of a Business Sector
March 28, 2013
Thomson Reuters has been a leader in professional publishing for many years. I lost track of the company after the management shake up which accompanied the departure of Michael Brown and some other top executives. Truth be told I was involved in work for the US government, and it was new, exciting, and relevant. My work for publishing companies trying to surf the digital revolution reminded me of my part time job air hammering slag at Keystone Steel & Wire Company.
I read “Data Don’t Add Up for Thomson Reuters.” (This online link can go dead or to a pay wall without warning, and I don’t have an easy way to update links in this free blog. So, there you go.) You can find the story in the printed version of the newspaper or online if you have a subscription. The printed version appears on page C-10, March 28, 2013 edition.
The main point is that Thomson Reuters has not been able to grow organically by selling more information to professionals or by buying promising companies and surfing on surging revenue streams. This is an important point, and I will return to it in a moment. The Wall Street Journal story said:
Shares of Thomson Reuters remain 13% below where they were when the deal closed in April 2008, partly reflecting difficulty integrating two large, international companies.
The article runs though other challenges which range from Bloomberg to Dow Jones, from ProQuest to LexisNexis. The article is short, so the list of challenges has been truncated to a handful of big names.
Do the professional publishing companies have access to talent on a par with Julius Caesar’s capabilities? In my opinion, without management of exceptional skill, professional publishing companies will be sucked through the rip in the fabric of credibility which Thomson Reuters’ pointed spear has created: Flat earnings, more wrenching cost cutting, and products which confuse customers and do not increase revenue and profits. Image from Wikipedia Vercingetorix write up.
But let’s set aside Thomson Reuters. I want to look at the Thomson Reuters’ situation as the pointy end of a spear. The idea is that Thomson Reuters has worked hard for 20 or 30 years to be the best managed, smartest, and most technologically adept company in the professional publishing sector. With hundreds of brands and almost total saturation of certain markets like trademark and patent information, legal information, and data for wheeler dealers—Thomson Reuters has been trying hard, very hard, to make the right moves. Is time running out?
Like the professional publishing sector which includes outfits as diverse as Cambridge Scientific Abstracts, Ebsco Electronic Publishing, Elsevier, and Wolters Kluwers to name a few outfits with hundreds of millions in revenue. Each of these companies share some components:
- Information is “must have” as opposed nice to have
- Information is for-fee, not free
- Customer segments are not spending in the way the analysts predicted
- Deals have not delivered significant new revenue
- Management shifts replace executives with similar, snap in type people. Innovative and disruptive folks find themselves sitting alone at company meetings.
Search Evaluation in the Wild
March 26, 2013
If you are struggling with search, you may be calling your search engine optimization advisor. I responded to a query from an SEO expert who needed information about enterprise search. His clients, as I understood the question, were seeking guidance from a person with expertise in spoofing the indexing and relevance algorithms used by public Web search vendors. (The discussion appeared in the Search-Based Applications (SBA) and Enterprise Search group on LinkedIn. Note that you may need to be a member of LinkedIn to view the archived discussion.)
The whole notion of turning search into marketing has interested me for a number of year. Our modern technology environment creates a need for faux information. The idea, as Jacques Ellul pointed out in Propaganda, is that modern man needs something to fill a void.
How can search deliver easy, comfortable, and good enough results? Easy. Don’t let the user formulate a query. A happy quack to Resistance Quotes.
It, therefore, makes perfect sense that a customer who is buying relevance in a page of free Web results would expect an SEO expert to provide similar functionality for enterprise search. Not surprisingly, the notion of controlling search results based on an externality like key word stuffing or content flooding is a logical way to approach enterprise search.
Precision, recall, hard metrics about indexing time, and the other impedimenta of the traditional information retrieval expert are secondary to results. Like the metrics about Web traffic, a number is better than no number. If the number’s flaws are not understood, the number is better than nothing. In fact, the entire approach to search as marketing is based on results which are good enough. One can see the consequences of this thinking when one runs a query on Bing or on systems which permit users’ comments to influence relevancy. Vivisimo activated this type of value adding years ago and it still is a good example of trying to make search useful. A result which delivers a laundry list of results which forces the user to work through the document list and determine what is useful is gone. If a document has internal votes of excellence, that document is the “right” one. Instead of precision and recall, modern systems are delivering “good enough” results. The user sees one top hit and makes the assumption that the system has made decisions more informed.
There are some downsides to the good enough approach to search which deliver a concrete result which, like Web traffic statistics, looks so solid, so meaningful. That downside is that the user consumes information which may not be accurate, germane, or timely. In the quest for better search, good enough trumps the mentally exhausting methods of the traditional precision and recall crowd.
To get a better feel for the implications of this “good enough” line of thinking, you may find the September 2012 “deliverable” from Promise whose acronym should be spelled PPromise in my opinion, “Tutorial on Evaluation in the Wild.” The abstract for the document does not emphasize the “good enough” angle, stating:
The methodology estimates the user perception based on a wide range of criteria that cover four categories, namely indexing, document matching, the quality of the search results and the user interface of the system. The criteria are established best practices in the information retrieval domain as well as advancements for user search experience. For each criterion a test script has been defined that contains step-by-step instructions, a scoring schema and adaptations for the three PROMISE use case domains.
The idea is that by running what strike me as subjective data collection from users of systems, an organization can gain insight into the search system’s “performance” and “all aspects of his or her behavior.” (The “all” is a bit problematic to me.)
Cengage: Time to Disengage?
March 25, 2013
Thomson Reuters in “Cengage Learning Hires Restructuring Advisers” reported that a former Thomson property is arranging a modest infusion of cash. “Modest” in this context is about $430 million, which is nothing when compared to the cost of a modern text book. (“See Textbook Prices Are Inflating Even Faster Than Tuition Prices: New Boston University Classifieds for Students Makes Buying Textbooks More Affordable.”)
Cengage used to be Thomson Learning, a sprawling collection of publishing companies. Some of the firms had traditional textbooks; others had combinations of traditional textbooks and electronic versions. My recollection is that the technical infrastructure of the original Thomson Learning was quite diverse. “Diverse” publishing infrastructures in the same organization add significantly to the costs of doing business. “Diverse” is also a stuck brake on innovation because repurposing content is time consuming and labor intensive. Prior to spinning off Thomson Learning to Apax Partners and Omers Capital Partners, Thomson’s senior management were focusing their considerable talents on cost efficiencies. . I assume that the technical infrastructure issues have been resolved.
Debt can be a burden as this illustration from Shape Home Loans suggests?i Does debt enhance agility or is it a financial play disconnected from structural changes such as those described in my “Gadzooks, It’s MOOCs: The Fuss over Open Source Learning” article?
One item in the Thomson Reuters news release caught my attention:
…the company said it had borrowed $430 million, almost all of its remaining credit facility to ensure its businesses have the cash they need. Stamford, Connecticut-based Cengage has a $1.5 billion term loan that matures next year and a total of $5.3 billion of debt as of Dec. 31.
Several observations:
First, this type of cash crunch in publishing is likely to become more common. I wrote a story for Online Searcher about the impact of online learning. There is also a chorus of “if you are smart, you can skip college” echoing around Kentucky. What if the online learning and the “you don’t have to go to college” blend? Companies depending upon the traditional purchasing patterns in education may find that new revenues are not sufficient to keep up with old revenue losses.
Second, the spillover from a Cengage-type of problem will have cascading effects. Examples which come to mind are revenues flowing to such organizations like Ebsco Electronic Publishing, ProQuest, and Wolters Kluwer. These companies are in the education food chain. If Cengage flu becomes contagious, these firms will face some additional financial challenges.
Third, the authors who provide content to the textbook giants have to be paid. With the shift to online courses, some of these authors may take their “fame” and their content and go a new direction. It is now possible for some textbook superstar authors to try to become celebrities. If Google needs knowledge, the company just hires the superstar. Won’t the same approach become possible in the online learning space? Maybe an existing textbook company will corner this market? I am not sure traditional textbook companies have the agility necessary to pull off a slam dunk.
Fourth, the online services like Thomson Reuters’ WestLaw and Reed Elsevier’s LexisNexis may also feel the impact of a shift. On one hand, these systems could gain new content from disaffected textbook publishers and, therefore, more revenue pulling information. On the other hand, traditional online services have been caught flatfooted by the surge in online educational content and may be too late to ride the new revenue train.
Net net: Is it time for customers of Cengage to disengage? A larger question is, “Will the professional publishing and professional online services be able to adjust to yet another sapping of their life blood?” Changes are coming. Many of these shifts will not be gentle, kind, or slow I fear.
Stephen E Arnold, March 25,2013
Navigation Misses the Point of Search and Retrieval
March 18, 2013
How does one become a sheeple? One answer is, “Accept search outputs without critical thinking.”
I don’t want to get into a squabble with the thinkers at Nielsen Norman Group. I suggest you read “Converting Search into Navigation” and then reflect on the fact that this was the basic premise of Endeca and then almost every other search vendor on the planet since the late 1990s. The idea is that users prefer to click than type queries or, better yet, have the system just tell the user what he or she wants without having to do so much as make a click.
Humans want information and most humans don’t want to expend much, if any, effort getting “answers.” In the late 1970s, I worked on a Booz, Allen & Hamilton study which revealed that managers in that pre-Internet Dark Age got information by asking the first person encountered in the hall, a person whom an executive could get on the phone, or by flipping through the old school trade magazines which once flowed into in boxes.
A happy quack to http://red-pill.org/are-you-one-of-the-sheeple-take-the-quiz/
What’s different today? According to the write up, as I understand it, not too much. The article asserts:
Users are incredibly bad at finding and researching things on the web. A few years ago, I characterized users’ research skills as “incompetent,” and they’ve only gotten worse over time. “Pathetic” and “useless” are words that come to mind after this year’s user testing.
There you go. When top quality minds like those Booz, Allen & Hamilton tried to hire took the path of least resistance almost 50 years ago, is it a big surprise that people are clueless when it comes to finding information?
The point of the article is that people who make interfaces have to design for mediocre searchers. Mediocre? How about terrible, clueless, inept, or naive? The article says:
… you should redirect users from a normal SERP to a category page only when their query is unambiguous and exactly matches the category. A search for “3D TV” could go to the subcategory page for these products, but a search for “3D” should generate a regular SERP. (Costco does this correctly, including both 3D televisions and other products relevant to the query.) Until people begin to grasp the complexities of search and develop skills accordingly, businesses that take such extra steps to help users find what they need will improve customer success — and the bottom line.
My view is just a little bit different and not parental like the preceding paragraph.
Come Here, Watson. I Want a Cusp of Commercialization
February 28, 2013
For a moment, I thought I was reading a sitcom script. You judge for yourself. Navigate to “And Now, from IBM, It’s Chef Watson.” If you have an environmentally unfriendly version of the New York Times, you can find the script—sorry, real news story—on page B1 of the February 28, 2013, edition.
Let me highlight several phrases and sentences which I found amusing and somewhat troubling for those trying to convince people to license next generation search systems. Keep in mind that the point of the story is Watson, IBM’s next generation Jeopardy winning search system. The peripatetic Watson has done education, insurance, and cancer cracking. Now, Watson and its formidable technical amalgamation of open source and proprietary code is prepping for the Food Network.
IBM Watson’s is hunting for revenues and finding publicity. Can a $100 billion dollar entity find money in search, content processing, and analytics with a silicon Watson? Someday perhaps.
Here are the items I noted, highlighted in dark red and bold to make the words easy to spot:
First, this phrase, “…tries to expand its [IBM’s] artificial intelligence technology and turn turn Watson into something that actually makes commercial sense.” Reading this statement in the context of Hewlett Packard’s interesting commercial activities related to the write down of the spectacular $11 billion purchase of Autonomy is ripe with irony, probably unintentional too.
Second, I found the phrase “on the cusp of commercialization.” Interesting. The Jeopardy show aired in early 2011. A “cusp,” according to one of the online dictionaries is “A transitional point or time, as between two astrological signs.” Yep, I believe is astrology.
Autonomy: An Anomaly or Bellwether for Search?
February 24, 2013
I don’t pay much attention to the corporate calisthenics at Hewlett Packard. I noted the chatter about layoffs at Autonomy. (See “Layoffs, Hiring to Come at HP’s Autonomy Unit.”) I chuckled at the notion that HP’s management team would write off billions and then try to sell Autonomy. (See HP: “Jefferies Analyst Says CEO Whitman Unlikely To Sell Autonomy, EDS.”)
Allegedly HP is in profit making mode, has its act together, and now sees Android as a way into the booming mobile market. Too late? No, never to late for a giant company which has tilled the ground for generations of MBA students to analyze and discuss. Few companies are quite the case study breeder reactor which HP has become.
The larger question is, “Is Autonomy an anomaly or a bellwether for search, analytics, and content processing?”
A happy quack to this outstanding surprise image from Jokeroo. See http://www.jokeroo.com/pictures/funny/very-unpleasant-surprise.html
Let’s look at the upside. Some folks at HP obviously perceived Autonomy’s technology, industry stature, and customer list as having value. The dollar amount assigned to the “value” is a subject of discussion. The point is that search such looked tempting and too good to pass up. HP talked to wizards, gurus, and poobahs. The information added up to $10 or $11 billion for the deal. The number, after the oddball write off, should have been closer to $2 billion. One cannot argue with the powerful enervating effect talk about the payoff from search and line extensions causes among “rational” managers.
eDiscovery: A Source of Thrills and Reduced Costs?
February 2, 2013
When I hear the phrase “eDiscovery”, I don’t get chills. I suppose some folks do. I read after dinner last night (February 1, 2013) “Letter From LegalTech: The Thrills of E-Discovery.” The author addresses the use of search and content processing technology to figure out which documents are most germane to a legal matter. Once the subset has been identified, eDiscovery provides outputs which “real” attorneys (whether in Bangalore or Binghamton) can use to develop their “logical” arguments.
A happy quack to
One interesting factoid bumps into my rather sharp assessment of the “size” of the enterprise search market generated by an azure chip out. The number was about $1.5 billion. In the eDiscovery write up, the author says:
Nobody seems to know how large the e-discovery market is — estimates range from 1.2 to 2.8 billion dollars — but everyone agree it’s not going anywhere. We’re never going back to sorting through those boxes of documents in that proverbial warehouse.
I like the categorical affirmative “nobody.” The point is that sizing any of the search and content processing markets is pretty much like asking Bernie Madoff type professionals, “How much in liquid assets do you have?” The answer is situational, enhanced by marketing, and believed without a moment’s hesitation.
I know the eDiscovery market is out there because I get lots of PR spam about various breakthroughs, revolutions, and inventions which promise to revolutionize figuring out which email will help a legal eagle win a case with his or her “logical” argument. I wanted to use the word “rational” in the manner of John Ralston Saul, but the rational attorneys are leaving the field and looking for work as novelists, bloggers, and fast food workers.
One company—an outfit called Catalyst Repository Systems—flooded me with PR email spam about its products. I called the company on January 31, 2013. I was treated in an offhand, suspicious manner by a tense, somewhat defensive young man named Mark, Monk, Matt, or Mump. At age 69, I have a tough time figuring out Denver accents. Mark, Monk, Matt, or Mump took my name and phone number. He assured me that his boss would call me back to answer my questions about PR spam and the product which struck me as a “me too.” I did learn that he had six years of marketing experience and that he just “push the send button.” I suggested that he may want to know to whom he is sending messages multiple times, he said, “You are being too aggressive.” I pointed out that I was asking a question just like the lawyers who, one presumes, gobbles up the Catalyst products. He took my name, did not ask how to spell it, wrote down my direct line and did not bother to repeat it back to me, and left me with the impression that I was out of bounds and annoying. That was amusing because I was trying hard to be a regular type caller.
A happy quack to Bitter Lawyer which has information about the pressures upon some in the legal profession. See http://www.bitterlawyer.com/i%E2%80%99m-unemployed-and-feel-ripped-off-by-my-ttt-law-school/
Mark, Monk, Matt, or Mump may have delivered the message and the Catalyst top dog was too busy to give me a jingle. Another possibility is that Mark, Monk, Matt, or Mump never took the note. He just wanted to get a person complaining about PR spam off the phone. Either way, Catalyst qualifies as an interesting example of what’s happening in eDiscovery. Desperation marketing has infected other subsectors of the information retrieval market. Maybe this is an attempt to hit in reality revenues of $1.5 billion?
Thoughts about Commercial Databases: 2013
January 29, 2013
After the dress rehearsal for my weaponized information webinar, a couple of librarians and I were talking about the commercial database business. I narrowed the focus to the commercial outfits selling primary and secondary information to libraries and other professionals; namely, to the legal and health care sectors.
In a nutshell, the digital future does not look too bright for companies such as:
- Ebsco Electronic Publishing (everything but the kitchen sink coverage)
- Elsevier (scientific and technical with Fast Search in its background)
- ProQuest (everything but the kitchen sink coverage plus Dialog)
- Thomson Reuters (multiple disciplines, including financial real time info)
- Wolters Kluwer (mostly legal and medical and a truckload of individual brands)
I just reread “Why Acquisitions Fail: The Five Main Factors by Pearson Education. This outfit has a long and storied past. The irony of Pearson Education explaining the problems of making an acquisition work is interesting but not germane to the main points in the write up. the fact that this item was available to me without charge via the Internet is amusing to me as well. Here’s what the Pearson analyst suggests about the causes of failure:
Survey after survey has proclaimed that most acquisitions fail. Denzil Rankine’s Executive Briefing on Why Acquisitions Fail (FT Prentice Hall) examines why. There are five key factors, which we will examine below:
- Flawed business logic
- Flawed understanding of the new business
- Flawed deal management
- Flawed integration management
- Flawed corporate development
No argument from me. The business model for these firms has been built on selling “must have” information to markets who need the information to do their job. The reason for the stress on this group of companies is that the traditional customers are strapped for cash or have lower cost alternatives.
If one of these outfits buys a company, the likelihood that the acquisition will be a home run revenue success is low. These five companies are bottom-line oriented, so the acquisitions will have to perform. The idea of massive investment to realize the promise of the purchase is not in the game plan.
So big traditional commercial database companies have to find a way to work around the Pearson Education hurdles. Let me consider some of the options available to the Ebscos, Elseviers, ProQuests, Thomsons, and Wolters Kluwers of the world. (Yes, there are oligopolies in a number of other countries, not just the US and Western Europe.)
The Hail, Mary Deal
This is the option which makes investment bankers’ and deal brokers’ hearts go pitty patter. We know how that approach works.
Buy One Another
The idea is that no other outfit wants to buy commercial database companies. Ergo: These outfits buy one another in some combination. Good for the investment bankers but long term, the customers may not be able to cope with ever increasing prices. Librarians, lawyers, and accountants are not exactly in a GEICO made of money mode.
The Microsoft Dell Variant
The idea is that a third party like Google buys one or more commercial database companies and monetizes the content with ads. (I would lobby for this if I were attached to a giant money machine like the Google.)
Fire Sale
I think that Thomson Reuters’ effort to get out of the health fraud business makes clear that the price offered kills the deals. Nevertheless, some of the commercial database publishers may be forced to chop off fingers and toes to keep the core alive. Highly probable path opine I.
Raise Prices and Innovate from Within
This option keeps the Board of Directors engaged. The reality is that such innovation goes nowhere. Ah, I am looking forward to annoyed vice presidents asserting, “I am innovative. We do innovate.” Okay, okay.
Net net?
Big changes are coming for commercial database producers, access to curated content, and the quality of the commercial information. Lawyers are looking to cut costs. No good for Lexis and West. Librarians are under severe financial pressure. Accountants? Accountants don’t want to spend their own money.
Looks like the future is moving in directions different from what these traditional, commercial database producers are going. I suppose after a couple of decades of evolution, the arrival of the End of Times is tough to accept.
Disagree? Agree? Surprise me. Keep in mind that I don’t have a stake in these companies and find myself baffled by the management challenges each has created for itself.
Stephen E Arnold, January 29, 2013
Sponsored by Dumante.com
Forrester Fills the Gap in Search Market Size Estimates
January 25, 2013
I used to enjoy the search market size estimates of IDC (the time it takes to find info group), Gartner (the magic quad folks), Forrester (yep, the “wave” people), and Ovum (we do it all experts), among others.
I read “Growth of Big Data in Businesses Intensifies Global Demand for Enterprise Search Solutions, Finds Frost & Sullivan” and found several items of interest in the brief news story which arrived via Germany. Is Germany a leader in enterprise search? I heard that 99 percent of Germany’s search means Google. The numerous open source players are not setting the non-German world on fire, but I could be wrong. Check out GoPubMed, for example, of an interesting system which has a modest profile.
Now to the size of the search market.
The first thing I noticed was the nod to Big Data, which is certainly the hook on which many dreams for Big Money hang. With enterprise search vendors looking for a way to gain traction in a market which has been caught in awkward positions when licensing and deploying “search,” new words and new Velcro patches are needed. I won’t mention the Hewlett Packard Autonomy matter nor the Fast Search & Transfer matter nor the millions pumped into traditional search vendors with little chance of paying back the investments. No. No. No.
I want to quote this statement from :
The growth of Big Data across verticals presents the enterprise search solutions market with further opportunities. Since newer data types are not confined to a relational database within an organization, solutions that can search information outside the scope of these relational frameworks are widely accepted. Demand for personalized search tools that operate in a pool of unlimited data from internal servers, the Internet, or third-party sources is also growing.
Ah, but how does one crawfish away from exaggeration? Easy. I noted:
However, the disparity between customer expectations and actual search outcomes could dissuade future investments. Customers expect a single query to retrieve the right results immediately. Therefore, search providers must offer timely and relevant results, taking into account the continuous addition of new data to repositories.
But “How big is the market? my inner child yelps. The answer: