March 9, 2016
I read “The Search Continues: A History of Search’s Unsatisfactory Progress.” I noted some points which, in my opinion, underscore why enterprise search has been problematic and why the menagerie of experts and marketers have put search and retrieval on the path to enterprise irrelevance. The word that came to mind when I read the article was “revisionism” for the millennials among us.
The write up ignores the fact that enterprise search dates back to the early 1970s. One can argue that IBM’s Storage and Information Retrieval System (STAIRS) was the first significant enterprise search system. The point is that enterprise search as a productized service has a history of over promising and under delivering of more than 40 years.
Customers said they wanted to “find” information. What those individuals meant was have access to information that provided the relevant facts, documents, and data needed to deal with a problem.
Because providing on point information was and remains a very, very difficult problem, the vendors interpreted “find” to mean a list of indexed documents that contained the users’ search terms. But there was a problem. Users were not skilled in crafting queries which were essentially computer instructions between words the index actually contained.
After STAIRS came other systems, many other systems which have been documented reasonably well in Bourne and Bellardo-Hahn’s A History of Online information Services 1963-1976. (The period prior to 1970 describes for-fee research centric online systems. STAIRS was among the most well known early enterprise information retrieval system.) I provided some history in the first three editions of the Enterprise Search Report, published from 2003 to 2007. I have continued to document enterprise search in the Xenky profiles and in this blog.
The history makes painful reading for those who invested in many search and retrieval companies and for the executives who experienced the crushing of their dreams and sometimes career under the buzz saw of reality.
In a nutshell, enterprise search vendors heard what prospects, workers overwhelmed with digital and print information, and unhappy users of those early systems were saying.
The disconnect was that enterprise search vendors parroted back marketing pitches that assured enterprise procurement teams of these functions:
- Easy to use
- “All” information instantly available
- Answers to business questions
- Faster decision making
- Access to the organization’s knowledge.
The result was a steady stream of enterprise search product launches. Some of these were funded by US government money like Verity. Sure, the company struggled with the cost of infrastructure the Verity system required. The work arounds were okay as long as the infrastructure could keep pace with the new and changed word-centric documents. Toss in other types of digital information, make the system perform ever faster indexing, and keep the Verity system responding quickly was another kettle of fish.
Research oriented information retrieval experts looked at the Verity type system and concluded, “We can do more. We can use better algorithms. We can use smart software to eliminate some of the costs and indexing delays. We can [ fill in the blank ].
The cycle of describing what an enterprise search system could actually deliver was disconnected from the promises the vendors made. As one moves through the decades from 1973 to the present, the failures of search vendors made it clear that:
- Companies and government agencies would buy a system, discover it did not do the job users needed, and buy another system.
- New search vendors picked up the methods taught at Cornell, Stanford, and other search-centric research centers and wrap on additional functions like semantics. The core of most modern enterprise search systems is unchanged from what STAIRS implemented.
- Search vendors came like Convera, failed, and went away. Some hit revenue ceilings and sold to larger companies looking for a search utility. The acquisitions hit a high water mark with the sale of Autonomy (a 1990s system) to HP for $11 billion.
What about Oracle, as a representative outfit. Oracle database has included search as a core system function since the day Larry Ellison envisioned becoming a big dog in enterprise software. The search language was Oracle’s version of the structured query language. But people found that difficult to use. Oracle purchased Artificial Linguistics in order to make finding information more intuitive. Oracle continued to try to crack the find information problem through the acquisitions of Triple Hop, its in-house Secure Enterprise Search, and some other odds and ends until it bought in rapid succession InQuira (a company formed from the failure of two search vendors), RightNow (technology from a Dutch outfit RightNow acquired), and Endeca. Where is search at Oracle today? Essentially search is a utility and it is available in Oracle applications: customer support, ecommerce, and business intelligence. In short, search has shifted from the “solution” to a component used to get started with an application that allows the user to find the answer to business questions.
I mention the Oracle story because it illustrates the consistent pattern of companies which are actually trying to deliver information that the u9ser of a search system needs to answer a business or technical question.
I don’t want to highlight the inaccuracies of “The Search Continues.” Instead I want to point out the problem buzzwords create when trying to understand why search has consistently been a problem and why today’s most promising solutions may relegate search to a permanent role of necessary evil.
In the write up, the notion of answering questions, analytics, federation (that is, running a single query across multiple collections of content and file types), the cloud, and system performance are the conclusion of the write up.
The use of open source search systems means that good enough is the foundation of many modern systems. Palantir-type outfits, essential an enterprise search vendors describing themselves as “intelligence” providing systems,, uses open source technology in order to reduce costs, shift bug chasing to a community, The good enough core is wrapped with subsystems that deal with the pesky problems of video, audio, data streams from sensors or similar sources. Attivio, formed by professionals who worked at the infamous Fast Search & Transfer company, delivers active intelligence but uses open source to handle the STAIRS-type functions. These companies have figured out that open source search is a good foundation. Available resources can be invested in visualizations, generating reports instead of results lists, and graphical interfaces which involve the user in performing tasks smart software at this time cannot perform.
For a low cost enterprise search system, one can download Lucene, Solr, SphinxSearch, or any one of a number of open source systems. There are low cost (keep in mind that costs of search can be tricky to nail down) appliances from vendors like Maxxcat and Thunderstone. One can make do with the craziness of the search included with Microsoft SharePoint.
For a serious application, enterprises have many choices. Some of these are highly specialized like BAE NetReveal and Palantir Metropolitan. Others are more generic like the Elastic offering. Some are free like the Effective File Search system.
The point is that enterprise search is not what users wanted in the 1970s when IBM pitched the mainframe centric STAIRS system, in the 1980s when Verity pitched its system, in the 1990s when Excalibur (later Convera) sold its system, in the 2000s when Fast Search shifted from Web search to enterprise search and put the company on the road to improper financial behavior, and in the efflorescence of search sell offs (Dassault bought Exalead, IBM bought iPhrase and other search vendors), and Lexmark bought Brainware and ISYS Search Software.
Where are we today?
Users still want on point information. The solutions on offer today are application and use case centric, not the silly one-size-fits-all approach of the period from 2001 to 2011 when Autonomy sold to HP.
Open source search has helped create an opportunity for vendors to deliver information access in interesting ways. There are cloud solutions. There are open source solutions. There are small company solutions. There are more ways to find information than at any other time in the history of search as I know it.
Unfortunately, the same problems remain. These are:
- As the volume of digital information goes up, so does the cost of indexing and accessing the sources in the corpus
- Multimedia remains a significant challenge for which there is no particularly good solution
- Federation of content requires considerable investment in data grooming and normalizing
- Multi-lingual corpuses require humans to deal with certain synonyms and entity names
- Graphical interfaces still are stupid and need more intelligence behind the icons and links
- Visualizations have to be “accurate” because a bad decision can have significant real world consequences
- Intelligent systems are creeping forward but crazy Watson-like marketing raises expectations and exacerbates the credibility of enterprise search’s capabilities.
I am okay with history. I am not okay with analyses that ignore some very real and painful lessons. I sure would like some of the experts today to know a bit more about the facts behind the implosions of Convera, Delphis, Entopia, and many other companies.
I also would like investors in search start ups to know a bit more about the risks associated with search and content processing.
In short, for a history of search, one needs more than 900 words mixing up what happened with what is.
Stephen E Arnold, March 9, 2016
December 4, 2015
I thought I knew. Time wasting, distractive behavior, and non productive behavior.
Wrong again. I read “What Happens Online Every Minute?” The document is an infographic which reveals a number of factoids. (Who knows if these are accurate or a 20 something daydream.)
- Every minute Vine users play 1,041,666 videos. I like the precision of this number. The happenstance of the sign of the devil is a delight. Remember? 666.
- In seconds Alphabet Googlers who can probably spell “video” nine out of ten times upload 300 hours of new video. The idea is that in one minute, you have the opportunity to fritter away 300 hours of couch potato time whether in a Google self driving car, in your own car, or standing on a line to buy a slice in Manhattan.
- In 1/60th of an hour, Twitter users send 347,222 tweets. How many of these are from marketers? No info. But again the precision of the number is outstanding. I like the 222 number which connotes faith. I have faith in Twitter. Also, 222 is a a strobogrammatic number. Nifty, eh?
View the original. There will be a factoid to make your day or at least a few seconds so you can get back to viewing the video goodness.
Stephen E Arnold, December 4, 2015
January 7, 2015
Social search was supposed to integrate social media and regular semantic search to create a seamless flow of information. This was one of the major search points for a while, yet it has not come to fruition. So what happened? TechCrunch reports that it is “Good Riddance To Social Search” and with good reason, because the combination only cluttered up search results.
TechCrunch explains that Google tried Social Search back in 2009, using its regular search engine and Google+. Now the search engine mogul is not putting forth much effort in promoting social search. Bing tried something by adding more social media features, but it is not present in most of its search results today.
Why did this endeavor fail?
“I think one of the reasons social search failed is because our social media “friendships” don’t actually represent our real-life tastes all that well. Just because we follow people on Twitter or are friends with old high school classmates on Facebook doesn’t mean we like the same restaurants they do or share the politics they do. At the end of the day, I’m more likely to trust an overall score on Yelp, for example, than a single person’s recommendation.”
It makes sense considering how many people consider their social media feeds are filled with too much noise. Having search results free of the noiwy makes them more accurate and helpful to users.
March 21, 2014
The Appen explanation titled Query Relevance delves into the work that the language, search and social technology company has done recently to improve natural language search. Linguist PhD Julie Vonwiller founded the company in 1996 with her engineer husband Chris Vonwiller. In 2010, Appen merged with Butler Hill Group and began making strides in language resources, search, and text. The article explores the issues at hand when it comes to natural language search,
“Even a query as seemingly simple as the word “blue” could be looking for any of the following: a description or picture of the color, a television show, a credit card, a misspelling of an electronic cigarette brand, or a rap artist. By analyzing what the most likely user intent is and returning valid and appropriate results in the correct order of relevance, we encourage a relationship whereby the user will return again and again to our client’s search engine.”
Appen has established a “global network” of locals who are trained experts in the language and local culture. This team allows for the most accurate interpretations of queries from regional users. The company is continually working to improve their processes, both through collaboration with users and advances in the program to provide the best possible results.
Chelsea Kerwin, March 21, 2014
May 29, 2013
Why Are We Still Waiting for Natural Language Processing, an article on The Chronicle of Higher Education, explores the failure of the 21st century to produce Natural Language Processing, or NLP. This would mean the ability of computers to process natural human language. The steps required are explained in the article,
“ In the 1980s I was convinced that computers would soon be able to simulate the basics of what (I hope) you are doing right now: processing sentences and determining their meanings.
To do this, computers would have to master three things. First, enough syntax to uniquely identify the sentence; second, enough semantics to extract its literal meaning; and third, enough pragmatics to infer the intent behind the utterance, and thus discerning what should be done or assumed given that it was uttered.”
Currently, typing a question into Google can result in exactly the opposite information from what you are seeking. This is because it is unable to infer, since natural conversation is full of gaps and assumptions that we are all trained to leap through without failure. According to the article, the one company that seemed to be coming close to this technology was Powerset in 2008. After making a deal with Microsoft, however, their site now only redirects to Bing, a Google clone. Maybe NLP like Big Data, business intelligence, and predictive analytics is just a buzzword with marketing value.
Chelsea Kerwin, May 29, 2013
May 3, 2013
My Overflight for search vendors generated an odd “recent” update. The item originated from Chrlettestuvv’s Blog. The story pointed to an item called “SAIC’s TeraText Solutions Signs Strategic Alliance Agreement with HP.” The source was an “article from Software Industry Report, August 1, 2005.
HP apparently needed something more than TeraText, which shared some similarities with the now forgotten iPhrase and anticipated features in MarkLogic Server today. I find these search- and content-processing related tie ups interesting.
Each time I recall one or some glitch in the Internet surfaces a partner factoid, I am more confident that search vendors and some growth hungry large corporations move from speed dating to speed dating activity. Do the engagements lead to marriages? Sometimes I suppose. Other times the companies, like boy friends and girl friends in high school, the couples just drift apart.
Search, however, remains mostly unchanged.
Stephen E Arnold, May 3, 2013
Sponsored by Augmentext
October 18, 2012
I have been amusing myself with the various analyses of the Google missteps. I just got off the phone with one of my clients who asked me, “What has happened to Google?” My answer, “Nothing.”
For context, check out “Google Reports Profit, Sales That Miss Analysts’ Estimates.” The estimable Bloomberg said:
The company earlier this year spent $12.4 billion on Motorola Mobility Holdings, pushing it further into the hardware market and stepping up its rivalry with Apple Inc. (AAPL) Third-quarter total revenue, including the acquisition of Motorola Mobility, rose 45 percent from a year earlier, while expenses rose 71 percent over the same time period. Motorola Mobility contributed sales of $2.58 billion for the period. Net income declined to $2.18 billion, or $6.53 a share, from $2.73 billion, or $8.33, a year earlier.
Everything looks pretty good considering the dilution of ad precision, the lousy economy, and the lost voice of Larry Page.
Let me highlight two other points. First, the Motorola deal is going to be exciting and expensive. Second, Google operates via controlled chaos. The approach may work like a champ among rocket scientists. Among lesser souls, management is a bit more tricky. Glue together without a clamp Motorola Mobility and controlled chaos, and I think we have a pivot point for the happy crowd in Mountain View.
One more thing: those pesky regulators are not going quietly into the good night.
Stephen E Arnold, October 18, 2012
April 30, 2012
We learned about Million Short. The idea is that a user can NOT out a block of Web sites. The use case is that a query is passed to the system, and the system filters out the top 1,000 or 100,000 Web sites. We think you will want to check it out. Once you have run some sample queries, consider these questions:
- When a handful of Web sites attract the most traffic, is popularity the road to comprehensive information retrieval?
- When sites are NOTted out, what do you gain? What do you lose?
- How do you know what is filtered from any Web search index? Do you run test queries and examine the results?
Enjoy. If you know a librarian, why not involve that person in your test queries?
Stephen E Arnold, May 1, 2012
Sponsored by PolySpot
March 30, 2012
Stochastic Technologies’ Stavros Korokithakis has some very harsh words for Google’s AppEngine in “Going from Loving AppEngine to Hating it in 9 Days.” Is the Google shifting its enterprise focus?
Stochastic’s service Dead Man’s Switch got a huge publicity boost from its recent Yahoo article, which drove thousands of new visitors to the site. Preparing for just such a surge, the company turned months ago to Google’s AppEngine to manage potential customers. At first, AppEngine worked just fine. The hassle-free deployments while rewriting and the free tier were just what the company needed at that stage.
Soon after the Yahoo piece, Stochastic knew they had to move from the free quota to a billable status. There was a huge penalty, though, for one small mistake: Korokithakis entered the wrong credit card number. No problem, just disable the billing and re-enable it with the correct information, right? Wrong. Billing could not be re-enabled for another week.
Things only got worse from there. Korokithakis attempted to change settings from Google Wallet, but all he could do was cancel the payment. He then found that, while he was trying to correct his credit card information, the AppEngine Mail API had reached its daily 100-recipient email limit. The limit would not be removed until the first charge cleared, which would take a week. The write up laments:
At this point, we had five thousand users waiting for their activation emails, and a lot of them were emailing us, asking what’s wrong and how they could log in. You can imagine our frustration when we couldn’t really help them, because there was no way to send email from the app! After trying for several days to contact Google, the AppEngine team, and the AppEngine support desk, we were at our wits’ end. Of all the tens of thousands of visitors that had come in with the Yahoo! article, only 100 managed to actually register and try out the site. The rest of the visitors were locked out, and there was nothing we could do.
Between sluggish payment processing and a bug in the Mail API, it actually took nine days before the Stochastic team could send emails and register users. The company undoubtedly lost many potential customers to the delay. In the meantime, to add charges to injury, the AppEngine task queue kept retrying to send the emails and ran up high instance fees.
It is no wonder that Stochastic is advising us all to stay away from Google’s AppEngine. Our experiences with Google have been positive. Perhaps this is an outlier’s experience?
Cynthia Murrell, March 30, 2012
Sponsored by Pandia.com
February 5, 2012
If it seems like a step backward, that’s because it is: Network Computing declares, “Fat Apps Are Where It’s At.” At least for now.
Writer Mike Fratto makes the case that, in the shift from desktop to mobile, we’re getting ahead of ourselves. Cloud-based applications that run only the user interface on mobile devices are a great way to save space– if you can guarantee constant wireless access to the Web. That’s not happening yet. Wi-Fi is unreliable, and wireless data plans with their data caps can become very expensive very quickly.
There isn’t the screen real estate available on mobile devices–certainly not on phones–to populate menus and pull downs. . . . But that is how desktop apps are designed. Lots of features displayed for quick access because you have the room to do it while still providing enough screen space to write a document or work on a spreadsheet. Try using Excel as a thin app on your phone or tablet. See how long it takes for you to get frustrated.
So, Fratto proposes “fat apps” as the temporary alternative, applications designed for mobile use with local storage that let you continue to work without a connection. Bloatware is back, at least until we get affordable, universal wireless access worked out.
I am getting some app fatigue. What’s the next big thing?
Cynthia Murrell, February 5, 2012
Sponsored by Pandia.com