An Enterprise Search Case Studies Is Like Hunting for Digital Truffles

July 15, 2014

The job hunters, experts, and consultants in the LinkedIn enterprise search discussion groups have been looking for positive use cases related to enterprise search. Finding a success story that one can verify is similar to hunting truffles. Keep you eye on the pig, or the truffle will disappear.

I did come across one use case published in the Italian Journal of Library and Information Science. You can find it at The title of the paper is “Using a Google Search Appliance (GSA) to search digital library collections: a case study of the INIS Collection Search.” The problem search system was BASISPlus, now a product marketed and mostly “frozen” by OpenText.

The original version of BASIS was created at Battellle Memorial Institute in the late 1960s. Battelle spun out BASIS and Information Dimensions was the result. In 1998, OpenText bought BASIS, and I don’t think there has been much modernization of the system in the last couple of decades.

Yep, that’s an old school mainframey type system. A colleague and I used BASIS when it was an Information Dimensions’ product to provide data management, report, and search functionality to a Bell Communications Research (a chunk of what was Bell Labs) system that was used by the seven Baby Bells for a number of years.

My team and I loved big iron and FORTRAN. We stuffed the IBM MVS TSO system with some tasty BASIS sausage in 1983.

Well, the use case explains that BASIS was not the solution today’s users required. The fix was to license the Google Search Appliance. You can get a taste of the GSA’s license and fail over cluster costs at Prepare yourself for sticker shock.

Keep in mind that “positive” has a spectrum of meanings determined by the reader’s context. The solution is the Google Search Appliance. You know this product as a search toaster…sort of. The advantages and disadvantages section of the use case hammers on the good parts and tiptoes around the thorns.

Stephen E Arnold, July 15, 2014

Funnelback Demonstration: Australian Government Grants

July 15, 2014

I saw a link to what seems to be an implementation of the Funnelback search system. Some folks see Funnelback as an alternative to the Google Search Appliance (a comparison that eludes me) or Elasticsearch (a little closer to the mark in my opinion).

Navigate to Enter a query. I used the term “aboriginal.” The results demonstrate that Funnelback has implemented some features that I associate with the 1998-2001 version of Endeca and a Google style results list.

Here’s the variant of what Endeca called “Guided Navigation”:


Here’s the Google style results list:


For a discussion of how one can integrate Squiz Matrix (a content management system) with Funnelback, navigate to the Squizsuite discussion board.

Years ago, I learned that Funnelback was a project of a university/Australian government project. Funnelback popped out of its incubator program and became part of Squiz in 2009. Even though Squiz flies the open source flag, Funnelback is a commercial product.

My Overflight archive shows that when I provided a profile of Funnelback to Commonwealth Scientific and Industrial Research Organization, I received inputs from someone. When the profile was published, the wizard responsible for Funnelback complained that the profile did not reflect his view of the system.

Since that initial interaction with Funnelback and its resident wizard, I have kept the system on my back burner. Getting one’s ducks in a row can be helpful when a third party is writing about a search system for inclusion in a monograph about information retrieval.

My recommendation is to talk with licensees and then, if possible, use the system and run some tests. Accepting a statement that Funnelback is an alternative to the Google Search Appliance is a stretch based on my experience. Is Funnelback comparable to Elasticsearch? The answer is that Elasticsearch has about $100 million in venture funding, outfits who make access to Elasticsearch a cloud solution that requires less fiddling than an on premises solutions, and developers coming out of the woodwork. See, for example, this Search Wizards Speak interview.

Marketing does not equal employee satisfaction with a search system. Testing and analysis are often useful, not the baloney generated by some of the wizards who advise potential licensees. One outfit is selling my work via Amazon without my permission, without a valid contract with me, and without sharing the fee for a report based on my work. When “wizards” run companies, caution is advised.

Stephen E Arnold, July 15, 2014

Is it Possible, A Search Engine for Apps?

July 15, 2014

This cannot be true, a complicit search engine across multiple apps? Technology Review says that Quixey is working on such a beast in the article “Search Startup Quixey Aims To Be The Google Of The App Era.” More people are using applications over spending time on the Web and they’re using multiple apps as entryways to Internet content. It is bothersome to open one app to find one spec of information, and then do the same in another. Quixey wants to be the Google of applications and they have $74 million, 150 employees, and an old appliance store in Mountain View, CA to work on the idea.

Quixey’s wants the search box on a device to take queries and actively take the user to the exact information and action in an app to answer it. They have released a demo that searches for food and drink information using Yelp and Urbanspoon.

“ ‘The way people interact with the third-party apps installed on their phones is broken,’ says Liron Shapira, a cofounder and CTO of the company. It doesn’t make sense for people to hunt through a clutter of icons to find the app they need and to have to memorize how to navigate inside each one, he says. ‘A search bar is the better way to use third party apps—and the Quixey vision is to put that search bar on every device.’ That approach should be able to offer broader functionality than voice operated assistants such as Apple’s Siri or Microsoft’s Cortana, he claims.”

Quixey’s search engine is different from Web search engines. Instead of building a Web site index, its index is built on app store information, review sites, and deep links. Deep links are a hyperlink type that points to a specific place or function inside a mobile app. Deep links are mostly used for advertising, but Quixey wants to use them for function over advertisement. The company is still working out the project’s kinks and they are competing with Google, but what Google lacks Quixey is moving into the territory.

Whitney Grace, July 15, 2014

Sponsored by, developer of Augmentext

Will Germany Scrutinize Google Web Search More Closely?

July 14, 2014

Several years ago, I learned a hard-to-believe factoid. In Denmark, 99 percent of referrals to a major financial service firm’s Web site came via Google. Figuring prominently was My contact mentioned that the same traffic flow characterized the company’s German affiliate; that is, if an organization wanted Web traffic, Google was then the only game in town.

I no longer follow the flips and flops of Euro-centric Google killers like Quaero. I have little or no interest in assorted German search revolutions whether from the likes of the Weitkämper Clustering Engine or the Intrafind open source play or the Transinsight Enterprise Semantic Intelligence system. Although promising at one time, none of these companies offers an information retrieval that could supplant Google for German language search. Toss in English and the other languages Google supports, and the likelihood of a German Google killer decreases.

I read “Germany Is Looking to Regulate Google and Other Technology Giants.” I found the write up interesting and thought provoking. I spend some time each day contemplating the search and content processing sectors. I don’t pay much attention to the wider world of business and technology.

The article states:

German officials are planning to clip the wings of technology giants such as Google through heavier regulation.

That seems cut and dried. I also noted this statement:

The German government has always been militant in matters of data protection. In 2013, it warned consumers against using Microsoft’s Windows 8 operating system due to perceived security risks, suggesting that it provided a back door for the US National Security Agency (NSA). Of course, this might have had something to do with the fact that German chancellor Angela Merkel was one of the first high-profile victims of NSA surveillance, with some reports saying that the NSA hacked her mobile phone for over a decade.

My view is that search and content processing may be of particular interest. After all, who wants to sit and listen to a person’s telephone calls. I would convert the speech to text and hit the output with one of the many tools available to attach metadata, generate relationship maps, tug out entities like code words and proper names. Then I would browse the information using an old fashioned tabular report. I am not too keen on the 1959 Cadillac tail fin visualizations that 20 somethings find helpful, but to each his or her own I say.

Scrutiny of Google’s indexing might reveal some interesting things to the team assigned to ponder Google from macro and micro levels. The notion of timed crawls, the depth of crawls, the content parsed and converted to a Guha type semantic store, the Alon Halevy dataspace, and other fascinating methods of generating meta-information might be of interest to the German investigate-the-US-vendors team.

My hunch is that scrutiny of Google is likely to lead to increased concern about Web indexing in general. That means even the somewhat tame Bing crawler and the other Web indexing systems churning away at “public” sites’ content may be of interest.

When it comes to search and retrieval, ignorance and bliss are bedfellows. Once a person understands the utility of the archives, the caches, and the various “representations” of the spidered and parsed source content, bliss may become FUD (a version of IBM’s fear, uncertainty and doubt method). FUD may create some opportunities for German search and retrieval vendors. Will these outfits be able to respond or will the German systems remain in the province of Ivory Tower thinking?

In the short term, life will be good for the law firms representing some of the non German Web indexing companies. I wonder, “Is the Google Germany intercept matter included in the young attorneys’ legal education in Germany?”

Stephen E Arnold, July 14, 2014

Useful Glossary to Search Short Text

July 13, 2014

The Daily Mail published a list of 60 new abbreviations. If you have access to short message content, the list may be helpful for some queries. You can find the list at My faves allow me to say, “DGAF about the vendor’s OOTD. Very classy stuff.

Stephen E Arnold, July 13, 2014

YouTube: What Does Google Need? Money?

July 13, 2014

I read an exclusive to Thomson Reuters. I must admit I was a bit confused about what Google is or is not doing with YouTube.

You can find the “exclusive” (for the time being) at “YouTube Weighs Funding Efforts to Boost Premium Content—Sources.” This is, because it carries the Reuters’ logo, a “real” news story I presume.

The story jumps out of the gate with the suggestion that Google needs money. Digital video is the new living room for couch potatoes. If Google needs money, it the firm’s ad revenue flow insufficient to realize Hollywood-style fancies.

Here’s a passage I marked:

YouTube is by far the world’s most popular location for video streaming, with more than 1 billion unique visitors a month, far surpassing Netflix Inc and Amazon. But it is trying to lure more marketers for premium video advertising, boosting margins as overall prices for Google’s advertising declines.

There you go. But we learn that the special channel investment was a less than stellar success:

YouTube set aside an estimated $100 million in late 2011 to bankroll some 100 channels, though it never confirmed amounts spent or other details. Beneficiaries of that largesse included Madonna and ESPN, as well as lesser-known creators. Reuters was one of the companies that received funds for a channel. But few of those have garnered much mainstream attention

Is it possible that the write up suggests that when Thomson Reuters tried out the dedicated channel thing with YouTube, the test was a belly flop.

I find video ads are sort of an annoyance. In fact, I can’t figure out how to make them go away. My solution is to not look at the video. I browsed some videos of the SU 27 and did not encounter ads one day. Try this query on YouTube and on Google Video:


Here’s what I saw today.


Link is

Variable ads. Errors. Then a few videos of the only fighter aircraft that can do a cobra. Unfamiliar with the move? Ask around for a fighter pilot up on slick moves.

I was baffled. Is Google hunting for investments or is Google just doing Google moon shot thinking? My take on the write up is that Google is flipping rocks, looking for money.


When the online ad world shifts more aggressively from online search ads to other types of marketing, Google has to find a way to deal with its looming crossover of revenue and costs. Amazon is struggling with the same issue. I find giant, dominant, digital entities interesting. One is never sure of their motives whether it is a “real” journalism outfit or an online ad company.

What’s happened to search? Oh, right, I forgot. The new Google was Google Plus and social search. How did that approach to search (text and video) work out? Why are there two video search systems available? Is Google in sync with the couch potato market and the hot buttons of Hollywood moguls? I don’t know.

Stephen E Arnold, July 13, 2014

Search, Not Just Sentiment Analysis, Needs Customization

July 11, 2014

One of the most widespread misperceptions in enterprise search and content processing is “install and search.” Anyone who has tried to get a desktop search system like X1 or dtSearch to do what the user wants with his or her files and network shares knows that fiddling is part of the desktop search game. Even a basic system like Sow Soft’s Effective File Search requires configuring the targets to query for every search in multi-drive systems. The work arounds are not for the casual user. Just try making a Google Search Appliance walk, talk, and roll over without the ministrations of an expert like Adhere Solutions. Don’t take my word for it. Get your hands dirty with information processing’s moving parts.

Does it not make sense that a search system destined for serving a Fortune 1000 company requires some additional effort? How much more time and money will an enterprise class information retrieval and content processing system require than a desktop system or a plug-and-play appliance?

How much effort is required to these tasks? There is work to get the access controls working as the ever alert security manager expects. Then there is the work needed to get the system to access, normalize, and process content for the basic index. Then there is work for getting the system to recognize, acquire, index, and allow a user to access the old, new, and changed content. Then one has to figure out what to tell management about rich media, content for which additional connectors are required, the method for locating versions of PowerPoints, Excels, and Word files. Then one has to deal with latencies, flawed indexes, and dependencies among the various subsystems that a search and content processing system includes. There are other tasks as well like interfaces, work flow for alerts, yadda yadda. You get the idea of the almost unending stream of dependent, serial “thens.”

When I read “Why Sentiment Analysis Engines need Customization”, I felt sad for licensees fooled by marketers of search and content processing systems. Yep, sad as in sorrow.

Is it not obvious that enterprise search and content processing is primarily about customization?

Many of the so called experts, advisors, and vendors illustrate these common search blind spots:

ITEM: Consulting firms that sell my information under another person’s name assuring that clients are likely to get a wild and wooly view of reality. Example: Check out IDC’s $3,500 version of information based on my team’s work. Here’s the link for those who find that big outfits help themselves to expertise and then identify a person with a fascinating employment and educational history as the AUTHOR.



In this example from, notice that my work is priced at seven times that of a former IDC professional. Presumably Mr. Schubmehl recognized that my value was greater than that of an IDC sole author and priced my work accordingly. Fascinating because I do not have a signed agreement giving IDC, Mr. Schubmehl, or IDC’s parent company the right to sell my work on Amazon.

This screen shot makes it clear that my work is identified as that of a former IDC professional, a fellow from upstate New York, an MLS on my team, and a Ph.D. on my team.



I assume that IDC’s expertise embraces the level of expertise evident in the TechRadar article. Should I trust a company that sells my content without a formal contract? Oh, maybe I should ask this question, “Should you trust a high  profile consulting firm that vends another person’s work as its own?” Keep that $3,500 price in mind, please.

ITEM: The TechRadar article is written by a vendor of sentiment analysis software. His employer is Lexalytics / Semantria (once a unit of Infonics). He writes:

High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word. The better NLP engines out there will make this entire process a piece of cake. Without this kind of customization, the machine could very well be useless in your work. When you choose a sentiment analysis engine, make sure it allows for customization. Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

When a vendor describes “natural language processing” with the phrase “high quality” I laugh. NLP is a work in progress. But the stunning statement in this quoted passage is:

Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

Amazing, a vendor wrote this sentence. Unless a licensee of a “high quality” NLP system invests in customizing, the system will “never get accurate results.” I quite like that categorical never.

ITEM: Sentiment analysis is a single, usually complex component of a search or content processing system. A person on the LinkedIn enterprise search group asked the few hundred “experts” in the discussion group for examples of successful enterprise search systems. If you are a member in good standing of LinkedIn, you can view the original query at this link. [If the link won’t work, talk to LinkedIn. I have no idea how to make references to my content on the system work consistently over time.] I pointed out that enterprise search success stories are harder to find than reports of failures. Whether the flop is at the scale of the HP/Autonomy acquisition or a more modest termination like Overstock’s dumping of a big name system, the “customizing” issues is often present. Enterprise search and content processing is usually:

  • A box of puzzle pieces that requires time, expertise, and money to assemble in a way that attracts and satisfies users and the CFO
  • A work in progress to make work so users are happy and in a manner that does not force another search procurement cycle, the firing of the person responsible for the search and content processing system, and the legal fees related to the invoices submitted by the vendor whose system does not work. (Slow or no payment of licensee and consulting fees to a search vendor can be fatal to the search firm’s health.)
  • A source of friction among those contending for infrastructure resources. What I am driving at is that a misconfigured search system makes some computing work S-L-O_W. Note: the performance issue must be addressed for appliance-based, cloud, or on premises enterprise search.
  • Money. Don’t forget money, please. Remember the CFO’s birthday. Take her to lunch. Be really nice. The cost overruns that plague enterprise search and content processing deployments and operations will need all the goodwill you can generate.

If sentiment analysis requires customizing and money, take out your pencil and estimate how much it will cost to make NLP and sentiment to work. Now do the same calculation for relevancy tuning, index tuning, optimizing indexing and query processing, etc.

The point is that folks who get a basic key word search and retrieval system work pile on the features and functions. Vendors whip up some wrapper code that makes it possible to do a demo of customer support search, eCommerce search, voice search, and predictive search. Once the licensee inks the deal, the fun begins. The reason one major Norwegian search vendor crashed and burned is that licensees balked at paying bills for a next generation system that was not what the PowerPoint slides described. Why has IBM embraced open source search? Is one reason to trim the cost of keeping the basic plumbing working reasonably well? Why are search vendors embracing every buzzword that comes along? I think that search and an enterprise function has become a very difficult thing to sell, make work,  and turn into an evergreen revenue stream.

The TechRadar article underscores the danger for licensees of over hyped systems. The consultants often surf on the expertise of others. The vendors dance around the costs and complexities of their systems. The buzzwords obfuscate.

What makes this article by the Lexalytics’ professional almost as painful as IDC’s unauthorized sale of my search content is this statement:

You’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

I agree with this statement.

Stephen E Arnold, July 11, 2014

IBM: Hitting Numbers by Chasing Medium Sized Fish, Not Whales

July 11, 2014

I scanned my false drop stuffed Yahoo Alert a moment ago (5 04 am Eastern time). I clicked a link with the fetching headline “Enterprise Search Adoption among Midsize Firms.” The core of the story is a reference to an allegedly accurate survey from another publisher. I learned “nearly 40 percent of IT departments reported that they have already invested or plan to invest in enterprise search solutions.” Yikes. That means that 60 percent of midsize firms cannot locate information. Looks like a great opportunity to license an enterprise search system. I wondered who was at the root of this article and had such confidence in a market that probably is expensive to convince to pump big bucks into a Google Search Appliance (starts at $50,000 or so), an Autonomy IDOL hosted service or Amazon Search service with no cap on costs, or sign up for a bargain basement hosted search system until the ministrations of an expensive consultant are required. Most organizations use one of the default, utility search systems already included with other applications; for example, Microsoft’s search feature or a freeeware system like Effective File Search or an open source system like Sphinx Search or Searchdaimon.

After clicking of a few links I was directed to the eminence gris behind this article. Guess who? IBM. The link pointed me to Yep, IBM wants to recover the billion tossed into Watson (really helpful for a midsize business wanting to win a game show or develop a recipe) or the $3 billion extending Moore’s Law.

I know from industry chatter at the trade shows I attend that there is concern about the future of IBM. This does not come just from those customers who pine for the good old days when IBM engineers delivered expensive but top notch service. Nope. The laments come from IBM professionals. I think I heard words like “lost its way,” “chaotic,” and “floundering.”

Several observations:

ITEM: Selling big buck enterprise search services to midsize firms is expensive, slow, and difficult. If these firms were able to float the boats of other search vendors, the vendors would be in high cotton. The middle market already has search and that’s why 60 percent of the outfits in the allegedly accurate survey are not buying standalone systems. Almost every piece of software includes a finding function. These are either good enough or are not used because users have found workarounds.

ITEM: IBM fees are going to cause even “large” midsize businesses (oxymoronic, right?) to pause. Imagine the cost impact of paying IBM sales people to pitch a product/service that a potential customers does not want, cannot afford, or already has available. Losses mount. Seems obvious to me.

ITEM: The clumsy content marketing ploy of creating a content free article and then pitching IBM as a generic solution is silly. Navigate to the IBM Small and Medium Business Solution page. IBM is offering “customized solutions.”

I don’t think the solution is on point. I don’t think the marketing approach is particularly useful. I don’t think the midsize business will beat a path to the door of a company known to sell expensive services while funding billion dollar pipe dreams.

You can, however, sign up for Forward View, an eMagazine. Yep, helpful.

Call me skeptical.

Stephen E Arnold, July 11, 2014

Comparing Apples and a Bunch of Grapes a Common Misunderstanding about DuckDuckGo

July 11, 2014

Over at OS News, Thom Holwerda disagrees with a recent, positive review on search engine DuckDuckGo in, “Review: DuckDuckGo Compared to Google, Bing, Yandex.” A user going by “sb56637” at had found that:

“In many respects the tiny DuckDuckGo holds its own against the giant that is Google, and even more so if the user is willing to slightly manipulate the search query to work around DuckDuckGo’s temperamental intelligence layer. So it is heartening to see that DuckDuckGo is a viable alternative to Google by its own merits.”

As our readers may know, usage of DuckDuckGo has grown heartily as people have become more interested in not being tracked. That’s why sb56637 was so happy to call the site a “top-notch search engine.” Holwerda, however, did not have similar success when he tried to substitute the Duck for Google. He writes:

“I tried the ‘new’ DDG as well since it came out, setting it as my default search engine. Sadly, my experience wasn’t as positive – it simply didn’t find the things I was looking for about 80% of the time. Within a few days, I got into the habit of simply adding !g to every search query to go straight to Google anyway since that gave me the results I was looking for.”

Perhaps that is because DuckDuckGo is a metasearch engine, while the rest are not. (Metasearch engines mix results from several search engines.) Recall that reviewer sb56637 noted that having to adjust to DuckDuckGo’s “temperamental intelligence layer” is kind of a pain. It seems those willing to do some research and make the adjustments, though, can have both (comparatively better) privacy and good results.

Cynthia Murrell, July 11, 2014

Sponsored by, developer of Augmentext

AMI: From Albert Search to Market Intelligence

July 10, 2014

Editor’s Note: This is information that did not make Stephen E Arnold’s bylined article in Information Today. That  forthcoming Information Today story about French search and content processing companies entering the US market. Spoiler alert: The revenue opportunities and taxes appear to be better in the US than in France. Maybe a French company will be the Next Big Thing in search and content processing. Few French companies have gained significant search and retrieval traction in the US in the last few years. Arguably, the most successful firm is the image recognition outfit called A2iA. It seems that French information retrieval companies and the US market have been lengthy, expensive, and difficult. One French company is trying a different approach, and that’s the core of the Information Today story.)

In 1999, I learned about a Swiss enterprise search system. The working name was, according to my Overflight archive, was AMI Albert.The “AMI” did not mean friend. AMI shorthand for Automatic Message Interpreter.

Flash forward to 2014. Note that a Google query for “AMI” may return hits for AMI International a defense oriented company as well as hits to American Megatrends, Advanced Metering Infrastructure, ambient intelligence, the Association Montessori International, and dozens of other organizations sharing the acronym. In an age of Google, finding a specific company can be a challenge and may inhibit some potential customers ability to locate a specific vendor. (This is a problem shared by Thunderstone, for example. The game company makes it tough to locate information about the search appliance vendor.)


Basic search interface as of 2011.

Every time I update my files, I struggle to get specific information. Invariably I get an email from an AMI Software sales person telling me, “Yes, we are growing. We are very much a dynamic force in market intelligence.”

The UK Web site for the firm is The French language Web site for the company is And the English language version of the French Web site is at The company’s blog is at, but the content is stale. The most recent update as of July 7, 2014, is from December 2013. The company seems to have shifted its dissemination of news to LinkedIn, where more than 30 AMI employees have a LinkedIn presence. The blog is in French. The LinkedIn postings are in English. Most of the AMI videos are in French as well.

admi adv search

Advanced Search Interface as of 2011.

The Managing Director, according to, is Alain Beauvieux. The person in charge of products is Eric Fourboul. The UK sales manager is Mike Alderton.

Mr. Beauvieux is a former IBMer and worked at LexiQuest, which originally formerly Erli, S.A. LexiQuest (Clementine) was acquired by SPSS. SPSS was, in turn, acquired by IBM, joining other long-in-the-tooth technologies marketed today by IBM. Eric

Fourboul is a former Dassault professional, and he has some Microsoft DNA in his background.

Read more

« Previous PageNext Page »