Enterprise Search: The Valiant Fight On

May 17, 2016

I read “VirtualWorks and Language Tools Announce Merger.” I ran across Language Tools several years ago. The company was working to create components for ElasticSearch’s burgeoning user base. The firm espoused natural language processing as a core technology. NLP is useful, but it imposes some computational burdens on some content processing functions. ElasticSearch works pretty well, and there are a number of companies optimizing, integrating, and creating widgets to make life with ElasticSearch better, faster, and presumably more impressive than the open source system is.

This news release highlights the fact that VirtualWorks and Language Tools have merged. The financial details are not explicit, and it appears that a company founded by a wizard from Citrix will make Language Tools’ R&D hub for the Florida-based VirtualWorks’ operation.

According to the story:

The combined organization brings together best of breed core technologies in the areas of enterprise search, data management, text analytics, discovery techniques and analytics to enable the development of new and exciting next generation applications in the business intelligence space.

VirtualWorks is or was a SharePoint centric solution. Like other search vendors, the company uses connectors to suck data into a central indexing point. Users then search the content and have access to the content without having to query separate systems.

This idea has fueled enterprise search since the days of Verity, Autonomy, Fast Search, Convera, et al. The real money today seems to be in the consulting and engineering services required to make enterprise search useful.

SharePoint is certainly widely used, and it is fraught with interesting challenges. Will the lash up of these two firms generate the type of revenue once associated with Autonomy and Fast Search & Transfer?

My hunch is that enterprise search continues to be a tough market. There are functional solutions to locating information available as open source or at comparatively modest license fees. I am thinking of dtSearch and Maxxcat. Both of these work well within Microsoft centric environments.

Stephen E Arnold, May 17, 2016

Facebook and Humans: Reality Is Not Marketing

May 16, 2016

I read “Facebook News Selection Is in Hands of Editors Not Algorithms, Documents Show.” The main point of the story is that Facebook uses humans to do work. The idea is that algorithms do not seem to be a big part of picking out what’s important.

The write up comes from a “real” journalism outfit. The article points out:

The boilerplate about its [Facebook’s]  news operations provided to customers by the company suggests that much of its news gathering is determined by machines: “The topics you see are based on a number of factors including engagement, timeliness, Pages you’ve liked and your location,” says a page devoted to the question “How does Facebook determine what topics are trending?”

After reading this, I thought of Google’s poetry created by its artificial intelligence system. Here’s the line which came to mind:

I started to cry. (Source: Quartz)

I vibrate with the annoyance bubbling under the surface of the newspaper article. Imagine. Facebook has great artificial intelligence. Facebook uses smart software. Facebook open sources its systems and methods. The company says it is at the cutting edge of replacing humans with objective procedures.

The article’s belief in baloney is fried and served cold on stale bread. Facebook uses humans. The folks at real journalism outfits may want to work through articles like “Different Loci of Semantic Interference in Picture Naming vs. Word-Picture Matching Tasks” to get a sense of why smart systems go wandering.

So what’s new? Palantir Technologies uses humans to index content. Without that human input, the “smart” software does some useful work, but humans are part of the work flow process.

Other companies use humans too. But the marketing collateral and the fizzy presentations at fancy conferences paint a picture of a world in which cognitive, artificially intelligent, smart systems do the work that subject matter experts used to do. Humans, like indexers and editors, are no longer needed.

Now reality pokes is rose tinted fingertips into the real world.

Let me be clear. One reason I am not happy with the verbiage generated about smart software is one simple fact.

Most of the smart software systems require humans to fiddle at the beginning when a system is set up, while the system operates to deal with exceptions, and after an output is produced to figure out what’s what. In short, smart software is not that smart yet.

There are many reasons but the primary one is that the math and procedures underpinning many of the systems with which I am familiar are immature. Smart software works well when certain caveats are accepted. For example, the vaunted Watson must be trained. Watson, therefore, is not that much different from the training Autonomy baked into its IDOL system in the mid 1990s. Palantir uses humans for one simple reason. Figuring out what’s important to a team under fire with software works much better if the humans with skin in the game provide indexing terms and identify important points like local names for stretches of highway where bombs can be placed without too much hassle. Dig into any of the search and content processing systems and you find expenditures for human work. Companies licensing smart systems which index automatically face significant budget overruns, operational problems because of lousy outputs, and piles of exceptions to either ignore or deal with. The result is that the smoke and mirrors of marketers speaking to people who want a silver bullet are not exactly able to perform like the carefully crafted demonstrations. IBM i2 Analyst’s Notebook requires humans. Fast Search (now an earlobe in SharePoint) requires humans. Coveo’s system requires humans. Attivio’s system requires humans. OpenText’s suite of search and content processing requires humans. Even Maxxcat benefits from informed set up and deployment. Out of the box, dtSearch can index, but one needs to know how to set it up and make it work in a specific Microsoft environment. Every search and content processing system that asserts that it is automatic is spackling flawed wallboard.

For years, I have given a lecture about the essential sameness of search and content processing systems. These systems use the same well known and widely taught mathematical procedures. The great breakthroughs at SRCH2 and similar firms amount to optimization of certain operations. But the whiziest system is pretty much like other systems. As a result, these systems perform in a similar manner. These systems require humans to create term lists, look up tables of aliases for persons of interest, hand craft taxonomies to represent the chunk of reality the system is supposed to know about, and other “libraries” and “knowledgebases.”

The fact that Watson is a source of amusement to me is precisely because the human effort required to make a smart system work is never converted to cost and time statements. People assume Watson won Jeopardy because it was smart. People assume Google knows what ads to present because Google’s software is so darned smart. People assume Facebook mines its data to select news for an individual. Sure, there is automation of certain processes, but humans are needed. Omit the human and you get the crazy Microsoft Tay system which humans taught to be crazier than some US politicians.

For decades I have reminded those who listened to my lectures not to confuse what they see in science fiction films with reality. Progress in smart software is evident. But the progress is very slow, hampered by the computational limits of today’s hardware and infrastructure. Just like real time, the concept is easy to say but quite expensive and difficult to implement in a meaningful way. There’s a reason millisecond access to trading data costs so much that only certain financial operations can afford the bill. Smart software is the same.

How about less outrage from those covering smart software and more critical thinking about what’s required to get a system to produce a useful output? In short, more info and less puffery, more critical thinking and less sawdust. Maybe I imagined it but both the Google and Tesla self driving vehicles have crashed, right? Humans are essential because smart software is not as smart as those who believe in unicorns assume. Demos, like TV game shows, require pre and post production, gentle reader.

What happens when humans are involved? Isn’t bias part of the territory?

Stephen E Arnold, May 16, 2016

Enterprise Search Revisionism: Can One Change What Happened

March 9, 2016

I read “The Search Continues: A History of Search’s Unsatisfactory Progress.” I noted some points which, in my opinion, underscore why enterprise search has been problematic and why the menagerie of experts and marketers have put search and retrieval on the path to enterprise irrelevance. The word that came to mind when I read the article was “revisionism” for the millennials among us.

The write up ignores the fact that enterprise search dates back to the early 1970s. One can argue that IBM’s Storage and Information Retrieval System (STAIRS) was the first significant enterprise search system. The point is that enterprise search as a productized service has a history of over promising and under delivering of more than 40 years.

image.pngEnterprise search with a touch of Stalinist revisionism.

Customers said they wanted to “find” information. What those individuals meant was have access to information that provided the relevant facts, documents, and data needed to deal with a problem.

Because providing on point information was and remains a very, very difficult problem, the vendors interpreted “find” to mean a list of indexed documents that contained the users’ search terms. But there was a problem. Users were not skilled in crafting queries which were essentially computer instructions between words the index actually contained.

After STAIRS came other systems, many other systems which have been documented reasonably well in Bourne and Bellardo-Hahn’s A History of Online information Services 1963-1976. (The period prior to 1970 describes for-fee research centric online systems. STAIRS was among the most well known early enterprise information retrieval system.)  I provided some history in the first three editions of the Enterprise Search Report, published from 2003 to 2007. I have continued to document enterprise search in the Xenky profiles and in this blog.

The history makes painful reading for those who invested in many search and retrieval companies and for the executives who experienced the crushing of their dreams and sometimes career under the buzz saw of reality.

In a nutshell, enterprise search vendors heard what prospects, workers overwhelmed with digital and print information, and unhappy users of those early systems were saying.

The disconnect was that enterprise search vendors parroted back marketing pitches that assured enterprise procurement teams of these functions:

  • Easy to use
  • “All” information instantly available
  • Answers to business questions
  • Faster decision making
  • Access to the organization’s knowledge.

The result was a steady stream of enterprise search product launches. Some of these were funded by US government money like Verity. Sure, the company struggled with the cost of infrastructure the Verity system required. The work arounds were okay as long as the infrastructure could keep pace with the new and changed word-centric documents. Toss in other types of digital information, make the system perform ever faster indexing, and keep the Verity system responding quickly was another kettle of fish.

Research oriented information retrieval experts looked at the Verity type system and concluded, “We can do more. We can use better algorithms. We can use smart software to eliminate some of the costs and indexing delays. We can [ fill in the blank ].

The cycle of describing what an enterprise search system could actually deliver was disconnected from the promises the vendors made. As one moves through the decades from 1973 to the present, the failures of search vendors made it clear that:

  1. Companies and government agencies would buy a system, discover it did not do the job users needed, and buy another system.
  2. New search vendors picked up the methods taught at Cornell, Stanford, and other search-centric research centers and wrap on additional functions like semantics. The core of most modern enterprise search systems is unchanged from what STAIRS implemented.
  3. Search vendors came like Convera, failed, and went away. Some hit revenue ceilings and sold to larger companies looking for a search utility. The acquisitions hit a high water mark with the sale of Autonomy (a 1990s system) to HP for $11 billion.

What about Oracle, as a representative outfit. Oracle database has included search as a core system function since the day Larry Ellison envisioned becoming a big dog in enterprise software. The search language was Oracle’s version of the structured query language. But people found that difficult to use. Oracle purchased Artificial Linguistics in order to make finding information more intuitive. Oracle continued to try to crack the find information problem through the acquisitions of Triple Hop, its in-house Secure Enterprise Search, and some other odds and ends until it bought in rapid succession InQuira (a company formed from the failure of two search vendors), RightNow (technology from a Dutch outfit RightNow acquired), and Endeca. Where is search at Oracle today? Essentially search is a utility and it is available in Oracle applications: customer support, ecommerce, and business intelligence. In short, search has shifted from the “solution” to a component used to get started with an application that allows the user to find the answer to business questions.

I mention the Oracle story because it illustrates the consistent pattern of companies which are actually trying to deliver information that the u9ser of a search system needs to answer a business or technical question.

I don’t want to highlight the inaccuracies of “The Search Continues.” Instead I want to point out the problem buzzwords create when trying to understand why search has consistently been a problem and why today’s most promising solutions may relegate search to a permanent role of necessary evil.

In the write up, the notion of answering questions, analytics, federation (that is, running a single query across multiple collections of content and file types), the cloud, and system performance are the conclusion of the write up.

Wrong.

The use of open source search systems means that good enough is the foundation of many modern systems. Palantir-type outfits, essential an enterprise search vendors describing themselves as “intelligence” providing systems,, uses open source technology in order to reduce costs, shift bug chasing to a community, The good enough core is wrapped with subsystems that deal with the pesky problems of video, audio, data streams from sensors or similar sources. Attivio, formed by professionals who worked at the infamous Fast Search & Transfer company, delivers active intelligence but uses open source to handle the STAIRS-type functions. These companies have figured out that open source search is a good foundation. Available resources can be invested in visualizations, generating reports instead of results lists, and graphical interfaces which involve the user in performing tasks smart software at this time cannot perform.

For a low cost enterprise search system, one can download Lucene, Solr, SphinxSearch, or any one of a number of open source systems. There are low cost (keep in mind that costs of search can be tricky to nail down) appliances from vendors like Maxxcat and Thunderstone. One can make do with the craziness of the search included with Microsoft SharePoint.

For a serious application, enterprises have many choices. Some of these are highly specialized like BAE NetReveal and Palantir Metropolitan. Others are more generic like the Elastic offering. Some are free like the Effective File Search system.

The point is that enterprise search is not what users wanted in the 1970s when IBM pitched the mainframe centric STAIRS system, in the 1980s when Verity pitched its system, in the 1990s when Excalibur (later Convera) sold its system, in the 2000s when Fast Search shifted from Web search to enterprise search and put the company on the road to improper financial behavior, and in the efflorescence of search sell offs (Dassault bought Exalead, IBM bought iPhrase and other search vendors), and Lexmark bought Brainware and ISYS Search Software.

Where are we today?

Users still want on point information. The solutions on offer today are application and use case centric, not the silly one-size-fits-all approach of the period from 2001 to 2011 when Autonomy sold to HP.

Open source search has helped create an opportunity for vendors to deliver information access in interesting ways. There are cloud solutions. There are open source solutions. There are small company solutions. There are more ways to find information than at any other time in the history of search as I know it.

Unfortunately, the same problems remain. These are:

  1. As the volume of digital information goes up, so does the cost of indexing and accessing the sources in the corpus
  2. Multimedia remains a significant challenge for which there is no particularly good solution
  3. Federation of content requires considerable investment in data grooming and normalizing
  4. Multi-lingual corpuses require humans to deal with certain synonyms and entity names
  5. Graphical interfaces still are stupid and need more intelligence behind the icons and links
  6. Visualizations have to be “accurate” because a bad decision can have significant real world consequences
  7. Intelligent systems are creeping forward but crazy Watson-like marketing raises expectations and exacerbates the credibility of enterprise search’s capabilities.

I am okay with history. I am not okay with analyses that ignore some very real and painful lessons. I sure would like some of the experts today to know a bit more about the facts behind the implosions of Convera, Delphis, Entopia, and many other companies.

I also would like investors in search start ups to know a bit more about the risks associated with search and content processing.

In short, for a history of search, one needs more than 900 words mixing up what happened with what is.

Stephen E Arnold, March 9, 2016

Google Search Appliance: Like Glass It Broke

February 8, 2016

I read “So Long Google Search Appliance.” Farewell, happy yellow and blue boxes. So long integrators who have been supporting these wildebeests for a decade. Au revoir easy-as-pie search.

According to the write up:

The tech giant told its reseller and consulting partners the news via email on Thursday, noting that they can continue to sell one-year license renewals for existing hardware customers through 2017, but that they will be unable to sell new hardware. Renewals will end in 2018.

I recall writing about the Google Search Appliance when I was reporting about enterprise search for specialist publications. I was the first or one of the first to run down the pricing for the wonky boxes. I pointed out that a redundant multi million document system would ring the Google cash register in the high six figures with seven figures not out of sight. I thought I mentioned that the number of engineeers supporting the GSA had dwindled to a couple of folks. I thought I pointed out that the assumption a Web search system would work like a champ on corporate content was a wild and crazy notion.l

Like so many others who assumed enterprise search was not a tough problem, the Alphabet Google thing has bailed. Google essentially failed to revolutionize enterprise search. Cheaper and more usable appliances are available, including products from Maxxcat and Thunderstone. There are reasonable cloud solutions. And there is a cornucopia of outfits offering repackaged open source systems. Heck, if one pokes around long enough, a bold enterprise can license a system from companies with proprietary information access systems; 3RDi, Fabasoft, Lexmark, etc.

What will organizations do without the Google Search Appliance? Yard sale, Goodwill?

Stephen E Arnold, February 8, 2016

Weekly Watson: The Internet of Things

December 17, 2015

Yep, there is not a buzzword, trend, or wave which IBM’s public relations professionals ignore. I read “IBM Is Bringing Its Watson Supercomputer to IoT.” The headline puzzled me. I thought that Watson was:

  • Open source software like Lucene
  • Home brew scripts
  • Acquired technology.

The hardware part is moving to the cloud. IBM is reveling in a US government supercomputing contract which may involve quantum computing.

But Watson runs on hardware. If Watson is a supercomputer, I see some parallels with the Google and Maxxcat search appliances.

The write up reports:

IBM has announced today it is bringing the power of its Watson supercomputer to the Internet of Things, in a bid to extend the power of cognitive computing to the billions of connected devices, sensors and systems that comprise the IoT.

Will the Watson Internet of Things be located in Manhattan? Nope. I learned:

the company announced that the new initiative, the Watson Internet of Things, will be headquartered in Munich, Germany. The facility will serve as the first European Watson innovation super centre, built to drive collaboration between IBM experts and clients. This will be complemented by eight Watson IoT Client Experience Centers spread across Europe, Asia and the Americas.

Why Germany? IBM has a partner, Siemens.

Will the IoT venture use the shared desk approach. According to EndicottAllilance.org Comment 12/10/15, this approach to work has some consequences:

I wouldn’t get too excited about the new “Agile Workspace” in RTP. Basically it is management forcing workers back to the office and into a tense, continuously monitored environment with no privacy. It will be loud, you’ll have no space of your own, and it will be difficult to think. Mood marbles? Better be sure you always choose the light-colored ones! And make sure your discussion card is always flipped to the green side. What humiliation! The environment will be great for loud-mouthed managers, terrible for workers who do all the work. Worse than cubicles.

From cookbooks to cancer, IBM Watson seems to be where the buzzwords are. I wonder if the Watson revenues will reverse the revenue downturns IBM has experienced for 14 consecutive quarters.

Stephen E Arnold, December 17, 2015

Enterprise Search: Search No Longer Big Enough

September 22, 2015

I read the news on LinkedIn. (Registration may be required, gentle reader.) A post by a forum moderator raised the question, “Should be expand enterprise search?” There are other signs of trouble in search land as well. The Paper.Li enterprise search curated newsletter is about Big Data, analytics, education, and—almost as an afterthought—enterprise search in the form of endlessly recycled references to mid tier consulting reports based on what are in my opinion subjective criteria.

Is the implosion of enterprise search complete? Has the shockwave of the Fast Search financial charade caught up with today’s vendors? Is the shadow of the billion dollar bust that was HP’s acquisition of Autonomy/Verity been the straw which broke the camel’s back? Was it the mid tier consulting firm’s enterprise search report which ignored the major player in open source information access? Was it the constant repositioning, faux news releases, and posturing on webinars the karate chop across the throat of search marketers?

I don’t know.

From my point of view, there are high value solutions to the challenge of providing employees with access to certain types of content. One can use the appliance approach of Maxxcat? There is Elasticsearch? The Blossom Software solution is pretty darned good. Specialist solutions are available for parts. There are even semi automated systems to help a user make sense of the noise filled streams of social media content. Think Recorded Future.

Gentle reader, starting in 2003 when I began work on the Enterprise Search Report, sponsored by, of all things, a content management specialist, there were some brand leaders. But these have fallen into disgrace, been absorbed into larger firms with little incentive to invest in research, or crashed and burned as a result of failed implementations.

What remains today are some grim facts:

  • Search is perceived by many information technology professionals as a problem. Enterprise search implementations are often doomed from the git go because few want to hook their careers to projects which have for decades failed to keep users happy and been unable to provide useful results without constant infusions of money, computing cycles, and whiz kids.
  • Open source solutions are available, and they are pretty good. Large companies have the time, staff, resources, and incentive to get away from the proprietary solutions which limit what the licensee can do with the system.
  • Search is an inclusion in the most advanced systems. Consider Recorded Future, Diffeo, or any other cyber centric, next generation system. System is available, but these systems solve specific problems. Search is sort of an apple pie, mother, and love type solution. These generalizations are tough to apply in a business like manner in organizations struggling to pay their bills. Most organizations just use what’s available? Even AutoCAD includes a search function. Oracle, bless its proprietary heart, provides a database licensee with a good enough solution. For those wanting a more robust solution, the Secure Enterprise Search system is available without charge. Yikes.

In my own experience, the sins of the earliest enterprise search vendors like Fulcrum Technologies and Verity have bulldozed a highway built on quicksand. Today’s vendors talk about search in terms of buzzwords like these:

  • Customer support. The idea is a variation of ClearForest’s pitch that one can find answers to customer issues by indexing text.
  • Big Data. I am confident that when I look for information in a Big Data set, I want to use search as a secondary tool. Enterprise search vendors offer analytic routines as add ons or as spin on counting terms which have been extracted.
  • Taxonomy. I love this concept. A company needs to index its content. Nothing improves search, which has not been improved too much in the last 50 years, like machine indexing. Just don’t pin down the vendor on the amount of human intervention that is required to keep the automated system on track.
  • Natural language processing, semantics, and artificial intelligence. The idea is that a search system with smart software can figur4e out what a human generated document means and make it  available to a user easily or, in some cases, BEFORE the user knows she needs access to the information in that document.

There are three problems which vendors and their customers have to wrestle into submission.

First, vendor and customer have to agree on exactly what the information access system is supposed to do. In my experience, this is an important step which is usually given modest, if any, attention. The reason is that instead of narrowing the focus to a specific problem, the problem gets defined in ever widening circles of functionality. The result is cost overruns and disaffected users.

Second, the vendors’ marketing argues that certain functions and benefits are a consequence of installing their software. The flaw is that marketing is easy; implementing a search system which the customer can afford to maintain is very hard. Add to this disconnect the characteristic of some vendors to sell software which is half baked, or, in some cases, not even completed. A certain vendor was kicked off a government procurement list for getting caught with software that did not work.

Third, the customers know that finding information is important. Most enterprise search vendors cannot provide access to the type of content which is growing rapidly and gaining importance with each passing day. I am talking about indexing audio, video, social media generated by employees and contractors, and digital images. The focus has been for a half century on text. That does not work particularly well if one does not select a solution from a handful of vendors with solutions that actually work. Need I repeat Blossom, Elastic, and Maxxcat?

What about today’s flagship vendors? If one embraces the analysis of the mid tier consulting firms, the solutions are ones that are proprietary and have some profile and money due to the ministrations of addled venture capital players looking for the next Google.

There are solutions. Until the LinkedIn pool of job hunters and consultants comes to grip with software that works in a reliable manner, it is unlikely that the enterprise search discussion on LinkedIn will rise above thinly veiled marketing.

Search, gentle reader, is important. There are solutions which work. The problem is that in today’s go go world, those with a veneer of knowledge and expertise are guided by individuals who may be failed webmasters, unemployed journalists, English majors, and self appointed experts.

I have no solution to the crisis in enterprise search. Google muffed the bunny. Microsoft has its Powerset and Fast Search technologies. IBM offers Watson.

Maybe these solutions will work for you. They won’t work for me. Search experts, crisis time. My vantage point is from rural Kentucky. The experts in Manhattan and San Francisco have a much better view. What they see, however, is quite different from what I observe. Just make search bigger. The problems will just fade away, right? Grass is easy to grow in scorched earth, correct?

Stephen E Arnold, September 22, 2015

Suggestions for Developers to Improve Functionality for Search

September 2, 2015

The article on SiteCrafting titled Maxxcat Pro Tips lays out some guidelines for improved functionality when it comes deep search. Limiting your Crawls is the first suggestion. Since all links are not created equally, it is wise to avoid runaway crawls on links where there will always be a “Next” button. The article suggests hand-selecting the links you want to use. The second tip is Specify Your Snippets. The article explains,

“When MaxxCAT returns search results, each result comes with four pieces of information: url, title, meta, and snippet (a preview of some of the text found at the link). By default, MaxxCAT formulates a snippet by parsing the document, extracting content, and assembling a snippet out of that content. This works well for binary documents… but for webpages you wanted to trim out the content that is repeated on every page (e.g. navigation…) so search results are as accurate as possible.”

The third suggestion is to Implement Meta-Tag Filtering. Each suggestion is followed up with step-by-step instructions. These handy tips come from a partnering between Sitecrafting is a web design company founded in 1995 by Brian Forth. Maxxcat is a company acknowledged for its achievements in high performance search since 2007.

Chelsea Kerwin, September 2, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

More Enterprise Search Revisionism: Omitted Companies Are the Major News

August 24, 2015

A flurry of news items hit my Overflight system in the last couple of days. Gartner, one of the expert for hire mid tier consulting firms, issued a “Gartner’s Magic Quadrant for Enterprise Search.” I am not sure if you can access the report. I had to log in to LinkedIn and work through various screens until this gem presented itself to me.

image

I followed the link and learned that the “Magic Quadrant for 2015” includes these firms:

The Challengers. To me a challenger means a person or thing that engages in any contest, as of skill, strength, etc.

  • LucidWorks, founded in 2007
  • Mindbreeze, a unit of Microsoft centric Fabasoft in Austria. The search unit fired up a decade ago
  • Google, ah, dear old Google and its pricey Google Search Appliances. You can find the license fees for some devices via the GSAAdvantage service. Google has been sort of selling GSAs for a decade.
  • Dassault Systems, yep, the French engineering outfit working to convert Exalead’s ageing technology into a product component solution. Exalead dates from 2000. Yikes, that makes the technology 15 years old, an aeon in technology time.

The good news is that LucidWorks has its roots in open source. The other three outfits are proprietary technology.

The second group is Niche Players. The companies in this sector are:

  • Expert System. An outfit which opened its doors in 1992 and whose stock is publicly traded. The share price on August 23, 2015 was $2.13 a share
  • Recommind, founded in year 2000, is a legal system whose technical approach often reminds me of Autonomy’s systems and methods. The firm was founded in 2000 and now, according to this story, has $70 million in revenue
  • Squiz, which is, by golly, not an open source solution despite its origins in the 2001 P@noptic academic/research setting in Australia. Just try searching for that spelling “P@noptic.”

The third group is Visionaries which to me means “given to or characterized by fanciful, not presently workable, or unpractical ideas, views, or schemes.” The dictionary entry here also points out these clarifications: unreal, imaginary, idealistic, impractical, and unrealizable. Here are the search outfits in this category:

  • BA Insight. This is an company founded in 2004. The founder raised some venture money and then found himself looking for his future elsewhere. In the presentations I have heard, BA Insight is [a] an enterprise search system replacement for whatever you have running, [b] a business intelligence system, [c] a metatagging machine, [d] some combination of these functions.,
  • IBM. Ah, dear, old IBM. The company does the home grown thing with scripts and algorithms from its research labs. IBM was founded in It does the open source thing by building in 1911. The company has had a long time to figure out what to do since the STAIRS III and Web Fountain days. Now IBM search means use of open source, community supported, free Lucene. Plus, It does the acquisition thing with SPSS Clementine (remember than, gentle reader), Vivisimo, i2, and Cybertap, among other information access companies IBM has purchased. At the end of the day, I am not sure what search means because IBM has been promoting the heck out of Watson. You remember Watson. It was a TV game show winner. Watson wrote a cook book. Watson is curing cancer. Watson is doing all sorts of wonderful things. I suppose that’s why it is a visionary with 13 consecutive quarters of revenue decline.
  • IHS (Information Handling Service. IHS leverages technology from The Invention Machine (founded in 1992) an acquisition built to locate systems and methods from patent documents. The IHS search system is called Goldfire and positioned as an enterprise search system. IHS, according to Attivio, licenses the Fast Search & Transfer influenced UIA technology platform. IHS for me is a publishing company, but I suppose that doesn’t matter in today’s fluid world.

The final group of search vendors is labeled leaders. So what’s a leader? According to my online dictionary, a leader is a person or thing that leads. And “lead” means to go before or with to show the way; conduct or escort. No, I will not refer to Ashley Madison, gentle reader. I will play this straight. The leaders are:

  • Attivio, founded in 2007. It must be a leader because a “visionary” uses the Attivio technology to be a visionary. Is that self referential like articles about Google’s right to be forgotten which must be forgotten?
  • Coveo, founded in 2004. This company has been, like Attivio, successful in attracting venture capital.The company once focused on Microsoft Windows as did BA Insight. Now the firm is into customer support but the mid tier consultants remember the good old days of enterprise search.
  • Hewlett Packard. Ah, HP, the company wrote a check for $11 billion in 2011, promptly wrote off billions, and embarked on a much loved legal challenge to Dr. Michael Lynch and some other favored individuals. HP, like IBM, has been racking up declining revenues for five consecutive quarters and is in the process of dividing itself into two separate companies. Does this suggest that HP some challenges? Keep in mind Autonomy was founded in the mid 1990s.
  • Lexmark. This is a relative newcomer to enterprise search. The company bought Brainware of trigram fame. Lexmark bought the 1980s search darling ISYS Search Software, which was founded in 1988. The company also snagged Kofax, which got into the content processing game with its acquisition of Kapow. I did hear that Lexmark is looking at some shortfalls related to search and content processing. I reported on the chopping of 500 jobs a couple of months ago. But leaders must expect some setbacks like Hewlett Packard. Perhaps Lexmark will reveal the shortfall from its “search related” endeavors. I would peg the number somewhere in the $75 to $80 million range in the last 18 months.
  • Sinequa. This marketing centric, social media maven was founded in 2002. The company has some big European clients, but I am not certain that the push into the US has met with the “name in lights” success some French stakeholders expected. Sinequa is obviously a leader in search. I classify the company as a business process outfit, but the mid tier consultants are more informed than an old guy in rural Kentucky.

My view of the enterprise search sector is different.The companies in this list are oldies, a couple dating from the late 1980s and early 1990s. Let’s see. In Internet time, that pegs some technology as prehistoric.,

There is a notable omission too. The list of companies identified by the mid tier outfit has missed the company which has been driving a bulldozer through deals.

What company is that?

Elastic, gentle reader. This outfit is in the process of providing the folks at Goldman Sachs with some information access love. The company has shoved aside the Lucid Works outfit which is scrambling to reposition itself as a Big Data spark something. There are cloud versions of Elastic available for a darned reasonable price. Check out SearchBlox, for example. Keep in mind that Elasticsearch was a second act to Compass, another search system.

A question which I asked myself is, “Why has a mid tier outfit which is so darned expert in enterprise search overlooked the big dog?” Frankly I have no evidence other than the odd little grid in the Linked In post. I assume that the experts at the mid tier firms don’t know much about what’s happening in search. Another thought is that the Elastic folks don’t buy much third party expert input about search. Whatever the reason, I suggest you, gentle reader, become familiar with Elasticsearch in the free or for fee variant.

Another gap I noticed is the omission of the appliance folks. Right off the bat, I think Index Engines, Maxxcat, and Thunderstone deserve a tiny footnote. Maxxcat, for example, is pretty good in the enterprise content indexing arena. Buy a box and plug it in. Index Engines does a great job making some specialized content instantly accessible. And Thunderstone? Well, the company has some darned good technology.

A third lacuna is the omission of the wild and crazy, Fast Search & Transfer tinged SharePoint search. There are upwards of 150 million SharePoint installations. Like it or not, Microsoft also shoves search down my throat each time I use Windows 10. Yikes. The system may have a legacy of considerable interest, but the darned thing is out there. Maybe a teeny tiny footnote? I would suggest that the mid tier outfit identify the vendor which sells more search into Microsoft installations than any other vendor. Nope. I won’t identify this outfit. The president agreed to a Search Wizards Speak interview and then backed out. Too bad for him. No life preserver from me again.

What’s the value of this league table or grid thing from the mid tier consulting firm.

First, it allows the companies in the list to issue a news release. I have already seen references to some of the companies. This post was inspired by the junk mail Linked In shoots at me on a regular basis. There’s nothing like PR which gets a company’s name in front of a bunch of red hot prospects.

Second, the mid tier consulting firm can visit with each company. I can imagine that on those visits, the mid tier consulting firm might just mention the firm’s strategic and tactical for fee services. Hey, if I worked for a mid tier consulting firm, I would be sure to explain why retaining me was the best darned thing since sliced bread. Oh, wait. I worked at Booz, Allen & Hamilton before it drifted into Snowden drifts. I responded to requests; I don’t recall making sales calls. Life is different now I suppose.

Third, the mid tier reports practically force me to write blog posts. I am delighted to be spurred into action.

Fourth, how much does it cost to use these systems? Why not make a table which presents the name of the company, the search system name so that I know what IBM asserts actually performs enterprise search and what HP calls its cloud stuff with Autonomy made ever so easy? Why not states that such and such a search system begins at $X for the license fee and $Y for the on going support, upgrades, and maintenance? Why not present average hourly engineering and technical service fees? Hey, even the best of this animal shelter of disparate systems fail. Did I say crash? Did I say flame out? Did I say deliver irrelevant results? Well, often in my experience.

To wrap up, the Visionaries, the Challengers, the Leaders, and the Niche Players can output news releases. Some my try to dismiss my observations, which is just peachy keen with me. I assume that failed webmasters, thwarted academicians, and unemployed home economics majors will explain that the best of the best appear in the league table.

Present reality any way one wants. I don’t have to make this stuff work anymore. I don’t have to explain to the CFO why the costs associated with enterprise search will continue to go up until the system is removed from the company. I will no longer have to attend a conference filled with cheerleaders for a utilitarian technology which most companies have learned is pretty much the same as it has been since the days of Fulcrum and Verity.

Remember. This is 2015. Most of the technology presented in the mid tier report is getting old. The world wants mobile. The world wants predictive outputs. The world wants search which actually delivers relevant results.

Maybe that is secondary today?

Will I read the complete report if a copy becomes available to me?

Nah. Marketing stuff bores me.

Stephen E Arnold, August 24, 2015

Enterprise Search: You Cannot Do It Yourself, People.

July 31, 2015

I love write ups like “Don’t Settle When It Comes to Enterprise Search Platforms.” These articles are designed to make consulting firms with the marketing flim flam which positions each as an “expert” in enterprise information access. I would not be surprised to find copies of this article in the peddler kit of search sales professionals.

The main point of the write up is that enterprise search is a “platform.” Because there are options, no self respecting company will try to implement search without the equivalent of the F Troop in mid tier or below consultants.

I noted:

Let’s look at two very common workarounds some have tried, and then we will talk about why you must go with a reputable developer when you make your final decision.

When I read this, I wondered if the “expert” were familiar with the Maxxcat line of enterprise search systems or the Blossom hosted solution.

The write up dismisses an open source solution apparently unaware of research by Diomidis Spinellis and Vaggelis Giannikas work published in Journal of Systems and Software, March 2012, pages 666 to 682. That’s okay. My hunch is that those finding the “Don’t Settle” article compelling are not likely to be interested in researchy type stuff.

One of the more interesting segments in the write up is the assertion that scalability is a “given.” Hmmm. In my experience, there are some on going enterprise search challenges: Scalability is one facet of a nest of vipers which includes my favorite reptile indexing latency.

The article states:

Open source platforms are only as scalable as their code allows, so if the person who first made it didn’t have your company’s needs in mind, you’ll be in trouble. Even if they did, you could run into a problem where you find out that scaling up actually reveals some issues you hadn’t encountered before. This is the exact kind of event you want to avoid at all costs.

I don’t want to rain on this parade of “information,” but every enterprise search system which I have had the pleasure of procuring, managing, investigating, and analyzing has scalability problems.

The reason is simple: The volume of changed information and the flow of new information goes up. Whatever one starts with is rather rapidly choked. The solutions are painful: Spend more or index less.

I am not confident that one who follows the advice of certain experts will find his or her enterprise search journey pleasant. On the other hand, there are opportunities as Uber drivers one can pursue.

Stephen E Arnold, July 31, 2015

Short Honk: Open Semantic Search Appliance

July 17, 2015

Several people have asked me about Open Semantic Search. I sent a couple of emails to the professional identified on the DNS record as the contact point. No response yet from our inquiry emails, but this is not unusual. People are so darned busy today.

The Open Semantic Search organization is offering an open semantic search appliance. The appliance is not a box like the much loved Google Search Appliance or the Maxxcat solutions. The appliance is virtual.

The explanation of the  data enriching system is located at this link. The resources required are modest and based on the information I scanned, the open semantic search appliance is a solution to many information access woes.

I will be able to search, explore, and analyze. Give the system a whirl. We will add it to our list of tasks. We assume it will present the same exciting challenges as other Lucene/Solr solutions. The addition of semantics will add a new wrinkle or two.

If you are into semantics and open source, the system may be for you.

Stephen E Arnold, July 17, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta