MaxxCAT Stresses Speed and Openness

November 17, 2010

MaxxCAT has released a new version of their enterprise search solution that allows developers and programmers to customize the platform for their own needs. The press release “MaxxCAT, the World’s Fastest Search Appliance, Now the Most Open Too” details the reasons why MaxxCAT took this step. According to the write up, “The MaxxCAT Lynx Connector framework is an open specification with dedicated kernel support on the appliance that will allow developers, integrators and connector foundries to tap into MaxxCAT’s extreme performance search hardware.”

Users have apparently been requesting this feature and MaxxCAT has launched this product to meet their needs. However, we are concerned at the proliferation of search appliances on the market. Developers have to babysit these appliances and constantly monitor them, taking precious time away from other vital activities. In addition, these appliances are expensive and it’s tough to keep up with the technology that each appliance uses.

We are cautious about hardware or software that is billed as the “fastest.” We are also careful about “open”. Oracle and Apple seem to be nibbling away at certain open software and some of developers’ assumptions. You can get more information about MaxxCAT at

Laura Amos, November 17, 2010


Exclusive Interview with MaxxCat

April 15, 2009

I spoke with Jim Jackson on April 14, 2009. Maxxcat is a search and content processing vendor delivering appliance solutions. The full text of the interview appears below:

Why another appliance to address a search and content processing problem?

At MaxxCat, we believe that from the performance and cost perspectives, appliance based computing provides the best overall value. The GSA and Google Mini are the market leaders, but provide only moderate performance at an expensive price point.  We believe that by continuously obsessing about performance in the three major dimensions of search (volume of data, speed of retrieval, and crawl/indexing times), our appliances will continue to improve.  Software only solutions can not match the performance of our appliances.  Nor can software only, or general purpose hardware approaches provide the scaling, high availability or ease of use of a gray-box appliance.  From an overall cost perspective, even free software such as Lucene, may end up being more expensive than our drop-in and use appliance.


Jim Jackson, Maxxcat

A second factor that is growing more important is the ease of integration of the appliance.  Many of our customers have found unique and unexpected uses for our appliances that would have been very difficult to implement with black box architectures like Google’s.  Our entry level appliance can be set up in 3 minutes, comes with a quick start guide that is only 12 pages long, and can be administered from two simple browser pages. That’s it!  Conversely, software such as Lucene has to be downloaded, configured, installed, understood, matched with suitable hardware. This is typically followed by a steep learning curve and consulting fees from experts who are involved in getting a working solution, which sometimes doesn’t work, or won’t scale.

But just because the appliance features easy integration, this does not mean that complex tasks cannot be accomplished with it.  To aid our customers in integrating our appliances with their computing environments, we expose most of the features of the appliance to a web API.  The appliance can be started, stopped, backed up, queried, pointed at content, SNMP monitored, and reported upon by external applications.   This greatly eases the burden on developers who wish to customize the output, crawl behavior and integration points with our appliance.  Of course this level of customization is available with open source software solutions, but at what price?  And most other hardware appliances do not expose the hardware and operating system to manipulation.

Throughput becomes an issue eventually. What are the scaling options you offer

Throughput is our major concern. Even our entry level appliance offers impressive performance using, for the most part, general purpose hardware.  We have developed a micro-kernel architecture that scales from our SB-250 appliance all the way through our 6 enterprise models.  Our clustering technology has been built to deliver performance over a wide range of the three dimensions that I mentioned before.  Some customers have huge volumes of data that are updated and queried relatively infrequently.  Our EX-5700 appliance runs the MaxxCAT kernel in a horizontal, front-facing cluster mode sitting on top of our proprietary SAN; storage heavy, adequate performance for retrieval.  Other customers may have very high search volumes on relatively smaller data sets (< 1 Exabyte). In this case, the MaxxCAT kernel runs the nodes in a stacked cluster for maximum parallelism of retrieval.  Same operating system, same search hardware, same query language, same configuration files etc, but two very different applications. Both heavy usage cases, but heavy in different dimensions.  So I guess the point I am trying to make is that you can say a system scales, but does it scale well in all dimensions, or can you just throw storage on it?  The MaxxCAT is the only appliance that we know of that offers multiple clustering paradigms from a single kernel.  And by the way, with the simple flick of a switch on one of the two administration screens I mentioned before, the clusters can be converted to H/A, with symmetric load balancing, automatic fault detection, recovery and fail over.

Where the the idea for the MaxxCat solution originate?

Maxxcat was inspired by the growing frustration with the intrinsic limitations of the GSA and Google Mini.  We were hearing lamentations in the market place with respect to pricing, licensing, uptime, performance and integration.  So…we seized the opportunity to build a very fast, inexpensive enterprise search capability that was much more open, and easier to integrate using the latest web technologies and general purpose hardware.  Originally, we had conceived it as a single stand alone appliance, but as we moved from alpha development to beta we realized that our core search kernel and algorithms would scale to much more complex computing topologies. This is why we began work on the clustering, H/A and SAN interfaces that have resulted in the EX-5000 series of appliances.

What’s a use case for your system?

I am going to answer your question twice, for the same price.  One of our customers had an application in which they had to continuously scan literally hundreds of millions of documents for certain phrases as part of a service that they were providing to their customers, and marry that data with a structured database.  The solution they had in place before working with us was a cobbled together mish mash of SQL databases, expensive server platforms and proprietary software.  They were using MS SQLServer to do full text searching, which is a performance disaster. They had queries that were running on very high end Dell quad core servers maxed out with memory that were taking 22 hours to process.  Our entry level enterprise appliance is now doing those same queries in under 10 minutes, but the excitement doesn’t stop there.  Because our architecture is so open, they were able to structure the output of the MaxxCAT into SQL statements that were fed back into their application and eliminate 6 pieces of hardware and two databases.  And now, for the free, second answer.  We are working with a consortium of publishers who all have very large volumes of data, but in widely varying formats, locations and platforms.  By using a MaxxCAT cluster, we are able to provide these customers, not customers from different divisions of the same company, but different companies, with unified access to their pooled data.  So the benefits in both of these cases is performance, economy, time to market, and ease of implementation.

Where did the name “MaxxCat” come from?

There are three (at least) versions of the story, and I do not feel empowered to arbitrate between the factions.  The acronym comes from Content Addressable Technology, an old CS/EE term. Most computer memories work by presenting the memory with an address, and the memory retrieves the content.  Our system works in reverse, the system is presented with content, and the addresses are found. A rival group, consisting primarily of Guitar Hero players,  claims that the name evokes a double x fast version of the Unix ‘cat’ command (wouldn’t MaxxGrep have been more appropriate?).  And the final faction, consisting primarily of our low level programmers claim that the name came from a very fast female cat, named Max who sometimes shows up at our offices.  I will make as many friends as enemies if I were to reveal my own leanings.  Meow.

What’s the product line up today?

Our entry level appliance is the SB-250, and starts at a price point of $1,995.  It can handle up to several million web pages or documents, depending upon size.  None of our appliances have artificial license restrictions based upon silly things like document counts.  We then have 6 models of our EX-5000 enterprise appliances that are configured in ever increasing numbers of nodes, storage, and throughput.  We really try to understand a customer’s application before making a recommendation, and prefer to do proofs of concept with the customer’s actual data, because, as any good search practitioner can tell you, the devil is in the data.

8. What is the technical approach of your search and content processing system?

We are most concerned with performance, scalability and ease of use.  First of all, we try to keep things as simple as possible, and if complexity is necessary, we try to bury it in the appliance, rather than making the customer deal with it.  A note on performance; our approach has been to start with general purpose hardware and a basic Linux configuration.  We then threw out most of Linux, and built our operating system that attempts to take advantage of every small detail we know about search.  A general purpose Linux machine has been designed to run databases, run graphics applications, handle network routing, sharing and interface to a wide range of devices and so forth.  It is sort of good at all of them, but not built from the ground up for any one of them.  This fact is part of the beauty of building a hardware appliance dedicated to one function — we can throw out most of the operating system that does things like network routing, process scheduling, user accounting and so forth, and make the hardware scream through only the things that are pertinent to search.  We are also obsessive about what may seem to be picayune details to most other software developers.  We have meetings where each line of code is reviewed and developers are berated for using one more byte or one more cycle than necessary.   If you watch the picoseconds, the microseconds will take care of themselves.

A lot of our development methodology would be anathema to other software firms.  We could not care less about portability or platform independence.  Object oriented is a wonderful idea, unless it costs one extra byte or cycle.  We literally have search algorithms that are so obscure, they take advantage of the Endianess of the platform.  When we want to do something fast, we go back to Knuth, Salton and Hartmanis, rather than reading about the latest greatest on the net.  We are very focused on keeping things small, fast, and tight.  If we have a choice between adding a feature or taking one out, it is nearly unanimous to take it out.  We are all infected with the joy of making code fast and small.  You might ask, “Isn’t that what optimizing compilers do”.  You would be laughed out of our building.  Optimizing compilers are not aware of the meta algorithms, the operating system threading, the file system structure and the like.  We consider an assembler a high level programming tool, sort of.  Unlike Microsoft Operating systems which keep getting bigger and slower, we are on a quest to make ours smaller, faster.  We are not satisfied yet, and maybe we won’t ever get there.  Hardware is changing really fast too, so the opportunities continue.

How has the Google Search Appliance affected the market for your firm’s appliance?

I think that the marketing and demand generation done by Google for the GSA is helping to create demand and awareness for enterprise search, which helps us.  Usually, especially on the higher end of the spectrum, people who are considering a GSA will shop a little, or when they come back with the price tag, their boss will tell them “What??? Shop This!”. They are very happy when they find out about us.  What we share with Google is a belief in box based search (they advocate a totally closed black box, we have a gray box philosophy where we hide what you don’t need to know about, but expose what you do).  Both of our companies have realized the benefits of dedicating hardware to a special task using low cost, mass produced components to build a platform.   Google offers massive brand awareness and a giant company (dare I say bureaucracy).  We offer our customers a higher performing, lower cost, extensible platform that makes it very easy to do things that are very difficult with the Google Mini or GSA.

What hooks / services does your API offer?

Every function that is available from the browser based user interface is exported through the API. In fact, our front end runs on top of the API, so customers who are so inclined to do so could rewrite or re-organize the management console.  Using the API, detailed machine status can be obtained. Things such as core temperature, queries per minute, available disk space, current crawl stats, errors and console logs are all at the user’s fingertips.  Furthermore, collections can be added, dropped, scheduled and downloaded through the API.  Our configuration and query languages are simple, text based protocols, and users can use text editors or software to generate and manipulate the control structures.  Don’t like how fast the MaxxCAT is crawling your intranet, or when?  Control it with external scheduling software.  We don’t want to build that and make you learn how to use it.  Use Unix cron for that if that’s what you like and are used to.  For security reasons, do you want to suspend query processing during non-business hours?  No problem.  Do it from a browser or do it from a mainframe.

We also offer a number of protocol connectors to talk to external systems — HTTP, HTTPS, NFS, FTP, ODBC.  And we can import the most common document formats, and provide a mechanism for customers to integrate additional format connectors.  We have licensed a very slick technology for indexing ODBC databases. A template can be created to create pages from the database and the template can be included in the MaxxCAT control file.  When it is time to update say, the invoice collection, the MaxxCAT can talk directly to the legacy system and pull the required records (or those that have changed or any other SQL selectable parameters), and format them as actual documents prior to indexing.  This takes a lot of work off of the integration team.  Databases are traditionally tricky to index, but we really like this solution.

With respect to customizing output, we emit a standard JSON object that contains the result and provide a simple templating language to format those results.  If users want to integrate the results with SSIs or external systems, it is very straightforward to pass this data around, and to manipulate it.  This is one area where we excel against Google, which only provides a very clunky XML output format that is server based, and hard to work with.  Our appliance can literally become a sub-routine in somebody else’s system.

What are new features and functions added since the last point release of your product?

Our 3.2 OS (not yet released) will provide improved indexing performance, a handful of new API methods, and most exciting for us, a template based ODBC extractor that should make pulling data out of SQL databases a breeze for our customers.   We also have scheduled toggle-switch H/A, but that may take a little more time to make it completely transparent to the users.

13. Consolidation and vendors going out of business like SurfRay seem to be a feature of the search sector. How will these business conditions affect your company?

Another strange thing about MaxxCAT, in addition to our iconoclastic development methods is our capital structure.  Unlike most technology companies, especially young ones, we live off of revenue, not equity infusions.  And we carry no debt.   So we are somewhat insulated from the current downturn in the capital markets, and intend to survive on customers, not investors.  Our major focus is to make our appliances better and faster.  Although we like to be involved in the evaluation process with our customers, in all but the most difficult of cases, we prefer to hand off the implementation to partners who are familiar with our capabilities and who can bring in-depth enterprise search know how into the mix.

Where do I go to get more information?

Visit or email

Stephen Arnold, April 15, 2009

Maxxcat: Search Appliance Challenger

January 30, 2009

Maxxcat Corp., has released an enterprise search appliance to compete with Google’s. The Maxxcat XB-250 was designed with simplicity and speed in mind, and it offers all sorts of bells and whistles such as clustering and mirroring, customizable rankings, scripts and real time edits, field-based indexing, and remote support for diagnostics. They’ve posted a fact sheet at comparing the XB-250 and the GSA Mini (info available at, and the more complex MaxxCat EX-5000 versus the Google™ GB-1001. For more specific details, check out Maxxcat says its appliance is 16 times faster, takes three minutes to install, and is only a quarter of the cost for its basic model. You can also take the two systems for a head-to-head performance test at Added to our watch list. More later.

Jessica W. Bratcher, January 30, 2009

Another Crazy Enterprise Search Report

October 18, 2020

“Enterprise Search Market Investment Analysis | Dassault Systemes, Oracle, HP Autonomy, Expert System Inc.” may be a knock out report, but its presentation of the company’s nuanced understanding is like hitting an egg with a feather. The effort appears to be there, but the result is an intact egg.

You can learn about this omelet of a report at this link. The publisher is PRnewsleader, which seems to be one of the off brand SEO centric content outputters.

The first thing I noticed about this report was the list of vendors in the document; to wit:

Coveo Corp.

Dassault Systèmes

Esker Software

Expert System

HP Autonomy

IBM Corp.





Perceptive Software

Polyspot and Sinequa


What jumped out at me was the inclusion of Polyspot and Sinequa. Polyspot was acquired years ago by an outfit called oppScience. The company offers Bee4Sense and list information retrieval as a solution. As far as I know, oppScience is a company based in Paris, not on a street once known for fish sales. Sinequa is a separate company. True, it once positioned itself as an enterprise search developer. That core capability has been wrapped in buzzwordery; for example, “insight platform.” Therefore, listing two companies incorrectly as one illustrates a minor slip up.

I also noticed the inclusion of Esker Software. This company is a process automation outfit, and it says that it has an artificial intelligence capability. (Doesn’t every company today?) Esker is into the cloud, and its search technology is a bullet point, not the white paper/journal article/rah rah approach used by Lucidworks.

And what about Elasticsearch? What about Algolia (former Dassault Exalead DNA I heard)? What about Voyager Search? What about Maxxcat? And there are other vendors.

What’s amusing is that the authors of this report are able to set forth:

forecasts for Enterprise Search investments till 2029.

Okay, that’s almost a decade in the Era of the Rona. I am not sure what’s going on tomorrow. Predicting search in 2029 is Snow Crash territory. But I am confident the authors of this report are intrepid researchers who just happened to overlook the Polyspot Sinequa mistake. What else has been overlooked?

Stephen E Arnold, October 18, 2020

Enterprise Search: Not Exactly Crazy but Close

April 13, 2020

I think I started writing the first of three editions of the Enterprise Search Report in 2003. I had been through the Great Search Procurement competition for the US government’s search system. The original name for the service was (the idea was that the service was the “first” place to look for public facing government information. The second name was, and it was different from FirstGov because the search results were pulled from an ad supported Web index.

The highlight of the competition was Google’s losing the contract to Fast Search & Transfer. (Note: The first index exposed to the public was the work of Inktomi, a company mostly lost in the purple mists of Yahoo and time.) Google was miffed because Fast Search & Transfer had teamed with AT&T and replied to the SOW with some of the old fervor that characterized the company before Judge Green changed the game. I recall one sticking point: Truncation. In fact, one of the Google founders argued with me about truncation at a search conference. I pointed out that Google had to do truncation whether the founders wanted to or not. My hunch is that you don’t know much about truncation and what it contributes. I won’t get into the weeds, but the function is important. Think stemming, inflections, etc.

I examined more than 60 “enterprise” search systems, including the chemical structure search systems, the not-so-useful search tools in engineering design systems like AutoCAD, and a number of search systems now long forgotten like Delphis and Entopia, among others.

I have also written “The New Landscape of Search” published by Pandia and “Successful Enterprise Search Management” with Martin White, who is still chugging along with his brand of search expertise. Of course, I follow search and retrieval even though I have narrowed my focus to what I call intelware and policeware. These are next-generation systems which address the numerous short coming of the oversold, over-hyped, and misunderstood software allowing a commercial enterprise to locate specific items of interest from their hotchpotch of content.

In this blog, Beyond Search/DarkCyber I write about some enterprise search systems. In general, I remain very critical of the technologies and the mostly unfounded assertions about what a search-and-retrieval system can deliver to an organization.

With this background, I reacted to “Enterprise Search Software Comparison” with sadness. I was not annoyed by the tone or desire to compare some solutions to enterprise content finding. My response was based on my realization about how far behind understanding of enterprise search’s upsides and downsides, the gap between next-generation information retrieval systems and the “brand” names, and the somewhat shallow understanding of the challenges enterprise search poses for licensees, vendors, and users.

The write up “compares” these systems as listed in the order each is discussed in the source article cited above:

  • IBM Watson Discovery
  • Salesforce Einstein Search
  • Microsoft Search
  • Google Cloud Search
  • Amazon Kendra
  • Lucidworks
  • AlphaSense.

Each of these system merits a couple of paragraphs. For comparison, the discussion of systems in the Enterprise Search report typically required 15 or more pages. In CyberOSINT, I needed four pages for each system described. I had to cut the detail to meet the page limit for the book. A paragraph may be perfect for the thumb typing crowd, but detail does matter. The reason is that a misstep in selecting enterprise software can cost time and money and jobs. The people usually fired are those serving on the enterprise search system procurement team. Why? CFOs get very angry when triage to make a system work costs more than the original budget for the system. Users get angry when the system is slow (try 120 seconds to find a document in a content management system and then learn the document has not been indexed), stakeholders (the investment in search cannot be recovered without tricks, often illegal), and similar serious issues.

Let’s look at each of these systems described in the write up. I am going to move forward in alphabetical order. The listing in the source implies best to worst, and I want to avoid that. Also, at the end of this post, I will identify a few other systems which anyone seeking an enterprise search system may want to learn about. I post free profiles at The newer profiles cost money, and you can contact me at benkent2020 at yahoo dot com. No, I won’t give you a free copy. The free stuff is on my Web site.

AlphaSense. This is a venture backed company focused on making search the sharp end of a business intelligence initiative. The company is influenced by Eric Schmidt, the controversial Xoogler. The firm has raised about $100 million. The idea is to process disparate information and allow users to identify gems of information. AlphaSense competes with next-generation information services like DataWalk, Voyager Labs, and dozens of other forward looking firms. Will AlphaSense handle video, audio, time series data, and information stored on a remote workers’ laptop? Yeah. To sum up: Not an enterprise search solution; it is a variant of intelware. That’s no problem. AlphaSense is a me too of a different category of software.

Amazon Kendra. Amazon has a number of search solutions. This is Lucene. Yes, Lucene can deliver enterprise search; however, the system requires a commitment. Amazon’s approach is to put enterprise search into AWS. There’s nothing quite like the security of AWS in the hands of individuals who have not been “trained” in the ways of Amazon and Lucene.

Google Cloud Search. This is the spirit of the ill fated Google Search Appliance. The problems of GSA are ameliorated by putting content into the Google Cloud. What’s Google’s principal business? Yep, advertising. Those Googlers are trustworthy: Infidelity among senior managers raises this question, “Can we trust you to keep your body parts out of our private data?” You have to answer that question for yourself. (Sorry. Can’t say. Legal eagles monitor me still.)

IBM Watson Discovery. Okay, this is Lucene, home brew, and acquired technology like Vivisimo. Does it work? Why not ask Watson. IBM does have robust next-generation search, but that technology like IBM CyberTap is not available to the author of the article or to most commercial organizations. So IBM has training wheels search which requires oodles of IBM billable hours. Plus the company has next-generation information access. Which is it? Why not ask Watson? (If you used ITRC in the 1980s, you experienced my contribution to Big Blue. Plus I took money. None of that J5 stuff either.)

Salesforce Einstein Search. If a company puts its sales letter and contacts into this system, one can find the prospect and the email a salesperson sent that individual. Why do company’s want Salesforce search? When a salesperson quits, the company wants to make sure it has the leads, the sales story, etc. There are alternatives to Salesforce’s search system. Why? Maybe there are sufficient numbers of Salesforce customers who want to control what’s indexed and what employees can see? Just a thought.

Microsoft Search. I would like to write about Microsoft Search. (Yep, did a small thing for this outfit.)  I would like to identify the acquisitions Microsoft completed to “improve” search. I would like to point out that Microsoft is changing Windows 10 search again. But that’s the story. One flavor of Microsoft Search is Fast Search & Transfer. It is so wonderful that a competitive solution is available from outfits like Surfray, EPI Server, and even Coveo (yep, the customer support and kitchen sink vendor). Why? Microsoft Search is very similar to the Google search: Young people fooling around in order to justify their salaries and sense of self worth. The result? I particularly like the racist chat bot and the fact that Microsoft bought Fast Search & Transfer as the criminal case for financial fraud was winding through Norway’s court system. Yep, criminal behavior. Why? Check out my previous write ups about Fast Search & Transfer.

Lucidworks. Okay, I did some small work for this outfit when it was called Lucid Imagination. Then the revolving door started to spin. The Lucene/Solr system collected many, many millions and started its journey to … wait for it… digital commerce and just about anything that could be slapped on open source software. Can one “do” enterprise search with Solr? Sure. Just make sure you have money and time. Lucidworks’ future is not exactly one that will thrill its funding sources. But there is hope for an acquisition or maybe an IPO. Is Lucidworks a way to get “faceted search” like Endeca offered in 1998? Sure, but why not license Endeca from Oracle? Endeca has some issues, of course, but I wanted to put a time mark in this essay so the “age” of Lucidworks’ newest ideas are anchored with a me-too peg.

What vendors are not mentioned who can implement enterprise search?

I will highlight three briefly, just to make clear the distortion of the enterprise market that this article presents to a thumb typing millennial procurement professional:

  1. Exalead spawned a number of interesting content companies. One of them is Algolia. It works and has some Exalead DNA.
  2. SearchIT is an outfit in Europe. It delivers what I consider a basic enterprise search system.
  3. Maxxcat produces a search appliance which is arguably a bit more modern than the Thunderstone appliance.
  4. Elastic Elasticsearch. This is the better Compass. How many outfits use Elasticsearch? Lots. There’s a free version and for-fee help when fans of Shay Bannon get stuck. Check out this how to.

There are others, of course, but my point is that mixing apples and oranges gives one a peculiar view of what is in the enterprise search orchard. It is better to categorize, compare and contrast systems that perform “enterprise search” functions. What are these? It took me 400 pages to explain what users expect, what systems can deliver, and the cost/engineering assumptions required to deliver a solution that is actually useful.

Search is hard. The next-generation systems point the way forward. Enterprise search has, in my opinion, not advanced very far beyond the original Smart system or IBM STAIRS III.

PS. Notice I did not use the jargon natural language processing, semantics, text analytics, and similar hoo haa. Why? Search has a different meaning for each worker in quite distinct business units. Do you expect a chemical engineer looking for Hexamethylene triperoxide diamine to use a word or a chemical structure? What about a marketing person seeking a video of a sales VP’s presentation at a client meeting yesterday? What about that intern’s Instagram post of a not-yet-released product prototype? What about the information on that sales VP’s laptop as he returns to his home office after a news story appeared about his or her talk? What about those human resource personnel data files? What about the eDiscovery material occupying the company’s legal team? What about the tweet a contractor sent to a big client about the cost of a fix to a factory robot that trashed a day’s production? What about the emails between an executive and a sex worker related to heroin? (A real need at a certain vendor of enterprise search!) Yeah! Enterprise search.

Stephen E Arnold, April 13, 2014

New Enterprise Search Market Study

August 1, 2017

Don Quixote and Solving Death: No Problem, Amigo

I read “Global Enterprise Search Market 2017-2022.” I was surprised that a consulting firms would invest time and energy in writing about a market sector which has not been thriving. Now don’t start sending me email about my lack of cheerfulness about enterprise search. The sector is thriving, but it is doing so with approaches that are disguised as applications which deliver something other than inflated expectations, business closures, and lawsuits.

Image result for don quixote

I will slay the beast that is enterprise search. “Hold still, you knave!”

First, let’s look at what the report covers, then I will tackle some of the issues about which I think as the author of the Enterprise Search Report and a number of search-related articles and analyses. (The articles are available from the estimable Information Today Web site, and the free analyses may be located at

The write up told me that enterprise search boils down to these companies:

Coveo Corp
Dassault Systemes
IBM Corp

Coveo is a fork of Copernic. Yep, it’s a proprietary system which originally was focused on providing search for Microsoft. Now the company has spread its wings to include a raft of functions which range from the cloud to customer support / help desk services.

Dassault Systèmes is the owner of Exalead. Since the acquisition, Exalead as a brand has faded. The desktop search system was killed, and its proprietary technology lives on mostly as a replacement for Dassault’s internal search system which was based on Autonomy. Most of the search wizards have left, but the Exalead technology was good before Dassault learned that selling search was indeed a challenge.

IBM offers a number of products which include open source Lucene, acquired technology like Vivisimo’s clustering engine, and home brew code from its IBM wizards. (Did you  know that the precursor of PageRank was an IBM “invention”?) The key is that IBM uses search to sell services which have a higher margins than providing a free version of brute force information access.

Read more

Five Years in Enterprise Search: 2011 to 2016

October 4, 2016

Before I shifted from worker bee to Kentucky dirt farmer, I attended a presentation in which a wizard from Findwise explained enterprise search in 2011. In my notes, I jotted down the companies the maven mentioned (love that alliteration) in his remarks:

  • Attivio
  • Autonomy
  • Coveo
  • Endeca
  • Exalead
  • Fabasoft
  • Google
  • IBM
  • ISYS Search
  • Microsoft
  • Sinequa
  • Vivisimo.

There were nodding heads as the guru listed the key functions of enterprise search systems in 2011. My notes contained these items:

  • Federation model
  • Indexing and connectivity
  • Interface flexibility
  • Management and analysis
  • Mobile support
  • Platform readiness
  • Relevance model
  • Security
  • Semantics and text analytics
  • Social and collaborative features

I recall that I was confused about the source of the information in the analysis. Then the murky family tree seemed important. Five years later, I am less interested in who sired what child than the interesting historical nuggets in this simple list and collection of pretty fuzzy and downright crazy characteristics of search. I am not too sure what “analysis” and “analytics” mean. The notion that an index is required is okay, but the blending of indexing and “connectivity” seems a wonky way of referencing file filters or a network connection. With the Harvard Business Review pointing out that collaboration is a bit of a problem, it is an interesting footnote to acknowledge that a buzzword can grow into a time sink.


There are some notable omissions; for example, open source search options do not appear in the list. That’s interesting because Attivio was at that time I heard poking its toe into open source search. IBM was a fan of Lucene five years ago. Today the IBM marketing machine beats the Watson drum, but inside the Big Blue system resides that free and open source Lucene. I assume that the gurus and the mavens working on this list ignored open source because what consulting revenue results from free stuff? What happened to Oracle? In 2011, Oracle still believed in Secure Enterprise Search only to recant with purchases of Endeca, InQuira, and Rightnow. There are other glitches in the list, but let’s move on.

Read more

Attensity: A Big 404 in Text Analytics

October 1, 2016

Search vendors can save their business by embracing text analytics. Sounds like a wise statement, right? I would point out that our routine check of search and content processing companies turned up this inspiring Web page for Attensity, the Xerox Parc love child and once hot big dog in text analysis:


Attensity joins a long list of search-related companies which have had to reinvent themselves.

The company pulled in $90 million from a “mystery investor” in 2014. A pundit tweeted in 2015:


In February 2016, Attensity morphed into Sematell GmbH, a company with interaction solutions.

I mention this arabesque because it underscores:

  1. No single add on to enterprise search will “save” an information access company
  2. Enterprise search has become a utility function. Witness the shift to cloud based services like SearchBlox, appliances like Maxxcat, and open source options. Who will go out on a limb for a proprietary utility when open source variants are available and improving?
  3. Pundits who champion a company often have skin in the game. Self appointed experts for cognitive computing, predictive analytics, or semantic link analysis are tooting a horn without other instruments.

Attensity is a candidate to join the enterprise search Hall of Fame. In the shrine are Delphes, Entopia, et al. I anticipate more members, and I have a short list of “who is next” taped on my watch wall.

Stephen E Arnold, October 1, 2016

Enterprise Search Vendors: A Partial List

June 24, 2016

I spoke with a confused and unbudgeted worker bee at a giant outfit this weekend. The stellar professional was involved in figuring out what to do about enterprise search. The story is one I have heard many times in the last 40 years. The system doesn’t meet the needs of the users. The system is over budget. The system does not index in real time. Yadda yadda yadda.

The big question was, “What are the enterprise search vendors offering a system which actually works, does not experience downtime, cost overruns, and user outrage. Note that this is not the word “outage.” The word is “outrage”.

I don’t know of such a system. As a helpful 72 year old, I rattled off a list of vendors who purport to offer Big Data capable, next generation semantic-linguistic-NLP systems. True to form, I repeated the list twice. I thought he would cry.

For those of you who want to know the vendors I plucked from my list of outfits in the search and content processing game, I reproduce the list. If you want upsides, downsides, license fees, gotchas, and other assorted details, I will provide the information. But since you are not likely to buy me dinner this evening, you will have to pay for my thoughts.

Here’s the selected list. Reader, start your browser:

  • Attivio
  • Coveo
  • dtSearch
  • Elasticsearch (Lucene)
  • Fabasoft Mindbreeze
  • IBM Omnifind
  • IHS Goldfire
  • Lookeen
  • Lucid Works (Solr)
  • Marklogic
  • Maxxcat
  • Polyspot
  • Sinequa
  • Solcara
  • Squiz Funnelback
  • Thunderstone
  • X1
  • Yippy

There are quite a few outfits whose systems do search like Palantir, but I trimmed the list to companies for my worried pal.

What’s interesting is that most of these outfits explain that their systems are much, much more than search and retrieval. Believe it or not as Mr. Ripley used to say.

Factoid: Most of these outfits have been around for quite a few years. Only Elasticsearch has managed to become a “brand” in the search space. What happened to Autonomy, Convera, Endeca, Fast Search & Transfer, and Verity since I wrote the first three editions of the Enterprise Search Report between 2003 and 2007? Ugly for some.

Search is a tough problem and has yet to deliver what users expect. Remember Google killed its search appliance. Ads are a better business because they spell money for Alphabet.

Stephen E Arnold, June 24, 2016

Yippy for Vivisimo

June 6, 2016

I read “Yippy Buys MC+A, a Veteran Google Search Appliance Partner.” Yippy, if memory serves, is a variant of Vivisimo. In the good old days, before Vivisimo sold to IBM and suddenly became a Big Data company, Yippy did search and retrieval. I assume the old document limits were lifted. I also assume that the wild and crazy config file editing has been streamlined. I also assume that Yippy is confident it can zoom past Maxxcat and Thunderstone, two outfits also in the search appliance business. Buying a Google reseller provides some insight and maybe leads into which companies embraced the GSA solution. When the GSA was first demonstrated to me, I noted the locked down relevance system. There were other interesting “enhancements” the Googlers included to eliminate the complexity of enterprise search. I recall working on the training materials for a DC reseller of the GSA. Customization was like a Google interview question.

The Fortune write up is one of those reinventions of enterprise search which I enjoy. I circled this comment:

Google Search Appliance was a great idea for companies that deploy a welter of different applications, so important data can be scattered about in different file systems and repositories. It also gave Google a toehold in corporate server rooms, which is why some wondered why Google would cut the product at a time when it’s trying to sell more cloud services to these very companies.

I was unaware that Alphabet Google was in the search business. I thought it was an online advertising outfit. Who at Google wanted to work on the wonky GSA products? For years Google relied on resellers and outfits like Dell to make the over priced gizmos.

Love live Vivisimo. I mean Yippy. If you cannot pin down an integrator, why not buy one?

Stephen E Arnold, June 6, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta