Lucidworks: The Future of Search Which Has Already Arrived

August 24, 2017

I am pushing 74, but I am interested in the future of search. The reason is that with each passing day I find it more and more difficult to locate the information I need as my routine research for my books and other work. I was anticipating a juicy read when I requested a copy of “Enterprise Search in 2025.” The “book” is a nine page PDF. After two years of effort and much research, my team and I were able to squeeze the basics of Dark Web investigative techniques into about 200 pages. I assumed that a nine-page book would deliver a high-impact payload comparable to one of the chapters in one of my books like CyberOSINT or Dark Web Notebook.

I was surprised that a nine-page document was described as a “book.” I was quite surprised by the Lucidworks’ description of the future. For me, Lucidworks is describing information access already available to me and most companies from established vendors.

The book’s main idea in my opinion is as understandable as this unlabeled, data-free graphic which introduces the text content assembled by Lucidworks.

image

However, the pamphlet’s text does not make this diagram understandable to me. I noted these points as I worked through the basic argument that client server search is on the downturn. Okay. I think I understand, but the assertion “Solr killed the client-server stars” was interesting. I read this statement and highlighted it:

Other solutions developed, but the Solr ecosystem became the unmatched winner of the search market. Search 1.0 was over and Solr won.

In the world of open source search, Lucene and Solr have gained adherents. Based on the information my team gathered when we were working on an IDC open source search project, the dominant open source search system was Lucene. If our data were accurate when we did the research, Elastic’s Elasticsearch had emerged as the go-to open source search system. The alternatives like Solr and Flaxsearch have their users and supporters, but Elastic, founded by Shay Branon, was a definite step up from his earlier search service called Compass.

In the span of two and a half years, Elastic had garnered more than a $100 million in funding by 2014and expanded into a number adjacent information access market sectors. Reports I have received from those attending Elastic meetings was that Elastic was putting considerable pressure on proprietary search systems and a bit of a squeeze on Lucidworks. Google’s withdrawing its odd duck Google Search Appliance may have been, in small part, due to the rise of Elasticsearch and the changes made by organizations trying to figure out how to make sense of the digital information to which their staff had access.

But enough about the Lucene-Solr and open source versus proprietary search yin and yang tension.

Lucidworks believes that the future is data from multiple sources. Here in Harrod’s Creek, the future has already arrived. For us and for many involved in information access, the Lucidworks’ big idea is an old one. I recall that Fulcrum Technologies included in its presentations in the 1980s, its systems ability to index different types of content normalized when part of the incredibly overwrought SharePoint which Microsoft asserted was its enterprise application for the future. Remember. This was in the 1980s. Vivisimo pitched “federated content.” Other vendors made the same claim. Today dozens of vendors deliver federated information access.

Anyone who has been involved in large-scale federation projects know that there are some sticky wickets between a user and information tucked into proprietary systems, locked behind poorly maintained access protocols, publishers’ copyright, and digital content which flow from mobile devices an images and video. The task is to make sure the content processing system can deal with content, often in the form of well-formed XML. Then once in that form, search and analytic tools could be used to figure out what information was available to answer a user’s or a software system’s query.

Lucidworks sees one problem as storage. I don’t. Some key issues were not at all related to storage. Consider infrastructure, bandwidth, updating indexes in use in near real time or real real time, data integrity, editorial controls to know who added what and when, and basic computational challenges. Think Big O. Some of these issues remain complicated to solve in an acceptable manner. Others like human subject matter experts fiddling with rules and tuning “smart” software are essentially as complicated today as they were in 1950 when mainframe search was available with STAIRS III as the plumbing.

Even in best case implementations of enterprise search, hurdles exists; for example, the challenges of non standard data types, flows of real time information from monitoring systems, audio, video, images, chat bot transcripts, and engineering drawings which combined vector data with database information about suppliers and parts remained. These problems persist to this day in most organizations using the most sophisticated cyber tools available.

Toss in access controls and integration across distributed, disparate information technology systems with content in multiple languages, and  in companies which grow via acquisition. What have you to solve? A quite complicated set of problems. Money and time along with technology are the reasons enterprise search which tries to be more than keyword matching earns a justified reputation as a problem among some information technology professionals. (Investors are another group concerned about enterprise search too. Their money is at stake and the winners in enterprise search have been few compared with the large number of companies who set forth to become an Autonomy, Microsoft Fast, or Exalead. Keep in mind that the “success” of these vendors may have some footnotes attached.

Even the mighty Google does not deliver search results from Web data, Google Books, and Google analytics. The reasons are not new ones: Cost and complexity pose hurdles even for the most savvy of online information access companies.

Lucidworks seems oblivious to these issues, preferring to explain the future in an earthworm like listing of assertions; for instance:

  • Search can deal with Big Data
  • Data can be personalized
  • Solutions will by “cloud hybrid”
  • Voice interfaces allow one to talk to a computing device
  • Search will be predictive
  • Search will be everywhere

In my two most recent books CyberOSINT: Next Generation Information Access and the Dark Web Notebook, I describe a number of companies which are delivering information access, analytic functions, and the functions which Lucidworks will characterize 2025.

If my team’s research is accurate, tomorrow’s functions are available today and can be licensed from companies ranging from the giant BAE Systems to the Google-backed Recorded Future. In fact, I profile more than 25 companies delivering 2025 solutions today. Yes, you can send a letter to any of these companies’ sales departments and get access quickly.

What’s disappointing about the Lucidworks’ nine page book is that it offers a view of the future has already arrived.

Instead of setting up a Palantir trial or getting a Recorded Future demo, Lucidworks recommends these steps to execute to get ready for the future; for example:

  • Use software that can work on premises or from the cloud
  • Avoid pitfalls
  • Have the “right” expertise
  • Create a “permanent technology refresh plan”
  • Deploy new capabilities
  • Use “smart” a/b testing
  • Work on data quality.

I think that someone at Lucidworks looked at a page of recommendations from a second class consulting firm and practiced “green” thinking. Recycling is useful for plastic, but I am not too sure it is applicable to a complex concept like information access. I am tempted to make a school master comment about the suggestion to “avoid pitfalls.” I will content myself with mentioning that Secretary Rumsfeld’s observation about knowns, unknown knowns and unknown unknowns makes clear the difficult of “avoiding pitfalls”? Many victims of glib enterprise search marketers do. Also, there is the “right” expertise as if a person knows who is and who is not an expert able to handle a specific information access issue. Plus, there is the notion of “working on data quality” when “quality” is a bit of a slippery fish.

To me, these statements are an attempt to express “wisdom” without evidence of authority or hard facts.

So what?

I think that the notion of offering a book with nine pages is an SEO-type gimmick. The gray type may cause some who gain access to the “book” to focus only on the readable headlines and the orange highlights. But despite the lack of contrast on the page, I labored through the text and formulate three probably obvious observations:

  1. Elastic, a Lucidworks nemesis, may interpret the “book” as a document which nudges those interested in open source search toward Elasticsearch and its suite of tools and services
  2. Lucidworks has not yet found a way to explain why the company’s solution is the best one based on verifiable metrics like time to deploy, cost to deploy, specific use cases, and similar factual information
  3. The “book” can be interpreted as evidence of the company’s lack of sophistication about where search is now and what search might become in eight years.

In short, for me the future of search in 2025 is here today. Predicting the future is difficult, but making information available to those who need answers is also difficult. Facts, not MBA generalizations, have been more useful in my experience.

I still don’t know what the graphic means. Maybe a way to depict a lack of focus? Entropy? Just another pretty but meaningless visualization?

Stephen E Arnold, August 24, 2017

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta