Understanding Intention: Fluffy and Frothy with a Few Factoids Folded In

October 16, 2017

Introduction

One of my colleagues forwarded me a document called “Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications.” To get a copy of the collateral, one has to register at this link. My colleague wanted to know what I thought about this “book” by Lucidworks. That’s what Lucidworks calls the 25 page marketing brochure. I read the PDF file and was surprised at what I perceived as fluff, not facts or a cohesive argument.

image

The topic was of interest to my colleague because we completed a five month review and analysis of “intent” technology. In addition to two white papers about using smart software to figure out and tag (index) content, we had to immerse ourselves in computational linguistics, multi-language content processing technology, and semantic methods for “making sense” of text.

The Lucidworks’ document purported to explain intent in terms of content, context, and the crowd. The company explains:

With the challenges of scaling and storage ticked off the to-do list, what’s next for search in the enterprise? This ebook looks at the holy trinity of content, context, and crowd and how these three ingredients can drive a personalized, highly-relevant search experience for every user.

The presentation of “intent” was quite different from what I expected. The details of figuring out what content “means” were sparse. The focus was not on methodology but on selling integration services. I found this interesting because I have Lucidworks in my list of open source search vendors. These are companies which repackage open source technology, create some proprietary software, and assist organizations with engineering and integrating services.

The book was an explanation anchored in buzzwords, not the type of detail we expected. After reading the text, I was not sure how Lucidworks would go about figuring out what an utterance might mean. The intent-centric systems we reviewed over the course of five months followed several different paths.

Some companies relied upon statistical procedures. Others used dictionaries and pattern matching. A few combined multiple approaches in a content pipeline. Our client, a firm based in Madrid, focused on computational linguistics plus a series of procedures which combined proprietary methods with “modules” to perform specific functions. The idea for this approach was to reduce the errors in intent identification from accuracy between 65 percent to 80 percent to accuracy approaching and often exceeding 90 percent. For text processing in multi-language corpuses, the Spanish company’s approach was a breakthrough.

I was disappointed but not surprised that Lucidworks’ approach was breezy. One of my colleagues used the word “frothy” to describe the information in the “Understanding Intention” document.

As I read the document, which struck me as a shotgun marriage of generalizations and examples of use cases in which “intent” was important, I made some notes.

Let me highlight five of the observations I made. I urge you to read the original Lucidworks’ document so you can judge the Lucidworks’ arguments for yourself.

Imitation without Attribution

My first reaction was that Lucidworks had borrowed conceptually from ideas articulated by Dr. Gregory Grefenstette and his book Search Based Applications: At the Confluence of Search and Database Technologies. You can purchase this 2011 book on Amazon at this link. Lucidworks’ approach, unlike Dr. Grefenstette’s borrowed some of the analysis but did not include the detail which supports the increasing importance of using search as a utility within larger information access solutions. Without detail, the Lucidworks’ document struck me as a description of the type of solutions that a company like Tibco is now offering its customers.

Jargonizing

Lucidworks’ “book” includes a number of terms which are not defined. The most obvious examples are content, context, and crowd. But the most egregious omission is the failure to explain “intent”. (In our analysis of intent-centric methods we included examples of specific ways to figure out what a content object expressed. Content is no longer text. The mechanisms for making sense of audio, video, compound digital constructs like engineering drawings with product part data, and other types of enterprise data pose challenges for systems identifying intent. Indeed implementing an intent identification system involves a number of complicated systems and methods. The reader of the Lucidworks’ “book” learns nothing beyond the jargon. I was not sure how Lucidworks would discern intent. Context is important, but it goes hand in hand with content. Our research revealed that context remains a difficult problem.

Thus, systems which correctly identify context frequently return off point data to the system and ultimately to the user who wants relevant information. Pizza said to a mobile device while driving an automobile is different from a query for pizza issued by a cook preparing dinner. How does Lucidworks deal with this issue? The book does not provide any clues. Finally, the crowd presents a problem. The “crowd” works for large populations. For information access in an organization a disproportionate number of queries are unique; that is, not too many employees ask the same question. The Lucidworks’ explanation of the crowd ignore or confuses the difference between mass social data like that available from a Twitter data reseller and the log file data of queries for an enterprise with 200 employees.

Hand Waving

When I read the Lucidworks’ “Understanding Intention” document, the Equifax security “problem” was making headlines. However, I know from my work for certain government agencies that there are specialist firms which have purpose built solutions for fraud and security applications. I do not recall encountering Lucidworks on the lists of vendors who provide certified fraud detection solutions. Lucidworks is on slightly firmer ground when it suggests that it can create applications for an organization’s customers to use. However, details are simply not provided. Lucidworks points out that it can deliver enterprise search. I know that Solr can be used for enterprise search, but the question I had after reading the terse statements in the “Understanding Intention” was, “Why use Solr instead of Elastic’s Elasticsearch?” Both are based on Lucene. Both are open source. The big question I had was ignored. I found that interesting and somewhat disconcerting.

The View from Harrod’s Creek: Froth and Fluff

Let me share with you some of my personal opinions about this “book”.

In my opinion, Lucidworks is trying to position itself as a thought leader in search based applications. That’s okay, but the idea is not new, and the failure to include specifics makes what might have been a useful extension of Dr. Grefenstette’s work is little more than a marketing and sales generation effort.

image

Usually verbal froth is not based on facts in Harrod’s Creek.

The quite patient investors who have pumped millions into Lucidworks since 2007 may want more than generalizations and frothy assertions. My hunch is that a profit, an IPO, sustainable revenues, and meaningful profit are important. A brochure? That’s a cost, not a money in the investor’s hand. My personal opinion is that the nasty dust up between Search Technologies and Lucidworks is one incident that hints at the pressure building on Lucidworks’ management to generate fungible returns for the investors and the banks providing the $6 million in debt financing.

Organizations are struggling with digital information. One doesn’t need to learn about the security challenges online systems face. Multi-language content is a fact of life in today’s world. Mobile access is redefining how work is done and how information is obtained. These are environmental factors turbocharged by wider use of “smart software.” The “Understanding intention” document purports to explain “intent” but does little more than recycle obvious issues. That’s okay as long as there are meaningful details, high value analysis, and concrete case examples. Without bedrock, the foundation of Lucidworks seems to be unstable. But that’s just my opinion.

Perhaps Lucidworks will invest time and effort in a white paper that uses facts, data, and specific use case details to answer these questions I think are important:

  • What’s the difference between Elasticsearch and Lucidworks?
  • Which is better for basic search, Elasticsearch or Lucidworks?
  • Which is better for log file analysis, and business intelligence, Elasticsearch or Lucidworks?
  • What are the facts backing each nuts-and-bolts comparison?
  • What are the constraints upon each system the Lucene based Elasticsearch or the Lucene based Solr? Is Lucene itself an “issue”?
  • Which is less costly to deploy in an organization with 1,0000 employees (users), Elasticsearch or Lucidworks?
  • Which is less costly over operation in years two through five, Elasticsearch or Lucidworks? What drives the costs of the “less expensive” system? Lucene itself, technology, the client’s requirements, other factors?
  • Which is less costly to customize, Elasticsearch or Lucidworks?
  • Which is less vulnerable to a an open source community “action” (fork, defection, shift to another project)? Elasticsearch or Lucidworks?

For Lucidworks’ “books” to deliver the content payload I want, the company has to scoop off its froth and focus on facts. I think Milton Friedman would agree even as some of his cherished precepts suffer from the fires raging in and around Silicon Valley.

Stephen E Arnold, October 15, 2017

Comments

One Response to “Understanding Intention: Fluffy and Frothy with a Few Factoids Folded In”

  1. Edwin Stauthamer on October 20th, 2017 9:00 am

    You are comparing Elasticsearch with LucidWorks (Fusion?). One can only compare Elasticsearch to Solr.

    LucidWorks Fusion is a suite of solutions that enhance “vanilla” Solr drastically. It is meant to create search driven applications and gives you connectors, index- and query pipelines, analytics, NLP, machine learning etc.

  • Archives

  • Recent Posts

  • Meta