The AIIM Enterprise Search Study 2014

October 10, 2014

I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:

Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.

The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).

Some History for the Young Reader of Beyond Search

AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).

AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.

The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.

But AIIM evokes for me:

Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.

I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.

Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.

Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive  for Beyond Search.)

Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].

Talk about good news for vendors of findability solutions. If  one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.

Finding:  47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.

The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.

Read more

Google Blamed for Problems in Enterprise Search

October 7, 2014

Google, believe it or not, is responsible in part for the problems with enterprise search. The idea is advanced in “Why the “Google Paradigm” Has Damaged Enterprise Search.” The core of the argument is that people use Google for Web search. The resulting perception is that “enterprise search is as easy as Google web search, and that a central index of an enterprise is the right way to do enterprise search. ”

Google’s entrance into enterprise search was one of the companies earliest attempts to enter a market in which revenue came from a subscription or license, not a fee for advertising. The Google Search Appliance was a server loaded with a version of Google’s Web search system. Based on our work with the first GSA, it was clear that like many other Google products and services from the 2001 to 2004 period, Google was operating on some Googley assumptions; for example:

  • Google assumed that a version of its Web search system stripped of its ad matching was good enough for finding textual content in an organization
  • The company assumed that Autonomy, Endeca, and Fast Search & Transfer, the dominant enterprise search vendors at this time were too complex for most technical staff in an organization. The time and complexity of these systems contributed to the high user dissatisfaction with these systems. The high cost of these industry leaders’ systems contributed to management grousing about search.
  • Google assumed that it could disintermediate traditional information technology departments and deal directly with end users.

Google crafted a server that was positioned as a “search toaster.” The low price of the basic unit was less than $2,000 and sported an interface that required the licensee to plug in basic information and click a button to start the indexing process.

The Google Search Appliance by 2007 had an estimated 50,000 licensees. At that time, the product line had expanded but the locked down nature of the Google Search Appliance and the key word approach of the system was creating sales opportunities for other search appliance vendors; namely, Thunderstone, Maxxcat, and Index Engines.

Google added features and fiddled with the license fees, hardening the GSA product line with hot backups, connectors, and extensibility via licensed vendors. Few analysts paid much attention to the product licensing fees for the various “GB” or Google boxes. If you want to get a sense of the costs for building out a GSA system that can process 100 million documents, navigate to www.GSAadvantage.gov and search for the Google’s search appliances. The costs work out to be comparable or slightly higher than a similar installation from Autonomy, Endeca, or Fast. The high prices remain today.

Google learned from the GSA experience. Instead of offering an enterprise cloud solution, the company has left a limited and pricey GSA product line in the market and provided a modest commitment to this enterprise search solution. Google’s cloud solution manifests itself in Google’s site search features. I am waiting for Google either to kill the product line or amp up its commitment. In my opinion, the GSA is in no man’s land at this time. It appears that not even Google can respond to the needs of the enterprise findability users. If any company could crack the code, would it not be Google or a Xoogler’s start up?

As the GSA emerged as placeholder product, professionals became more and more dependent on Google’s Web search. In Europe, for example, Google’s Web search commands an market share in excess of 80 percent. In Denmark, Google’s share of Web search is north of 90 percent. In the US, Google has a 65 to 75 percent share of Web search, depending on which consultancies’ numbers one uses.

The word “search” became synonymous with Google. Enterprise search vendors began to use jargon other than search. This step was a natural reaction to hearing from prospects, “We want a search that works just like Google.” What the prospects meant was a system that was easy to use and seemed to deliver useful results in the hits displayed at the top of a results list, a page of images, or a map showing a location.

Google Web search, not the Google Search Appliance, reflected a broader shift in the information access market. Users of Web search and enterprise systems wanted and still want:

  1. Systems that do not require the user to invest much time and effort in getting an answer
  2. Systems that can produce useful outputs whether text, images, or maps with data displayed on them
  3. Systems that delivered “answers” without the delays (latency) many enterprise systems force on users.

Google’s ability to respond to this enterprise demand has been ineffectual. Like other Web search vendors, key word retrieval does not solve the problems basic search systems spawn. The GSA is evidence that Google does not have the key to unlock the revenue vault for enterprise search.

What Google search has done (inadvertently I might add) has been to make crystal clear that users do not want to work hard for information users perceive as useful. Precision and recall are irrelevant because voting and advertisers influence Google Web search results. Users love Google’s outputs.

In the organization, procurement teams, individual users, and senior management  boil their needs down to one simple statement: “We want search that is just like Google.”

That’s a big, big problem for search and content processing vendors. Google Web search is not about relevance, objective information, or accuracy. Google is easy and “good enough.” In an organization, people want easy. But in an organization the results have to be timely, comprehensive in terms of what information is available to an organization, and accurate.

On the Web Google can skip content that is malformed or stored on a server that does not respond to a Google spider quickly enough. In an organization, the content has to be available. On the Web, the advertisers and the uses’ own behavioral data pays the bills. In an organization, the organization has to pay the bills. Google has more money from a different business model than most organizations. Google pumps money into plumbing to deliver the service that makes money. Organizations want to fix the amount spent on search and the funds are not infinite.

For search vendors, the problem of Google’s dominance in Web search makes product differentiation difficult. Google’s business model creates challenges for vendors who have to justify the “value” and hence the “cost” of their search systems. For traditional search vendors, ease of use is very, very difficult because of the nature of the questions enterprise system users have.

Google is a mirror in which societal, cultural, and intellectual changes in information access are reflected. For many years, I have called attention to the verbal push ups search vendors use to try and make sales. The struggled Hewlett Packard have had with Autonomy provide a glimpse of how “value” can be difficult to change into hard cash Microsoft’s Delve illustrates that search for Office 365 is a combination of contacts, alerts, and personalization, not key word search. The dependence on enterprise search companies for cash from venture capital sources illustrates that traditional search is a very, very tough business to make into something sustainable and profitable without financial life support. The expectations that Watson will become a $10 billion business in 60 months is disconnected from the experience of other smart companies. In the history of enterprise search, only Autonomy reported revenues of more than $800 million from enterprise licenses. IBM projects more than 10 X this revenue in 60 months. It took Autonomy more than a decade to hit $500 million.

The reality is that Google is not the problem. Google is a metaphor for what users want when it comes to information access.

The write up asserts:

The Google paradigm also ignores the challenge of scalability.  Indexing the enterprise for a centralized enterprise search capability requires major investment.  In addition, centralization runs counter to the realities of the working world where information must be distributed globally across a variety of devices and applications.  The amount of information we create is overwhelming and the velocity with which that information moves increases daily.

Interesting statement. For me, the problem of the Google paradigm is that it another bit of jargon that sidesteps a what information retrieval must deliver in today’s business environment. Whoever cracks the code can make money. My hunch is that Google probed the enterprise search market and is trying to figure out how to make it pay off in a significant way. Google may be trapped in the same problem space through which other enterprise search and content processing vendors slog. The question may be, “Is there a way out of the swamp and into a land of milk, honey, sustainable revenue, and healthy margins?”

Stephen E Arnold, October 7, 2014

New Spin for Unstructured Content

October 6, 2014

I read “Objective Announces Partnership with Active Navigation, Helping Organisations Reduce Unstructured Content by 30%.” The angle is different from some content marketing outfits’ approach to search. Instead of dragging out the frequently beaten horse “all available information,” the focus is trimming the fat.

According to the write up:

The Active Navigation and Objective solution, allows organisations to quickly identify the important corporate data from their ROT information, then clean and migrate it into a fully compliant repository

How does the system work? I learned:

Following the Objective Information Audit, in which all data will be fully cleansed, an organisation’s content will be free of ROT information and policy violations. Immediate results from an Objective Information Audit typically deliver storage reductions of 30% to 50%, lowering the total cost of ownership for storage, while improving the performance of enterprise search, file servers, email servers and information repositories.

Humans plus smart software do the trick. In my view, this is an acknowledgement that the combination of subject matter experts plus software deliver a useful solution. The approach does work as long as the humans do not suffer “indexing fatigue” or budget cuts. Working time can also present a challenge. Management can lose patience.

Will organizations embrace an approach familiar to those who used decades old systems? Since some state of the art automated systems have delivered mixed results, perhaps a shift in methods is worth a try.

Stephen E Arnold, October 6, 2014

Palantir: Now an Enterprise App Developer

September 30, 2014

I read “Hush Hush Data Firm Palantir Snags ICE Case Tracking Deal.” Palantir may be moving from supporting intelligence agencies to the market sector dominated by government contractors like SRA, Booz Allen Hamilton, and CACI.

The article states:

Immigration and Customs Enforcement has awarded secretive data-mining firm Palantir a $42 million contract to redo the investigation agency’s failed case filing system.

The challenge will be to make a case management system work in a manner that satisfies the statement of work. Other case management efforts have crashed and burned.

Palantir appears to be working with a tough mandate: On time and on budget delivery. As you may know, the notion of on time and on budget is only valid until the first scope change rolls down the timeline.

Are flaws in case management systems unusual. Nah. The article reveals:

The Justice Department inspector general last week released a report on the FBI’s new case management system, Sentinel, assailing its searching and indexing features for slowing the investigations of special agents and the productivity levels of evidence technicians.

Why are case management systems problematic? I can identify a number of reasons, but it will be more entertaining if I wait for news about the Palantir project’s path.

Stephen E Arnold, October 17, 2014

Google-News Corp. Marvelous, He Said, She Said

September 26, 2014

I don’t have a dog in this hunt. I think both Google and News Corp. are wonderful. Weaknesses, none. Both companies just have strengths. Google has its Washington, DC lobbying effort and News Corp. has Fox News. Google has its ups and downs with the privacy issue (except at Stanford University). News Corp. has that alleged telephone tapping matter. Google has legions of users in Europe. News Corp. has fingers clutching newspapers, eyeballs watching television, and some Web users.

One big difference. Google is a  15 year old adolescent. News Corp. is an aged information company.

The two, like a May December romance gone sour will face nothing but irreconcilable differences. Need an example? Check out this blog post from the Google charmingly labeled “Dear Rupert.” See, Google does have a sense of humor.

I don’t have the energy to walk through the arguments and counter arguments. I do want to highlight one point and comment about it. News Corp. leaves a door open with its comment: “Google’s “power” makes it hard for people to “access information independently and meaningfully.” Google is “willing to exploit [its] dominant market position to stifle competition.

The Google response is wonderful. I believe that Commodore Vanderbilt, Jay Gould, John D. Rockefeller (oops sorry. He’s apoplectic about his descendants’ dumping holdings in fossil fuels), and JP Morgan (you remember: the fellow whose portrait makes it appear he is holding a knife as he starts to push himself from a chair) could not have collectively inked a better response:

With the Internet, people enjoy greater choice than ever before — and because the competition is just one click away online, barriers to switching are very, very low.

Well, I sort of enjoy the one click notion, but the reality for online users is that once a habit is formed, users have a tough time breaking it. Google is a habit with a market share only a drug lord can seek: 95% of users in Denmark, 66% of users in the US, 95 percent of users in France (gasp, France, home of Exalead, the former Quaero brain trust centroid, and numerous search vendors), etc. For more data see http://bit.ly/1nwgGrw.

Add to the monopoly position Google search controls, competition is few and far between. Bing.com is just not able to gain significant market share from the GOOG. The hot ticket search engines according to search “experts” are Ixquick.com and DuckDuckGo.com. Er, there are metasearch engines and need access to other vendors’ indexes. As metasearch vendors, as search vendors doing primary indexing bite the dust, these outfits face some tough choices if they want to stay in business. The little known Exalead search,  which is almost unknown, offers a tiny fraction of the GOOG’s coverage. And Yandex? Well, Mr. Putin may make it difficult for that outfit to remain in business without picking up and heading to a new campground.

One click? Nope. As users shift to mobile devices, the information access mode shifts to applications or apps. Maybe News Corp. can tackle Google in this new space. I am not sure, however, if those who know how to do intercepts focus much on cutting Google off at the knees with an appealing online ad platform.

You can work through the rest of the arguments. Remember, you are one click away from finding a new search engine.

Stephen E Arnold, September 26, 2014

Concept Searching: More Smart Content Rah Rah

September 23, 2014

I read “Concept Searching Taxonomy Workflow Tool solving Migration, Security, and Records Management Challenges.” This is a news release and it can disappear at any time. Don’t hassle me if it is a goner. The write up walks me rapidly into the smart content swamp. The idea is that content without indexing is dumb content. Okay. Lots of folks are pitching the smart versus dumb content idea now.

The fix? Concept Searching provides a smart tool to make content intelligent; that is, include index terms. For the youngster at heart, “indexing” is the old school word for metadata.

The company’s news announcement asserts:

conceptTaxonomyWorkflow serves as a strategic tool, managing enterprise metadata to drive business processes at both the operational and tactical levels. It provides administrators with the ability to independently manage access, information management, information rights management, and records management policy application within their respective business units and functional areas, without the need for IT support or access to enterprise-wide servers. By effectively and accurately applying policy across applications and content repositories, conceptTaxonomyWorkflow enables organizations to significantly improve their compliance and information governance initiatives.

The product name is indeed a unique string in the Google index. The company asserts that the notion of a workflow is strategic. Not only is workflow strategic, it is also tactical. For some, this is a two for one deal that may be heard to resist. The tool allows administrators to perform what appears to be tasks I think of “editorial policy” or as the young at heart say, information governance.

The only issue for me is that the organizations with which I am familiar have pretty miserable information governance methods. What I find is that organizations have Balkanized methods for dealing with digital information. Examples of poor information governance fall readily to hand. The US court system removed public documents only to reinstate them. The IRS in the US cannot locate email. And when the IRS finds an archive of the email, the email cannot be searched. And, of course, there is Mr. Snowden. How many documents did he remove from NSA servers?

The notion that the CTW tool makes it possible to “apply policy across applications and content repositories” sounds absolutely fantastic to a person with indexing experience. There is a problem. Many organizations do not understand an editorial policy or are willing to do much more than react when something goes off the tracks. The reality is that the appetite for meaningful action is often not in commercial enterprises or government entities. Budgets remain tight. Reducing information technology budgets is often a more important goal than improve information technology.

What’s this mean?

My hunch is that Concept Searching is offering a product for an organization that [a] has an editorial policy in place or [b] wants to appear to be taking meaning steps toward useful information governance.

The president of Concept Searching is taking a less pragmatic approach to selling this tool. Martin Garland, according to the company story, states:

Managing metadata and auto-classifying to taxonomies provides high value in applications such as search, text analytics, and business social. But many forward thinking organizations are now looking to leverage their enterprise metadata and use it to improve business processes aligned with compliance and information governance initiatives. To accomplish this successfully, technologies such as conceptTaxonomyWorkflow must be able to qualify metadata and process the content based on enterprise policies. A key benefit of the product is its ease of use and rapid deployment. It removes the lengthy application development cycle and can be used by a large community of business specialists as well as IT.

The key benefit, for me, is that a well conceived and administered information policy eliminates risks of an information misstep. I would suggest that the Snowden matter was a rather serious misstep.

One assumes that companies have information policies, stand behind them, and keep them current. This strikes me as a quite significant assumption.

A similar message is now being pushed by Smartlogic, TEMIS, WAND, and other “indexing” companies.

Are these products delivering essentially similar functionality? Is any system indexing with less than a 10 percent error rate? Are those with responsibility for figuring out what to do with the flood of digital information equipped to enforce organization wide policies? And once installed, will the organization continue to commit resources to support tools that manage indexing? What happens if Microsoft Azure Search and Delve deliver good enough indexing and controls?

These are difficult questions to answer. Based on the pivoting content processing vendors are doing, most companies selling information solutions are trying to find a way to boost revenues in an exhausting effort to maintain stable cash flows.

Does anyone make an information governance tool that keeps track of what information retrieval companies market?

Stephen E Arnold, September 23, 2014

Mondeca: Content IQ

September 23, 2014

I reacted strongly to the IDC report about the knowledge quotient. IDC, as you know, is the home of the fellow who sold my content on Amazon without written permission. I learned that Mondeca is using a variant of “knowledge quotient.” This company’s approach taps the idea of the intelligence quotient of content.

I interpret content with a high IQ in a way that is probably not what Mondeca intended. Smart content is usually content that conveys information that I find useful. Modena, like other purveyors of indexing software, uses the IQ to refer to content that is indexed in a meaningful way. Remember if the users do not use the index terms, assigning these terms to a document does not help a user. Effective indexing helps the user find content. In the good old days of specialist indexing, users had to learn the indexing vocabulary and conventions. Today users just pump 2.7 words into a search box and feel lucky.

Like vendors of automated indexing systems and software, humans have to get into the mix.

One twist Modena brings to the content IQ notion is a process that helps a potential licensee answer the question, “How smart is your content?” For me, poorly indexed content is not smart. The content is simply poorly indexed.

I navigated to the “more information” link on the Content IQ page and learned that answering the question costs 5000 Euros, roughly $6,000.

Like the knowledge quotient play, smart content and allied jargon make an effort to impart a halo of magic around a pretty obvious function. I suppose that in today’s market, clarity is not important. Marketing magic is needed to create a demand for indexing.

I believe professionally administered indexing is important. I was one of the people responsible for creating the ABI/INFORM controlled vocabulary revision and the reindexing of the database in 1981. Our effort involved controlled terms, company name fields, and a purpose built classification system.

Some day automated systems will be able to assign high value index terms without humans. I don’t think that day has arrived. To create smart content, have smart people write it. Then get smart, professional indexers to index it. If a software system can contribute to the effort, I support that effort. I am just not comfortable with the “smart software” trend that is gaining traction.

Stephen E Arnold, September 23, 2014

Luxid: Positioning Adjustments

September 23, 2014

Luxid, based in Paris, offers an automatic indexing service. The company has focused on the publishing sector as well a number of other verticals. The company uses the phrase “semantic content enrichment” to describe the companies indexing. The more trendy phrase is “metatagging,” but I prefer the older term.

The company also uses the term “ontology” along with references to semantic jargon like “triples.” The idea is that a licensee can select a module that matches an industry sector. WAND, a competitor, offers a taxonomy library. The idea is that much of the expensive and intellectually demand work needed to build a controlled vocabulary from scratch is sidestepped.

The positioning that I find interesting is that Luxid delivers “NLP enabled ontology management workflow.” The idea is that once the indexing system is installed, the licensee can maintain the taxonomy using the provided interface. This is another way of saying that administrative tools are included. Another competitor, Smartlogic, uses equally broad and somewhat esoteric terms to describe what are essential indexing operations.

Like other search and content processing vendors, Luxid invokes the magic of Big Data. Luxid asserts, “Streamlined, Big Data architecture offers improved scalability and robust integration options.” The point that indexing processes often stub toes is the amount of human effort and machine processing time required to keep and index updated and populate the new content across already compiled indexes. Scalability can be addressed with more resources. More resources often means increased costs, a challenge for any indexing system that deals with regular content, not just Big Data.

Will the revised positioning generate more inquiries and sales leads? Possibly. I find the wordsmithing content processing vendors use fascinating. The technology, despite the academic jargon, has been around since the days of Data Harmony and other aging methods.

The key points, in my view, is that Luxid offers a story that makes sense. The catnip may be the jargon, the push into publishing which is loath to spend for humans to create indexes, and the packaging of vocabularies into “Skill Cartridges.”

I anticipate that some of Luxid’s competitors will emulate the Luxid terminology. For many years, much of the confusion about which content processing does what can be traced to widespread use of jargon.

Stephen E Arnold, September 22, 2014

Lucid Works: Really?

September 21, 2014

Editor’s Note: This amusing open letter to Chrissy Lee at Launchsquad Public Relations points out some of the challenges Lucid Imagination (now Lucid Works) faces. Significant competition exists from numerous findability vendors. The market leader in open source search is, in Beyond Search’s view, ElasticSearch.

Dear Ms. Lee,

I sent you an email on September 18, 2014, referring you to my response to Stacy Wechsler at Hired Gun public relations. I told you I would create a prize for the news release you sent me. I am retired, but I don’t have too much time to write for PR “professionals” who send me spam, fail to do some research about my background, and understand the topic addressed in your email.

Some history: I recall the first contact I had from Lucid Imagination in 2008. A fellow named Anil Uberoi sent me an email. He and I had a mutual connection, Mark Krellenstein who was the CTO for Northern Light when it was a search vendor.

I wrote a for fee report for Mr. Uberoi, who shortly thereafter left Lucid for an outfit called Kitana. His replacement was a fellow named David. He left and migrated to another company as well. Then a person named Nancy took over marketing and quickly left for another outfit. My recollection is that in a span of 24 months, Lucid Imagination churned through technical professionals, marketers, and presidents. Open source search, it seemed, was beyond the management expertise of the professionals at Lucid.

Then co founder Mark Krellenstein cut his ties with the firm, I wondered how Mr. Krellenstein could deliver the innovative folders function for Northern Light and flop at Lucid. Odd.

Recently I have been the recipient of several emails sent to my two major email accounts. For me, this is an indication of spam. I knew about the appointment of another president. I read  “Trouble at Lucid Works: Lawsuits, Lost Deals, and Layoffs Plague the Search Startup Despite Funding.” Like other pundit-fueled articles, there is probably some truth, some exaggeration, and some errors in the article. The overall impression left on me by the write up is that Lucid Works seems to be struggling.

Your emails to me indicate that you perceive me as a “real” journalist. Call me quirky, but I do not like it when a chipper young person writes me, uses my first name, and then shovels baloney at me. As the purveyor of search silliness for your employer Launchsquad, which seems Lucid Works’ biggest fan and current content marketing agent. Not surprisingly, the new Lucid Fusion products is the Popeil pocket fisherman of search. Fusion slices, dices, chops, and grates. Here’s what  Lucid Works allegedly delivers via Lucene/Solr and proprietary code:

  • Modular integration. Sorry, Ms. Lee, I don’t know what this means.
  • Big Data Discovery Engine. Ms. Lee, Lucid has a search and retrieval system, not a Cybertap, Palantir, or Recorded Future type system.
  • Connector Framework. Ms. Lee licensees want connectors included. Salesforce bought Entropy Soft to meet this need. Oracle bought Outside In for the same reason. Even Microsoft includes some connectors with the quite fragile Delve system for Office 365.
  • Intelligent Search Services.Ms. Lee, I suggest you read my forthcoming article in KMWorld about smart software. Today, most search services are using the word intelligent when the technology in use has been available for decades.
  • Signals Processing.Ms. Lee, I suggest you provide some facts for signals processing. I think in terms of SIGINT, not crude click log file data.
  • Advanced Analytics.Ms. Lee, I lecture at several intelligence and law enforcement conferences about “analytics.” The notion of “advanced” analytics is at odds with the standard numerical recipes that most vendors use. The reason “advanced” is not a good word is that there are mathematical methods that can deliver significant return. Unfortunately today’s computer systems cannot get around the computational barriers that bring x86 architectures to their knees.
  • Natural Language Search.Ms. Lee, I have been hearing about NLP for many years. Perhaps you have not experimented with the voice search functions on Apple and Android devices? You should. Software does a miserable job of figuring out what a human “means.”

So what?

Frankly I am not confident that Lucid Works can close the gap between your client and ElasticSearch’s. Furthermore, I don’t think Lucid Works can deliver the type of performance available from Searchdaimon or ElasticSearch. The indexing and query processing gap between Lucid Works and Blossom Software is orders of magnitude. How do I know? Well, my team tested Lucid Works’ performance against these systems. Why don’t you know this when you write directly to the person who ran the tests? I sent a copy of the test results to one of Lucid Works’ many presidents.

Do I care about Ms. Lee, the new management team, the investors, or the “new” Lucid?

Nope.

The sun has begun to set on vendors and their agents who employ meaningless jargon to generate interest from potential licensees.

What’s my recommendation? I suggest a person interested in Lucid navigate to my Search Wizards Speak series and read the Lucid Imagination and Lucid Works interviews. Notice how the story drifts. You can find these interviews at www.arnoldit.com/search-wizards-speak.

Why does Lucid illustrate “pivoting”? It is easy to sit around and dream about what software could do. It is another task to deliver software that matches products and services from industry leaders and consistent innovators.

For open source search, I suggest you pay attention to www.Flax.co.uk, www.Searchdaimon.com, www.sphinxsearch.com, and www.elasticsearch.com for starters. Keep in mind that other competitors like IBM and Attivio use open source search technology too.

You will never have the opportunity to work directly for me. I can offer one small piece of advice: Do your homework before writing about search to me.

Your pal,

Stephen E Arnold, September 21, 2014

AI Is Learning To Read

September 19, 2014

Machines know how to read, because they have been programmed to understand letters and numbers. They, however, do not comprehend what they are “reading” and cannot regurgitate it for users. The Research Blog that comments on Google’s latest news “Teaching Machines To Read Between The Lines (And A New Corpus With Entity Salience Annotations),” about how the search engine giant is using the New York Times Annotated Corpus to teach machines entity salience. Entity salience basically means machines can comprehend what they are “reading,” locate required information, and be able to use it. The New York Times Corpus is a large dataset with 1.8 million articles from twenty years. If a machine can learn salience from anything, it would be this collection.

Entity salience is determined by term ratios and complex search indexing done-brought to you by Knowledge Graph. The machine reading the article records the indicator for salience, byte offsets, entity index, mention count of entity determined by conference system, and other information to digest the document.

The system does work better with proper nouns:

“Since our entity resolver works better for named entities like WNBA than for nominals like “coach” (this is the notoriously difficult word sense disambiguation problem, which we’ve previously touched on), the annotations are limited to names.”

On a similar note on the Team Leada blog people can ask Google’s Director of Research Peter Norvig questions. He was asked:

“What is one of the most-often overlooked things in machine learning that you wished more people would know about or would study more? What are some of the most interesting data science projects Google is working on?”

Norvig responded that there are many problems depending on the project you are working on and Google is doing a lot of data science projects, but nothing specific.

Machine learning and reading is being worked on. In short, machines are going to school.

Whitney Grace, September 19, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta