CyberOSINT banner

Why Enterprise Search Fails

July 12, 2016

I participated in a telephone call before the US holiday break. The subject was the likelihood of a potential investment in an enterprise search technology would be a winner. I listened for most of the 60 minute call. I offered a brief example of the over promise and under deliver problems which plagued Convera and Fast Search & Transfer and several of the people on the call asked, “What’s a Convera?” I knew that today’s whiz kids are essentially reinventing the wheel.

I wanted to capture three ideas which I jotted down during that call. My thought is that at some future time, a person wanting to understand the incredible failures that enterprise search vendors have tallied will have three observations to consider.

No background is necessary. You don’t need to read about throwing rocks at the Google bus, search engine optimization, or any of the craziness about search making Big Data a little pussycat.

Enterprise Search: Does a Couple of Things Well When Users Expect Much More

Enterprise search systems ship with filters or widgets which convert source text into a format that the content processing module can index. The problem is that images, videos, audio files, content from wonky legacy systems, or proprietary file formats like IBM i2’s ANB files do not lend themselves to indexing by a standard enterprise search system.  The buyers or licensees of the enterprise search system do not understand this one trick pony nature of text retrieval. Therefore, when the system is deployed, consternation follows confusion when content is not “in” the enterprise search system and, therefore, cannot be found. There are systems which can deal with a wide range of content, but these systems are marketed in a different way, often cost millions of dollars a year to set up, maintain, and operate.


Net net: Vendors do not explain the limitations of text search. Licensees do not take the time or have the desire to understand what an enterprise search system can actually do. Marketers obfuscate in order to close the deal. Failure is a natural consequence.

Data Management Needed

The disconnect boils down to what digital information the licensee wants to search. Once the universe is defined, the system into which the data will be placed must be resolved. No data management, no enterprise search. The reason is that licensees and the users of an enterprise search system assume that “all” or “everything” – maps to web content, email to outputs from an AS/400 Ironside are available any time. Baloney. Few organizations have the expertise or the appetite to deal with figuring out what is where, how much, how frequently each type of data changes, and the formats used. I can hear you saying, “Hey, we know what we have and what we need. We don’t need a stupid, time consuming, expensive inventory.” There you go. Failure is a distinct possibility.


Net net: Hope springs eternal. When problems arise, few know what’s where, who’s on first, and why I don’t know is on third.

Read more

VirtualWorks Purchases Natural Language Processing Firm

July 8, 2016

Another day, another merger. PR Newswire released a story, VirtualWorks and Language Tools Announce Merger, which covers Virtual Works’ purchase of Language Tools. In Language Tools, they will inherit computational linguistics and natural language processing technologies. Virtual Works is an enterprise search firm. Erik Baklid, Chief Executive Officer of VirtualWorks is quoted in the article,

“We are incredibly excited about what this combined merger means to the future of our business. The potential to analyze and make sense of the vast unstructured data that exists for enterprises, both internally and externally, cannot be understated. Our underlying technology offers a sophisticated solution to extract meaning from text in a systematic way without the shortcomings of machine learning. We are well positioned to bring to market applications that provide insight, never before possible, into the vast majority of data that is out there.”

This is another case of a company positioning themselves as a leader in enterprise search. Are they anything special? Well, the news release mentions several core technologies will be bolstered due to the merger: text analytics, data management, and discovery techniques. We will have to wait and see what their future holds in regards to the enterprise search and business intelligence sector they seek to be a leader in.

Megan Feil, July 8, 2016
Sponsored by, publisher of the CyberOSINT monograph


Enterprise Search Is Stuck in the Past

July 4, 2016

Enterprise search is one of the driving forces behind an enterprise system because the entire purpose of the system is to encourage collaboration and quickly find information.  While enterprise search is an essential tool, according to Computer Weekly’s article. “Beyond Keywords: Bringing Initiative To Enterprise Search” the feature is stuck in the past.

Enterprise search is due for an upgrade.  The amount of enterprise data has increased, but the underlying information management system remains the same.  Structured data is easy to make comply with the standard information management system, however, it is the unstructured data that holds the most valuable information.  Unstructured information is hard to categorize, but natural language processing is being used to add context.  Ontotext combined natural language processing with a graph database, allowing the content indexing to make more nuanced decisions.

We need to level up the basic keyword searching to something more in-depth:

“Search for most organisations is limited: enterprises are forced to play ‘keyword bingo’, rephrasing their question multiple times until they land on what gets them to their answer. The technologies we’ve been exploring can alleviate this problem by not stopping at capturing the keywords, but by capturing the meaning behind the keywords, labeling the keywords into different categories, entities or types, and linking them together and inferring new relationships.”

In other words, enterprise search needs the addition of semantic search in order to add context to the keywords.  A basic keyword search returns every result that matches the keyword phrase, but a context-driven search actually adds intuition behind the keyword phrases.  This is really not anything new when it comes to enterprise or any kind of search.  Semantic search is context-driven search.


Whitney Grace,  July 4, 2016
Sponsored by, publisher of the CyberOSINT monograph

Voyager Search: New Release Available

July 1, 2016

Voyager Search is vendor of search and retrieval based on Lucene. I was not familiar with the company until I read “Voyager Search Improves Search Capabilities and Overall Usability With More Than 150 Updates to Its Version 1.9.8.” According to the write up:

In the new version, Voyager makes it easier to configure content in Navigo, its modern web app, extends its spatial content search, and improves the usability of its Navigo processing tools. Managing content in Navigo can now be done through the new personalized ‘My Voyager’ customization page, which allows customers to share saved searches and update display configurations through a drag and drop interface.

One point in the write up I noted was this statement: “An improved ?spatial search interface now includes the ability to draw and buffer points, lines and polygons.” The idea is that geo-spatial operations appear to be supported by the system.

I also highlighted this comment:

Voyager Search is a leading global provider of geospatial, enterprise search tools that connect, find and deliver more than 1,800 different file formats.

In my experience, support for more than 1,000 file formats suggests a large number of conversion widgets.

The company bills itself as the “only install and go Solr/Lucene search engine.” Information about the company is available at this link. A demo is available here.

Stephen E Arnold, July 1, 2016

Enterprise Search Vendors: A Partial List

June 24, 2016

I spoke with a confused and unbudgeted worker bee at a giant outfit this weekend. The stellar professional was involved in figuring out what to do about enterprise search. The story is one I have heard many times in the last 40 years. The system doesn’t meet the needs of the users. The system is over budget. The system does not index in real time. Yadda yadda yadda.

The big question was, “What are the enterprise search vendors offering a system which actually works, does not experience downtime, cost overruns, and user outrage. Note that this is not the word “outage.” The word is “outrage”.

I don’t know of such a system. As a helpful 72 year old, I rattled off a list of vendors who purport to offer Big Data capable, next generation semantic-linguistic-NLP systems. True to form, I repeated the list twice. I thought he would cry.

For those of you who want to know the vendors I plucked from my list of outfits in the search and content processing game, I reproduce the list. If you want upsides, downsides, license fees, gotchas, and other assorted details, I will provide the information. But since you are not likely to buy me dinner this evening, you will have to pay for my thoughts.

Here’s the selected list. Reader, start your browser:

  • Attivio
  • Coveo
  • dtSearch
  • Elasticsearch (Lucene)
  • Fabasoft Mindbreeze
  • IBM Omnifind
  • IHS Goldfire
  • Lookeen
  • Lucid Works (Solr)
  • Marklogic
  • Maxxcat
  • Polyspot
  • Sinequa
  • Solcara
  • Squiz Funnelback
  • Thunderstone
  • X1
  • Yippy

There are quite a few outfits whose systems do search like Palantir, but I trimmed the list to companies for my worried pal.

What’s interesting is that most of these outfits explain that their systems are much, much more than search and retrieval. Believe it or not as Mr. Ripley used to say.

Factoid: Most of these outfits have been around for quite a few years. Only Elasticsearch has managed to become a “brand” in the search space. What happened to Autonomy, Convera, Endeca, Fast Search & Transfer, and Verity since I wrote the first three editions of the Enterprise Search Report between 2003 and 2007? Ugly for some.

Search is a tough problem and has yet to deliver what users expect. Remember Google killed its search appliance. Ads are a better business because they spell money for Alphabet.

Stephen E Arnold, June 24, 2016

The Paradox of Marketing and Anonymity

June 22, 2016

While Dark Web users understand the perks of anonymity, especially for those those involved with illicit activity, consistency in maintaining that anonymity appears to be challenging. published an article that showcases how one drug dealer revealed his identity while trying to promote his brand: Drug dealer busted after trying to trademark his dark web username. David Ryan Burchard of Merced, California reportedly made $1.25 million by selling marijuana and cocaine on the Dark Web before he trademarked the username he used to sell drugs, “caliconnect”. The article summarizes,

“He started out on Silk Road and moved on to other shady marketplaces in the wake of its highly-publicized shutdown. Burchard wound up on Homeland Security’s list of top sellers, though they were having trouble establishing a rock-solid connection between him and his online persona. They knew that Burchard was accumulating a large Bitcoin stash and that there didn’t appear to be a legitimate source. Then, finally, investigators got the break they were looking for. It seems that Burchard decided that his personal brand was worth protecting, and he filed paperwork to trademark “caliconnect.””

Whether this points to the proclivity of human nature to self-promote or the egoism of one person in a specific situation, it seems that all covering the story are drawing attention to this foiling move as a preventable mistake on Burchard’s part. Look no farther than the title of a recent Motherboard article: Pro-Tip: If You’re a Suspected Dark Web Drug Dealer, Don’t Trademark Your #Brand. The nature of promotions and marketing on the Dark Web will be an interesting area to see unfold.


Megan Feil, June 22, 2016

Sponsored by, publisher of the CyberOSINT monograph

Enterprise Search Vendor Sinequa Partners with MapR

June 8, 2016

In the world of enterprise search and analytics, everyone wants in on the clients who have flocked to Hadoop for data storage. Virtual Strategy shared an article announcing Sinequa Collaborates With MapR to Power Real-Time Big Data Search and Analytics on Hadoop. A firm specializing in big data, Sinequa, has become certified with the MapR Converged Data Platform. The interoperation of Sinequa’s solutions with MapR will enable actionable information to be gleaned from data stored in Hadoop. We learned,

“By leveraging advanced natural language processing along with universal structured and unstructured data indexing, Sinequa’s platform enables customers to embark on ambitious Big Data projects, achieve critical in-depth content analytics and establish an extremely agile development environment for Search Based Applications (SBA). Global enterprises, including Airbus, AstraZeneca, Atos, Biogen, ENGIE, Total and Siemens have all trusted Sinequa for the guidance and collaboration to harness Big Data to find relevant insight to move business forward.”

Beyond all the enterprise search jargon in this article, the collaboration between Sinequa and MapR appears to offer an upgraded service to customers. As we all know at this point, unstructured data indexing is key to data intake. However, when it comes to output, technological solutions that can support informed business decisions will be unparalleled.


Megan Feil, June 8, 2016

Sponsored by, publisher of the CyberOSINT monograph


Mindbreeze Breaks into Slovak Big Data Market Through Partnership with Medialife

April 18, 2016

The article titled Mindbreeze and MEDIALIFE Launch Strategic Partnership on BusinessWire discusses what the merger means for the Slovak and Czech Republic enterprise search market. MediaLife emphasizes its concentrated approach to document management systems for Slovak customers in need of large systems for the management, processing, and storage of documents. The article details,

“Based on this partnership, we provide our customers innovative solutions for fast access to corporate data, filtering of relevant information, data extraction and their use in automated sorting (classification)… Powerful enterprise search systems for businesses must recognize relationships among different types of information and be able to link them accordingly. Mindbreeze InSpire Appliance is easy to use, has a high scalability and shows the user only the information which he or she is authorized to view.”

Daniel Fallmann, founder and CEO of Mindbreeze, complimented himself on his selection of a partner in MediaLife and licked his chops at the prospect of the new Eastern European client base opened to Mindbreeze through the partnership. Other Mindbreeze partners exist in Italy, the UK, Germany, Mexico, Canada, and the USA, as the company advances its mission to supply enterprise search appliances as well as big data and knowledge management technologies.


Chelsea Kerwin, April 18, 2016

Sponsored by, publisher of the CyberOSINT monograph

RAVN ACE Can Help Financial Institutions with Regulatory Compliance

March 31, 2016

Increased regulations in the financial field call for tools that can gather certain information faster and more thoroughly. Bobsguide points to a solution in, “RAVN Systems Releases RAVN ACE for Automated Data Extraction of ISDA Documents Using Artificial Intelligence.” For those who are unaware, ISDA stands for International Swaps and Derivatives Association, and a CSA is a Credit Support Annex. The press release informs us:

“RAVN’s ground-breaking technology, RAVN ACE, joins elements of Artificial Intelligence and information processing to deliver a platform that can read, interpret, extract and summarise content held within ISDA CSAs and other legal documents. It converts unstructured data into structured output, in a fraction of the time it takes a human – and with a higher degree of accuracy. RAVN ACE can extract the structure of the agreement, the clauses and sub-clauses, which can be very useful for subsequent re-negotiation purposes. It then further extracts the key definitions from the contract, including collateral data from tabular formats within the credit support annexes. All this data is made available for input to contract or collateral management and margining systems or can simply be provided as an Excel or XML output for analysis. AVN ACE also provides an in-context review and preview of the extracted terms to allow reviewing teams to further validate the data in the context of the original agreement.”

The write-up tells us the platform can identify high-credit-risk relationships and detail the work required to repaper those accounts (that is, to re-draft, re-sign, and re-process paperwork). It also notes that even organizations that have a handle on their contracts can benefit, because the platform can compare terms in actual documents with those in that have been manually abstracted.

Based in London, enterprise search firm RAVN tailors its solutions to the needs of each industry it serves. The company was founded in 2011.


Cynthia Murrell, March 31, 2016

Sponsored by, publisher of the CyberOSINT monograph


Allegedly Secretive Palantir Technologies Getting Chatty?

March 22, 2016

Many of the articles I read about Palantir Technologies describe the company as secretive. I am not sure that is 100 percent accurate. The company has videos on YouTube for goodness sake.

I noted “How Palantir Uses Big Data to Find Missing Kids.” This article came hard on the heels of “Is Morgan Stanley Wrong about Big Palantir Valuation Markdown?”

The missing kids story emphasizes Palantir’s social “good” work. I noted this passage:

Lucky for Palantir, big data challenges are just as common in the nonprofit world as in the for-profit sector. Recently, the company, which started out partnering with the U.S. intelligence and defense communities in antiterrorism efforts, has turned its attention to one of the biggest current problems: The Syrian civil war and subsequent refugee crisis, via a collaboration with The Carter Center. “We’re a company that focuses on the world’s hardest problems,” says Karin Knox, head of Palantir’s philanthropy engineering team. “Right now we probably have a hand in all of them.”


Stephen E Arnold, March 22, 2016

Next Page »