Enterprise Search Needs To Do Its Core Function

December 24, 2020

Enterprise search is still one of those buzzwords tossed around by tech experts to make themselves sound smart, but with good reason. Inside Big Data discusses enterprise search’s future in the article: “Enterprise Search In The Age Of AI.” Enterprise search used to be one of the most important buzzwords in the tech industry. It meant a more intuitive and customizable way to search data and actually find desired information.

Enterprise search evolved into more advanced facets of enterprise systems and it appears with AI-powered big data systems it might not be relevant anymore. The article, however, states enterprise search is still important. Here is the extraordinary insight:

“My opinion is that, if Enterprise Search is to regain a significant share of the business tools market, it can only do so by refocusing on its core value proposition: search. When it comes to the public web, we might feel that there’s little room left for improvement in the search space, but I believe that there’s a lot more ground to explore on the enterprise side of things. Part of the reason for claiming this comes from the insight that our needs seem to almost universally follow Pareto’s law, at least when it comes to the public web. For the most part, we keep searching for the same things by posing similar queries and land on the same websites. The fact that the corpus of all web documents is immense presents more of a problem than an opportunity, as most of it is irrelevant to us. Google understands this well, which is why, over the last decade, it hasn’t been investing in expanding its search experience, but instead slowly reducing it to merely providing the “one true answer,” personalized for each user.”

Why does this need to be explained? With all the powerful AI systems users still need to locate information. Users want precise, quick, and relevant search tools that return the required data. How much simpler can it get? Why not develop an AI-powered enterprise search tool? I know the answer. Too difficult. Marketing hype and consulting baloney are much easier.

Whitney Grace, December 24, 2020

Sinequa: A Logical Leap

December 21, 2020

The French have contributed significantly to logic. One may not agree with the precepts of Peter Abelard, the enlightened René Descartes, or the mathiness of Jean-Yves Girard. A rational observer of the disciplines of search and retrieval may want to inspect the reasoning of “How Apple’s Pending Search Engine Hints at a Rise in Enterprise Search.”

The jumping off point for this essay is the vaporware emitted by heavy breathing thumb typers that Apple will roll out a Web search engine. The idea is an interesting one, but, as I write this, Apple is busy with a number of tasks. But vaporware is a proven fungible among those engaged in enterprise search. The idea of finding just the information one needs when working in a dynamic company is a bit like looking for the end of a rainbow. One can see it; therefore, there must be an end. Even better, mothers have informed their precocious progeny that there is a pot of gold at the terminus.

What can one do with the assumption that an Apple Web search engine will manifest itself?

The answer is probably one which will set a number of French logicians spinning in their graves.

According to the write up from an “expert” at the French enterprise search firm Sinequa:

So, if Apple is spending (most likely) billions of dollars recreating a tool that effortlessly finds us the global sum of human knowledge, then isn’t it about time we improve the tools that knowledge workers have to do their jobs?

That’s quite a leap, particularly for a discipline which dates from the pre-STAIRS era. But from a company founded in 2002, the leap is nothing out of the ordinary.

But enterprise search is a big job; for example:

The complication is that enterprise data is more heterogeneous in nature than internet data, which is homogeneous by comparison. As a result, enterprise data tends to reside in silos, so if we need to find a document, we can narrow down where we look to a couple of places – for instance, in our email or on a particular SharePoint. However. further complication arises when we don’t know where to look – or worse still, we don’t know what we’re looking for. A siloed approach works fairly well but at some point, we start to lose track of where to look. According to recent Sinequa research, knowledge workers currently have to access an average of around six different systems when looking for information – that’s potentially six individual searches you need to make to find something.

And why has enterprise search as a discipline failed to deliver exactly what an employee needs to do his or her job at a particular point in time?

That’s a good question which the logical confection does not address. No problem. Vendors of enterprise search have dodged the question for more than half a century.

Here’s how the essary nails down its stunning analysis:

It’s only a matter of time before enterprise search reaches a similar tipping point. There will be a time when the silos become too many or the time taken to search them becomes too great. The question is whether the reason for enterprise to take search seriously is because a lack of search is seen as an existential threat, or an opportunity to differentiate.

Okay, 50 years and counting.

Do you hear that buzzing sound? I surmise that it is René Descartes trying to contact Jacque Ellul to discuss how French logic fell off the wine cart.

My hunch is that Messrs. Descartes and Ellul will realize that providing access to information in response to a particular business need is a digital version of running toward the end of the rainbow. Some exercise, d’accord, but the journey may end in disappointment.

Par for the course for a company whose product pricing begins at $0.01 if Sourceforge is to be believed. Yep, $0.01. Logical? Sure. It’s marketing consistent with the hundreds of companies which have flogged enterprise search for decades.

Rainbows. Pots of gold. Yep.

Stephen E Arnold, December 20, 2020

Can Enterprise Search Improve Governance? Security?

December 10, 2020

I thought about this question after I read “BA Insight Delivers Internet-Like Search for Egnyte Customers.” The write up is a content marketing item with some jazzy jargon; for example:

AI-driven enterprise search
Connector-driven software portfolio
Intelligent recommendations
Machine learning
Natural Language
User behavior
User productivity.

What is, I ask myself, AI driven enterprise search? I don’t know what AI means, and I still have not figured out what “enterprise search” means after writing The New Landscape of Search and a number of other books and monographs on this subject.

a search revolution

My recollection is that Attivio has been wrapping layers of functionality around Lucene, but maybe my recollection is faulty. I do recall the interesting business intelligence application which pivoted on baseball data.

But that was in 2007 when former Fast Search & Transfer professionals pivoted from ESP (enterprise search platform) to Attivio. Attivio’s founder told me “attivio” was an Italian-like word which implies forward movement. Today a jaunty MBA would call this “kinetic branding.” Whatever.

The focus of the marketing collateral is a deal with an outfit involved in resolving content chaos and delivering information cohesion. I am not exactly sure what this means, but here is the description offered by Attivio’s partner / licensee Egnyte:

Your files contain your most critical data, but, more than ever, they’re sprawled across disconnected systems, devices, locations, and apps. Egnyte enables you to gain visibility and control across a hybrid content stack while also improving employee experience and driving business advantage.

Egnyte is in the compliance business, the data governance business, the risk reduction business, and the cyber security business. But the key value proposition seems to be:

Unified multi cloud content search


Egnyte is the only all-in-one platform that combines data-centric security and governance, AI for real-time and predictive insights, and the flexibility to connect with the content sources and applications your business users know and love – on any device, anywhere, without friction.

The words “only” and “all” are blinking yellow lights to me. Categorical affirmatives are tough for me to accept. These types of “make a case” statements are, however, popular with the millennials and thumbtypers in marketing departments.

I took a look at one of the buzzwords used to describe the Egnyte system powered in part by Attivio and learned that these are the functions the platform delivers:

  • Breach reporting
  • Classification policies (for GDPR compliance, CCPA, HIPAA, etc.)
  • Content lifecycle management
  • Content safeguards
  • Custom keyword classification
  • Data subject access requests
  • Issue detection and alerting
  • Insider threat and ransomware detection
  • Multi-repository governance .

The combination of cyber security and search is interesting. However, the cyber security sector seems to have some explaining to do. Cyber crime particularly insider threats and phishing are experiencing a bad actor gold rush. Adding to the woe are reports of a cyber security firm’s inability to prevent a crippling cyber attack; specifically, “U.S. Cybersecurity Firm FireEye Discloses Breach, Theft of Hacking Tools.” What this means is that cyber security super stars are not secure. Thus, questions about a firm which is a relative newcomer to cyber security equipped with “only” and “all” assertions may face some interesting questions about the security of Egnyte and Attivio systems. I know I would ask some questions and carefully consider the responses. Insider threats and phishing are topics of interest to me.

Several observations:

  • Search vendors are indeed working overtime to find markets for what is a downloadable utility function
  • Partnerships are one way to generate sales leads and revenue from technical services and training
  • Organizations, regardless of type, face significant findability, security, and regulatory challenges.

Interesting play, but “only” and “all” are big concepts, particularly when Amazon AWS, to cite one example, offers technology to deliver a similar solution directly or via its extensive partner network.

Stephen E Arnold, December 10, 2020

OpenText: A Cyber Graphic Points to Its Future

December 9, 2020

When I think of OpenText, here’s what flashes through my find:

  • BRS (Livelink)
  • Fulcrum
  • Hummingbird
  • InQuery
  • nQuire
  • Recommind
  • SGML search.

My recollection is that there may be a Web search engine, a search system for law firm email, and a database from Information Dimensions. I cannot recall, but the message seems clear:

OpenText is a company deeply involved in search and retrieval.

When I read “Mark J. Barrenechea Keynote: The Future of Cyber Resilience”, I realized that I am thinking about the “old” OpenText. What do I mean “old.” That “old” OpenText was an enterprise search vendor wrapped in search-based applications like eDiscovery and content management.

Not any more.

Here’s the new OpenText:


Yep, the Rona, cyber security, health, and “agility, flexibility, and trust.” Who knew? Ice skaters call this a counter turn.

Stephen E Arnold, December 9, 2020

LinkedIn Reveals Disinterest in Search and Retrieval

December 7, 2020

LinkedIn does quite a bit of info-ramming when either one of my team or I log in to the Microsoft social media system. Here’s the graphic displayed when we were checking to see if our automated posts from this blog were appearing:


The eight “cards” tell me about LinkedIn Groups in which I may have an interest. The little boxes reveal a small amount of information about the content access topics in which the unemployed, the consultants cruising for gigs, and the self-promoters have an interest.

The table below presents some of the data in this graphic in tabular form. No, I did not use Excel 365 connected to Teams. Sorry, Mother Microsoft. I still recall Bob. (You remember Bob, don’t you, gentle reader?)

LinkedIn Group Name Number of LinkedIn Followers
Data Science Central 374,694
Association for Intelligent Information Management 27,861
Scientific, Technical, Medical Publishing Group 12,253
Data & Text Analytics Professionals 12,503
Special Libraries Asso. 15,191
Semantic Web 15,098
Semantic Technologies Group 3,772
Enterprise Search & Discovery 624

LinkedIn does not reveal the hard count for its total number of registered humans, the number of human users who log on to the system once per week, or the number of paying human users. Hence, figuring out the percentage of LinkedIn members interested in these groups is a difficult task akin to predicting the share price of Palantir Technologies on January 1, 2022.

An outfit called Oberlo reports with confidence that LinkedIn has 660 million users. Close enough for horseshoes.

The table below presents the percentage of these LinkedIn users interested in each the groups suggested to me:

LinkedIn Group Name Percentage of LinkedIn Members Interested in These Topics
Data Science Central 0.0567718182%
Association for Intelligent Information Management 0.0042213636%
Scientific, Technical, Media Publishing Group 0.0018565152%
Data & Text Analytics Professionals 0.0018943939%
Special Libraries Asso. 0.0023016667%
Semantic Web 0.0022875758%
Semantic Technologies Group 0.0005715152%
Enterprise Search & Discovery 0.0000945455%

Eyeballing my math, surely there are errors. How can such a compelling subject as Enterprise Search & Discovery appeal to 0.0000945455 percent of the LinkedIn members.

What’s interesting is that an astounding 0.0042213636 percent of the LinkedIn membership are pulled to the Association for Intelligent Information Management.

And the semantic topics. Magnetic indeed.

What’s the analysis suggest? Anyone looking for a job in enterprise search may want to spin their expertise a different way.

Stephen E Arnold, December 7, 2020

Fess Up: Elasticsearch Is a Threat to Proprietary Search and Retrieval

December 1, 2020

We have been poking around the world of Elasticsearch-based information retrieval systems. There are some interesting plays; that is, entrepreneurs use Elasticsearch (Shay Banon’s open source system) as a platform.

Fess provides Elasticsearch for personal use, although one can employ the system for an organization. The system is:

Fess is Elasticsearch-based search server, but knowledge/experience about Elasticsearch is NOT needed because of All-in-One Enterprise Search Server. Fess provides Administration GUI to configure the system on your browser. Fess also contains a crawler, which can crawl documents on Web/File System/DB and support many file formats, such as MS Office, pdf and zip.

Fess became available in 2019. The CEO of the N2SM, Inc. company is Masaharu Manabe. Demonstrations and links to the code are available at this link. A fee-based version of the software is provided under the name N2 Search. More information about the for fee version is here. A discussion forum is available at this link.

Observation: The Elasticsearch ecosystem is providing alternatives to the proprietary search systems. Beyond Search thinks that some vendors of proprietary search software are likely to be see Elasticsearch as digital kudzu. Good news or bad news for the Coveos, Fabasofts, and Microsoft Fast type folks? That’s a question some of these types of vendors stakeholders may be asking as they beat the bushes for deals in customer service, chatbots, business intelligence, and smart software services.

Stephen E Arnold, December 1, 2020

OpenText: The New Equilibrium. Think How? What?

November 27, 2020

I read a weird content marketing, predicting the future article called “OpenText CEO: Organizations Must Rethink Approach to Business, Technology.” OpenText is interesting for a number of reasons. It is a Canadian outfit. The company owns more search and retrieval systems than one can remember. Fulcrum, BRS, Dr. Tim Bray’s SGML search, and others. There are content management systems which once shipped with an Autonomy stub. I dimly recall that OpenText was into Hummingbird and maybe Information Dimensions too.


Now a company which ostensibly sells content management is suggesting that there is a “new equilibrium” on deck for 2021 is fascinating. I am not sure about the old equilibrium which seemed slightly crazy to me, but, hey, I am just reading what a Canadian outfit sees coming. I would prefer that the said Canadian outfit invest in enhancing the technologies it has, but I am flawed. That’s probably part of the old equilibrium.

The write up reports that the new equilibrium is part of the great rethink:

We are going through the fastest technology disruption in the history of the world. The shift to Industry 4.0 had already resulted in a huge increase in connectivity, automation, AI, and computing power. The response to COVID-19 has accelerated this process and forever changed the business environment.

Okay. How is that working out?

The pandemic has also forced a huge shift in time-to-value. Five years ago, companies would wait two years to deploy an ERP system. Now, the expectation is that you will have a solution in weeks, or even days.

Ah, ha. New system deployments have to be done faster. Is this an insight? I thought James Gleick’s Faster explained this process 20 years ago. That seems as if the OpenText insight has moved slowly through the great Canadian intellectual winter. Where is the management guru who lived on a sailboat in Canada when one needs him?

The new equilibrium for OpenText sounds a whole lot like Amazon Web services or the Microsoft Azure “blue” thing. I noted:

These cloud solutions enable businesses to re-invent processes and seize emerging opportunities faster, easier, and more cost-effectively. Developer Cloud is particularly exciting. It will provide a platform for developers to create custom solutions to manage information, and will help build a community of innovators working together to create better enterprise applications.

From my point of view, this content marketing fluff has not changed my perception of OpenText which is:

OpenText software applications manage content or unstructured data for large companies, government agencies, and professional service firms.

Services, new equilibrium, rethink. Got it. Enterprise search. Jargon.

Stephen E Arnold, November 27, 2020

Elastic: The Add Value to Open Source Outfit Bounces Along

November 25, 2020

Elastic Adds New Features to Enterprise Search, Observability, and Security Solutions

Search and data-management firm Elastic has some new features to crow about. BusinessWire posts “Elastic Announces Innovations Across its Solutions to Optimize Search and Enhance Performance and Monitoring Capabilities.” One new tool is Kibana Lens, a visual data analysis tool with a drag-and-drop interface described as intuitive. There is also a beta launch of the searchable snapshots, an efficient way to manage data storage tiers with searchable snapshots. The press release tells us:

“New expanded Elastic Observability features, including user experience monitoring and synthetics, give developers new tools to test, measure, and optimize end-user website experiences. The launch of a new dedicated User Experience app in Kibana provides Elastic customers with an enhanced view and understanding of how end users experience their websites. In addition, Elastic customers can use the new user experience monitoring feature to review Core Web Vitals, helping website developers interpret digital experience signals. Elastic users can also leverage a dev preview release of synthetic monitoring in Elastic Uptime to simulate complex user flows, measure performance, and optimize new interaction paths without impact to a website’s end users. The combination of these two new observability features gives Elastic customers a deeper view of their customers’ digital experience before and after a site update is deployed.”

See the write-up for its list of specific updates and features to Elastic’s Enterprise Search, Observability, Security, Stack, and Cloud products. Built around open source software, the company prides itself on its user-friendly products that have been adopted by major organizations around the world, from Cisco to Verizon. Elastic began as Elasticsearch Inc. in 2012, simplified its name in 2015, and went public in 2018. The company is based in Mountain View, California, and maintains offices around the world.

Cynthia Murrell, November 25, 2020

Enterprise Search: Still Crazy after All These Years

November 20, 2020

This is not old wine in new bottles. This is wine in those weird clay jars with the nifty moniker “amphora” filled with Oak Leaf Vineyards Sauvignon Blanc White Wine. Cough, cough.

CMS Wire gets it correct when it declares, “Scanning and Selecting Enterprise Search Results: Not as Easy as it Looks.” The article doesn’t even approach the formation of a query—finding the right wording then tweaking filters and facets to produce a manageable list. Here we are only looking at the next step. Though the task seems simple on its surface—scan a list of results and select the most relevant ones—writer Martin White explains why it is not so straightforward.

First is scanning results. Users’ perceptual speed differs, so for some folks (like those who are dyslexic, for example) the process can be so tedious as to make searching pointless. White tells us that inconvenient fact is often overlooked in the discussion of search functionality. Also under-considered is the issue of snippet length. A bit of research has been performed, but it involved web pages, which are themselves more easily scanned and assessed than content found in enterprise databases. Those documents are often several hundred pages long, so ranking algorithms often have trouble picking out a helpful snippet. Some platforms serve up a text sequence that contains the query term, others create computer-generated summaries of documents, and others reproduce the first few lines of each document. Each of these approaches is imperfect. Still others produce a thumbnail of a whole page that contains the search term, and that probably helps many users. However, there are accessibility problems with that method.

White concludes:

“We know from recent research that people may make different decisions from the information they perceive initially as relevant based on their expertise. Equally, most search metrics are based around the notional relevance of the results being presented in response to a query. If the true value of relevance cannot be well judged from the snippet, that calls any metrics associated with query performance (especially precision) into question.

“There are no easy solutions to the issues raised in this column. In the quest for achieving an acceptable user experience the points to consider are:

*Are the techniques used by the search application to create snippets appropriate to the types of content being searched?

*Can the format of snippets be customized by the user?

*How easy is it to scan and assess results from a federated search?

“In the final analysis, it doesn’t matter how sophisticated the search technology is (in terms of semantic analysis, etc.). What matters is if the user can make an informed judgment of which piece of content in the results serves their information requirement, reinforces their trust in the application and maintains the highest possible level of overall search satisfaction.”

Sigh. It seems the more developers work on enterprise search, the more complicated it is to effectively operate. The field has been at it for 50 years, and is still trying to deliver something useful. Still crazy after all these years too.

PS. Our esteemed check writer (Stephen E Arnold) wrote a book about enterprise search with the author of the source document. No wonder this essay seemed weirdly familiar. I had to proofread what turned out to be prose that made the Oak Leaf stuff welcome at the end of an editing day. Cough, cough, eeep. 

Cynthia Murrell, November 20, 2020

Survey Says Data Governance Is Important. But What Is Data Governance?

November 20, 2020

Here’s what the Google says governance means: The action or manner of governing. Okay, but what exactly is governing. Google says: Having authority to conduct the policy, actions, and affairs of a state, organization, or people.

Okay, now let’s add the magic word “data,” which is a plural, not a single thing. (That’s what datum means, right?)

Google says: Facts and statistics collected together for reference or analysis.

Let’s put the information together, shall we?

An organization uses authority to conduct policy, actions and affairs to deal with facts and statistics for reference or analysis.

Why care? The answer is found in “Businesses Positive about Data Governance but Still Struggle with Privacy Concerns.”

Okay, now we have linked dealing with information and privacy. This is getting interesting or is it? I go with the “not interesting,” but let’s plod forward in the write up.

A vendor of search and retrieval software sponsored a research project conducted by Standard & Poor 451 Research. Note: That report is titled “Pathfinder Report Market Intelligence: Information Driven Compliance and Insight. Two Sides of the Same Coin.” I am not sure about the “coin” metaphor, compliance, insight, and pathfinding. But no one ever accused me of understanding mid-tier consulting firms, sponsored research, and 18 year old vendors of proprietary search and retrieval software.

The 451 outfit tapped its pool of “survey responders” and discovered:

72 percent of enterprises believe data governance is an enabler of business value rather than a cost center.

Okay, that’s a lot of enterprises, assuming the sample was statistically valid, the questions not shaped, and the data analysis of the survey responses was performed on the up and up. But sponsored research is different from the often wonky academic research churned out by professors and work-from-home students. That’s better, right? 

I learned:

  • One in four organizations have more than 50 distinct data silos
  • 37 per cent of respondents say having relevant information automatically displayed, when the team needs it, would benefit them the most in the pursuit of automation.
  • Budget, privacy issues, and expertise are barriers. 

How does one deal with data silos, which I assume is “governance”? How does one deal with security? Privacy? How does an enterprise search company cope with the assorted sixes and sevens of data in an organization; for example, tweets, encrypted messages, images, geospatial data, videos, and information which must be kept isolated from the grubby “let’s federate information” crowd? (Why must some data be isolated? Find an attorney. Ask her what happens if information in a legal matter is out of her span of control.)

What’s the net net of the mid-tier consulting outfit’s report? Here it is:

Success requires alignment of business objectives by looking for common-denominator requirements across business units.

Let me be clear: Enterprise search is not the solution to problems with an “authority to conduct policy, actions and affairs to deal with facts and statistics for reference or analysis.”

Enterprise search is information retrieval, data governance no matter how much a marketer wishes it were. Enterprise search vendors have been struggling for relevance because Lucene/Solr are good enough and users want information to address right now business issues. Library style lists of stuff to read or look up may not ring the chimes of a thumb typing user.

Want the full report? Go here. Please, keep marketing and governance separate. Statistics 101 offered some useful guidelines. Some, however, did not pay attention. You will have to register. Marketing is still marketing.

Stephen E Arnold, November 20, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta