The AWS Bulldozer and Elasticsearch: Can the Rubber Trees Grow Back?
January 22, 2021
In 1955 or 1956, I lived in Campinas, Brazil. My father worked from RG LeTourneau. He had the delightful job of setting up a factory to produce what were then called sheep foot rollers. Most people are not aware of the function of a sheep’s foot roller. Let me explain.
Hoot a D9 or other comparable bulldozer to two or more sheep foot rollers. Drive the bulldozer, scraper, or other heavy duty machine through a grassy field, a jungle or grassland. Crush and smash the trees, plants, and animals. What’s in the wake of the snorting and roaring yellow beast is a surface almost ready for paving. That’s right. The sheep foot rollers made the Trans-Amazon highway a reality.
What did the fleets of earth moving machinery do to the Hevea brasiliensis, a species of rubberwood. Well, in the case of highway deforestation, the elastic plants did not fare particularly well.
What does this slice of my life have to do with search, retrieval, log file analysis, information access, and other content related activities?
“Stepping Up for a Truly Open Source Elasticsearch” reminded me of the impact of the bulldozers and the sheep foot roller combos. The write up explains:
We launched Open Distro for Elasticsearch in 2019 to provide customers and developers with a fully featured Elasticsearch distribution that provides all of the freedoms of ALv2-licensed software. Open Distro for Elasticsearch is a 100% open source distribution that delivers functionality practically every Elasticsearch user or developer needs, including support for network encryption and access controls. In building Open Distro, we followed the recommended open source development practice of “upstream first.”
Who is the “we” driving what I think of as a digital bulldozer? Why none other than Amazon.
I wrote about Elastic search’s difficult decision to try to stave off the building of an information superhighway directly over the Elastic NV buildings in Amsterdam. You can find that essay in “Enterprise Search: Flexible and Stretchy. Er, No.”
I think my observation that it was too late for Elastic NV. Perhaps the company can find a way to avoid the Bezos bulldozer. The sentiments about the virtues of open source software echo through the Amazon blog post and the Elastic NV explanation of its decision to be a different flavor of open source goodness.
Put that handwaving aside.
The function of the bulldozer and the sheep foot roller is to build a new trail. That trail leads to Amazon AWS revenues, service offerings, and integrated functionality.
Vrrooom. Too bad about those hyacinth macaws. My father and Mr. LeTourneau were not environmentalists. Neither was particularly elastic either. Both loved the results of big yellow machines dragging sheep foot rollers across the virgin landscape.
There’s a lesson here. The Trans-Amazon highway is visible from the international space station. The rubber trees and other trivialities are not.
Stephen E Arnold, January 22, 2021
Enterprise Search: Flexible and Stretchy. Er, No.
January 21, 2021
Enterprise search, the utility service, thrills users and information technology professionals alike. There are quite a few search and retrieval vendors chasing revenue. Frankly I have given up trying to keep track of outfits like Luigi’s Box, Yext (yes, enterprise search!), and quite a few repackagers of Lucene; e.g., IBM, Attivio, Voyager Search, and more. There are some proprietary outfits as well.
Then there is the Compass Search sibling Elastic and its Elasticsearch. Open source search makes a great deal of sense to:
- Companies wanting a no cost or low cost way to provide search and retrieval-type functionality to an application
- Penny pinchers who want “the community” to fix bugs so that cash is freed up to lease fancy cars, receive bonuses, and focus on more important software features which can be offered for a fee and a license handcuff
- Competitors who want to provide a familiar environment to those with cash to spend and wave the magic wand of open source in front of young believers who think proprietary software is a crime against humanity.
The history of Elasticsearch and Amazon reaches back to the era when Lucid Works (né Lucid Imagination) lost some staff to Amazon’s Burlingame, California, office. That was the bell which sounded when the Bezos bulldozer decided A9 was not enough. Sure, A9 works but the folks from the Lucene/Solr outfit would map the route from A9 to a more open, folksy world of open source search.
The open source version of Lucene was the beating heart of Elastic, the now public company.
Then Amazon does what Amazon does: The company shifted the bulldozer into gear and went for open source search developers who could seamlessly (sort of) move into the newly blazed path to AWS. Once inside, the fruits of the thousand plus services, features, and functions were just a click away. Policeware vendors, start ups, and some big outfits followed the Bezos bulldozer. The updated IBM slogan reads, “Nobody gets fired for buying AWS.”
Elastic was upset.
“Amazon: NOT OK – Why We Had to Change Elastic Licensing” picks up this story and explains where Elastic fits into the world crafted by the Bezos bulldozer.
The write up explains:
Our license change is aimed at preventing companies from taking our Elasticsearch and Kibana products and providing them directly as a service without collaborating with us.
Elastic’s essay notes:
We think that Amazon’s behavior is inconsistent with the norms and values that are especially important in the open source ecosystem. Our hope is to take our presence in the market and use it to stand up to this now so others don’t face these same issues in the future.
The essay concludes:
I believe in the core values of the Open Source Community: transparency, collaboration, openness. Building great products to the benefit of users across the world. Amazing things have been built and will continue to be built using Elasticsearch and Kibana. And to be clear, this change most likely has zero effect on you, our users. And no effect on our customers that engage with us either in cloud or on premises.
Several observations:
- Commercial behemoths like Amazon use open source the way my neighbor burns firewood, old carpets, and newspapers. The goal is to optimize available cash.
- Amazon’s move into Elastic’s territory began more than five years ago. Amazon does kill off loser products like health and food delivery but it keeps others in tall cotton when it pays off.
- Those completing [a] Amazon certification, [b] partner indoctrination, or [c] inputs from free or low cost Amazon training arrive ready to do the search thing Amazon’s way.
Net net: Beyond Search understands Elastic’s anguish and actions. Perhaps the license shift and the assumptions about open source are unlikely to stand up to the Bezos bulldozer? Open source Elasticsearch is a bargain. It may be tough to compete with free plus discounts for AWS goodies and other Amazon benefits. Legal or illegal, fair or unfair, open source or closed source — the bulldozer grinds forward.
Stephen E Arnold, January 21, 2021
Enterprise Search: Blasting Away at Feet, Walls, and Partners
January 18, 2021
I read a very good write up called “Is Elasticsearch No Longer Open Source Software?” The write up contains a helpful summary of the history of Elastic and its Lucene-based search solution. Plus the inhospitable territory of open source licensing gets a review as well. To boil down the write up does not do it justice, so navigate to the source document and read it first hand.
I noted a couple of passages which I found suggestive.
First, here’s a comment which strikes me as relevant to the Bezos bulldozer’s approach to low or no cost, high utility software:
if you want to provide Elasticsearch on a SaaS basis, you have to release any code that you use to do this: in Amazon’s case this could mean all the management layers that go into providing Elasticsearch on Amazon Web Services (AWS), so I doubt this is going to happen.
My view is that Elastic and its management team want to put some sand in the bulldozer’s diesel fuel. The question is, “WWAD” or “What will Amazon do?” Some of the options available to Amazon are likely to be interesting. The specific series of actions Amazon pursues will be particularly thrilling.
Second, another passage I circled was:
Smaller SaaS providers without Amazon’s resources will have to decide whether to do a deal with Elastic or Amazon to continue to offer a hosted Elasticsearch.
Based on my limited understanding of the legal hoo-hah with open source legal nuances, I think a customer will have to make a choice. Ride the bulldozer or go with the Son of Compass search. (Yep, that would be Elastic.)
For me, my meanderings through open source and enterprise search sparked these thoughts:
- In a competitive arena, open source will become closed. Too much money is at stake for the “leaders”
- Open source provides a low cost, low friction way to add functionality or enable an open source “play.” Once up and running, the company using open source wants to make sure the costs of R&D, bug fixes, and other enhancements are “free”; that is, not an expense to the company using open source software.
- Forks or code released to open source are competitive moves motivated by financial and marketing considerations.
Open source, open code, open anything: Sounds too good to be true. For some situations, enterprise search’s DNA will surface and the costs can be tricky enough to make an accountant experience heart burn. And the lawyers? Those folks send invoices. The users? Search is a utility. The companies appropriating and making their solution proprietary? Mostly happy campers. And the open source “developers”? Yikes.
Stephen E Arnold, January 18, 2021
Rah Rah Rah for Enterprise Search
January 8, 2021
The founder and CEO of enterprise search firm Mindbreeze, Daniel Fallmann, has penned quite an advertisement for enterprise search in “Employ Your AI as a Smart Partner: Intelligent Ways to Leverage Knowledge” posted by Forbes. For Fallmann, the advantage of AI is the ability to serve up the right information at the right time in rapidly changing business environments. He advises us that any knowledge management system worth its salt will have these technologies: AI, machine learning, natural language processing, natural language question answering, and semantic content processing. He emphasizes:
“Making the relevance of information personalized for each individual is what makes successful search results for employees. This is achieved by observing user behavior (assuming their consent, of course) and learning from it. Various factors that are analyzed include the role of the activity, the actions that were taken in the past in connection with certain information, specific search behavior and even the emotions that users associate with information — a topic closely related to customer experience or the experience economy.”
Looking ahead, Fallmann sees three significant developments in his field: X analytics, multimedia sources included in search results; weak supervision, a process that allows systems to learn independently and improve with use; and explainable AI (XAI), a way for systems to express their logic in a way humans can understand and manage. We’re told:
“Thanks to these new developments in intelligent systems like those used for enterprise search and knowledge management, workers no longer have to manage newly automated processes. Instead, they can combine their experience with artificial intelligence. This can generate a great opportunity to see ROI with reductions in the time it takes to complete tasks and eliminate repetitive tasks. This can help people play to more distinctively human strengths like social interactions, creativity and tact. And best of all, it can help workers spend their time on more impactful activities like strategy, innovation and problem-solving.”
No doubt, Mr. Fallmann would recommend Mindbreeze’s InSpire platform as the ideal solution. With headquarters in Chicago and in Linz, Austria, that company was founded in 2015 and is connected to a Microsoft reseller.
Cynthia Murrell, January 8, 2021
A Beefed Up Elasticsearch Presages an Interesting Future
December 31, 2020
The write up “Elasticsearch New Features: 2020 Year in Review” makes several “enterprise search” issues clear:
- Key word retrieval is not enough
- Additions to basic search signals that Elasticsearch is following the Entopia, FAST Search & Transfer, and other proprietary systems down the path of exponential complexity
- Specialists in the time series and geospatial sector have cause to rejoice and be worried.
The article provides a summary of the feature landscape for Elasticsearch. It is worth pointing out that many commercial vendors rely on Elasticsearch or its cousin Lucene for information retrieval functions.
The article illustrates why. A single firm lacks the resources to build, enhance, and support what now is a retrieval and analysis platform. What’s interesting is how few vendors report their open source roots. Most prefer to concentrate on their proprietary add ons. These are the differentiators, but I must admit that most of these commercial vendors appear to me like an iguanas in a Caribbean iguana farm pen. I can no longer tell them apart. When I encounter a “new” enterprise or specialized search system positioned as a problem solver for the enterprise, I see iguanas. I suppose each iguana has a quite distinct personality, but I am not smart enough to perceive the difference.
Net net: Enterprise search is a utility. As an information service accretes features and functions, the basics become less important. At some point, the enterprise search systems, whether free or proprietary, bangs straight into the accounting department’s Zoom meeting.
The results are not pretty. Complexity, triage costs, customization costs, and special add ons set the stage for more Delphes, Fulcrums, SMARTs and STAIRS. Will vendors of enterprise search figure out how to get off this pathway to a Dante-like digital netherworld?
My prediction for 2021? Nah.
Stephen E Arnold, December 31, 2020
Enterprise Search Needs To Do Its Core Function
December 24, 2020
Enterprise search is still one of those buzzwords tossed around by tech experts to make themselves sound smart, but with good reason. Inside Big Data discusses enterprise search’s future in the article: “Enterprise Search In The Age Of AI.” Enterprise search used to be one of the most important buzzwords in the tech industry. It meant a more intuitive and customizable way to search data and actually find desired information.
Enterprise search evolved into more advanced facets of enterprise systems and it appears with AI-powered big data systems it might not be relevant anymore. The article, however, states enterprise search is still important. Here is the extraordinary insight:
“My opinion is that, if Enterprise Search is to regain a significant share of the business tools market, it can only do so by refocusing on its core value proposition: search. When it comes to the public web, we might feel that there’s little room left for improvement in the search space, but I believe that there’s a lot more ground to explore on the enterprise side of things. Part of the reason for claiming this comes from the insight that our needs seem to almost universally follow Pareto’s law, at least when it comes to the public web. For the most part, we keep searching for the same things by posing similar queries and land on the same websites. The fact that the corpus of all web documents is immense presents more of a problem than an opportunity, as most of it is irrelevant to us. Google understands this well, which is why, over the last decade, it hasn’t been investing in expanding its search experience, but instead slowly reducing it to merely providing the “one true answer,” personalized for each user.”
Why does this need to be explained? With all the powerful AI systems users still need to locate information. Users want precise, quick, and relevant search tools that return the required data. How much simpler can it get? Why not develop an AI-powered enterprise search tool? I know the answer. Too difficult. Marketing hype and consulting baloney are much easier.
Whitney Grace, December 24, 2020
Sinequa: A Logical Leap
December 21, 2020
The French have contributed significantly to logic. One may not agree with the precepts of Peter Abelard, the enlightened René Descartes, or the mathiness of Jean-Yves Girard. A rational observer of the disciplines of search and retrieval may want to inspect the reasoning of “How Apple’s Pending Search Engine Hints at a Rise in Enterprise Search.”
The jumping off point for this essay is the vaporware emitted by heavy breathing thumb typers that Apple will roll out a Web search engine. The idea is an interesting one, but, as I write this, Apple is busy with a number of tasks. But vaporware is a proven fungible among those engaged in enterprise search. The idea of finding just the information one needs when working in a dynamic company is a bit like looking for the end of a rainbow. One can see it; therefore, there must be an end. Even better, mothers have informed their precocious progeny that there is a pot of gold at the terminus.
What can one do with the assumption that an Apple Web search engine will manifest itself?
The answer is probably one which will set a number of French logicians spinning in their graves.
According to the write up from an “expert” at the French enterprise search firm Sinequa:
So, if Apple is spending (most likely) billions of dollars recreating a tool that effortlessly finds us the global sum of human knowledge, then isn’t it about time we improve the tools that knowledge workers have to do their jobs?
That’s quite a leap, particularly for a discipline which dates from the pre-STAIRS era. But from a company founded in 2002, the leap is nothing out of the ordinary.
But enterprise search is a big job; for example:
The complication is that enterprise data is more heterogeneous in nature than internet data, which is homogeneous by comparison. As a result, enterprise data tends to reside in silos, so if we need to find a document, we can narrow down where we look to a couple of places – for instance, in our email or on a particular SharePoint. However. further complication arises when we don’t know where to look – or worse still, we don’t know what we’re looking for. A siloed approach works fairly well but at some point, we start to lose track of where to look. According to recent Sinequa research, knowledge workers currently have to access an average of around six different systems when looking for information – that’s potentially six individual searches you need to make to find something.
And why has enterprise search as a discipline failed to deliver exactly what an employee needs to do his or her job at a particular point in time?
That’s a good question which the logical confection does not address. No problem. Vendors of enterprise search have dodged the question for more than half a century.
Here’s how the essary nails down its stunning analysis:
It’s only a matter of time before enterprise search reaches a similar tipping point. There will be a time when the silos become too many or the time taken to search them becomes too great. The question is whether the reason for enterprise to take search seriously is because a lack of search is seen as an existential threat, or an opportunity to differentiate.
Okay, 50 years and counting.
Do you hear that buzzing sound? I surmise that it is René Descartes trying to contact Jacque Ellul to discuss how French logic fell off the wine cart.
My hunch is that Messrs. Descartes and Ellul will realize that providing access to information in response to a particular business need is a digital version of running toward the end of the rainbow. Some exercise, d’accord, but the journey may end in disappointment.
Par for the course for a company whose product pricing begins at $0.01 if Sourceforge is to be believed. Yep, $0.01. Logical? Sure. It’s marketing consistent with the hundreds of companies which have flogged enterprise search for decades.
Rainbows. Pots of gold. Yep.
Stephen E Arnold, December 20, 2020
Can Enterprise Search Improve Governance? Security?
December 10, 2020
I thought about this question after I read “BA Insight Delivers Internet-Like Search for Egnyte Customers.” The write up is a content marketing item with some jazzy jargon; for example:
AI-driven enterprise search
Connector-driven software portfolio
Intelligent recommendations
Machine learning
Natural Language
User behavior
User productivity.
What is, I ask myself, AI driven enterprise search? I don’t know what AI means, and I still have not figured out what “enterprise search” means after writing The New Landscape of Search and a number of other books and monographs on this subject.
My recollection is that Attivio has been wrapping layers of functionality around Lucene, but maybe my recollection is faulty. I do recall the interesting business intelligence application which pivoted on baseball data.
But that was in 2007 when former Fast Search & Transfer professionals pivoted from ESP (enterprise search platform) to Attivio. Attivio’s founder told me “attivio” was an Italian-like word which implies forward movement. Today a jaunty MBA would call this “kinetic branding.” Whatever.
The focus of the marketing collateral is a deal with an outfit involved in resolving content chaos and delivering information cohesion. I am not exactly sure what this means, but here is the description offered by Attivio’s partner / licensee Egnyte:
Your files contain your most critical data, but, more than ever, they’re sprawled across disconnected systems, devices, locations, and apps. Egnyte enables you to gain visibility and control across a hybrid content stack while also improving employee experience and driving business advantage.
Egnyte is in the compliance business, the data governance business, the risk reduction business, and the cyber security business. But the key value proposition seems to be:
Unified multi cloud content search
Specifically:
Egnyte is the only all-in-one platform that combines data-centric security and governance, AI for real-time and predictive insights, and the flexibility to connect with the content sources and applications your business users know and love – on any device, anywhere, without friction.
The words “only” and “all” are blinking yellow lights to me. Categorical affirmatives are tough for me to accept. These types of “make a case” statements are, however, popular with the millennials and thumbtypers in marketing departments.
I took a look at one of the buzzwords used to describe the Egnyte system powered in part by Attivio and learned that these are the functions the platform delivers:
- Breach reporting
- Classification policies (for GDPR compliance, CCPA, HIPAA, etc.)
- Content lifecycle management
- Content safeguards
- Custom keyword classification
- Data subject access requests
- Issue detection and alerting
- Insider threat and ransomware detection
- Multi-repository governance .
The combination of cyber security and search is interesting. However, the cyber security sector seems to have some explaining to do. Cyber crime particularly insider threats and phishing are experiencing a bad actor gold rush. Adding to the woe are reports of a cyber security firm’s inability to prevent a crippling cyber attack; specifically, “U.S. Cybersecurity Firm FireEye Discloses Breach, Theft of Hacking Tools.” What this means is that cyber security super stars are not secure. Thus, questions about a firm which is a relative newcomer to cyber security equipped with “only” and “all” assertions may face some interesting questions about the security of Egnyte and Attivio systems. I know I would ask some questions and carefully consider the responses. Insider threats and phishing are topics of interest to me.
Several observations:
- Search vendors are indeed working overtime to find markets for what is a downloadable utility function
- Partnerships are one way to generate sales leads and revenue from technical services and training
- Organizations, regardless of type, face significant findability, security, and regulatory challenges.
Interesting play, but “only” and “all” are big concepts, particularly when Amazon AWS, to cite one example, offers technology to deliver a similar solution directly or via its extensive partner network.
Stephen E Arnold, December 10, 2020
OpenText: A Cyber Graphic Points to Its Future
December 9, 2020
When I think of OpenText, here’s what flashes through my find:
- BRS (Livelink)
- Fulcrum
- Hummingbird
- InQuery
- nQuire
- Recommind
- SGML search.
My recollection is that there may be a Web search engine, a search system for law firm email, and a database from Information Dimensions. I cannot recall, but the message seems clear:
OpenText is a company deeply involved in search and retrieval.
When I read “Mark J. Barrenechea Keynote: The Future of Cyber Resilience”, I realized that I am thinking about the “old” OpenText. What do I mean “old.” That “old” OpenText was an enterprise search vendor wrapped in search-based applications like eDiscovery and content management.
Not any more.
Here’s the new OpenText:
Yep, the Rona, cyber security, health, and “agility, flexibility, and trust.” Who knew? Ice skaters call this a counter turn.
Stephen E Arnold, December 9, 2020
LinkedIn Reveals Disinterest in Search and Retrieval
December 7, 2020
LinkedIn does quite a bit of info-ramming when either one of my team or I log in to the Microsoft social media system. Here’s the graphic displayed when we were checking to see if our automated posts from this blog were appearing:
The eight “cards” tell me about LinkedIn Groups in which I may have an interest. The little boxes reveal a small amount of information about the content access topics in which the unemployed, the consultants cruising for gigs, and the self-promoters have an interest.
The table below presents some of the data in this graphic in tabular form. No, I did not use Excel 365 connected to Teams. Sorry, Mother Microsoft. I still recall Bob. (You remember Bob, don’t you, gentle reader?)
LinkedIn Group Name | Number of LinkedIn Followers |
Data Science Central | 374,694 |
Association for Intelligent Information Management | 27,861 |
Scientific, Technical, Medical Publishing Group | 12,253 |
Data & Text Analytics Professionals | 12,503 |
Special Libraries Asso. | 15,191 |
Semantic Web | 15,098 |
Semantic Technologies Group | 3,772 |
Enterprise Search & Discovery | 624 |
LinkedIn does not reveal the hard count for its total number of registered humans, the number of human users who log on to the system once per week, or the number of paying human users. Hence, figuring out the percentage of LinkedIn members interested in these groups is a difficult task akin to predicting the share price of Palantir Technologies on January 1, 2022.
An outfit called Oberlo reports with confidence that LinkedIn has 660 million users. Close enough for horseshoes.
The table below presents the percentage of these LinkedIn users interested in each the groups suggested to me:
LinkedIn Group Name | Percentage of LinkedIn Members Interested in These Topics |
Data Science Central | 0.0567718182% |
Association for Intelligent Information Management | 0.0042213636% |
Scientific, Technical, Media Publishing Group | 0.0018565152% |
Data & Text Analytics Professionals | 0.0018943939% |
Special Libraries Asso. | 0.0023016667% |
Semantic Web | 0.0022875758% |
Semantic Technologies Group | 0.0005715152% |
Enterprise Search & Discovery | 0.0000945455% |
Eyeballing my math, surely there are errors. How can such a compelling subject as Enterprise Search & Discovery appeal to 0.0000945455 percent of the LinkedIn members.
What’s interesting is that an astounding 0.0042213636 percent of the LinkedIn membership are pulled to the Association for Intelligent Information Management.
And the semantic topics. Magnetic indeed.
What’s the analysis suggest? Anyone looking for a job in enterprise search may want to spin their expertise a different way.
Stephen E Arnold, December 7, 2020