Open Source Software: The Community Model in 2021

January 25, 2021

I read “Why I Wouldn’t Invest in Open-Source Companies, Even Though I Ran One.” I became interested in open source search when I was assembling the first of three editions of Enterprise Search Report in the early 2000s. I debated whether to include Compass Search, the precursor to Shay Branon’s Elasticsearch reprise. Over the years, I have kept my eye on open source search and retrieval. I prepared a report for an the outfit IDC, which happily published sections of the document and offering my write ups for $3,000 on Amazon. Too bad IDC had no agreement with me, managers who made Daffy Duck look like a model for MBAs, and a keen desire to find a buyer. Ah, the book still resides on one of my back of drives, and it contains a run down of where open source was getting traction. I wrote the report in 2011 before getting the shaft-a-rama from a mid tier consulting firm. Great experience!

The report included a few nuggets which in 2011 not many experts in enterprise search recognized; for instance:

  1. Large companies were early and enthusiastic adopters of open source search; for example Lucene. Why? Reduce costs and get out of the crazy environment which put Fast Search & Transfer-type executives in prison for violating some rules and regulations. The phrase I heard in some of my interviews was, “We want to get out of the proprietary software handcuffs.” Plus big outfits had plenty of information technology resources to throw at balky open source software.
  2. Developers saw open source in general and contributing to open source information retrieval projects as a really super duper way to get hired. For example, IBM — an early enthusiast for a search system which mostly worked — used the committers as feedstock. The practice became popular among other outfits as well.
  3. Venture outfits stuffed with oh-so-technical MBAs realized that consulting services could be wrapped around free software. Sure, there were legal niceties in the open source licenses, but these were not a big deal when Silicon Valley super lawyers were just a text message away.

There were other findings as well, including the initiatives underway to embed open source search, content processing, and related functions into commercial products. Attivio (formed by former super star managers from Fast Search & Transfer), Lucid Works, IBM, and other bright lights adopted open source software to [a] reduce costs, [b] eliminate the R&D required to implement certain new features, and [c] develop expensive, proprietary components, training, and services.

Read more

Google: One Trial Balloon Up and Another Launched

January 25, 2021

I read “Alphabet Loon Internet Balloon “Other Bet” Gets Grounded Forever.” Unlike the Graf Zeppelin’s performance, no one appears to have been killed by the Loon balloon.


I quite like the idea of airships; however, unpredictable weather and all-too-predictable smart software make balloons bouncing Internet signals a somewhat unusual idea. Puerto Rico, Sri Lanka, and odd spots in the US once were on the globe floating Loon’s itinerary. Not now. I learned:

Project Loon will be winding down operations and its remaining balloons in the coming months while employees are shuffled across Alphabet, Google, and X. It’s definitely disappointing news to hear, especially given how Loon Internet played critical roles in some natural disasters in the past two to three years.

But, rejoice. There is another Google balloon which may be trialed in Australia. “Google Threatens to Remove Search from Australia over New Law” contains the company-versus-country news:

Google on Friday threatened to disable its search engine function in Australia if the government passes new regulations that would force large tech companies to negotiate with news organizations to present the content they produce.

France and Google have reached some agreement about news and money. Australia is the testing ground for a less fromage-and-wne centric rapprochement. But Australia has sheep and coal. The write up noted:

Australian Prime Minister Scott Morrison responded during a press conference later Friday, stating, “We don’t respond to threats.” “Let me be clear: Australia makes our rules for things you can do in Australia,” he said. “That’s done in our Parliament. It’s done by our government and that’s how things work here in Australia.”

If a country will not meet Google’s demands, then Google search won’t find anything for the kangaroo crowd.


Google does not find anything when some people run queries.

It is clear that Google does not want fromage-and-wine deals with other countries. The costs would be too much for the Google to stomach. Lamb chops versus fromage and wine? Is this a fair contest.

Beyond Search believes that Google’s policy of threat is a trial balloon. Will that policy fly like the Loon?

Stephen E Arnold, January 25, 2020

The Building Blocks of Smart Software: Combine Them to Meet Your Needs

January 25, 2021

I have a file of listicles. One called “Top 10 Algorithms in Data Mining” appeared in 2007. I spotted another list which is, not surprisingly, quite like the Xindong Wu et al write up. The most recent listing is “All Machine Learning Algorithms You Should Know in 2021.” And note the “all.” I included a short item about a book of business intelligence algorithms in the DarkCyber for January 26, 2021, at this link. That book had more than 600 pages, and I am reasonably confident that the authors did not use the word “all” to describe their effort.

What’s the line up of “all” you ask? In the table below, I present the list from 2008 in red and the list from 2021 in blue.

2008 Xindong Dong et al 2021 “All” KDNuggets’
1 Decision Trees Linear regression
2 k-means Logistic regression
3 Support Vector Machines k nearest neighbor
4 A priori Naive Bayes
5 Expectation-Maximization (EM) Support vector machines
6 Page Rank (voting) Decision trees
7 Ada Boost Random forest
8 k nearest neighbor classification AdaBoost
9 Naive Bayes Gradient boost
10 Classification and Regression trees XGBoost

The KDNuggets’ opinion piece also includes LightGMB (a variation of XGBoost) and CatBoost (is a more efficient gradient boost). Hence, I have focused on 10 algorithms. I performed a similar compression with Xindong Dong et al’s labored discussion of rules and cases grouped under “decision trees” in the table above.

Several observations are possible from these data:

  1. “All” is misleading in the KDNuggets’ title. Why not skip the intellectually shallow “all”?
  2. In the 14 years between the scholarly article and the enthusiastic “all” paper, the tools of the smart software crowd have not advanced if the data in these two write ups are close enough for horse shoes
  3. Modern systems’ similarity in overall approaches is understandable because a limited set of tools are used by energetic “inventors” of smart software.

Net net: The mathematical recipes are evolving in terms of efficiency due to more machine horsepower and more data.

How about the progress in accuracy? Did IBM Watson uncover a drug to defeat Covid? How are those Google search results working for you? What about the smart cyber security software which appear to have missed entirely the SolarWinds’ misstep.

Why? Knowing algorithms is not the same as developing systems which most work. Marketers, however, can seize on these mathy names and work miracles. Too bad the systems built with them don’t.

Stephen E Arnold, January 25, 2021

The Reality of Machine Learning in a Thumb Typing World

January 25, 2021

I spotted a question on Hacker News: “How Many % of Machine Learning Projects Fail?” I want to call attention to a response to this question from an entity allegedly named Jurgurtha. This individual provides information about the machine learning procedures to which he has been exposed. Unlike the rah rah Webinars, the weird made-for-mobile Web pages of machine learning vendors, and the volumes of marketing collateral from outfits with enough venture funding to repurpose a friend’s daughter who is a major in art history as an search engine optimization expert, the Jurgurtha entity offers some interesting information. Here are a handful of interesting phrases from his “answer” to the question posed by a user of Hacker News:

  1. Academics
  2. Data set difficulty
  3. Cooperation and the lack thereof
  4. Coordination expressed as “not all stakeholders are aligned”
  5. Annoying taps on one’s shoulder.

Credible write up. Oh, taps on one’s shoulder seems to mean, “Help me.” Yep, help.

Stephen E Arnold, January 25, 2021

More Pix Online: 700K Images from the Rijksmuseum

January 22, 2021

We spotted this news item: “Over 700,000 Pairings from the Rijksmuseum Online Copyright Free.” These, according to the write up, are copyright free. The source of the money for this project was BankGiro Lottery, which is a culture lottery. I love that phrase “culture lottery.” I wonder if Russian individuals of character will implement similar terminology? You can access the service at this link.

The value of any image collection is one’s ability to locate a picture by artist, date, subject, and hopefully the name of the individual who made the painting possible for the museum to acquire. Art, like yachts, often has a fascinating back-story.

I ran this query: Canal boats.

The system displayed:


I clicked on Canals Boats and Ships. Notice that my “canal” was expanded to include “canals.” The term “boats” was matched exactly.

That’s a step forward considering the issues I have encountered with Internet Archive, Google Life collection, Library of Congress, and other image services.

My family was forced out of Amsterdam in 1605. Perhaps by getting image search to mostly work, the citizens are extending an olive branch to the remaining Arnolds.

Stephen E Arnold, January 22, 2021

The AWS Bulldozer and Elasticsearch: Can the Rubber Trees Grow Back?

January 22, 2021

In 1955 or 1956, I lived in Campinas, Brazil. My father worked from RG LeTourneau. He had the delightful job of setting up a factory to produce what were then called sheep foot rollers. Most people are not aware of the function of a sheep’s foot roller. Let me explain.

Hoot a D9 or other comparable bulldozer to two or more sheep foot rollers. Drive the bulldozer, scraper, or other heavy duty machine through a grassy field, a jungle or grassland. Crush and smash the trees, plants, and animals. What’s in the wake of the snorting and roaring yellow beast is a surface almost ready for paving. That’s right. The sheep foot rollers made the Trans-Amazon highway a reality.

Holmes Sheepfoot Rollers & Parts

What did the fleets of earth moving machinery do to the Hevea brasiliensis, a species of rubberwood. Well, in the case of highway deforestation, the elastic plants did not fare particularly well.


What does this slice of my life have to do with search, retrieval, log file analysis, information access, and other content related activities?

Stepping Up for a Truly Open Source Elasticsearch” reminded me of the impact of the bulldozers and the sheep foot roller combos. The write up explains:

We launched Open Distro for Elasticsearch in 2019 to provide customers and developers with a fully featured Elasticsearch distribution that provides all of the freedoms of ALv2-licensed software. Open Distro for Elasticsearch is a 100% open source distribution that delivers functionality practically every Elasticsearch user or developer needs, including support for network encryption and access controls. In building Open Distro, we followed the recommended open source development practice of “upstream first.”

Who is the “we” driving what I think of as a digital bulldozer? Why none other than Amazon.

I wrote about Elastic search’s difficult decision to try to stave off the building of an information superhighway directly over the Elastic NV buildings in Amsterdam. You can find that essay in “Enterprise Search: Flexible and Stretchy. Er, No.”

I think my observation that it was too late for Elastic NV. Perhaps the company can find a way to avoid the Bezos bulldozer. The sentiments about the virtues of open source software echo through the Amazon blog post and the Elastic NV explanation of its decision to be a different flavor of open source goodness.

Put that handwaving aside.

The function of the bulldozer and the sheep foot roller is to build a new trail. That trail leads to Amazon AWS revenues, service offerings, and integrated functionality.

Vrrooom. Too bad about those hyacinth macaws. My father and Mr. LeTourneau were not environmentalists. Neither was particularly elastic either. Both loved the results of big yellow machines dragging sheep foot rollers across the virgin landscape.

There’s a lesson here. The Trans-Amazon highway is visible from the international space station. The rubber trees and other trivialities are not.

Stephen E Arnold, January 22, 2021

Post SolarWinds: Enhanced Security Methods. Er, What?

January 22, 2021

I find it interesting that the SolarWinds’ security misstep has faded. I assumed (the old ass of you and me saw is applicable) that after a teeny little security breach, information technology professionals would exert a teeny little effort to make sure obvious security lapses were remediated. Was I incorrect? Absolutely, gentle reader.

I noted the Beeb’s article “Malware Found on Laptops Given Out by Government”. The “government” is the United Kingdom’s Brexit capable entity. I learned:

Some of the laptops given out in England to support vulnerable children home-schooling during lockdown contain malware….The malware, which they said appeared to be contacting Russian servers, is believed to have been found on laptops given to a handful of schools.

I love the “some” and the “handful.” Ho ho ho.

Like the SolarWinds’ misstep, numbers in which one can be confident are not readily available. What is available is the indifference organizations have to the risks and threats malware on school laptops and educational computers pose. Thinking about human trafficking and child pornography. Distasteful for sure, but these “government” computers may provide information useful for other pursuits; for example, blackmail, extortion, and parent or guardian financial information.

One source for the tolerant Beeb allegedly said:

“We believe this is not widespread.”

Right, 18,000 organizations compromised via the SolarWinds’ misstep should be ignored.

Let’s here it for security well implemented. Wait. I don’t hear any rah rah. Must be an intercepted Internet stream which does not happen in the UK.

Stephen E Arnold, January 22, 2021

Law Enforcement Content Acquisition Revealed

January 22, 2021

Everything you do with a computer, smartphone, wearable, smart speaker, or tablet is recorded. In order to catch bad actors, law enforcement issues warrants to technology companies often asking for users who searched for specific keywords or visited certain Web sites in a specific time frame. Wired explains how private user information is still collected despite big tech promising to protect their users in the article, “How Your Digital Trails Wind Up In The Police’s Hands.”

Big tech companies continue to host apps and sell technology that provides user data to law enforcement. Apple attempted to combat the unauthorized of user information by requiring all developers to have a “nutritional label” on its apps. The label will disclose privacy policies. It is not, however, a blanket solution.

Big tech companies pledge their dedication to ending law enforcement using unlawful surveillance, but their actions are hypocritical. Amazon is committed to racial equity, but they saw an uptick in police request for user information. Google promises the same equity commitment with Google Doodles and donations, but they provide police with geofence warrants.

Law makers and political activists cite that these actions violate people’s civil rights and the Fourth Amendment. While there are people who are rallying to protect the average user, the bigger problem rests with users’ lack of knowledge. How many users are aware about the breadcrumbs they are leaving around the Internet? How many users actually read privacy policies or terms of service agreements? Very few!

“The solution isn’t simply for people to stop buying IoT devices or for tech companies to stop sharing data with the government. But “equity” demands that users be aware of the digital bread crumbs they leave behind as they use electronic devices and how state agents capitalize on both obscure systems of data collection and our own ignorance.”

Perhaps organizations should concentrate on educating the public or require big tech companies to have more transparent privacy policies in shorter, readable English? With thumb typing and illiteracy prevalent in the US, ignorance pays data dividends.

Whitney Grace, January 22, 2020

Computing: Things Go Better with Light

January 22, 2021

Electricity is too slow at matrix math for IBM. Now, announces ZDNet, “IBM Is Using Light, Instead of Electricity, to Create Ultra-Fast Computing.” The shift could be especially important to the future of self-driving automobiles, where ultra-fast processing is needed to avoid collisions at high travel speeds. Reporter Daphne Leprince-Ringuet writes:

“Although the device has only been tested at a small scale, the report suggests that as the processor develops, it could achieve one thousand trillion multiply-accumulate (MAC) operations per second and per square-millimeter – according to the scientists, that is two to three orders more than ‘state-of-the-art AI processors’ that rely on electrical signals.”

IBM researchers have been working toward this goal for some time. Last year, the company demonstrated the tech’s potential through in-memory computing with devices that performed computational tasks using light. Now they have created what they call a photonic tensor core they say is particularly suited for deep-learning applications. The article continues:

“The most significant advantage that light-based circuits have over their electronic counterparts is never-before-seen speed. Leveraging optical physics, the technology developed by IBM can run complex operations in parallel in a single core, using different optical wavelengths for each calculation. Combined with in-memory computing, IBM’s scientists achieved ultra-low latency that is yet to be matched by electrical circuits. For applications that require very low latency, therefore, the speed of photonic processing could make a big difference. … With its ability to perform several operations simultaneously, the light-based processor developed by IBM also requires much less compute density.”

That is another consideration for self-driving vehicles—the smaller the hardware the better. But this technology is far from ready for the road. IBM still must evaluate how it can be integrated for end-to-end performance. The potential to trade electricity for light is an interesting development; we are curious to see how this unfolds.

Cynthia Murrell, January 22, 2021

More Google Smart Software Ethics Excitement

January 21, 2021

Generally speaking, the publicity swirling around the Timnit Gebru matter has not been flattering to Googzilla. If the write up “Google’s New Union is Outraged As the Firm Investigates a Second AI Ethicist: Being Targeted by One of the World’s Largest Corporations Is Terrifying,” the mom and pop company should be pleased to be labeled “one of the world’s largest corporations.” Google is facing tough competition from DuckDuckGo, an outfit that uses other companies’ search outputs and Ecosia, a green search engine. Yep, green.

According to the write up, the notion of ethics seems to irritate Googzilla. I learned:

…e Alphabet Workers Union, which went public earlier this month, has hit out at the decision to suspend Mitchell, branding the move an “attack on the people who are trying to make Google’s technology more ethical.”

I am not going to get into the debate about whether this particular “union” is a real Jimmy Hoffa style operation.

The main point is that where ethics are an issue, the online advertising store — which faces intense competition from companies with which is may have an understanding — takes actions which underscore the excellence of management’s deft touch.

Useful management insight. But a mom and pop outfit is expected to have some management tools which baffle outsiders. Perhaps an employee handbook will surface which explains the rules of the information highway promulgated by the most sensitive company in Silicon Valley?

Stephen E Arnold, January 22l, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta