Open Source and Open Doors. Bad Actors, Come On In
May 13, 2024
Open source code is awesome, because it allows developers to create projects without paying proprietary fees and it inspires innovation. Open source code, however, has problems especially when bad actors know how to exploit it. OpenSSF shares how a recent open source back door left many people vulnerable: “Open Source Security (OpenSSF) And OpenJS Foundations Issue Alert For Social Engineer Takeovers Of Open Source Projects.”
The OpenJS Foundation hosts billions of JavaScript websites. The foundation recently discovered a social engineering takeover attempt dubbed XZ Utilz backdoor, similar to another hack in the past. The OpenJS Foundation and the Open Source Security Foundation are alerting developers about the threat.
The OpenJS received a series of suspicious emails from various GitHub emails that advised project administrators to update their JavaScript. The update description was vague and wanted the administrators to allow the bad actors access to projects. The scam emails are part of the endless bag of tricks black hat hackers use to manipulate administrators, so they can access source code.
The foundations are warning administrators about the scams and sharing tips about how to recognize scams. Bad actors exploit open source developers:
“These social engineering attacks are exploiting the sense of duty that maintainers have with their project and community in order to manipulate them. Pay attention to how interactions make you feel. Interactions that create self-doubt, feelings of inadequacy, of not doing enough for the project, etc. might be part of a social engineering attack.
Social engineering attacks like the ones we have witnessed with XZ/liblzma were successfully averted by the OpenJS community. These types of attacks are difficult to detect or protect against programmatically as they prey on a violation of trust through social engineering. In the short term, clearly and transparently sharing suspicious activity like those we mentioned above will help other communities stay vigilant. Ensuring our maintainers are well supported is the primary deterrent we have against these social engineering attacks.”
These scams aren’t surprising. There needs to be more organizations like OpenJS and Open Source Security, because their intentions are to protect the common good. They’re on the side of the little person compared to politicians and corporations.
Whitney Grace, May 13, 2024
Open Source Software: Fool Me Once, Fool Me Twice, Fool Me Once Again
April 1, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Open source is shoved in my face each and every day. I nod and say, “Sure” or “Sounds on point”. But in the back of my mind, I ask myself, “Am I the only one who sees open source as a way to demonstrate certain skills, a Hail, Mary, in a dicey job market, or a bit of MBA fancy dancing. I am not alone. Navigate to “Software Vendors Dump Open Source, Go for Cash Grab.” The write up does a reasonable job of explaining the open source “playbook.”
The write up asserts:
A company will make its program using open source, make millions from it, and then — and only then — switch licenses, leaving their contributors, customers, and partners in the lurch as they try to grab billions.
Yep, billions with a “B”. I think that the goal may be big numbers, but some open source outfits chug along ingesting venture funding and surfing on assorted methods of raising cash and never really get into “B” territory. I don’t want to name names because as a dinobaby, the only thing I dislike more than doctors is a legal eagle. Want proper nouns? Sorry, not in this blog post.
Thanks, MSFT Copilot. Where are you in the open source game?
The write up focuses on Redis, which is a database that strikes me as quite similar to the now-forgotten Pinpoint approach or the clever Inktomi method to speed up certain retrieval functions. Well, Redis, unlike Pinpoint or Inktomi is into the “B” numbers. Two billion to be semi-exact in this era of specious valuations.
The write up says that Redis changed its license terms. This is nothing new. 23andMe made headlines with some term modifications as the company slowly settled to earth and landed in a genetically rich river bank in Silicon Valley.
The article quotes Redis Big Dogs as saying:
“Beginning today, all future versions of Redis will be released with source-available licenses. Starting with Redis 7.4, Redis will be dual-licensed under the Redis Source Available License (RSALv2) and Server Side Public License (SSPLv1). Consequently, Redis will no longer be distributed under the three-clause Berkeley Software Distribution (BSD).”
I think this means, “Pay up.”
The author of the essay (Steven J. Vaughan-Nichols) identifies three reasons for the bait-and-switch play. I think there is just one — money.
The big question is, “What’s going to happen now?”
The essay does not provide an answer. Let me fill the void:
- Open source will chug along until there is a break out program. Then whoever has the rights to the open source (that is, the one or handful of people who created it) will look for ways to make money. The software is free, but modules to make it useful cost money.
- Open source will rot from within because “open” makes it possible for bad actors to poison widely used libraries. Once a big outfit suffers big losses, it will be hasta la vista open source and “Hello, Microsoft” or whoever the accountants and lawyers running the company believe care about their software.
- Open source becomes quasi-commercial. Options range from Microsoft charging for GitHub access to an open source repository becoming a membership operation like a digital Mar-A-Lago. The “hosting” service becomes the equivalent of a golf course, and the people who use the facilities paying fees which can vary widely and without any logic whatsoever.
Which of these three predictions will come true? Answer: The one that affords the breakout open source stakeholders to generate the maximum amount of money.
Stephen E Arnold, April 1, 2024
Commercial Open Source: Fantastic Pipe Dream or Revenue Pipe Line?
March 26, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Open source is a term which strikes me as au courant. Artificial intelligence software is often described as “open source.” The idea has a bit of “do good” mixed with the idea that commercial software puts customers in handcuffs. (I think I hear Kumbaya playing faintly in the background.) Is it possible to blend the idea of free and open software with the principles of commercial software lock in? Notable open source entrepreneurs have become difficult to differentiate from a run-of-the-mill technology company. Examples include RedHat, Elastic, and OpenAI. Ooops. Sorry. OpenAI is a different type of company. I think.
Will open source software, particularly open source AI components, end up like this private playground? Thanks, MSFT Copilot. You are into open source, aren’t you? I hope your commitment is stronger than for server and cloud security.
I had these open source thoughts when I read “AI and Data Infrastructure Drives Demand for Open Source Startups.” The source of the information is Runa Capital, now located in Luxembourg. The firm publishes a report called the Runa Open Source Start Up Index, and it is a “rosy” document. The point of the article is that Runa sees open source as a financial opportunity. You can start your exploration of the tables and charts at this link on the Runa Capital Web site.
I want to focus on some information tucked into the article, just not presented in bold face or with a snappy chart. Here’s the passage I noted:
Defining what constitutes “open source” has its own inherent challenges too, as there is a spectrum of how “open source” a startup is — some are more akin to “open core,” where most of their major features are locked behind a premium paywall, and some have licenses which are more restrictive than others. So for this, the curators at Runa decided that the startup must simply have a product that is “reasonably connected to its open-source repositories,” which obviously involves a degree of subjectivity when deciding which ones make the cut.
The word “reasonably” invokes an image of lawyers negotiating on behalf of their clients. Nothing is quite so far from the kumbaya of the “real” open source software initiative as lawyers. Just look at the licenses for open source software.
I also noted this statement:
Thus, according to Runa’s methodology, it uses what it calls the “commercial perception of open-source” for its report, rather than the actual license the company attaches to its project.
What is “open source”? My hunch it is whatever the lawyers and courts conclude.
Why is this important?
The talk about “open source” is relevant to the “next big thing” in technology. And what is that? ANSWER: A fresh set of money making plays.
I know that there are true believers in open source. I wish them financial and kumbaya-type success.
My take is different: Open source, as the term is used today, is one of the phrases repurposed to breathe life in what some critics call a techno-feudal world. I don’t have a dog in the race. I don’t want a dog in any race. I am a dinobaby. I find amusement in how language becomes the Teflon on which money (one hopes) glides effortlessly.
And the kumbaya? Hmm.
Stephen E Arnold, March 26, 2024
AI Hermeneutics: The Fire Fights of Interpretation Flame
March 12, 2024
This essay is the work of a dumb dinobaby. No smart software required.
My hunch is that not too many of the thumb-typing, TikTok generation know what hermeneutics means. Furthermore, like most of their parents, these future masters of the phone-iverse don’t care. “Let software think for me” would make a nifty T shirt slogan at a technology conference.
This morning (March 12, 2024) I read three quite different write ups. Let me highlight each and then link the content of those documents to the the problem of interpretation of religious texts.
Thanks, MSFT Copilot. I am confident your security team is up to this task.
The first write up is a news story called “Elon Musk’s AI to Open Source Grok This Week.” The main point for me is that Mr. Musk will put the label “open source” on his Grok artificial intelligence software. The write up includes an interesting quote; to wit:
Musk further adds that the whole idea of him founding OpenAI was about open sourcing AI. He highlighted his discussion with Larry Page, the former CEO of Google, who was Musk’s friend then. “I sat in his house and talked about AI safety, and Larry did not care about AI safety at all.”
The implication is that Mr. Musk does care about safety. Okay, let’s accept that.
The second story is an ArXiv paper called “Stealing Part of a Production Language Model.” The authors are nine Googlers, two ETH wizards, one University of Washington professor, one OpenAI researcher, and one McGill University smart software luminary. In short, the big outfits are making clear that closed or open, software is rising to the task of revealing some of the inner workings of these “next big things.” The paper states:
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2…. For under $20 USD, our attack extracts the entire projection matrix of OpenAI’s ada and babbage language models.
The third item is “How Do Neural Networks Learn? A Mathematical Formula Explains How They Detect Relevant Patterns.” The main idea of this write up is that software can perform an X-ray type analysis of a black box and present some useful data about the inner workings of numerical recipes about which many AI “experts” feign total ignorance.
Several observations:
- Open source software is available to download largely without encumbrances. Good actors and bad actors can use this software and its components to let users put on a happy face or bedevil the world’s cyber security experts. Either way, smart software is out of the bag.
- In the event that someone or some organization has secrets buried in its software, those secrets can be exposed. One the secret is known, the good actors and the bad actors can surf on that information.
- The notion of an attack surface for smart software now includes the numerical recipes and the model itself. Toss in the notion of data poisoning, and the notion of vulnerability must be recast from a specific attack to a much larger type of exploitation.
Net net: I assume the many committees, NGOs, and government entities discussing AI have considered these points and incorporated these articles into informed policies. In the meantime, the AI parade continues to attract participants. Who has time to fool around with the hermeneutics of smart software?
Stephen E Arnold, March 12, 2024
Open Source: Free, Easy, and Fast Sort Of
February 29, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Not long ago, I spoke with an open source cheerleader. The pros outweighed the cons from this technologist’s point of view. (I would like to ID the individual, but I try to avoid having legal eagles claw their way into my modest nest in rural Kentucky. Just plug in “John Wizard Doe”, a high profile entrepreneur and graduate of a big time engineering school.)
I think going up suggests a problem.
Here are highlights of my notes about the upside of open source:
- Many smart people eyeball the code and problems are spotted and fixed
- Fixes get made and deployed more rapidly than commercial software which of works on an longer “fix” cycle
- Dead end software can be given new kidneys or maybe a heart with a fork
- For most use cases, the software is free or cheaper than commercial products
- New functions become available; some of which fuel new product opportunities.
There may be a few others, but let’s look at a downside few open source cheerleaders want to talk about. I don’t want to counter the widely held belief that “many smart people eyeball the code.” The method is grab and go. The speed angle is relative. Reviving open source again and again is quite useful; bad actors do this. Most people just recycle. The “free” angle is a big deal. Everyone like “free” because why not? New functions become available so new markets are created. Perhaps. But in the cyber crime space, innovation boils down to finding a mistake that can be exploited with good enough open source components, often with some mileage on their chassis.
But the one point open source champions crank back on the rah rah output. “Over 100,000 Infected Repos Found on GitHub.” I want to point out that GitHub is a Microsoft, the all-time champion in security, owns GitHub. If you think about Microsoft and security too much, you may come away confused. I know I do. I also get a headache.
This “Infected Repos” API IRO article asserts:
Our security research and data science teams detected a resurgence of a malicious repo confusion campaign that began mid-last year, this time on a much larger scale. The attack impacts more than 100,000 GitHub repositories (and presumably millions) when unsuspecting developers use repositories that resemble known and trusted ones but are, in fact, infected with malicious code.
The write up provides excellent information about how the bad repos create problems and provides a recipe for do this type of malware distribution yourself. (As you know, I am not too keen on having certain information with helpful detail easily available, but I am a dinobaby, and dinobabies have crazy ideas.)
If we confine our thinking to the open source champion’s five benefits, I think security issues may be more important in some use cases.The better question is, “Why don’t open source supporters like Microsoft and the person with whom I spoke want to talk about open source security?” My view is that:
- Security is an after thought or a never thought facet of open source software
- Making money is Job #1, so free trumps spending money to make sure the open source software is secure
- Open source appeals to some venture capitalists. Why? RedHat, Elastic, and a handful of other “open source plays”.
Net net: Just visualize a future in which smart software ingests poisoned code, and programmers who rely on smart software to make them a 10X engineer. Does that create a bit of a problem? Of course not. Microsoft is the security champ, and GitHub is Microsoft.
Stephen E Arnold, February 29, 2024
Map Data: USGS Historical Topos
February 20, 2024
This essay is the work of a dumb dinobaby. No smart software required.
The ESRI blog published “Access Over 181,000 USGS Historical Topographic Maps.” The map outfit teamed with the US Geological Survey to provide access to an additional 1,745 maps. The total maps in the collection is now 181,008.
The blog reports:
Esri’s USGS historical topographic map collection contains historical quads (excluding orthophoto quads) dating from 1884 to 2006 with scales ranging from 1:10,000 to 1:250,000. The scanned maps can be used in ArcGIS Pro, ArcGIS Online, and ArcGIS Enterprise. They can also be downloaded as georeferenced TIFs for use in other applications.
These data are useful. Maps can be viewed with ESRI’s online service called the Historical Topo Map Explorer. You can access that online service at this link.
If you are not familiar with historical topos, ESRI states in an ARCGIS post:
The USGS topographic maps were designed to serve as base maps for geologists by defining streams, water bodies, mountains, hills, and valleys. Using contours and other precise symbolization, these maps were drawn accurately, made mathematically correct, and edited carefully. The topographic quadrangles gradually evolved to show the changing landscape of a new nation by adding symbolization for important highways; canals; railroads; and railway stations; wagon roads; and the sites of cities, towns and villages. New and revised quadrangles helped geologists map the mineral fields, and assisted populated places to develop safe and plentiful water supplies and lay out new highways. Primary considerations of the USGS were the permanence of features; map symbolization and legibility; and the overall cost of compiling, editing, printing and distributing the maps to government agencies, industry, and the general public. Due to the longevity and the numerous editions of these maps they now serve new audiences such as historians, genealogists, archeologists, and people who are interested in the historical landscape of the U.S.
This public facing data service is one example of extremely useful information gathered by US government entities can be made more accessible via a public-private relationship. When I served on the board of the US National Technical Information Service, I learned that other useful information is available, just not easily accessible to US citizens.
Good work, ESRI and USGS! Now what about making that volcano data a bit easier to find and access in real time?
Stephen E Arnold, February 20, 2024
AI Coding: Better, Faster, Cheaper. Just Pick Two, Please
January 29, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Visual Studio Magazine is not on my must-read list. Nevertheless, one of my research team told me that I needed to read “New GitHub Copilot Research Finds “Downward Pressure on Code Quality.” I had no idea what “downward pressure” means. I read the article trying to figure out what the plain English meaning of this tortured phrase meant. Was it the downward pressure on the metatarsals when a person is running to a job interview? Was it the deadly downward pressure exerted on the OceanGate submersible? Was it the force illustrated in the YouTube “Hydraulic Press Channel”?
A partner at a venture firms wants his open source recipients to produce more code better, faster, and cheaper. (He does not explain that one must pick two.) Thanks MSFT Copilot Bing thing. Good enough. But the green? Wow.
Wrong.
The writeup is a content marketing piece for a research report. That’s okay. I think a human may have written most of the article. Despite the frippery in the article, I spotted several factoids. If these are indeed verifiable, excitement in the world of machine generated open source software will ensue. Why does this matter? Well, in the words of the SmartNews content engine, “Read on.”
Here are the items of interest to me:
- Bad code is being created and added to the GitHub repositories.
- Code is recycled, despite smart efforts to reduce the copy-paste approach to programming.
- AI is preparing a field in which lousy, flawed, and possible worse software will flourish.
Stephen E Arnold, January 29, 2024
Open Source Software: Free Gym Shoes for Bad Actors
January 15, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Many years ago, I completed a number of open source projects. Although different clients hired my team and me, the big question was, “What’s the future of open source software as an investment opportunity and as a substitute for commercial software. Our work focused on two major points:
- Community support for a widely-used software once the original developer moved on
- A way to save money and get rid of the “licensing handcuffs” commercial software companies clamped on their customers
- Security issues resulting from poisoned code or obfuscated “special features.:
My recollection is that the customers focused on one point, the opportunity to save money. Commercial software vendors were in the “lock in” game, and open source software for database, utility, and search and retrieval.
Today, a young innovator may embrace an open source solution to the generative smart software approach to innovation. Apart from the issues embedded in the large language model methods themselves, building a product on other people’s code available a open source software looks like a certain path to money.
An open source game plan sounds like a winner. Then upon starting work, the path reveals its risks. Thanks, MSFT Copilot, you exhausted me this morning. Good enough.
I thought about our work in open source when I read “So, Are We Going to Talk about How GitHub Is an Absolute Boon for Malware, or Nah?” The write up opines:
In a report published on Thursday, security shop Recorded Future warns that GitHub’s infrastructure is frequently abused by criminals to support and deliver malware. And the abuse is expected to grow due to the advantages of a “living-off-trusted-sites” strategy for those involved in malware. GitHub, the report says, presents several advantages to malware authors. For example, GitHub domains are seldom blocked by corporate networks, making it a reliable hosting site for malware.
Those cost advantages can be vaporized once a security issue becomes known. The write up continues:
Reliance on this “living-off-trusted-sites” strategy is likely to increase and so organizations are advised to flag or block GitHub services that aren’t normally used and could be abused. Companies, it’s suggested, should also look at their usage of GitHub services in detail to formulate specific defensive strategies.
How about a risk round up?
- The licenses vary. Litigation is a possibility. For big companies with lots of legal eagles, court battles are no problem. Just write a check or cut a deal.
- Forks make it easy for bad actors to exploit some open source projects.
- A big aggregator of open source like MSFT GitHub is not in the open source business and may be deflect criticism without spending money to correct issues as they are discovered. It’s free software, isn’t it.
- The “community” may be composed of good actors who find that cash from what looks like a reputable organization becomes the unwitting dupe of an industrialized cyber gang.
- Commercial products integrating or built upon open source may have to do some very fancy dancing when a problem becomes publicly known.
There are other concerns as well. The problem is that open source’s appeal is now powered by two different performance enhancers. First, is the perception that open source software reduces certain costs. The second is the mad integration of open source smart software.
What’s the fix? My hunch is that words will take the place of meaningful action and remediation. Economic pressure and the desire to use what is free make more sense to many business wizards.
Stephen E Arnold, January 15, 2024
Smart Software for Cyber Security Mavens (Good and Bad Mavens)
November 17, 2023
This essay is the work of a dumb humanoid. No smart software required.
One of my research team (who wishes to maintain a low profile) called my attention to the “Awesome GPTs (Agents) for Cybersecurity.” The list on GitHub says:
The "Awesome GPTs (Agents) Repo" represents an initial effort to compile a comprehensive list of GPT agents focused on cybersecurity (offensive and defensive), created by the community. Please note, this repository is a community-driven project and may not list all existing GPT agents in cybersecurity. Contributions are welcome – feel free to add your own creations!
Open source cyber security tools and smart software can be used by good actors to make people safe. The tools can be used by less good actors to create some interesting situations for cyber security professionals, the elderly, and clueless organizations. Thanks, Microsoft Bing. Does MSFT use these tools to keep people safe or unsafe?
When I viewed the list, it contained more than 30 items. Let me highlight three, and invite you to check out the other 30 at the link to the repository:
- The Threat Intel Bot. This is a specialized GPT for advanced persistent threat intelligence
- The Message Header Analyzer. This dissects email headers for “insights.”
- Hacker Art. The software generates hacker art and nifty profile pictures.
Several observations:
- More tools and services will be forthcoming; thus, the list will grow
- Bad actors and good actors will find software to help them accomplish their objectives.
- A for fee bundle of these will be assembled and offered for sale, probably on eBay or Etsy. (Too bad fr0gger.)
Useful list!
Stephen E Arnold, November 17, 2023
xx
test
Open Source Companies: Bet on Expandability and Extendibility
October 12, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Naturally, a key factor driving adoption of open source software is a need to save money. However, argues Lago co-founder Anh-Tho Chuong, “Open Source Does Not Win by Being Cheaper” than the competition. Not just that, anyway. She writes:
“What we’ve learned is that open-source tools can’t rely on being an open-source alternative to an already successful business. A developer can’t just imitate a product, tag on an MIT license, and call it a day. As awesome as open source is, in a vacuum, it’s not enough to succeed. … [Open-source companies] either need a concrete reason for why they are open source or have to surpass their competitors.”
One caveat: Chuong notes she is speaking of businesses like hers, not sponsored community projects like React, TypeORM, or VSCode. Outfits that need to turn a profit to succeed must offer more than savings to distinguish themselves, she insists. The post notes two specific problems open-source developers should aim to solve: transparency and extensibility. It is important to many companies to know just how their vendors are handling their data (and that of their clients). With closed software one just has to trust information is secure. The transparency of open-source code allows one verify that it is. The extensibility advantage comes from the passion of community developers for plugins, which are often merged into the open-source main branch. It can be difficult for closed-source engineering teams to compete with the resulting extendibility.
See the write-up for examples of both advantages from the likes of MongoDB, PostHog, and Minio. Chuong concludes:
“Both of the above issues contribute to commercial open-source being a better product in the long run. But by tapping the community for feedback and help, open-source projects can also accelerate past closed-source solutions. … Open-source projects—not just commercial open source—have served as a critical driver for the improvement of products for decades. However, some software is going to remain closed source. It’s just the nature of first-mover advantage. But when transparency and extensibility are an issue, an open-source successor becomes a real threat.”
Cynthia Murrell, October 12, 2023