AI Hermeneutics: The Fire Fights of Interpretation Flame

March 12, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

My hunch is that not too many of the thumb-typing, TikTok generation know what hermeneutics means. Furthermore, like most of their parents, these future masters of the phone-iverse don’t care. “Let software think for me” would make a nifty T shirt slogan at a technology conference.

This morning (March 12, 2024) I read three quite different write ups. Let me highlight each and then link the content of those documents to the the problem of interpretation of religious texts.

image

Thanks, MSFT Copilot. I am confident your security team is up to this task.

The first write up is a news story called “Elon Musk’s AI to Open Source Grok This Week.” The main point for me is that Mr. Musk will put the label “open source” on his Grok artificial intelligence software. The write up includes an interesting quote; to wit:

Musk further adds that the whole idea of him founding OpenAI was about open sourcing AI. He highlighted his discussion with Larry Page, the former CEO of Google, who was Musk’s friend then. “I sat in his house and talked about AI safety, and Larry did not care about AI safety at all.”

The implication is that Mr. Musk does care about safety. Okay, let’s accept that.

The second story is an ArXiv paper called “Stealing Part of a Production Language Model.” The authors are nine Googlers, two ETH wizards, one University of Washington professor, one OpenAI researcher, and one McGill University smart software luminary. In short, the big outfits are making clear that closed or open, software is rising to the task of revealing some of the inner workings of these “next big things.” The paper states:

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2…. For under $20 USD, our attack extracts the entire projection matrix of OpenAI’s ada and babbage language models.

The third item is “How Do Neural Networks Learn? A Mathematical Formula Explains How They Detect Relevant Patterns.” The main idea of this write up is that software can perform an X-ray type analysis of a black box and present some useful data about the inner workings of numerical recipes about which many AI “experts” feign total ignorance.

Several observations:

  1. Open source software is available to download largely without encumbrances. Good actors and bad actors can use this software and its components to let users put on a happy face or bedevil the world’s cyber security experts. Either way, smart software is out of the bag.
  2. In the event that someone or some organization has secrets buried in its software, those secrets can be exposed. One the secret is known, the good actors and the bad actors can surf on that information.
  3. The notion of an attack surface for smart software now includes the numerical recipes and the model itself. Toss in the notion of data poisoning, and the notion of vulnerability must be recast from a specific attack to a much larger type of exploitation.

Net net: I assume the many committees, NGOs, and government entities discussing AI have considered these points and incorporated these articles into informed policies. In the meantime, the AI parade continues to attract participants. Who has time to fool around with the hermeneutics of smart software?

Stephen E Arnold, March 12, 2024

Open Source: Free, Easy, and Fast Sort Of

February 29, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Not long ago, I spoke with an open source cheerleader. The pros outweighed the cons from this technologist’s point of view. (I would like to ID the individual, but I try to avoid having legal eagles claw their way into my modest nest in rural Kentucky. Just plug in “John Wizard Doe”, a high profile entrepreneur and graduate of a big time engineering school.)

image

I think going up suggests a problem.

Here are highlights of my notes about the upside of open source:

  1. Many smart people eyeball the code and problems are spotted and fixed
  2. Fixes get made and deployed more rapidly than commercial software which of works on an longer “fix” cycle
  3. Dead end software can be given new kidneys or maybe a heart with a fork
  4. For most use cases, the software is free or cheaper than commercial products
  5. New functions become available; some of which fuel new product opportunities.

There may be a few others, but let’s look at a downside few open source cheerleaders want to talk about. I don’t want to counter the widely held belief that “many smart people eyeball the code.” The method is grab and go. The speed angle is relative. Reviving open source again and again is quite useful; bad actors do this. Most people just recycle. The “free” angle is a big deal. Everyone like “free” because why not? New functions become available so new markets are created. Perhaps. But in the cyber crime space, innovation boils down to finding a mistake that can be exploited with good enough open source components, often with some mileage on their chassis.

But the one point open source champions crank back on the rah rah output. “Over 100,000 Infected Repos Found on GitHub.” I want to point out that GitHub is a Microsoft, the all-time champion in security, owns GitHub. If you think about Microsoft and security too much, you may come away confused. I know I do. I also get a headache.

This “Infected Repos” API IRO article asserts:

Our security research and data science teams detected a resurgence of a malicious repo confusion campaign that began mid-last year, this time on a much larger scale. The attack impacts more than 100,000 GitHub repositories (and presumably millions) when unsuspecting developers use repositories that resemble known and trusted ones but are, in fact, infected with malicious code.

The write up provides excellent information about how the bad repos create problems and provides a recipe for do this type of malware distribution yourself. (As you know, I am not too keen on having certain information with helpful detail easily available, but I am a dinobaby, and dinobabies have crazy ideas.)

If we confine our thinking to the open source champion’s five benefits, I think security issues may be more important in some use cases.The better question is, “Why don’t open source supporters like Microsoft and the person with whom I spoke want to talk about open source security?” My view is that:

  1. Security is an after thought or a never thought facet of open source software
  2. Making money is Job #1, so free trumps spending money to make sure the open source software is secure
  3. Open source appeals to some venture capitalists. Why? RedHat, Elastic, and a handful of other “open source plays”.

Net net: Just visualize a future in which smart software ingests poisoned code, and programmers who rely on smart software to make them a 10X engineer. Does that create a bit of a problem? Of course not. Microsoft is the security champ, and GitHub is Microsoft.

Stephen E Arnold, February 29, 2024

Map Data: USGS Historical Topos

February 20, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The ESRI blog published “Access Over 181,000 USGS Historical Topographic Maps.” The map outfit teamed with the US Geological Survey to provide access to an additional 1,745 maps. The total maps in the collection is now 181,008.

image

The blog reports:

Esri’s USGS historical topographic map collection contains historical quads (excluding orthophoto quads) dating from 1884 to 2006 with scales ranging from 1:10,000 to 1:250,000. The scanned maps can be used in ArcGIS Pro, ArcGIS Online, and ArcGIS Enterprise. They can also be downloaded as georeferenced TIFs for use in other applications.

These data are useful. Maps can be viewed with ESRI’s online service called the Historical Topo Map Explorer. You can access that online service at this link.

If you are not familiar with historical topos, ESRI states in an ARCGIS post:

The USGS topographic maps were designed to serve as base maps for geologists by defining streams, water bodies, mountains, hills, and valleys. Using contours and other precise symbolization, these maps were drawn accurately, made mathematically correct, and edited carefully. The topographic quadrangles gradually evolved to show the changing landscape of a new nation by adding symbolization for important highways; canals; railroads; and railway stations; wagon roads; and the sites of cities, towns and villages. New and revised quadrangles helped geologists map the mineral fields, and assisted populated places to develop safe and plentiful water supplies and lay out new highways. Primary considerations of the USGS were the permanence of features; map symbolization and legibility; and the overall cost of compiling, editing, printing and distributing the maps to government agencies, industry, and the general public. Due to the longevity and the numerous editions of these maps they now serve new audiences such as historians, genealogists, archeologists, and people who are interested in the historical landscape of the U.S.

This public facing data service is one example of extremely useful information gathered by US government entities can be made more accessible via a public-private relationship. When I served on the board of the US National Technical Information Service, I learned that other useful information is available, just not easily accessible to US citizens.

Good work, ESRI and USGS! Now what about making that volcano data a bit easier to find and access in real time?

Stephen E Arnold, February 20, 2024

AI Coding: Better, Faster, Cheaper. Just Pick Two, Please

January 29, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Visual Studio Magazine is not on my must-read list. Nevertheless, one of my research team told me that I needed to read “New GitHub Copilot Research Finds “Downward Pressure on Code Quality.” I had no idea what “downward pressure” means. I read the article trying to figure out what the plain English meaning of this tortured phrase meant. Was it the downward pressure on the metatarsals when a person is running to a job interview? Was it the deadly downward pressure exerted on the OceanGate submersible? Was it the force illustrated in the YouTube “Hydraulic Press Channel”?

image

A partner at a venture firms wants his open source recipients to produce more code better, faster, and cheaper. (He does not explain that one must pick two.) Thanks MSFT Copilot Bing thing. Good enough. But the green? Wow.

Wrong.

The writeup is a content marketing piece for a research report. That’s okay. I think a human may have written most of the article. Despite the frippery in the article, I spotted several factoids. If these are indeed verifiable, excitement in the world of machine generated open source software will ensue. Why does this matter? Well, in the words of the SmartNews content engine, “Read on.”

Here are the items of interest to me:

  1. Bad code is being created and added to the GitHub repositories.
  2. Code is recycled, despite smart efforts to reduce the copy-paste approach to programming.
  3. AI is preparing a field in which lousy, flawed, and possible worse software will flourish.

Stephen E Arnold, January 29, 2024

Open Source Software: Free Gym Shoes for Bad Actors

January 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Many years ago, I completed a number of open source projects. Although different clients hired my team and me, the big question was, “What’s the future of open source software as an investment opportunity and as a substitute for commercial software. Our work focused on two major points:

  1. Community support for a widely-used software once the original developer moved on
  2. A way to save money and get rid of the “licensing handcuffs” commercial software companies clamped on their customers
  3. Security issues resulting from poisoned code or obfuscated “special features.:

My recollection is that the customers focused on one point, the opportunity to save money. Commercial software vendors were in the “lock in” game, and open source software for database, utility, and search and retrieval.

Today, a young innovator may embrace an open source solution to the generative smart software approach to innovation. Apart from the issues embedded in the large language model methods themselves, building a product on other people’s code available a open source software looks like a certain path to money.

image

An open source game plan sounds like a winner. Then upon starting work, the path reveals its risks. Thanks, MSFT Copilot, you exhausted me this morning. Good enough.

I thought about our work in open source when I read “So, Are We Going to Talk about How GitHub Is an Absolute Boon for Malware, or Nah?” The write up opines:

In a report published on Thursday, security shop Recorded Future warns that GitHub’s infrastructure is frequently abused by criminals to support and deliver malware. And the abuse is expected to grow due to the advantages of a “living-off-trusted-sites” strategy for those involved in malware. GitHub, the report says, presents several advantages to malware authors. For example, GitHub domains are seldom blocked by corporate networks, making it a reliable hosting site for malware.

Those cost advantages can be vaporized once a security issue becomes known. The write up continues:

Reliance on this “living-off-trusted-sites” strategy is likely to increase and so organizations are advised to flag or block GitHub services that aren’t normally used and could be abused. Companies, it’s suggested, should also look at their usage of GitHub services in detail to formulate specific defensive strategies.

How about a risk round up?

  1. The licenses vary. Litigation is a possibility. For big companies with lots of legal eagles, court battles are no problem. Just write a check or cut a deal.
  2. Forks make it easy for bad actors to exploit some open source projects.
  3. A big aggregator of open source like MSFT GitHub is not in the open source business and may be deflect criticism without spending money to correct issues as they are discovered. It’s free software, isn’t it.
  4. The “community” may be composed of good actors who find that cash from what looks like a reputable organization becomes the unwitting dupe of an industrialized cyber gang.
  5. Commercial products integrating or built upon open source may have to do some very fancy dancing when a problem becomes publicly known.

There are other concerns as well. The problem is that open source’s appeal is now powered by two different performance enhancers. First, is the perception that open source software reduces certain costs. The second is the mad integration of open source smart software.

What’s the fix? My hunch is that words will take the place of meaningful action and remediation. Economic pressure and the desire to use what is free make more sense to many business wizards.

Stephen E Arnold, January 15, 2024

Smart Software for Cyber Security Mavens (Good and Bad Mavens)

November 17, 2023

green-dino_thumb_thumbThis essay is the work of a dumb humanoid. No smart software required.

One of my research team (who wishes to maintain a low profile) called my attention to the “Awesome GPTs (Agents) for Cybersecurity.” The list on GitHub says:

The "Awesome GPTs (Agents) Repo" represents an initial effort to compile a comprehensive list of GPT agents focused on cybersecurity (offensive and defensive), created by the community. Please note, this repository is a community-driven project and may not list all existing GPT agents in cybersecurity. Contributions are welcome – feel free to add your own creations!

image

Open source cyber security tools and smart software can be used by good actors to make people safe. The tools can be used by less good actors to create some interesting situations for cyber security professionals, the elderly, and clueless organizations. Thanks, Microsoft Bing. Does MSFT use these tools to keep people safe or unsafe?

When I viewed the list, it contained more than 30 items. Let me highlight three, and invite you to check out the other 30 at the link to the repository:

  1. The Threat Intel Bot. This is a specialized GPT for advanced persistent threat intelligence
  2. The Message Header Analyzer. This dissects email headers for “insights.”
  3. Hacker Art. The software generates hacker art and nifty profile pictures.

Several observations:

  • More tools and services will be forthcoming; thus, the list will grow
  • Bad actors and good actors will find software to help them accomplish their objectives.
  • A for fee bundle of these will be assembled and offered for sale, probably on eBay or Etsy. (Too bad fr0gger.)

Useful list!

Stephen E Arnold, November 17, 2023

xx

test

Open Source Companies: Bet on Expandability and Extendibility

October 12, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[2]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Naturally, a key factor driving adoption of open source software is a need to save money. However, argues Lago co-founder Anh-Tho Chuong, “Open Source Does Not Win by Being Cheaper” than the competition. Not just that, anyway. She writes:

“What we’ve learned is that open-source tools can’t rely on being an open-source alternative to an already successful business. A developer can’t just imitate a product, tag on an MIT license, and call it a day. As awesome as open source is, in a vacuum, it’s not enough to succeed. … [Open-source companies] either need a concrete reason for why they are open source or have to surpass their competitors.”

One caveat: Chuong notes she is speaking of businesses like hers, not sponsored community projects like React, TypeORM, or VSCode. Outfits that need to turn a profit to succeed must offer more than savings to distinguish themselves, she insists. The post notes two specific problems open-source developers should aim to solve: transparency and extensibility. It is important to many companies to know just how their vendors are handling their data (and that of their clients). With closed software one just has to trust information is secure. The transparency of open-source code allows one verify that it is. The extensibility advantage comes from the passion of community developers for plugins, which are often merged into the open-source main branch. It can be difficult for closed-source engineering teams to compete with the resulting extendibility.

See the write-up for examples of both advantages from the likes of MongoDB, PostHog, and Minio. Chuong concludes:

“Both of the above issues contribute to commercial open-source being a better product in the long run. But by tapping the community for feedback and help, open-source projects can also accelerate past closed-source solutions. … Open-source projects—not just commercial open source—have served as a critical driver for the improvement of products for decades. However, some software is going to remain closed source. It’s just the nature of first-mover advantage. But when transparency and extensibility are an issue, an open-source successor becomes a real threat.”

Cynthia Murrell, October 12, 2023

Python Algorithms? Hello, Excel

September 27, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Believe it or not, the T-Mobile WiFi worked on an eight-hour Delta flight from Europe to the Atlanta airport on September 24, 2023. Who knew?

On that flight I came across a page on GitHub called “TheAlgorithms” (sic). I clicked and browsed and was quite impressed with 40 categories and the specific algorithms within each. The “Other” category had two dozen algorithms ranging from a doomsday algorithm to a method to replace flake8 with ruff.

The individual categories include some AI magnets like “Neural Network” and “Machine Learning.” Remember there are more than 35 additional baskets. There’s only one python routine for “Genetic Algorithms” but categories like “Physics” and “Searches” seem particularly useful.

The collection has a disclaimer; to wit:

The algorithms are implemented in Python for education purpose only. These are just for demonstration purpose.

Some Excel jockeys may find some of them useful. My hunch is that second semester computer science majors may find “inspiration” in this collection.

Stephen E Arnold, September 27, 2023

Llama Beans? Is That the LLM from Zuckbook?

August 4, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

We love open-source projects. Camelids that masquerade as such, not so much. According to The Register, “Meta Can Call Llama 2 Open Source as Much as It Likes, but That Doesn’t Mean It Is.” The company asserts its new large language model is open source because it is freely available for research and (some) commercial use. Are Zuckerburg and his team of Meta marketers fuzzy on the definition of open source? Writer Steven J. Vaughan-Nichols builds his case with quotes from several open source authorities. First up:

“As Erica Brescia, a managing director at RedPoint, the open source-friendly venture capital firm, asked: ‘Can someone please explain to me how Meta and Microsoft can justify calling Llama 2 open source if it doesn’t actually use an OSI [Open Source Initiative]-approved license or comply with the OSD [Open Source Definition]? Are they intentionally challenging the definition of OSS [Open Source Software]?'”

Maybe they are trying. After all, open source is good for business. And being open to crowd-sourced improvements does help the product. However, as the post continues:

“The devil is in the details when it comes to open source. And there, Meta, with its Llama 2 Community License Agreement, falls on its face. As The Register noted earlier, the community agreement forbids the use of Llama 2 to train other language models; and if the technology is used in an app or service with more than 700 million monthly users, a special license is required from Meta. It’s also not on the Open Source Initiative’s list of open source licenses.”

Next, we learn OSI‘s executive director Stefano Maffulli directly states Llama 2 does not meet his organization’s definition of open source. The write-up quotes him:

“While I’m happy that Meta is pushing the bar of available access to powerful AI systems, I’m concerned about the confusion by some who celebrate Llama 2 as being open source: if it were, it wouldn’t have any restrictions on commercial use (points 5 and 6 of the Open Source Definition). As it is, the terms Meta has applied only allow some commercial use. The keyword is some.”

Maffulli further clarifies Meta’s license specifically states Amazon, Google, Microsoft, Bytedance, Alibaba, and any startup that grows too much may not use the LLM. Such a restriction is a no-no in actual open source projects. Finally, Software Freedom Conservancy executive Karen Sandler observes:

“It looks like Meta is trying to push a license that has some trappings of an open source license but, in fact, has the opposite result. Additionally, the Acceptable Use Policy, which the license requires adherence to, lists prohibited behaviors that are very expansively written and could be very subjectively applied.”

Perhaps most egregious for Sandler is the absence of a public drafting or comment process for the Llama 2 license. Llamas are not particularly speedy creatures.

Cynthia Murrell, August 4, 2023

The Future of Open Source: Appropriation and Indifference

March 1, 2023

Big companies love open source software. There are zero or minimal license fees and other people fix the bugs. Not surprisingly the individuals who create open source software face some challenges.

The essay “Open Source Is Broken: The Sad Story of Denis Pushkarev (Core-js)” explains how one developer got the shaft. What’s the fix? Here’s part of the conclusion to the essay:

We often hear that open-source is great, good, ethical compared to close-source and all the typical woo-woo. But in the real world, this isn’t enough. You don’t live and pay bills by doing good things: you need to have some business skills. This doesn’t make you a bad person: if you don’t have enough motivation to work on your open-source project, it simply won’t last.  You need to promote yourself and your open-source project.

I read this as saying, “More, better marketing.”

Why not suggest non-profit consortia able to fund certain projects? Why not suggest commercial enterprises embrace a kinder, gentler approach to code appropriation? Why not suggest a healthier balance between profit seeking and ethical behavior?

I know.

No one cares. Makes one proud to incorporate open source software into a commercial environment and charge people to use the work of an individual or team who wanted to do “good,” doesn’t it.  Blindspot? I think it depends on whom one asks.

Stephen E Arnold,March 1, 2023

Next Page »

  • Archives

  • Recent Posts

  • Meta