Map Data: USGS Historical Topos

February 20, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The ESRI blog published “Access Over 181,000 USGS Historical Topographic Maps.” The map outfit teamed with the US Geological Survey to provide access to an additional 1,745 maps. The total maps in the collection is now 181,008.

image

The blog reports:

Esri’s USGS historical topographic map collection contains historical quads (excluding orthophoto quads) dating from 1884 to 2006 with scales ranging from 1:10,000 to 1:250,000. The scanned maps can be used in ArcGIS Pro, ArcGIS Online, and ArcGIS Enterprise. They can also be downloaded as georeferenced TIFs for use in other applications.

These data are useful. Maps can be viewed with ESRI’s online service called the Historical Topo Map Explorer. You can access that online service at this link.

If you are not familiar with historical topos, ESRI states in an ARCGIS post:

The USGS topographic maps were designed to serve as base maps for geologists by defining streams, water bodies, mountains, hills, and valleys. Using contours and other precise symbolization, these maps were drawn accurately, made mathematically correct, and edited carefully. The topographic quadrangles gradually evolved to show the changing landscape of a new nation by adding symbolization for important highways; canals; railroads; and railway stations; wagon roads; and the sites of cities, towns and villages. New and revised quadrangles helped geologists map the mineral fields, and assisted populated places to develop safe and plentiful water supplies and lay out new highways. Primary considerations of the USGS were the permanence of features; map symbolization and legibility; and the overall cost of compiling, editing, printing and distributing the maps to government agencies, industry, and the general public. Due to the longevity and the numerous editions of these maps they now serve new audiences such as historians, genealogists, archeologists, and people who are interested in the historical landscape of the U.S.

This public facing data service is one example of extremely useful information gathered by US government entities can be made more accessible via a public-private relationship. When I served on the board of the US National Technical Information Service, I learned that other useful information is available, just not easily accessible to US citizens.

Good work, ESRI and USGS! Now what about making that volcano data a bit easier to find and access in real time?

Stephen E Arnold, February 20, 2024

AI Coding: Better, Faster, Cheaper. Just Pick Two, Please

January 29, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Visual Studio Magazine is not on my must-read list. Nevertheless, one of my research team told me that I needed to read “New GitHub Copilot Research Finds “Downward Pressure on Code Quality.” I had no idea what “downward pressure” means. I read the article trying to figure out what the plain English meaning of this tortured phrase meant. Was it the downward pressure on the metatarsals when a person is running to a job interview? Was it the deadly downward pressure exerted on the OceanGate submersible? Was it the force illustrated in the YouTube “Hydraulic Press Channel”?

image

A partner at a venture firms wants his open source recipients to produce more code better, faster, and cheaper. (He does not explain that one must pick two.) Thanks MSFT Copilot Bing thing. Good enough. But the green? Wow.

Wrong.

The writeup is a content marketing piece for a research report. That’s okay. I think a human may have written most of the article. Despite the frippery in the article, I spotted several factoids. If these are indeed verifiable, excitement in the world of machine generated open source software will ensue. Why does this matter? Well, in the words of the SmartNews content engine, “Read on.”

Here are the items of interest to me:

  1. Bad code is being created and added to the GitHub repositories.
  2. Code is recycled, despite smart efforts to reduce the copy-paste approach to programming.
  3. AI is preparing a field in which lousy, flawed, and possible worse software will flourish.

Stephen E Arnold, January 29, 2024

Open Source Software: Free Gym Shoes for Bad Actors

January 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Many years ago, I completed a number of open source projects. Although different clients hired my team and me, the big question was, “What’s the future of open source software as an investment opportunity and as a substitute for commercial software. Our work focused on two major points:

  1. Community support for a widely-used software once the original developer moved on
  2. A way to save money and get rid of the “licensing handcuffs” commercial software companies clamped on their customers
  3. Security issues resulting from poisoned code or obfuscated “special features.:

My recollection is that the customers focused on one point, the opportunity to save money. Commercial software vendors were in the “lock in” game, and open source software for database, utility, and search and retrieval.

Today, a young innovator may embrace an open source solution to the generative smart software approach to innovation. Apart from the issues embedded in the large language model methods themselves, building a product on other people’s code available a open source software looks like a certain path to money.

image

An open source game plan sounds like a winner. Then upon starting work, the path reveals its risks. Thanks, MSFT Copilot, you exhausted me this morning. Good enough.

I thought about our work in open source when I read “So, Are We Going to Talk about How GitHub Is an Absolute Boon for Malware, or Nah?” The write up opines:

In a report published on Thursday, security shop Recorded Future warns that GitHub’s infrastructure is frequently abused by criminals to support and deliver malware. And the abuse is expected to grow due to the advantages of a “living-off-trusted-sites” strategy for those involved in malware. GitHub, the report says, presents several advantages to malware authors. For example, GitHub domains are seldom blocked by corporate networks, making it a reliable hosting site for malware.

Those cost advantages can be vaporized once a security issue becomes known. The write up continues:

Reliance on this “living-off-trusted-sites” strategy is likely to increase and so organizations are advised to flag or block GitHub services that aren’t normally used and could be abused. Companies, it’s suggested, should also look at their usage of GitHub services in detail to formulate specific defensive strategies.

How about a risk round up?

  1. The licenses vary. Litigation is a possibility. For big companies with lots of legal eagles, court battles are no problem. Just write a check or cut a deal.
  2. Forks make it easy for bad actors to exploit some open source projects.
  3. A big aggregator of open source like MSFT GitHub is not in the open source business and may be deflect criticism without spending money to correct issues as they are discovered. It’s free software, isn’t it.
  4. The “community” may be composed of good actors who find that cash from what looks like a reputable organization becomes the unwitting dupe of an industrialized cyber gang.
  5. Commercial products integrating or built upon open source may have to do some very fancy dancing when a problem becomes publicly known.

There are other concerns as well. The problem is that open source’s appeal is now powered by two different performance enhancers. First, is the perception that open source software reduces certain costs. The second is the mad integration of open source smart software.

What’s the fix? My hunch is that words will take the place of meaningful action and remediation. Economic pressure and the desire to use what is free make more sense to many business wizards.

Stephen E Arnold, January 15, 2024

Smart Software for Cyber Security Mavens (Good and Bad Mavens)

November 17, 2023

green-dino_thumb_thumbThis essay is the work of a dumb humanoid. No smart software required.

One of my research team (who wishes to maintain a low profile) called my attention to the “Awesome GPTs (Agents) for Cybersecurity.” The list on GitHub says:

The "Awesome GPTs (Agents) Repo" represents an initial effort to compile a comprehensive list of GPT agents focused on cybersecurity (offensive and defensive), created by the community. Please note, this repository is a community-driven project and may not list all existing GPT agents in cybersecurity. Contributions are welcome – feel free to add your own creations!

image

Open source cyber security tools and smart software can be used by good actors to make people safe. The tools can be used by less good actors to create some interesting situations for cyber security professionals, the elderly, and clueless organizations. Thanks, Microsoft Bing. Does MSFT use these tools to keep people safe or unsafe?

When I viewed the list, it contained more than 30 items. Let me highlight three, and invite you to check out the other 30 at the link to the repository:

  1. The Threat Intel Bot. This is a specialized GPT for advanced persistent threat intelligence
  2. The Message Header Analyzer. This dissects email headers for “insights.”
  3. Hacker Art. The software generates hacker art and nifty profile pictures.

Several observations:

  • More tools and services will be forthcoming; thus, the list will grow
  • Bad actors and good actors will find software to help them accomplish their objectives.
  • A for fee bundle of these will be assembled and offered for sale, probably on eBay or Etsy. (Too bad fr0gger.)

Useful list!

Stephen E Arnold, November 17, 2023

xx

test

Open Source Companies: Bet on Expandability and Extendibility

October 12, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[2]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Naturally, a key factor driving adoption of open source software is a need to save money. However, argues Lago co-founder Anh-Tho Chuong, “Open Source Does Not Win by Being Cheaper” than the competition. Not just that, anyway. She writes:

“What we’ve learned is that open-source tools can’t rely on being an open-source alternative to an already successful business. A developer can’t just imitate a product, tag on an MIT license, and call it a day. As awesome as open source is, in a vacuum, it’s not enough to succeed. … [Open-source companies] either need a concrete reason for why they are open source or have to surpass their competitors.”

One caveat: Chuong notes she is speaking of businesses like hers, not sponsored community projects like React, TypeORM, or VSCode. Outfits that need to turn a profit to succeed must offer more than savings to distinguish themselves, she insists. The post notes two specific problems open-source developers should aim to solve: transparency and extensibility. It is important to many companies to know just how their vendors are handling their data (and that of their clients). With closed software one just has to trust information is secure. The transparency of open-source code allows one verify that it is. The extensibility advantage comes from the passion of community developers for plugins, which are often merged into the open-source main branch. It can be difficult for closed-source engineering teams to compete with the resulting extendibility.

See the write-up for examples of both advantages from the likes of MongoDB, PostHog, and Minio. Chuong concludes:

“Both of the above issues contribute to commercial open-source being a better product in the long run. But by tapping the community for feedback and help, open-source projects can also accelerate past closed-source solutions. … Open-source projects—not just commercial open source—have served as a critical driver for the improvement of products for decades. However, some software is going to remain closed source. It’s just the nature of first-mover advantage. But when transparency and extensibility are an issue, an open-source successor becomes a real threat.”

Cynthia Murrell, October 12, 2023

Python Algorithms? Hello, Excel

September 27, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Believe it or not, the T-Mobile WiFi worked on an eight-hour Delta flight from Europe to the Atlanta airport on September 24, 2023. Who knew?

On that flight I came across a page on GitHub called “TheAlgorithms” (sic). I clicked and browsed and was quite impressed with 40 categories and the specific algorithms within each. The “Other” category had two dozen algorithms ranging from a doomsday algorithm to a method to replace flake8 with ruff.

The individual categories include some AI magnets like “Neural Network” and “Machine Learning.” Remember there are more than 35 additional baskets. There’s only one python routine for “Genetic Algorithms” but categories like “Physics” and “Searches” seem particularly useful.

The collection has a disclaimer; to wit:

The algorithms are implemented in Python for education purpose only. These are just for demonstration purpose.

Some Excel jockeys may find some of them useful. My hunch is that second semester computer science majors may find “inspiration” in this collection.

Stephen E Arnold, September 27, 2023

Llama Beans? Is That the LLM from Zuckbook?

August 4, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

We love open-source projects. Camelids that masquerade as such, not so much. According to The Register, “Meta Can Call Llama 2 Open Source as Much as It Likes, but That Doesn’t Mean It Is.” The company asserts its new large language model is open source because it is freely available for research and (some) commercial use. Are Zuckerburg and his team of Meta marketers fuzzy on the definition of open source? Writer Steven J. Vaughan-Nichols builds his case with quotes from several open source authorities. First up:

“As Erica Brescia, a managing director at RedPoint, the open source-friendly venture capital firm, asked: ‘Can someone please explain to me how Meta and Microsoft can justify calling Llama 2 open source if it doesn’t actually use an OSI [Open Source Initiative]-approved license or comply with the OSD [Open Source Definition]? Are they intentionally challenging the definition of OSS [Open Source Software]?'”

Maybe they are trying. After all, open source is good for business. And being open to crowd-sourced improvements does help the product. However, as the post continues:

“The devil is in the details when it comes to open source. And there, Meta, with its Llama 2 Community License Agreement, falls on its face. As The Register noted earlier, the community agreement forbids the use of Llama 2 to train other language models; and if the technology is used in an app or service with more than 700 million monthly users, a special license is required from Meta. It’s also not on the Open Source Initiative’s list of open source licenses.”

Next, we learn OSI‘s executive director Stefano Maffulli directly states Llama 2 does not meet his organization’s definition of open source. The write-up quotes him:

“While I’m happy that Meta is pushing the bar of available access to powerful AI systems, I’m concerned about the confusion by some who celebrate Llama 2 as being open source: if it were, it wouldn’t have any restrictions on commercial use (points 5 and 6 of the Open Source Definition). As it is, the terms Meta has applied only allow some commercial use. The keyword is some.”

Maffulli further clarifies Meta’s license specifically states Amazon, Google, Microsoft, Bytedance, Alibaba, and any startup that grows too much may not use the LLM. Such a restriction is a no-no in actual open source projects. Finally, Software Freedom Conservancy executive Karen Sandler observes:

“It looks like Meta is trying to push a license that has some trappings of an open source license but, in fact, has the opposite result. Additionally, the Acceptable Use Policy, which the license requires adherence to, lists prohibited behaviors that are very expansively written and could be very subjectively applied.”

Perhaps most egregious for Sandler is the absence of a public drafting or comment process for the Llama 2 license. Llamas are not particularly speedy creatures.

Cynthia Murrell, August 4, 2023

The Future of Open Source: Appropriation and Indifference

March 1, 2023

Big companies love open source software. There are zero or minimal license fees and other people fix the bugs. Not surprisingly the individuals who create open source software face some challenges.

The essay “Open Source Is Broken: The Sad Story of Denis Pushkarev (Core-js)” explains how one developer got the shaft. What’s the fix? Here’s part of the conclusion to the essay:

We often hear that open-source is great, good, ethical compared to close-source and all the typical woo-woo. But in the real world, this isn’t enough. You don’t live and pay bills by doing good things: you need to have some business skills. This doesn’t make you a bad person: if you don’t have enough motivation to work on your open-source project, it simply won’t last.  You need to promote yourself and your open-source project.

I read this as saying, “More, better marketing.”

Why not suggest non-profit consortia able to fund certain projects? Why not suggest commercial enterprises embrace a kinder, gentler approach to code appropriation? Why not suggest a healthier balance between profit seeking and ethical behavior?

I know.

No one cares. Makes one proud to incorporate open source software into a commercial environment and charge people to use the work of an individual or team who wanted to do “good,” doesn’t it.  Blindspot? I think it depends on whom one asks.

Stephen E Arnold,March 1, 2023

Goggle Points Out the ChatGPT Has a Core Neural Disorder: LSD or Spoiled Baloney?

February 16, 2023

I am an old-fashioned dinobaby. I have a reasonably good memory for great moments in search and retrieval. I recall when Danny Sullivan told me that search engine optimization improves relevance. In 2006, Prabhakar Raghavan on a conference call with a Managing Director of a so-so financial outfit explained that Yahoo had semantic technology that made Google’s pathetic effort look like outdated technology.

psy pizza 1 copy

Hallucinating pizza courtesy of the super smart AI app Craiyon.com. The art, not the write up it accompanies, was created by smart software. The article is the work of the dinobaby, Stephen E Arnold. Looks like pizza to me. Close enough for horseshoes like so many zippy technologies.

Now that SEO and its spawn are scrambling to find a way to fiddle with increasingly weird methods for making software return results the search engine optimization crowd’s customers demand, Google’s head of search Prabhakar Raghavan is opining about the oh, so miserable work of Open AI and its now TikTok trend ChatGPT. May I remind you, gentle reader, that OpenAI availed itself of some Googley open source smart software and consulted with some Googlers as it ramped up to the tsunami of PR ripples? May I remind you that Microsoft said, “Yo, we’re putting some OpenAI goodies in PowerPoint.” The world rejoiced and Reddit plus Twitter kicked into rave mode.

Google responded with a nifty roll out in Paris. February is not April, but maybe it should have been in April 2023, not in les temp d’hiver?

I read with considerable amusement “Google Vice President Warns That AI Chatbots Are Hallucinating.” The write up states as rock solid George Washington I cannot tell a lie truth the following:

Speaking to German newspaper Welt am Sonntag, Raghavan warned that users may be delivered complete nonsense by chatbots, despite answers seeming coherent. “This type of artificial intelligence we’re talking about can sometimes lead to something we call hallucination,” Raghavan told Welt Am Sonntag. “This is then expressed in such a way that a machine delivers a convincing but completely fictitious answer.”

LSD or just the Google code relied upon? Was it the Googlers of whom OpenAI asked questions? Was it reading the gems of wisdom in Google patent documents? Was it coincidence?

I recall that Dr. Timnit Gebru and her co-authors of the Stochastic Parrot paper suggest that life on the Google island was not palm trees and friendly natives. Nope. Disagree with the Google and your future elsewhere awaits.

Now we have the hallucination issue. The implication is that smart software like Google-infused OpenAI is addled. It imagines things. It hallucinates. It is living in a fantasy land with bean bag chairs, Foosball tables, and memories of Odwalla juice.

I wrote about the after-the-fact yip yap from Google’s Chair Person of the Board. I mentioned the Father of the Darned Internet’s post ChatGPT PR blasts. Now we have the head of search’s observation about screwed up neural networks.

Yep, someone from Verity should know about flawed software. Yep, someone from Yahoo should be familiar with using PR to mask spectacular failure in search. Yep, someone from Google is definitely in a position to suggest that smart software may be somewhat unreliable because of fundamental flaws in the systems and methods implemented at Google and probably other outfits loving the Tensor T shirts.

Stephen E Arnold, February 16, 2023

Secrets Patterns Database

February 15, 2023

One of my researchers called my attention to “Secrets Patterns Database.” For those interested in finding “secrets”, you may want to take a look. The data and scripts are available on GitHub… for now. Among its features are:

  • “Over 1600 regular expressions for detecting secrets, passwords, API keys, tokens, and more.
  • Format agnostic. A Single format that supports secret detection tools, including Trufflehog and Gitleaks.
  • Tested and reviewed Regular expressions.
  • Categorized by confidence levels of each pattern.
  • All regular expressions are tested against ReDos attacks.”

Links to the author’s Web site and LinkedIn profile appear in the GitHub notes.

Stephen E Arnold, February 20, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta