Unified Data Across Governments? How Useful for a Non Participating Country
February 18, 2025
A dinobaby post. No smart software involved.
I spoke with a person whom I have known for a long time. The individual lives and works in Washington, DC. He mentioned “disappeared data.” I did some poking around and, sure enough, certain US government public facing information had been “disappeared.” Interesting. For a short period of time I made a few contributions to what was FirstGov.gov, now USA.gov.
For those who don’t remember or don’t know about President Clinton’s Year 2000 initiative, the idea was interesting. At that time, access to public-facing information on US government servers was via the Web search engines. In order to locate a tax form, one would navigate to an available search system. On Google one would just slap in IRS or IRS and the form number.
Most of the US government public-facing Web sites were reasonably straight forward. Others were fairly difficult to use. The US Marine Corps’ Web site had poor response times. I think it was hosted on something called Server Beach, and the would-be recruit would have to wait for the recruitment station data to appear. The Web page worked but it was slow.
President Clinton wanted or someone in his administration wanted the problem to be fixed with a search system for US government public-facing content. After a bit of work, the system went online in September 2000. The system morphed into a US government portal a bit like the Yahoo.com portal model.
I thought about the information in “Oracle’s Ellison Calls for Governments to Unify Data to Feed AI.” The write up reports:
Oracle Corp.’s co-founder and chairman Larry Ellison said governments should consolidate all national data for consumption by artificial intelligence models, calling this step the “missing link” for them to take full advantage of the technology. Fragmented sets of data about a population’s health, agriculture, infrastructure, procurement and borders should be unified into a single, secure database that can be accessed by AI models…
Several questions arise; for instance:
- What country or company provides the technology?
- Who manages what data are added and what data are deleted?
- What are the rules of access?
- What about public data which are not available for public access; for example, the “disappeared” data from US government Web sites?
- What happens to commercial or quasi-commercial government units which repackage public data and sell it at a hefty mark up?
Based on my brief brush with the original Clinton project, I think the idea is interesting. But I have one other question in mind: What happens when non-participating countries get access to the aggregated public facing data. Digital information is a tricky resource to secure. In fact, once data are digitized and connected to a network, it is fair game. Someone, somewhere will figure out how to access, obtain, exfiltrate, and benefit from aggregated data.
The idea is, in my opinion, a bit of grandstanding like Google’s quantum supremacy claims. But US high technology wizards are ready and willing to think big thoughts and take even bigger actions. We live in interesting times, but I am delighted that I am old.
Stephen E Arnold, February 18, 2025
A Vulnerability Bigger Than SolarWinds? Yes.
February 18, 2025
No smart software. Just a dinobaby doing his thing.
I read an interesting article from WatchTowr Labs. (The spelling is what the company uses, so the url is labs.watchtowr.com.) On February 4, 2024, the company reported that it discovered what one can think of as orphaned or abandoned-but-still alive Amazon S3 “buckets.” The discussion of the firm’s research and what it revealed is presented in “8 Million Requests Later, We Made The SolarWinds Supply Chain Attack Look Amateur.”
The company explains that it was curious if what it calls “abandoned infrastructure” on a cloud platform might yield interesting information relevant to security. We worked through the article and created what in the good old days would have been called an abstract for a database like ABI/INFORM. Here’s our summary:
The article from WatchTowr Labs describes a large-scale experiment where researchers identified and took control of about 150 abandoned Amazon Web Services S3 buckets previously used by various organizations, including governments, militaries, and corporations. Over two months, these buckets received more than eight million requests for software updates, virtual machine images, and sensitive files, exposing a significant vulnerability. Watchtowr explain that bad actors could have injected malicious content. Abandoned infrastructure could be used for supply chain attacks like SolarWinds. Had this happened, the impact would have been significant.
Several observations are warranted:
- Does Amazon Web Services have administrative functions to identify orphaned “buckets” and take action to minimize the attack surface?
- With companies information technology teams abandoning infrastructure, how will these organizations determine if other infrastructure vulnerabilities exist and remediate them?
- What can cyber security vendors’ software and systems do to identify and neutralize these “shoot yourself in the foot” vulnerabilities?
One of the most compelling statements in the WatchTowr article, in my opinion, is:
… we’d demonstrated just how held-together-by-string the Internet is and at the same time point out the reality that we as an industry seem so excited to demonstrate skills that would allow us to defend civilization from a Neo-from-the-Matrix-tier attacker – while a metaphorical drooling-kid-with-a-fork-tier attacker, in reality, has the power to undermine the world.
Is WatchTowr correct? With government and commercial organizations leaving S3 buckets available, perhaps WatchTowr should have included gum, duct tape, and grade-school white glue in its description of the Internet?
Stephen E Arnold, February 18, 2025
Real AI News? Yes, with Fact Checking, Original Research, and Ethics Too
February 17, 2025
This blog post is the work of a real-live dinobaby. No smart software involved.
This is “real” news… if the story is based on fact checking, original research, and those journalistic ethics pontifications. Let’s assume that these conditions of old-fashioned journalism to apply. This means that the story “New York Times Goes All-In on Internal AI Tools” pinpoints a small shift in how “real” news will be produced.
The write up asserts:
The New York Times is greenlighting the use of AI for its product and editorial staff, saying that internal tools could eventually write social copy, SEO headlines, and some code.
Yep, some. There’s ground truth (that’s an old-fashioned journalism concept) in blue-chip consulting. The big money maker is what’s called scope creep. Stated simply, one starts small like a test or a trial. Then if the sky does not fall as quickly as some companies’ revenue, the small gets a bit larger. You check to make sure the moon is in the sky and the revenues are not falling, hopefully as quickly as before. Then you expand. At each step there are meetings, presentations, analyses, and group reassurances from others in the deciders category. Then — like magic! — the small project is the rough equivalent of a nuclear-powered aircraft carrier.
Ah, scope creep.
Understate what one is trying. Watch it. Scale it. End up with an aircraft carrier scale project. Yes, it is happening at an outfit like the New York Times if the cited article is accurate.
What scope creep stage setting appears in the write up? Let look:
- Staff will be trained. You job, one assumes, is safe. (Ho ho ho)
- AI will help uncover “the truth.” (Absolutely)
- More people will benefit (Don’t forget the stakeholders, please)
What’s the write up presenting as actual factual?
The world’s greatest newspaper will embrace hallucinating technology, but only a little bit.
Scope creep begins, and it won’t change a thing, but that information will appear once the cost savings, revenue, and profit data become available at the speed of newspaper decision making.
Stephen E Arnold, February 17, 2025
Sam Altman: The Waffling Man
February 17, 2025
Another dinobaby commentary. No smart software required.
Chaos is good. Flexibility is good. AI is good. Sam Altman, whom I reference as “Sam AI-Man” has some explaining to do. OpenAI is a consumer of cash. The Chinese PR push suggests that Deepseek has found a way to do OpenAI-type computing like Shein and Temu do gym clothes.
I noted “Sam Altman Admits OpenAI Was On the Wrong Side of History in Open Source Debate.” The write up does not come out state, “OpenAI was stupid when it embraced proprietary software’s approach” to meeting user needs. To be frank, Sam AI-Man was not particularly clear either.
The write up says that Sam AI-Man said:
“Yes, we are discussing [releasing model weights],” Altman wrote. “I personally think we have been on the wrong side of history here and need to figure out a different open source strategy.” He noted that not everyone at OpenAI shares his view and it isn’t the company’s current highest priority. The statement represents a remarkable departure from OpenAI’s increasingly proprietary approach in recent years, which has drawn criticism from some AI researchers and former allies, most notably Elon Musk, who is suing the company for allegedly betraying its original open source mission.
My view is that Sam AI-Man wants to emulate other super techno leaders and get whatever he wants. Not surprisingly, other super techno leaders have their own ideas. I would suggest that the objective of these AI jousts is power, control, and money.
“What about the users?” a faint voice asks. “And the investors?” another bold soul queries.
Who?
Stephen E Arnold, February 17, 2025
Software Is Changing and Not for the Better
February 17, 2025
I read a short essay “We Are Destroying Software.” What struck me about the write up was the author’s word choice. For example, here’s a simple frequency count of the terms in the essay:
- The two most popular words in the essay are “destroying” and “software” with 15 occurrences each.
- The word “complex” is used three times
- The words “systems,” “dependencies,” “reinventing,” “wheel,” and “work” are used twice each.
The structure of the essay is a series of declarative statements like this:
We are destroying software claiming that code comments are useless.
I quite like the essay.
Several observations:
- The author is passionate about his subject. “Destroy” is not a neutral word.
- “Complex” appears to be a particular concern. This makes sense. Some systems like those in use at the US Internal Revenue Service may be difficult, if not impossible, to remediate within available budgets and resources. Gradual deterioration seems to be a characteristic of many systems today, particularly when computer technology interfaces with workers.
- The notion of “joy” of hacking comes across, not as a solution to a problem, but the reason the author was motivated to capture his thoughts.
Interesting stuff. Tough to get around entropy, however. Who is the “we” by the way?
Stephen E Arnold, February 17, 2025
IBM Faces DOGE Questions?
February 17, 2025
Simon Willison reminded us of the famous IBM internal training document that reads: “A Computer Can Never Be Held Accountable.” The document is also relevant for AI algorithms. Unfortunately the document has a mysterious history and the IBM Corporate Archives don’t have a copy of the presentation. A Twitter user with the name @bumblebike posted the original image. He said he found it when he went through his father’s papers. Unfortunately, the presentation with the legendary statement was destroyed in a 2019 flood.
I believe the image was first shared online in this tweet by @bumblebike in February 2017. Here’s where they confirm it was from 1979 internal training.
Here’s another tweet from @bumblebike from December 2021 about the flood:
Unfortunately destroyed by flood in 2019 with most of my things. Inquired at the retirees club zoom last week, but there’s almost no one the right age left. Not sure where else to ask.”
We don’t need the actual IBM document to know that IBM hasn’t done well when it comes to search. IBM, like most firms tried and sort of fizzled. (Remember Data Fountain or CLEVER?) IBM also moved into content management. Yep, the semi-Xerox, semi-information thing. But the good news is that a time sharing solution called Watson is doing pretty well. It’s not winning Jeopardy! but it is chugging along.
Now IBM professionals in DC have to answer the Doge nerd squad questions? Why not give OpenAI a whirl? The old Jeopardy! winner is kicking back. Doge wants to know.
Whitney Grace, February 17, 2025
Sweden Embraces Books for Student: A Revolutionary Idea
February 14, 2025
Yep, another dinobaby emission. No smart software required.
Doom scrolling through the weekend’s newsfeeds, I spotted “Sweden Swapped Books for Computers in 2009. Now, They’re Spending Millions to Bring Them Back.” Sweden has some challenges. The problems with kinetic devices are not widely known in Harrod’s Creek, Kentucky, and probably not in other parts of the US. Malmo bears some passing resemblance to parts of urban enclaves like Detroit or Las Vegas. To make life interesting, the country has a keen awareness of everyone’s favorite leader in Russia.
The point of the write up is that Sweden’s shift from old-fashioned dinobaby books to those super wonderful computers and tablets has become unpalatable. The write up reports:
The Nordic country is reportedly exploring ways to reintroduce traditional methods of studying into its educational system.
The reason for the shift to books? The write up observes:
…experts noted that modern, experiential learning methods led to a significant decline in students’ essential skills, such as reading and writing.
Does this statement sound familiar?
Most teachers and parents complain that their kids have increasingly started relying on these devices instead of engaging in classrooms.
Several observations:
- Nothing worthwhile comes easy. Computers became a way to make learning easy. The downside is that for most students, the negatives have life long consequences
- Reversing gradual loss of the capability to concentrate is likely to be a hit-and-miss undertaking.
- Individuals without skills like reading become the new market for talking to a smartphone because writing is too much friction.
How will these individuals, regardless of country, be able to engage in life long learning? The answer is one that may make some people uncomfortable: They won’t. These individuals demonstrate behaviors not well matched to independent, informed thinking.
This dinobaby longs for a time when tiny dinobabies had books, not gizmos. I smell smoke. Oh, I think that’s just some informed mobile phone users burning books.
Stephen E Arnold, February 14, 2025
Who Knew? AI Makes Learning Less Fun
February 14, 2025
Bill Gates was recently on the Jimmy Fallon show to promote his biography. In the interviews Gates shared views on AI stating that AI will replace a lot of jobs. Fallon hoped that TV show hosts wouldn’t be replaced and he probably doesn’t have anything to worry about. Why? Because he’s entertaining and interesting.
Humans love to be entertained, but AI just doesn’t have the capability of pulling it off. Media And Learning shared one teacher’s experience with AI-generated learning videos: “When AI Took Over My Teaching Videos, Students Enjoyed Them Less But Learned The Same.” Media and Learning conducted an experiment to see whether students would learn more from teacher-made or AI-generated videos. Here’s how the experiment went:
“We used generative AI tools to generate teaching videos on four different production management concepts and compared their effectiveness versus human-made videos on the same topics. While the human-made videos took several days to make, the analogous AI videos were completed in a few hours. Evidently, generative AI tools can speed up video production by an order of magnitude.”
The AI videos used ChatGPT written video scripts, MidJourney for illustrations, and HeyGen for teacher avatars. The teacher-made videos were made in the traditional manner of teachers writing scripts, recording themselves, and editing the video in Adobe Premier.
When it came to students retaining and testing on the educational content, both videos yielded the same results. Students, however, enjoyed the teacher-made videos over the AI ones. Why?
“The reduced enjoyment of AI-generated videos may stem from the absence of a personal connection and the nuanced communication styles that human educators naturally incorporate. Such interpersonal elements may not directly impact test scores but contribute to student engagement and motivation, which are quintessential foundations for continued studying and learning.”
Media And Learning suggests that AI could be used to complement instruction time, freeing teachers up to focus on personalized instruction. We’ll see what happens as AI becomes more competent, but we can rest easy for now that human engagement is more interesting than algorithms. Or at least Jimmy Fallon can.
Whitney Grace, February 14, 2025
What Happens When Understanding Technology Is Shallow? Weakness
February 14, 2025
Yep, a dinobaby wrote this blog post. Replace me with a subscription service or a contract worker from Fiverr. See if I care.
I like this question. Even more satisfying is that a big name seems to have answered it. I refer to an essay by Gary Marcus in “The Race for “AI Supremacy” Is Over — at Least for Now.”
Here’s the key passage in my opinion:
China caught up so quickly for many reasons. One that deserves Congressional investigation was Meta’s decision to open source their LLMs. (The question that Congress should ask is, how pivotal was that decision in China’s ability to catch up? Would we still have a lead if they hadn’t done that? Deepseek reportedly got its start in LLMs retraining Meta’s Llama model.) Putting so many eggs in Altman’s basket, as the White House did last week and others have before, may also prove to be a mistake in hindsight. … The reporter Ryan Grim wrote yesterday about how the US government (with the notable exception of Lina Khan) has repeatedly screwed up by placating big companies and doing too little to foster independent innovation
The write up is quite good. What’s missing, in my opinion, is the linkage of a probe to determine how a technology innovation released as a not-so-stealthy open source project can affect the US financial markets. The result was satisfying to the Chinese planners.
Also, the write up does not put the probe or “foray” in a strategic context. China wants to make certain its simple message “China smart, US dumb” gets into the world’s communication channels. That worked quite well.
Finally, the write up does not point out that the US approach to AI has given China an opportunity to demonstrate that it can borrow and refine with aplomb.
Net net: I think China is doing Shien and Temu in the AI and smart software sector.
Stephen E Arnold, February 14, 2025
Hauling Data: Is There a Chance of Derailment?
February 13, 2025
Another dinobaby write up. Only smart software is the lousy train illustration.
I spotted some chatter about US government Web sites going off line. Since I stepped away from the “index the US government” project, I don’t spend much time poking around the content at dot gov and in some cases dot com sites operated by the US government. Let’s assume that some US government servers are now blocked and the content has gone dark to a user looking for information generated by US government entities.
If libraries chug chug down the information railroad tracks to deliver data, what does the “Trouble on the Tracks” sign mean? Thanks, You.com. Good enough.
The fix in most cases is to use Bing.com. My recollection is that a third party like Bing provided the search service to the US government. A good alternative is to use Google.com, the qualifier site: command, and a bit of obscenity. The obscenity causes the Google AI to just generate a semi relevant list of links. In a pinch, you could poke around for a repository of US government information. Unfortunately the Library of Congress is not that repository. The Government Printing Office does not do the job either. The Internet Archive is a hit-and-miss archive operation.
Is there another alternative? Yes. Harvard University announced its Data.gov archive. The institution’s Library Innovation Lab Team said on February 6, 2025:
Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov.
I like this type of archive, but I am a dinobaby, not a forward leaning, “with it” thinker. Information in my mind belongs in a library. A library, in general, should provide students and those seeking information with a place to go to obtain information. The only advertising I see in a library is an announcement about a bake sale to raise funds for children’s reading material.
Will the Harvard initiative and others like it collide with something on the train tracks? Will the money to buy fuel for the engine’s power plant be cut off? Will the train drivers be forced to find work at Shake Shack?
I have no answers. I am glad I am old, but I fondly remember when the job was to index the content on US government servers. The quaint idea formulated by President Clinton was to make US government information available. Now one has to catch a train.
Stephen E Arnold, February 13, 2025