Old Code, New Code: Can You Make It Work Again… Sort Of?

March 18, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Even hippy dippy super slick AI start ups have a technical debt problem. It is, in my opinion, no different from the “costs” imposed on outfits like JPMorgan Chase or (heaven help us) AMTRAK. Software which mostly works is subject to two environmental problems. First, the people who wrote the code or made it work that last time catastrophe struck (hello, AT&T, how are those pushed updates working for you now?) move on, quit, or whatever. Second, the technical options for remediating the problem are evolving (how are those security hot fixes working out, Microsoft?).

image

The helpful father asks an question the aspiring engineer cannot answer. Thus it was when the wizard was a child, and it is when the wizard is working on a modern engineering project. Buildings tip; aircraft lose doors and wheels. Software updates kill computers. Self-driving cars cannot. Thanks, MSFT Copilot. Did you get your model airplane to fly when you were a wee lad? I think I know the answer.

I thought about this problem of the cost of code remediating, fixing, redoing, upgrading or whatever term fast-talking sales engineers use in their Zooms and PowerPoints as I read “The High-Risk Refactoring.” The write up does a good job of explaining in a gentle way what happens when suits authorize making old code like new again. (The suits do not know the agonies of the original developers, but why should “history” intrude on a whiz bang GenX or GenY management type?

The article says:

it’s highly important to ensure the system works the same way after the swap with the new code. In that regard, immediately spotting when something breaks throughout the whole refactoring process is very helpful. No one wants to find that out in production.

No kidding.

In most cases, there are insufficient skilled people and money to create a new or revamped system, get it up and running in parallel for an appropriate period of time, identify the problems, remediate them, and then make the cut over. People buy cars this way, but that’s not how most organizations, regardless of size, “do” software. Okay, the take your car in, buy a new one, and drive off will not work in today’s business environment.

The write up focuses on what most organizations do; that is, write or fix new code and stick it into a system. There may or may not be resources for a staging server, but the result is the same. The old software has been “fixed” and the documentation is “sort of written” and people move on to other work or in the case of consulting engineering firms, just get replaced by a new, higher margin professional.

The write up takes a different approach and concludes with four suggestions or questions to ask. I quote:

“Refactor if things are getting too complicated, but  stop if can’t prove it works.

Accompany new features with refactoring for areas you foresee to be subject to a change, but copy-pasting is ok until patterns arise.

Be proactive in finding new ways to ensure refactoring predictability, but be conservative about the assumption QA will find all the bugs.

Move business logic out of busy components, but be brave enough to keep the legacy code intact if the only argument is “this code looks wrong”.

These are useful points. I would like to suggest some bright white lines for those who have to tackle an IRS-mainframe- or AT&T-billing system type of challenge as well as tweaking an artificial intelligence solution to respond to those wonky multi-ethnic images Google generated in order to allow the Sundar & Prabhakar Comedy Team to smile sheepishly and apologize again for lousy software.

Are you ready? Let’s go:

  1. Fixes add to the complexity of the code base. As time goes stumbling forward, the complexity of the software becomes greater. The cost of making sure the fix works and does not create exciting dependency behavior goes up. Thus, small fixes “cost” more, and these costs are tough to control.
  2. The safest fixes are “wrappers”; that is, no one in his or her right mind wants to change software written in 1978 for a machine no longer in production by the manufacturer. Therefore, new software is written to interact in a “safe” way with the original software. The new code “fixes up” the problem without screwing up what grandpa programmer wrote almost half a century ago. The problem is that “wrappers” tend to slow stuff down. The fix is to say one will optimize the system while one looks for a new project or job.
  3. The software used for “fixing” a problem is becoming the equivalent of repairing an aircraft component with Dawn laundry detergent. The “fix” is cheap, easy to use, and good enough. The software equivalent of this Dawn solution is that it will not stand the test of time. Instead of code crafted in good old COBOL or Assembler, we have some Fancy Dan tools which may fall out of favor in a matter of months, not decades.

Many projects result in better, faster, and cheaper. The reminder “Pick two” is helpful.

Net net: Fixing up lousy or flawed software is going to increase risks and costs. The question asked by bean counters is, “How much?” The answer is, “No one knows until the project is done … if ever.”

Stephen E Arnold, March 18, 2024

Thomson Reuters Is Going to Do AI: Run Faster

March 11, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Thomson Reuters, a mostly low profile outfit, is going to do AI. Why’s this interesting to law schools, lawyers, accountants, special librarians, libraries, and others who “pay” for “real” information? There are three reasons:

  1. Money
  2. Markets
  3. Mania.

Thomson Reuters has been a tech talker for decades. The company created skunk works. It hired quirky MIT wizards. I bought businesses with information technology. But underneath the professional publishing clear coat, the firm is the creation of Lord Thomson of Fleet. The firm has a track record of being able to turn a profit on its $7 billion in revenues. But the future, if news reports are accurate, is artificial intelligence or smart software.

image

The young publishing executive says, “I have go to get ahead of this AI bus before it runs over me.” Thanks, MSFT Copilot. Working on security today?

But wait! What makes Thomson Reuters different from the New York Times or (heaven forbid the question) Rupert Murdoch’s confections? The answer is in my opinion: Thomson Reuters does the trust thing and is a professional publisher. I don’t want to explain that in the world of Lord Thomson of Fleet that publishing is publishing. Nope. Not going there. Thomson Reuters is a custom made billiard cue, not one of those bar pool cheapos.

As appropriate to today’s Thomson Reuters, the news appeared in Thomson’s own news releases first; for example, “Thomson Reuters Profit Beats Estimates Amid AI Push.” Yep, AI drives profits. That’s the “m” in money. Plus, Thomson late last year this article found its way to the law firm market (yep, that’s the second “m”): “Morgan Lewis and Thomson Reuters Enter into Partnership to Put Law Firms’ Needs at the Heart of AI Development.

Now the third “m” or mania. Here’s a representative story, “Thomson Reuters to Invest US$8 billion in a Substantial AI-Focused Spending Initiative.” You can also check out the Financial Times’s report at this link.

Thomson Reuters is a $7 billion corporation. If the $8 billion number is on the money, the venerable news outfit is going to spend the equivalent on one year’s revenue acquiring and investing in smart software. In terms of professional publishing, this chunk of change is roughly the equivalent of Sam AI-Man’s need for trillions of dollars for his smart software business.

Several thoughts struck me as I was reading about the $8 billion investment in smart software:

  1. In terms of publishing or more narrowly professional publishing, $8 billion will take some time to spend. But time is not on the side of publishing decision making processes. When the check is written for an AI investment, there may be some who ask, “Is this the correct investment? After all, aren’t we professional publishers serving lawyers, accountants, and researchers?”
  2. The US legal processes are interesting. But the minor challenge of Crown copyright adds a bit of spice to certain investments. The UK government itself is reluctant to push into some AI areas due to concerns that certain information may not be available unless the red tape about copyright has been trimmed, rolled, and put on the shelf. Without being disrespectful, Thomson Reuters could find that some of the $8 billion headed into its clients pockets as legal challenges make their way through courts in Britain, Canada, and the US and probably some frisky EU states.
  3. The game for AI seems to be breaking into two what a former Greek minister calls the techno feudal set up. On one hand, there are giant technology centric companies (of which Thomson Reuters is not one of the club members). These are Google- and Microsoft-scale outfits with infrastructure, data, customers, and multiple business models. On the other hand, there are the Product Watch outfits which are using open source and APIs to create “new” and “important” AI businesses, applications, and solutions. In short, there are some barons and a whole grab-bag of lesser folk. Is Thomson Reuters going to be able to run with the barons. Remember, please, the barons are riding stallions. Thomson Reuter-type firms either walk or ride donkeys.

Net net: If Thomson Reuters spends $8 billion on smart software, how many lawyers, accountants, and researchers will be put out of work? The risks are not just bad AI investments. The threat maybe to gut the billing power of the paying customers for Thomson Reuters’ content. This will be entertaining to watch.

PS. The third “m”? It is mania, AI mania.

Stephen E Arnold, March 11, 2024

x

x

x

x

x

The Internet as a Library and Archive? Ho Ho Ho

March 8, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I know that I find certain Internet-related items a knee slapper. Here’s an example: “Millions of Research Papers at Risk of Disappearing from the Internet.” The number of individuals — young at heart and allegedly-informed seniors — think the “Internet” is a library or better yet an archive like the Library of Congress’ collection of “every” book.

image

A person deleting data with some degree of fierceness. Yep, thanks MSFT Copilot. After three tries, this is the best of the lot for a prompt asking for an illustration of data being deleted from a personal computer. Not even good enough but I like the weird orange coloration.

Here are some basics of how “Internet” services work:

  1. Every year costs go up of storage for old and usually never or rarely accessed data. A bean counter calls a meeting and asks, “Do we need to keep paying for ping, power, and pipes?” Some one points out, “Usage of X percent of the data described as “old” is 0.0003 percent or whatever number the bright young sprout has guess-timated. The decision is, as you might guess, dump the old files and reduce other costs immediately.
  2. Doing “data” or “online” is expensive, and the costs associated with each are very difficult, if not impossible to control. Neither government agencies, non-governmental outfits, the United Nations, a library in Cleveland or the estimable Harvard University have sufficient money to make available or keep at hand information. Thus, stuff disappears.
  3. Well-intentioned outfits like the Internet Archive or Project Gutenberg are in the same accountant ink pot. Not every Web site is indexed and archived comprehensively. Not every book that can be digitized and converted to a format someone thinks will be “forever.” As a result, one has a better chance of discovering new information browsing through donated manuscripts at the Vatican Library than running an online query.
  4. If something unique is online “somewhere,” that item may be unfindable. Hey, what about Duke University’s collection of “old” books from the 17th century? Who knew?
  5. Will a government agency archive digital content in a comprehensive manner? Nope.

The article about “risks of disappearing” is a hoot. Notice this passage:

“Our entire epistemology of science and research relies on the chain of footnotes,” explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. “If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself.”

I like that word “epistemology.” Just one small problem: Trust. Didn’t the president of Stanford University have an opportunity to find his future elsewhere due to some data wonkery? Google wants to earn trust. Other outfits don’t fool around with trust; these folks gather data, exploit it, and resell it. Archiving and making it findable to a researcher or law enforcement? Not without friction, lots and lots of friction. Why verify? Estimates of non-reproducible research range from 15 percent to 40 percent of scientific, technical, and medical peer reviewed content. Trust? Hello, it’s time to wake up.

Many estimate how much new data are generated each year. I would suggest that data falling off the back end of online systems has been an active process. The first time an accountant hears the IT people say, “We can just roll off the old data and hold storage stable” is right up there with avoiding an IRS audit, finding a life partner, and billing an old person for much more than the accounting work is worth.

After 25 years, there is “risk.” Wow.

Stephen E Arnold, March 8, 2024

ACM: Good Defense or a Business Play?

March 8, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Professional publishers want to use the trappings of peer review, standards, tradition, and quasi academic hoo-hah to add value to their products; others want a quasi-monopoly. Think public legal filings and stuff in high school chemistry book. The customers of professional publishers are typically not the folks at the pizza joint on River Road in Prospect, Kentucky. The business of professional publishing in an interesting one, but in the wild and crazy world of collapsing next-gen publishing, professional publishing is often ignored. A publisher conference aimed at professional publishers is quite different from the Jazz Age South by Southwest shindig.

image

Yep, free. Thanks, MSFT Copilot. How’s that security today?

But professional publishers have been in the news. Examples include the dust up about academics making up data. The big time president of the much-honored Stanford University took intellectual short cuts and quit late last year. Then there was the some nasty issue about data and bias at the esteemed Harvard University. Plus, a number of bookish types have guess-timated that a hefty percentage of research studies contain made-up data. Hey, you gotta publish to get tenure or get a grant, right?

But there is an intruder in the basement of the professional publishing club. The intruder positions itself in the space between the making up of some data and the professional publishing process. That intruder is ArXiv, an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, according to Wikipedia. (Wikipedia is the cancer which killed the old-school encyclopedias.) Plus, there are services which offer access to professional content without paying for the right to host the information. I won’t name these services because I have no desire to have legal eagles circle about my semi-functioning head.

Why do I present this grade-school level history? I read “CACM Is Now Open Access.” Let’s let the Association of Computing Machinery explain its action:

For almost 65 years, the contents of CACM have been exclusively accessible to ACM members and individuals affiliated with institutions that subscribe to either CACM or the ACM Digital Library. In 2020, ACM announced its intention to transition to a fully Open Access publisher within a roughly five-year timeframe (January 2026) under a financially sustainable model. The transition is going well: By the end of 2023, approximately 40% of the ~26,000 articles ACM publishes annually were being published Open Access utilizing the ACM Open model. As ACM has progressed toward this goal, it has increasingly opened large parts of the ACM Digital Library, including more than 100,000 articles published between 1951–2000. It is ACM’s plan to open its entire archive of over 600,000 articles when the transition to full Open Access is complete.

The decision was not an easy one. Money issues rarely are.

I want to step back and look at this interesting change from a different point of view:

  1. Getting a degree today is less of a must have than when I was a wee dinobaby. My parents told me I was going to college. Period. I learned how much effort was required to get my hands on academic journals. I was a master of knowing that Carnegie-Mellon had new but limited bound volumes of certain professional publications. I knew what journals were at the University of Pittsburgh. I used these resources when the Duquesne Library was overrun with the faithful. Now “researchers” can zip online and whip up astonishing results. Google-type researchers prefer the phrase “quantumly supreme results.” This social change is one factor influencing the ACM.
  2. Stabilizing revenue streams means pulling off a magic trick. Sexy conferences and special events complement professional association membership fees. Reducing costs means knocking off the now, very very expensive printing, storing, and shipping of physical journals. The ACM seems to have figured out how to keep the lights on and the computing machine types spending.
  3. ACM members can use ACM content the way they do a pirate library’s or the feel good ArXiv outfit. The move helps neutralize discontent among the membership, and it is good PR.

These points raise a question; to wit: In today’s world how relevant will a professional association and its professional publications be going foreword. The ACM states:

By opening CACM to the world, ACM hopes to increase engagement with the broader computer science community and encourage non-members to discover its rich resources and the benefits of joining the largest professional computer science organization. This move will also benefit CACM authors by expanding their readership to a larger and more diverse audience. Of course, the community’s continued support of ACM through membership and the ACM Open model is essential to keeping ACM and CACM strong, so it is critical that current members continue their membership and authors encourage their institutions to join the ACM Open model to keep this effort sustainable.

Yep, surviving in a world of faux expertise.

Stephen E Arnold, March 8, 2024

Engineering Trust: Will Weaponized Data Patch the Social Fabric?

March 7, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Trust is a popular word. Google wants me to trust the company. Yeah, I will jump right on that. Politicians want me to trust their attestations that citizen interest are important. I worked in Washington, DC, for too long. Nope, I just have too much first-hand exposure to the way “things work.” What about my bank? It wants me to trust it. But isn’t the institution the subject of a a couple of government investigations? Oh, not important. And what about the images I see when I walk gingerly between the guard rails. I trust them right? Ho ho ho.

In our post-Covid, pre-US national election, the word “trust” is carrying quite a bit of freight. Whom to I trust? Not too many people. What about good old Socrates who was an Athenian when Greece was not yet a collection of ferocious football teams and sun seekers. As you may recall, he trusted fellow residents of Athens. He end up dead from either a lousy snack bar meal and beverage, or his friends did him in.

One of his alleged precepts in his pre-artificial intelligence worlds was:

“We cannot live better than in seeking to become better.” — Socrates

Got it, Soc.

image

Thanks MSFT Copilot and provider of PC “moments.” Good enough.

I read “Exclusive: Public Trust in AI Is Sinking across the Board.” Then I thought about Socrates being convicted for corruption of youth. See. Education does not bring unlimited benefits. Apparently Socrates asked annoying questions which open him to charges of impiety. (Side note: Hey, Socrates, go with the flow. Just pray to the carved mythical beast, okay?)

A loss of public trust? Who knew? I thought it was common courtesy, a desire to discuss and compromise, not whip out a weapon and shoot, bludgeon, or stab someone to death. In the case of Haiti, a twist is that a victim is bound and then barbequed in a steel drum. Cute and to me a variation of stacking seven tires in a pile dousing them with gasoline, inserting a person, and igniting the combo. I noted a variation in the Ukraine. Elderly women make cookies laced with poison and provide them to special operation fighters. Subtle and effective due to troop attrition I hear. Should I trust US Girl Scout cookies? No thanks.

What’s interesting about the write up is that it provides statistics to back up this brilliant and innovative insight about modern life is its focus on artificial intelligence. Let me pluck several examples from the dot point filled write up:

  1. “Globally, trust in AI companies has dropped to 53%, down from 61% five years ago.”
  2. “Trust in AI is low across political lines. Democrats trust in AI companies is 38%, independents are at 25% and Republicans at 24%.”
  3. “Eight years ago, technology was the leading industry in trust in 90% of the countries Edelman studies. Today, it is the most trusted in only half of countries.”

AI is trendy; crunchy click bait is highly desirable even for an estimable survivor of Silicon Valley style news reporting.

Let me offer several observations which may either be troubling or typical outputs from a dinobaby working in an underground computer facility:

  1. Close knit groups are more likely to have some concept of trust. The exception, of course, is the behavior of the Hatfields and McCoys
  2. Outsiders are viewed with suspicion. Often for now reason, a newcomer becomes the default bad entity
  3. In my lifetime, I have watched institutions take actions which erode trust on a consistent basis.

Net net: Old news. AI is not new. Hyperbole and click obsession are factors which illustrate the erosion of social cohesion. Get used to it.

Stephen E Arnold, March 7, 2024

Philosophy and Money: Adam Smith Remains Flexible

March 6, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

In the early twenty-first century, China was slated to overtake the United States as the world’s top economy. Unfortunately for the “sleeping dragon,” China’s economy has tanked due to many factors. The country, however, still remains a strong spot for technology development such as AI and chips. The Register explains why China is still doing well in the tech sector: “How Did China Get So Good At Chips And AI? Congressional Investigation Blames American Venture Capitalists.”

Venture capitalists are always interested in increasing their wealth and subverting anything preventing that. While the US government has choked China’s semiconductor industry and denying it the use of tools to develop AI, venture capitalists are funding those sectors. The US’s House Select Committee on the China Communist Party (CCP) shared that five venture capitalists are funneling billions into these two industries: Walden International, Sequoia Capital, Qualcomm Ventures, GSR Ventures, and GGV Capital. Chinese semiconductor and AI businesses are linked to human rights abuses and the People’s Liberation Army. These five venture capitalist firms don’t appear interested in respecting human rights or preventing the spread of communism.

The House Select Committee on the CCP discovered that one $1.9 million went to AI companies that support China’s mega-surveillance state and aided in the Uyghur genocide. The US blacklisted these AI-related companies. The committee also found that $1.2 bullion was sent to 150 semiconductor companies.

The committee also accused of sharing more than funding with China:

“The committee also called out the VCs for "intangible" contributions – including consulting, talent acquisition, and market opportunities. In one example highlighted in the report, the committee singled out Walden International chairman Lip-Bu Tan, who previously served as the CEO of Cadence Design Systems. Cadence develops electronic design automation software which Chinese corporates, like Huawei, are actively trying to replicate. The committee alleges that Tan and other partners at Walden coordinated business opportunities and provided subject-matter expertise while holding board seats at SMIC and Advanced Micro-Fabrication Equipment Co. (AMEC).”

Sharing knowledge and business connections is equally bad (if not worse) than funding China’s tech sector. It’s like providing instructions and resources on how to build nuclear weapon. If China only had the resources it wouldn’t be as frightening.

Whitney Grace, March 6, 2024

Synthetic Data: From Science Fiction to Functional Circumscription

March 4, 2024

green-dino_thumbThis essay is the work of a dumb humanoid. No smart software required.

Synthetic data are information produced by algorithms, not by real-world events. It’s created using real-world data and numerical recipes. The appeal is that it is easier than collecting real life information, cheaper than dealing with data from real life, and faster than fooling around with surveys, monitoring devices, and law suits. In theory, synthetic data is one promising way of skirting the expense of getting humans involved.

What Is [a] Synthetic Sample – And Is It All It’s Cracked Up to Be?” tackles the subject of a synthetic sample, a topic which is one slice of the synthetic data universe. The article seeks “to uncover the truth behind artificially created qualitative and quantitative market research data.” I am going to avoid the question, “Is synthetic data useful?” because the answer is, “Yes.” Bean counters and those looking to find a way out of the pickle barrel filled with expensive brine are going to chase after the magic of algorithms producing data to do some machine learning magic.

image

In certain situations, fake flowers are super. Other times, the faux blooms are just creepy. Thanks, MSFT Copilot Bing thing. Good enough.

Are synthetic data better than real world data? The answer from my vantage point is, “It depends.” Fancy math can prove that for some use cases, synthetic data are “good enough”; that is, the data produce results close enough to what a “real” data set provides. Therefore, just use synthetic data. But for other applications, synthetic data might throw some sand in the well-oiled marketing collateral describing the wonders of synthetic data. (Some university research labs are quite skilled in PR speak, but the reality of their methods may not line up with the PowerPoints used to raise venture capital.)

This essay discusses a research project to figure out if a synthetic sample works or in my lingo if the synthetic sample is good enough. The idea is that as long as the synthetic data is within a specified error range, the synthetic sample can be used and may produce “reliable” or useful results. (At least one hopes this is the case.)

I want to focus on one portion of the cited article and invite you to read the complete Kantar explanation.

Here’s the passage which snagged my attention:

… right now, synthetic sample currently has biases, lacks variation and nuance in both qual and quant analysis. On its own, as it stands, it’s just not good enough to use as a supplement for human sample. And there are other issues to consider. For instance, it matters what subject is being discussed. General political orientation could be easy for a large language model (LLM), but the trial of a new product is hard. And fundamentally, it will always be sensitive to its training data – something entirely new that is not part of its training will be off-limits. And the nature of questioning matters – a highly ’specific’ question that might require proprietary data or modelling (e.g., volume or revenue for a particular product in response to a price change) might elicit a poor-quality response, while a response to a general attitude or broad trend might be more acceptable.

These sentences present several thorny problems is academic speak. Let’s look at them in the vernacular of rural Kentucky where I live.

First, we have the issue of bias. Training data can be unintentionally or intentionally biased. Sample radical trucker posts on Telegram, and use those messages to train a model like Reor. That output is going to express views that some people might find unpalatable. Therefore, building a synthetic data recipe which includes this type of Telegram content is going to be oriented toward truck driver views. That’s good and bad.

Second, a synthetic sample may require mixing data from a “real” sample. That’s a common sense approach which reduces some costs. But will the outputs be good enough. The question then becomes, “Good enough for what applications?” Big, general questions about how a topic is presented might be close enough for horseshoes. Other topics like those focusing on dealing with a specific technical issue might warrant more caution or outright avoidance of synthetic data. Do you want your child or wife to die because the synthetic data about a treatment regimen was close enough for horseshoes. But in today’s medical structure, that may be what the future holds.

Third, many years ago, one of the early “smart” software companies was Autonomy, founded by Mike Lynch. In the 1990s, Bayesian methods were known but some — believe it or not — were classified and, thus, not widely known. Autonomy packed up some smart software in the Autonomy black box. Users of this system learned that the smart software had to be retrained because new terms and novel ideas not in the original training set were not findable by the neuro linguistic program’s engine.  Yikes, retraining requires human content curation of data sets, time to retrain the system, and the expense of redeploying the brains of the black boxes. Clients did not like this and some, to be frank, did not understand why a product did not work like an MG sports car. Synthetic data has to be trained to “know” about new terms and avid the “certain blindness” probability based systems possess.

Fourth, the topic of “proprietary data modeling” means big bucks. The idea behind synthetic data is that it is cheaper. Building proprietary training data and keeping it current is expensive. Is it better? Yeah, maybe. Is it faster? Probably not when humans are doing the curation, cleaning, verifying, and training.

The write up states:

But it’s likely that blended models (human supplemented by synthetic sample) will become more common as LLMs get even more powerful – especially as models are finetuned on proprietary datasets.

Net net: Synthetic data warrants monitoring. Some may want to invest in synthetic data set companies like Kantar, for instance. I am a dinobaby, and I like the old-fashioned Stone Age approach to data. The fancy math embodies sufficient risk for me. Why increase risk? Remember my reference to a dead loved one? That type of risk.

Stephen E Arnold, March 4, 2023

Open Source: Free, Easy, and Fast Sort Of

February 29, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Not long ago, I spoke with an open source cheerleader. The pros outweighed the cons from this technologist’s point of view. (I would like to ID the individual, but I try to avoid having legal eagles claw their way into my modest nest in rural Kentucky. Just plug in “John Wizard Doe”, a high profile entrepreneur and graduate of a big time engineering school.)

image

I think going up suggests a problem.

Here are highlights of my notes about the upside of open source:

  1. Many smart people eyeball the code and problems are spotted and fixed
  2. Fixes get made and deployed more rapidly than commercial software which of works on an longer “fix” cycle
  3. Dead end software can be given new kidneys or maybe a heart with a fork
  4. For most use cases, the software is free or cheaper than commercial products
  5. New functions become available; some of which fuel new product opportunities.

There may be a few others, but let’s look at a downside few open source cheerleaders want to talk about. I don’t want to counter the widely held belief that “many smart people eyeball the code.” The method is grab and go. The speed angle is relative. Reviving open source again and again is quite useful; bad actors do this. Most people just recycle. The “free” angle is a big deal. Everyone like “free” because why not? New functions become available so new markets are created. Perhaps. But in the cyber crime space, innovation boils down to finding a mistake that can be exploited with good enough open source components, often with some mileage on their chassis.

But the one point open source champions crank back on the rah rah output. “Over 100,000 Infected Repos Found on GitHub.” I want to point out that GitHub is a Microsoft, the all-time champion in security, owns GitHub. If you think about Microsoft and security too much, you may come away confused. I know I do. I also get a headache.

This “Infected Repos” API IRO article asserts:

Our security research and data science teams detected a resurgence of a malicious repo confusion campaign that began mid-last year, this time on a much larger scale. The attack impacts more than 100,000 GitHub repositories (and presumably millions) when unsuspecting developers use repositories that resemble known and trusted ones but are, in fact, infected with malicious code.

The write up provides excellent information about how the bad repos create problems and provides a recipe for do this type of malware distribution yourself. (As you know, I am not too keen on having certain information with helpful detail easily available, but I am a dinobaby, and dinobabies have crazy ideas.)

If we confine our thinking to the open source champion’s five benefits, I think security issues may be more important in some use cases.The better question is, “Why don’t open source supporters like Microsoft and the person with whom I spoke want to talk about open source security?” My view is that:

  1. Security is an after thought or a never thought facet of open source software
  2. Making money is Job #1, so free trumps spending money to make sure the open source software is secure
  3. Open source appeals to some venture capitalists. Why? RedHat, Elastic, and a handful of other “open source plays”.

Net net: Just visualize a future in which smart software ingests poisoned code, and programmers who rely on smart software to make them a 10X engineer. Does that create a bit of a problem? Of course not. Microsoft is the security champ, and GitHub is Microsoft.

Stephen E Arnold, February 29, 2024

The Google: A Bit of a Wobble

February 28, 2024

green dinoThis essay is the work of a dumb humanoid. No smart software required.

Check out this snap from Techmeme on February 28, 2024. The folks commenting about Google Gemini’s very interesting picture generation system are confused. Some think that Gemini makes clear that the Google has lost its way. Others just find the recent image gaffes as one more indication that the company is too big to manage and the present senior management is too busy amping up the advertising pushed in front of “users.”

image

I wanted to take a look at What Analytics India Magazine had to say. Its article is “Aal Izz Well, Google.” The write up — from a nation state some nifty drone technology and so-so relationships with its neighbors — offers this statement:

In recent weeks, the situation has intensified to the extent that there are calls for the resignation of Google chief Sundar Pichai. Helios Capital founder Samir Arora has suggested a likelihood of Pichai facing termination or choosing to resign soon, in the aftermath of the Gemini debacle.

The write offers:

Google chief Sundar Pichai, too, graciously accepted the mistake. “I know that some of its responses have offended our users and shown bias – to be clear, that’s completely unacceptable and we got it wrong,” Pichai said in a memo.

The author of the Analytics India article is Siddharth Jindal. I wonder if he will talk about Sundar’s and Prabhakar’s most recent comedy sketch. The roll out of Bard in Paris was a hoot, and it too had gaffes. That was a year ago. Now it is a year later and what’s Google accomplished:

Analytics India emphasizes that “Google is not alone.” My team and I know that smart software is the next big thing. But Analytics India is particularly forgiving.

The estimable New York Post takes a slightly different approach. “Google Parent Loses $70B in Market Value after Woke AI Chatbot Disaster” reports:

Google’s parent company lost more than $70 billion in market value in a single trading day after its “woke” chatbot’s bizarre image debacle stoked renewed fears among investors about its heavily promoted AI tool. Shares of Alphabet sank 4.4% to close at $138.75 in the week’s first day of trading on Monday. The Google’s parent’s stock moved slightly higher in premarket trading on Tuesday [February 28, 2024, 941 am US Eastern time].

As I write this, I turned to Google’s nemesis, the Softies in Redmond, Washington. I asked for a dinosaur looking at a meteorite crater. Here’s what Copilot provided:

image

Several observations:

  1. This is a spectacular event. Sundar and Prabhakar will have a smooth explanation I believe. Smooth may be their core competency.
  2. The fact that a Code Red has become a Code Dead makes clear that communications at Google requires a tune up. But if no one is in charge, blowing $70 billion will catch the attention of some folks with sharp teeth and a mean spirit.
  3. The adolescent attitudes of a high school science club appear to inform the management methods at Google. A big time investigative journalist told me that Google did not operate like a high school science club planning a bus trip to the state science fair. I stick by my HSSCMM or high school science club management method. I won’t repeat her phrase because it is similar to Google’s quantumly supreme smart software: Wildly off base.

Net net: I love this rationalization of management, governance, and technical failure. Everyone in the science club gets a failing grade. Hey, wizards and wizardettes, why not just stick to selling advertising.

Stephen E Arnold, February 28,. 2024

10X: The Magic Factor

February 27, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The 10X engineer. The 10X payout. The 10X advertising impact. The 10X factor can apply to money, people, and processes. Flip to the inverse and one can use smart software to replace the engineers who are not 10X or — more optimistically — lift those expensive digital humanoids to a higher level. It is magical: Win either way, provided you are a top dog a one percenter. Applied to money, 10X means winner. End up with $0.10, and the hapless investor is a loser. For processes, figuring out a 10X trick, and you are a winner, although one who is misunderstood. Money matters more than machine efficiency to some people.

image

In pursuit of a 10X payoff, will the people end up under water? Thanks, ImageFX. Good enough.

These are my 10X thoughts after I read “Groq, Gemini, and 10X Improvements.” The essay focuses on things technical. I am going to skip over what the author offers as a well-reasoned, dispassionate commentary on 10X. I want to zip to one passage which I think is quite fascinating. Here it is:

We don’t know when increasing parameters or datasets will plateau. We don’t know when we’ll discover the next breakthrough architecture akin to Transformers. And we don’t know how good GPUs, or LPUs, or whatever else we’re going to have, will become. Yet, when when you consider that Moore’s Law held for decades… suddenly Sam Altman’s goal of raising seven trillion dollars to build AI chips seems a little less crazy.

The way I read this is that unknowns exist with AI, money, and processes. For me, the unknowns are somewhat formidable. For many, charging into the unknown does not cause sleepless nights. Talking about raising trillions of dollars which is a large pile of silver dollars.

One must take the $7 trillion and Sam AI-Man seriously. In June 2023, Sam AI-Man met the boss of Softbank. Today (February 22, 2024) rumors about a deal related to raising the trillions required for OpenAI to build chips and fulfill its promise have reached my research team. If true, will there be a 10X payoff, which noses into spitting distance of 15 zeros. If that goes inverse, that’s going to create a bad day for someone.

Stephen E Arnold, February 27, 2024

Next Page »

  • Archives

  • Recent Posts

  • Meta