Synthetic Data: From Science Fiction to Functional Circumscription

March 4, 2024

green-dino_thumbThis essay is the work of a dumb humanoid. No smart software required.

Synthetic data are information produced by algorithms, not by real-world events. It’s created using real-world data and numerical recipes. The appeal is that it is easier than collecting real life information, cheaper than dealing with data from real life, and faster than fooling around with surveys, monitoring devices, and law suits. In theory, synthetic data is one promising way of skirting the expense of getting humans involved.

What Is [a] Synthetic Sample – And Is It All It’s Cracked Up to Be?” tackles the subject of a synthetic sample, a topic which is one slice of the synthetic data universe. The article seeks “to uncover the truth behind artificially created qualitative and quantitative market research data.” I am going to avoid the question, “Is synthetic data useful?” because the answer is, “Yes.” Bean counters and those looking to find a way out of the pickle barrel filled with expensive brine are going to chase after the magic of algorithms producing data to do some machine learning magic.

image

In certain situations, fake flowers are super. Other times, the faux blooms are just creepy. Thanks, MSFT Copilot Bing thing. Good enough.

Are synthetic data better than real world data? The answer from my vantage point is, “It depends.” Fancy math can prove that for some use cases, synthetic data are “good enough”; that is, the data produce results close enough to what a “real” data set provides. Therefore, just use synthetic data. But for other applications, synthetic data might throw some sand in the well-oiled marketing collateral describing the wonders of synthetic data. (Some university research labs are quite skilled in PR speak, but the reality of their methods may not line up with the PowerPoints used to raise venture capital.)

This essay discusses a research project to figure out if a synthetic sample works or in my lingo if the synthetic sample is good enough. The idea is that as long as the synthetic data is within a specified error range, the synthetic sample can be used and may produce “reliable” or useful results. (At least one hopes this is the case.)

I want to focus on one portion of the cited article and invite you to read the complete Kantar explanation.

Here’s the passage which snagged my attention:

… right now, synthetic sample currently has biases, lacks variation and nuance in both qual and quant analysis. On its own, as it stands, it’s just not good enough to use as a supplement for human sample. And there are other issues to consider. For instance, it matters what subject is being discussed. General political orientation could be easy for a large language model (LLM), but the trial of a new product is hard. And fundamentally, it will always be sensitive to its training data – something entirely new that is not part of its training will be off-limits. And the nature of questioning matters – a highly ’specific’ question that might require proprietary data or modelling (e.g., volume or revenue for a particular product in response to a price change) might elicit a poor-quality response, while a response to a general attitude or broad trend might be more acceptable.

These sentences present several thorny problems is academic speak. Let’s look at them in the vernacular of rural Kentucky where I live.

First, we have the issue of bias. Training data can be unintentionally or intentionally biased. Sample radical trucker posts on Telegram, and use those messages to train a model like Reor. That output is going to express views that some people might find unpalatable. Therefore, building a synthetic data recipe which includes this type of Telegram content is going to be oriented toward truck driver views. That’s good and bad.

Second, a synthetic sample may require mixing data from a “real” sample. That’s a common sense approach which reduces some costs. But will the outputs be good enough. The question then becomes, “Good enough for what applications?” Big, general questions about how a topic is presented might be close enough for horseshoes. Other topics like those focusing on dealing with a specific technical issue might warrant more caution or outright avoidance of synthetic data. Do you want your child or wife to die because the synthetic data about a treatment regimen was close enough for horseshoes. But in today’s medical structure, that may be what the future holds.

Third, many years ago, one of the early “smart” software companies was Autonomy, founded by Mike Lynch. In the 1990s, Bayesian methods were known but some — believe it or not — were classified and, thus, not widely known. Autonomy packed up some smart software in the Autonomy black box. Users of this system learned that the smart software had to be retrained because new terms and novel ideas not in the original training set were not findable by the neuro linguistic program’s engine.  Yikes, retraining requires human content curation of data sets, time to retrain the system, and the expense of redeploying the brains of the black boxes. Clients did not like this and some, to be frank, did not understand why a product did not work like an MG sports car. Synthetic data has to be trained to “know” about new terms and avid the “certain blindness” probability based systems possess.

Fourth, the topic of “proprietary data modeling” means big bucks. The idea behind synthetic data is that it is cheaper. Building proprietary training data and keeping it current is expensive. Is it better? Yeah, maybe. Is it faster? Probably not when humans are doing the curation, cleaning, verifying, and training.

The write up states:

But it’s likely that blended models (human supplemented by synthetic sample) will become more common as LLMs get even more powerful – especially as models are finetuned on proprietary datasets.

Net net: Synthetic data warrants monitoring. Some may want to invest in synthetic data set companies like Kantar, for instance. I am a dinobaby, and I like the old-fashioned Stone Age approach to data. The fancy math embodies sufficient risk for me. Why increase risk? Remember my reference to a dead loved one? That type of risk.

Stephen E Arnold, March 4, 2023

Open Source: Free, Easy, and Fast Sort Of

February 29, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Not long ago, I spoke with an open source cheerleader. The pros outweighed the cons from this technologist’s point of view. (I would like to ID the individual, but I try to avoid having legal eagles claw their way into my modest nest in rural Kentucky. Just plug in “John Wizard Doe”, a high profile entrepreneur and graduate of a big time engineering school.)

image

I think going up suggests a problem.

Here are highlights of my notes about the upside of open source:

  1. Many smart people eyeball the code and problems are spotted and fixed
  2. Fixes get made and deployed more rapidly than commercial software which of works on an longer “fix” cycle
  3. Dead end software can be given new kidneys or maybe a heart with a fork
  4. For most use cases, the software is free or cheaper than commercial products
  5. New functions become available; some of which fuel new product opportunities.

There may be a few others, but let’s look at a downside few open source cheerleaders want to talk about. I don’t want to counter the widely held belief that “many smart people eyeball the code.” The method is grab and go. The speed angle is relative. Reviving open source again and again is quite useful; bad actors do this. Most people just recycle. The “free” angle is a big deal. Everyone like “free” because why not? New functions become available so new markets are created. Perhaps. But in the cyber crime space, innovation boils down to finding a mistake that can be exploited with good enough open source components, often with some mileage on their chassis.

But the one point open source champions crank back on the rah rah output. “Over 100,000 Infected Repos Found on GitHub.” I want to point out that GitHub is a Microsoft, the all-time champion in security, owns GitHub. If you think about Microsoft and security too much, you may come away confused. I know I do. I also get a headache.

This “Infected Repos” API IRO article asserts:

Our security research and data science teams detected a resurgence of a malicious repo confusion campaign that began mid-last year, this time on a much larger scale. The attack impacts more than 100,000 GitHub repositories (and presumably millions) when unsuspecting developers use repositories that resemble known and trusted ones but are, in fact, infected with malicious code.

The write up provides excellent information about how the bad repos create problems and provides a recipe for do this type of malware distribution yourself. (As you know, I am not too keen on having certain information with helpful detail easily available, but I am a dinobaby, and dinobabies have crazy ideas.)

If we confine our thinking to the open source champion’s five benefits, I think security issues may be more important in some use cases.The better question is, “Why don’t open source supporters like Microsoft and the person with whom I spoke want to talk about open source security?” My view is that:

  1. Security is an after thought or a never thought facet of open source software
  2. Making money is Job #1, so free trumps spending money to make sure the open source software is secure
  3. Open source appeals to some venture capitalists. Why? RedHat, Elastic, and a handful of other “open source plays”.

Net net: Just visualize a future in which smart software ingests poisoned code, and programmers who rely on smart software to make them a 10X engineer. Does that create a bit of a problem? Of course not. Microsoft is the security champ, and GitHub is Microsoft.

Stephen E Arnold, February 29, 2024

The Google: A Bit of a Wobble

February 28, 2024

green dinoThis essay is the work of a dumb humanoid. No smart software required.

Check out this snap from Techmeme on February 28, 2024. The folks commenting about Google Gemini’s very interesting picture generation system are confused. Some think that Gemini makes clear that the Google has lost its way. Others just find the recent image gaffes as one more indication that the company is too big to manage and the present senior management is too busy amping up the advertising pushed in front of “users.”

image

I wanted to take a look at What Analytics India Magazine had to say. Its article is “Aal Izz Well, Google.” The write up — from a nation state some nifty drone technology and so-so relationships with its neighbors — offers this statement:

In recent weeks, the situation has intensified to the extent that there are calls for the resignation of Google chief Sundar Pichai. Helios Capital founder Samir Arora has suggested a likelihood of Pichai facing termination or choosing to resign soon, in the aftermath of the Gemini debacle.

The write offers:

Google chief Sundar Pichai, too, graciously accepted the mistake. “I know that some of its responses have offended our users and shown bias – to be clear, that’s completely unacceptable and we got it wrong,” Pichai said in a memo.

The author of the Analytics India article is Siddharth Jindal. I wonder if he will talk about Sundar’s and Prabhakar’s most recent comedy sketch. The roll out of Bard in Paris was a hoot, and it too had gaffes. That was a year ago. Now it is a year later and what’s Google accomplished:

Analytics India emphasizes that “Google is not alone.” My team and I know that smart software is the next big thing. But Analytics India is particularly forgiving.

The estimable New York Post takes a slightly different approach. “Google Parent Loses $70B in Market Value after Woke AI Chatbot Disaster” reports:

Google’s parent company lost more than $70 billion in market value in a single trading day after its “woke” chatbot’s bizarre image debacle stoked renewed fears among investors about its heavily promoted AI tool. Shares of Alphabet sank 4.4% to close at $138.75 in the week’s first day of trading on Monday. The Google’s parent’s stock moved slightly higher in premarket trading on Tuesday [February 28, 2024, 941 am US Eastern time].

As I write this, I turned to Google’s nemesis, the Softies in Redmond, Washington. I asked for a dinosaur looking at a meteorite crater. Here’s what Copilot provided:

image

Several observations:

  1. This is a spectacular event. Sundar and Prabhakar will have a smooth explanation I believe. Smooth may be their core competency.
  2. The fact that a Code Red has become a Code Dead makes clear that communications at Google requires a tune up. But if no one is in charge, blowing $70 billion will catch the attention of some folks with sharp teeth and a mean spirit.
  3. The adolescent attitudes of a high school science club appear to inform the management methods at Google. A big time investigative journalist told me that Google did not operate like a high school science club planning a bus trip to the state science fair. I stick by my HSSCMM or high school science club management method. I won’t repeat her phrase because it is similar to Google’s quantumly supreme smart software: Wildly off base.

Net net: I love this rationalization of management, governance, and technical failure. Everyone in the science club gets a failing grade. Hey, wizards and wizardettes, why not just stick to selling advertising.

Stephen E Arnold, February 28,. 2024

10X: The Magic Factor

February 27, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The 10X engineer. The 10X payout. The 10X advertising impact. The 10X factor can apply to money, people, and processes. Flip to the inverse and one can use smart software to replace the engineers who are not 10X or — more optimistically — lift those expensive digital humanoids to a higher level. It is magical: Win either way, provided you are a top dog a one percenter. Applied to money, 10X means winner. End up with $0.10, and the hapless investor is a loser. For processes, figuring out a 10X trick, and you are a winner, although one who is misunderstood. Money matters more than machine efficiency to some people.

image

In pursuit of a 10X payoff, will the people end up under water? Thanks, ImageFX. Good enough.

These are my 10X thoughts after I read “Groq, Gemini, and 10X Improvements.” The essay focuses on things technical. I am going to skip over what the author offers as a well-reasoned, dispassionate commentary on 10X. I want to zip to one passage which I think is quite fascinating. Here it is:

We don’t know when increasing parameters or datasets will plateau. We don’t know when we’ll discover the next breakthrough architecture akin to Transformers. And we don’t know how good GPUs, or LPUs, or whatever else we’re going to have, will become. Yet, when when you consider that Moore’s Law held for decades… suddenly Sam Altman’s goal of raising seven trillion dollars to build AI chips seems a little less crazy.

The way I read this is that unknowns exist with AI, money, and processes. For me, the unknowns are somewhat formidable. For many, charging into the unknown does not cause sleepless nights. Talking about raising trillions of dollars which is a large pile of silver dollars.

One must take the $7 trillion and Sam AI-Man seriously. In June 2023, Sam AI-Man met the boss of Softbank. Today (February 22, 2024) rumors about a deal related to raising the trillions required for OpenAI to build chips and fulfill its promise have reached my research team. If true, will there be a 10X payoff, which noses into spitting distance of 15 zeros. If that goes inverse, that’s going to create a bad day for someone.

Stephen E Arnold, February 27, 2024

Qualcomm: Its AI Models and Pour Gasoline on a Raging Fire

February 26, 2024

green-dino_thumbThis essay is the work of a dumb humanoid. No smart software required.

Qualcomm’s announcements at the Mobile World Congress pour gasoline on the raging AI fire. The chip maker aims to enable smart software on mobile devices, new gear, gym shoes, and more. Venture Beat’s “Qualcomm Unveils AI and Connectivity Chips at Mobile World Congress” does a good job of explaining the big picture. The online publication reports:

Generative AI functions in upcoming smartphones, Windows PCs, cars, and wearables will also be on display with practical applications. Generative AI is expected to have a broad impact across industries, with estimates that it could add the equivalent of $2.6 trillion to $4.4 trillion in economic benefits annually.

Qualcomm, primarily associated with chips, has pushed into what it calls “AI models.” The listing of the models appears on the Qualcomm AI Hub Web page. You can find this page at https://aihub.qualcomm.com/models. To view the available models, click on one of the four model domains, shown below:

image

Each domain will expand and present the name of the model. Note that the domain with the most models is computer vision. The company offers 60 models. These are grouped by function; for example, image classification, image editing, image generation, object detection, pose estimation, semantic segmentation (tagging objects), and super resolution.

The image below shows a model which analyzes data and then predicts related values. In this case, the position of the subject’s body are presented. The predictive functions of a company like Recorded Future suddenly appear to be behind the curve in my opinion.

image

There are two models for generative AI. These are image generation and text generation. Models are available for audio functions and for multimodal operations.

Qualcomm includes brief descriptions of each model. These descriptions include some repetitive phrases like “state of the art”, “transformer,” and “real time.”

Looking at the examples and following the links to supplemental information makes clear at first glance to suggest:

  1. Qualcomm will become a company of interest to investors
  2. Competitive outfits have their marching orders to develop comparable or better functions
  3. Supply chain vendors may experience additional interest and uplift from investors.

Can Qualcomm deliver? Let me answer the question this way. Whether the company experiences an nVidia moment or not, other companies have to respond, innovate, cut costs, and become more forward leaning in this chip sector.

I am in my underground computer lab in rural Kentucky, and I can feel the heat from Qualcomm’s AI announcement. Those at the conference not working for Qualcomm may have had their eyebrows scorched.

Stephen E Arnold, February 26, 2024

What Techno-Optimism Seems to Suggest (Oligopolies, a Plutocracy, or Utopia)

February 23, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Science and mathematics are comparable to religion. These fields of study attract acolytes who study and revere associated knowledge and shun nonbelievers. The advancement of modern technology is its own subset of religious science and mathematics combined with philosophical doctrine. Tech Policy Press discusses the changing views on technology-based philosophy in: “Parsing The Political Project Of Techno-Optimism.”

Rich, venture capitalists Marc Andreessen and Ben Horowitz are influential in Silicon Valley. While they’ve shaped modern technology with their investments, they also tried drafting a manifesto about how technology should be handled in the future. They “creatively” labeled it the “techno-optimist manifesto.” It promotes an ideology that favors rich people increasing their wealth by investing in politicians that will help them achieve this.

Techno-optimism is not the new mantra of Silicon Valley. Reception didn’t go over well. Andreessen wrote:

“Techno-Optimism is a material philosophy, not a political philosophy…We are materially focused, for a reason – to open the aperture on how we may choose to live amid material abundance.”

He also labeled this section, “the meaning of life.”

Techno-optimism is a revamped version of the Californian ideology that reigned in the 1990s. It preached that the future should be shaped by engineers, investors, and entrepreneurs without governmental influence. Techno-optimism wants venture capitalists to be untaxed with unregulated portfolios.

Horowitz added his own Silicon Valley-type titbit:

“‘…will, for the first time, get involved with politics by supporting candidates who align with our vision and values specifically for technology. (…) [W]e are non-partisan, one issue voters: if a candidate supports an optimistic technology-enabled future, we are for them. If they want to choke off important technologies, we are against them.’”

Horowitz and Andreessen are giving the world what some might describe as “a one-finger salute.” These venture capitalists want to do whatever they want wherever they want with governments in their pockets.

This isn’t a new ideology or a philosophy. It’s a rebranding of socialism and fascism and communism. There’s an even better word that describes techno-optimism: Plutocracy. I am not sure the approach will produce a Utopia. But there is a good chance that some giant techno feudal outfits will reap big rewards. But another approach might be to call techno optimism a religion and grab the benefits of a tax exemption. I wonder if someone will create a deep fake of Jim and Tammy Faye? Interesting.

Whitney Grace, February 23, 2023

Security Debt: So Just Be a Responsible User / Developer

February 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Security appears to be one of the next big things. Smart software strapped onto to cyber safeguard systems is a no-lose proposition for vendors. Does it matter that bolted on AI may not work? Nope. The important point is to ride the opportunity wave.

What’s interesting is that security is becoming a topic discussed at 75-something bridge groups and at lunch gatherings in government agencies concerned about fish and trees. Can third-party security services, grandmothers chasing a grand slam, or an expert in river fowl address security problems? I would suggest that the idea that security is the user’s responsibility is an interesting way to dodge responsibility. The estimable 23andMe tried this play, and I am not too sure that it worked.

image

Can security debt become the invisible hand creating opportunities for bad actors? Has the young executive reached the point of no return for a personal debt crisis? Thanks, MSFT Pilot Bing for a good enough illustration.

Who can address the security issues in the software people and organizations use today. “Why Software Security Debt Is Becoming a Serious Problem for Developers” states:

Over 70% of organizations have software containing flaws that have remained unfixed for longer than a year, constituting security debt,

Plus, the article asserts:

46% of organizations were found to have persistent, high-severity flaws that went unaddressed for over a year

Security issues exist. But the question is, “Who will address these flaws, gaps, and mistakes?”

The article cites an expert who opines:

“The further that you shift [security testing] to the developer’s desktop and have them see it as early as possible so they can fix it, the better, because number one it’s going to help them understand the issue more and [number two] it’s going to build the habits around avoiding it.”

But who is going to fix the security problems?

In-house developers may not have the expertise or access to the uncompiled code to identify and remediate. Open source and other third-party software can change without notice because why not do what’s best for those people or the bad actors manipulating open source software and “approved” apps available from a large technology company’s online store.

The article offers a number of suggestions, but none of these strike me as practical for some or most organizations.

Here’s the problem: Security is not a priority until a problem surfaces. Then when a problem becomes known, the delay between compromise, discovery, and public announcement can be — let’s be gentle — significant. Once a cyber security vendor “discovers” the problem or learns about it from a customer who calls and asks, “What has happened?”, the PR machines grind into action.

The “fixes” are typically rush jobs for these reasons:

  1. The vendor and the developer who made the zero a one does not earn money by fixing old code. Another factor is that the person or team responsible for the misstep is long gone, working as an Uber driver, or sitting in a rocking chair in a warehouse for the elderly
  2. The complexity of “going back” and making a fix may create other problems. These dependencies are unknown, so a fix just creates more problems. Writing a shim or wrapper code may be good enough to get the angry dogs to calm down and stop barking.
  3. The security flaw may be unfixable; that is, the original approach includes and may need flaws for performance, expediency, or some quite revenue-centric reason. No one wants to rebuild a Pinto that explodes in a rear end collision. Let the lawyers deal with it. When it comes to code, lawyers are definitely equipped to resolve security problems.

The write up contains a number of statistics, but it makes one major point:

Security debt is mounting.

Like a young worker who lives by moving credit card debt from vendor to vendor, getting out of the debt hole may be almost impossible. But, hey, it is that individual’s responsibility, not the system. Just be responsible. That is easy to say, and it strikes me as somewhat hollow.

Stephen E Arnold, February 15, 2024

Developers, AI Will Not Take Your Jobs… Yet

February 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

It seems programmers are safe from an imminent AI jobs takeover. The competent ones, anyway. LeadDev reports, “Researchers Say Generative AI Isn’t Replacing Devs Any Time Soon.” Generative AI tools have begun to lend developers a helping hand, but nearly half of developers are concerned they might loose their jobs to their algorithmic assistants.

image

Another MSFT Copilot completely original Bing thing. Good enough but that fellow sure looks familiar.

However, a recent study by researchers from Princeton University and the University of Chicago suggests they have nothing to worry about: AI systems are far from good enough at programming tasks to replace humans. Writer Chris Stokel-Walker tells us the researchers:

“… developed an evaluation framework that drew nearly 2,300 common software engineering problems from real GitHub issues – typically a bug report or feature request – and corresponding pull requests across 12 popular Python repositories to test the performance of various large language models (LLMs). Researchers provided the LLMs with both the issue and the repo code, and tasked the model with producing a workable fix, which was tested after to ensure it was correct. But only 4% of the time did the LLM generate a solution that worked.”

Researcher Carlos Jimenez notes these problems are very different from those LLMs are usually trained on. Specifically, the article states:

“The SWE-bench evaluation framework tested the model’s ability to understand and coordinate changes across multiple functions, classes, and files simultaneously. It required the models to interact with various execution environments, process context, and perform complex reasoning. These tasks go far beyond the simple prompts engineers have found success using to date, such as translating a line of code from one language to another. In short: it more accurately represented the kind of complex work that engineers have to do in their day-to-day jobs.”

Will AI someday be able to perform that sort of work? Perhaps, but the researchers consider it more likely we will never find AI coding independently. Instead, we will continue to need human developers to oversee algorithms’ work. They will, however, continue to make programmers’ jobs easier. If Jimenez and company are correct, developers everywhere can breathe a sigh of relief.

Cynthia Murrell, February 15, 2024

A Xoogler Explains AI, News, Inevitability, and Real Business Life

February 13, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I read an essay providing a tiny bit of evidence that one can take the Googler out of the Google, but that Xoogler still retains some Googley DNA. The item appeared in the Bezos bulldozer’s estimable publication with the title “The Real Wolf Menacing the News Business? AI.” Absolutely. Obviously. Who does not understand that?

image

A high-technology sophist explains the facts of life to a group of listeners who are skeptical about artificial intelligence. The illustration was generated after three tries by Google’s own smart software. I love the miniature horse and the less-than-flattering representation of a sales professional. That individual looks like one who would be more comfortable eating the listeners than convincing them about AI’s value.

The essay contains a number of interesting points. I want to highlight three and then, as I quite enjoy doing, I will offer some observations.

The author is a Xoogler who served from 2017 to 2023 as the senior director of news ecosystem products. I quite like the idea of a “news ecosystem.” But ecosystems as some who follow the impact of man on environments can be destroyed or pushed to the edge of catastrophe. In the aftermath of devastation coming from indifferent decision makers, greed fueled entrepreneurs, or rhinoceros poachers, landscapes are often transformed.

First, the essay writer argues:

The news publishing industry has always reviled new technology, whether it was radio or television, the internet or, now, generative artificial intelligence.

I love the word “revile.” It suggests that ignorant individuals are unable to grasp the value of certain technologies. I also like the very clever use of the word “always.” Categorical affirmatives make the world of zeros and one so delightfully absolute. We’re off to a good start I think.

Second, we have a remarkable argument which invokes another zero and one type of thinking. Consider this passage:

The publishers’ complaints were premised on the idea that web platforms such as Google and Facebook were stealing from them by posting — or even allowing publishers to post — headlines and blurbs linking to their stories. This was always a silly complaint because of a universal truism of the internet: Everybody wants traffic!

I love those universal truisms. I think some at Google honestly believe that their insights, perceptions, and beliefs are the One True Path Forward. Confidence is good, but the implication that a universal truism exists strikes me as information about a psychological and intellectual aberration. Consider this truism offered by my uneducated great grandmother:

Always get a second opinion.

My great grandmother used the logically troublesome word “always.” But the idea seems reasonable, but the action may not be possible. Does Google get second opinions when it decides to kill one of its services, modify algorithms in its ad brokering system, or reorganize its contentious smart software units? “Always” opens the door to many issues.

Publishers (I assume “all” publishers)k want traffic. May I demonstrate the frailty of the Xoogler’s argument. I publish a blog called Beyond Search. I have done this since 2008. I do not care if I get traffic or not. My goal was and remains to present commentary about the antics of high-technology companies and related subjects. Why do I do this? First, I want to make sure that my views about such topics as Google search exist. Second, I have set up my estate so the content will remain online long after I am gone. I am a publisher, and I don’t want traffic, or at least the type of traffic that Google provides. One exception causes an argument like the Xoogler’s to be shown as false, even if it is self-serving.

Third, the essay points its self-righteous finger at “regulators.” The essay suggests that elected officials pursued “illegitimate complaints” from publishers. I noted this passage:

Prior to these laws, no one ever asked permission to link to a website or paid to do so. Quite the contrary, if anyone got paid, it was the party doing the linking. Why? Because everybody wants traffic! After all, this is why advertising businesses — publishers and platforms alike — can exist in the first place. They offer distribution to advertisers, and the advertisers pay them because distribution is valuable and seldom free.

Repetition is okay, but I am able to recall one of the key arguments in this Xoogler’s write up: “Everybody wants traffic.” Since it is false, I am not sure the essay’s argumentative trajectory is on the track of logic.

Now we come to the guts of the essay: Artificial intelligence. What’s interesting is that AI magnetically pulls regulators back to the casino. Smart software companies face techno-feudalists in a high-stakes game. I noted this passage about anchoring statements via verification and just training algorithms:

The courts might or might not find this distinction between training and grounding compelling. If they don’t, Congress must step in. By legislating copyright protection for content used by AI for grounding purposes, Congress has an opportunity to create a copyright framework that achieves many competing social goals. It would permit continued innovation in artificial intelligence via the training and testing of LLMs; it would require licensing of content that AI applications use to verify their statements or look up new facts; and those licensing payments would financially sustain and incentivize the news media’s most important work — the discovery and verification of new information — rather than forcing the tech industry to make blanket payments for rewrites of what is already long known.

Who owns the casino? At this time, I would suggest that lobbyists and certain non-governmental entities exert considerable influence over some elected and appointed officials. Furthermore, some AI firms are moving as quickly as reasonably possible to convert interest in AI into revenue streams with moats. The idea is that if regulations curtail AI companies, consumers would not be well served. No 20-something wants to read a newspaper. That individual wants convenience and, of course, advertising.

Now several observations:

  1. The Xoogler author believes in AI going fast. The technology serves users / customers what they want. The downsides are bleats and shrieks from an outmoded sector; that is, those engaged in news
  2. The logic of the technologist is not the logic of a person who prefers nuances. The broad statements are false to me, for example. But to the Xoogler, these are self-evident truths. Get with our program or get left to sleep on cardboard in the street.
  3. The schism smart software creates is palpable. On one hand, there are those who “get it.” On the other hand, there are those who fight a meaningless battle with the inevitable. There’s only one problem: Technology is not delivering better, faster, or cheaper social fabrics. Technology seems to have some downsides. Just ask a journalist trying to survive on YouTube earnings.

Net net: The attitude of the Xoogler suggests that one cannot shake the sense of being right, entitlement, and logic associated with a Googler even after leaving the firm. The essay makes me uncomfortable for two reasons: [1] I think the author means exactly what is expressed in the essay. News is going to be different. Get with the program or lose big time. And [2] the attitude is one which I find destructive because technology is assumed to “do good.” I am not too sure about that because the benefits of AI are not known and neither are AI’s downsides. Plus, there’s the “everybody wants traffic.” Monopolistic vendors of online ads want me to believe that obvious statement is ground truth. Sorry. I don’t.

Stephen E Arnold, February 13, 2024

Sam AI-Man Puts a Price on AI Domination

February 13, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

AI start ups may want to amp up their fund raising. Optimism and confidence are often perceived as positive attributes. As a dinobaby, I think in terms of finding a deal at the discount supermarket. Sam AI-Man (actually Sam Altman) thinks big. Forget the $5 million investment in a semi-plausible AI play. “Think a bit bigger” is the catchphrase for OpenAI.

2 8 big piles of cash

Thinking billions? You silly goose. Think trillions. Thanks, MidJourney. Close enough, close enough.

How does seven followed by 12 zeros strike you? A reasonable figure. Well, Mr. AI-Man estimates that’s the cost of building world AI dominating chips, content, and assorted impedimenta in a quest to win the AI dust ups in assorted global markets. “OpenAI Chief Sam Altman Is Seeking Up to $7 TRILLION (sic) from Investors Including the UAE for Secretive Project to Reshape the Global Semiconductor Industry” reports:

Altman is reportedly looking to solve some of the biggest challenges faced by the rapidly-expanding AI sector — including a shortage of the expensive computer chips needed to power large-language models like OpenAI’s ChatGPT.

And where does one locate entities with this much money? The news report says:

Altman has met with several potential investors, including SoftBank Chairman Masayoshi Son and Sheikh Tahnoun bin Zayed al Nahyan, the UAE’s head of security.

To put the figure in context, the article says:

It would be a staggering and unprecedented sum in the history of venture capital, greater than the combined current market capitalizations of Apple and Microsoft, and more than the annual GDP of Japan or Germany.

Several observations:

  • The ante for big time AI has gone up
  • The argument for people and content has shifted to chip facilities to fabricate semiconductors
  • The fund-me tour is a newsmaker.

Net net: How about those small search-and-retrieval oriented AI companies? Heck, what about outfits like Amazon, Facebook, and Google?

Stephen E Arnold, February 13, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta