Philosophy and Money: Adam Smith Remains Flexible

March 6, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

In the early twenty-first century, China was slated to overtake the United States as the world’s top economy. Unfortunately for the “sleeping dragon,” China’s economy has tanked due to many factors. The country, however, still remains a strong spot for technology development such as AI and chips. The Register explains why China is still doing well in the tech sector: “How Did China Get So Good At Chips And AI? Congressional Investigation Blames American Venture Capitalists.”

Venture capitalists are always interested in increasing their wealth and subverting anything preventing that. While the US government has choked China’s semiconductor industry and denying it the use of tools to develop AI, venture capitalists are funding those sectors. The US’s House Select Committee on the China Communist Party (CCP) shared that five venture capitalists are funneling billions into these two industries: Walden International, Sequoia Capital, Qualcomm Ventures, GSR Ventures, and GGV Capital. Chinese semiconductor and AI businesses are linked to human rights abuses and the People’s Liberation Army. These five venture capitalist firms don’t appear interested in respecting human rights or preventing the spread of communism.

The House Select Committee on the CCP discovered that one $1.9 million went to AI companies that support China’s mega-surveillance state and aided in the Uyghur genocide. The US blacklisted these AI-related companies. The committee also found that $1.2 bullion was sent to 150 semiconductor companies.

The committee also accused of sharing more than funding with China:

“The committee also called out the VCs for "intangible" contributions – including consulting, talent acquisition, and market opportunities. In one example highlighted in the report, the committee singled out Walden International chairman Lip-Bu Tan, who previously served as the CEO of Cadence Design Systems. Cadence develops electronic design automation software which Chinese corporates, like Huawei, are actively trying to replicate. The committee alleges that Tan and other partners at Walden coordinated business opportunities and provided subject-matter expertise while holding board seats at SMIC and Advanced Micro-Fabrication Equipment Co. (AMEC).”

Sharing knowledge and business connections is equally bad (if not worse) than funding China’s tech sector. It’s like providing instructions and resources on how to build nuclear weapon. If China only had the resources it wouldn’t be as frightening.

Whitney Grace, March 6, 2024

Poohbahs Poohbahing: Just Obvious Poohbahing

March 6, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

We’re already feeling the effects of AI technology in deepfake videos and soundbites and generative text. While our present circumstances are our the beginning of AI technology, so-called experts are already claiming AI has gone bananas. The Verge, a popular Silicon Valley news outlet, released a new podcast episode where they declare that, “The AIs Are Officially Out Of Control.”

AI generated images and text aren’t 100% accurate. AI images are prone to include extra limbs, false representations of people, and even entirely miss the prompt. AI generative text is about as accurate as a Wikipedia article, so you need to double check and edit the response. Unfortunately AI are only as smart as the datasets that program them. AIs have been called “racist”and “sexist” due to limited data. Google Gemini also has gone too far on diversity and inclusion returning images that aren’t historically accurate when asked to deliver.

The podcast panelists made an obvious point when the pundits said that Google’s results qualities have declined. Bad SEO, crappy content, and paid results pollute search. They claim that the best results Google returns are coming from Reddit posts. Reddit is a catch-all online forum that Google recently negotiated deal with to use its content to train AI. That’s a great idea, especially when Reddit is going public on the stock market.

The problem is that Reddit is full of trolls who do things for %*^ and giggles. While Reddit is a brilliant source of information because it is created by real people, the bad actors will train the AI-chatbots to be “racist” and “sexist” like previous iterations. The worst incident involves ethnically diverse Nazis:

“Google has apologized for what it describes as “inaccuracies in some historical image generation depictions” with its Gemini AI tool, saying its attempts at creating a “wide range” of results missed the mark. The statement follows criticism that it depicted specific white figures (like the US Founding Fathers) or groups like Nazi-era German soldiers as people of color, possibly as an overcorrection to long-standing racial bias problems in AI.”

I am not sure which is the problem: Uninformed generalizations, flawed AI technology capable of zapping billions in a few hours, or minimum viable products are the equivalent of a blue jay fouling up a sparrow’s nest. Chirp. Chirp. Chirp.

Whitney Grace, March 6, 2024

The RCMP: Monitoring Sparks Criticism

March 5, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The United States and United Kingdom receive bad reps for monitoring their citizens’ Internet usage. Thankfully it is not as bad as China, Russia, and North Korea. The “hat” of the United States is hardly criticized for anything, but even Canada has its foibles. Canada’s Royal Canadian Mounted Police (RCMP) is in water hot enough to melt all its snow says The Madras Tribune: “RCMP Slammed For Private Surveillance Used To Trawl Social Media, ‘Darknet’.”

It’s been known that the RCMP has used private surveillance tools to monitor public facing information and other social media since 2015. The Privacy Commissioner of Canada (OPC) revealed that when the RCMP was collecting information, the police force failed to comply with privacy laws. The RCMP also doesn’t agree with the OPC’s suggestions to make their monitoring activities with third party vendors more transparent. The RCMP also argued that because they were using third party vendors they weren’t required to ensure that information was collected according to Canadian law.

The Mounties’ non-compliance began in 2014 after three police officers were shot. An information monitoring initiative called Project Wideawake started and it involved the software Babel X from Babel Street, a US threat intelligence company. Babel X allowed the RCMP to search social media accounts, including private ones, and information from third party data brokers.

Despite the backlash, the RCMP will continue to use Babel X:

“ ‘Despite the gaps in (the RCMP’s) assessment of compliance with Canadian privacy legislation that our report identifies, the RCMP asserted that it has done enough to review Babel X and will therefore continue to use it,’ the report noted. ‘In our view, the fact that the RCMP chose a subcontracting model to pay for access to services from a range of vendors does not abrogate its responsibility with respect to the services that it receives from each vendor.’”

Canada might be the politest of country in North America, but its government hides a facade dedicated to law enforcement as much as the US.

Whitney Grace, March 5, 2024

Just One Big Google Zircon Gemstone for March 5, 2024

March 5, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I have a folder stuffed with Google gems for the week of February 26 to March 1, 12023. I have a write up capturing more Australians stranded by following Google Maps’s representation of a territory, Google’s getting tangled in another publisher lawsuit, Google figuring out how to deliver better search even when the user’s network connection sucks, Google’s firing 43 unionized contractors while in the midst of a legal action, and more.

image

The brilliant and very nice wizard adds, “Yes, we have created a thing which looks valuable, but it is laboratory-generated. And it is gem and a deeply flawed one, not something we can use to sell advertising yet”. Thanks, MSFT Copilot Bing thing. Good enough and I liked the unasked for ethnic nuance.

But there is just one story: Google nuked billions in market value and created the meme of the week by making many images the heart and soul of diversity. Pundits wanted one half of the Sundar and Prabhakar comedy show yanked off the stage. Check out Stratechery’s view of Google management’s grasp of leading the company in a positive manner in Gemini and Google’s Culture. The screw up was so bad that even the world’s favorite expert in aircraft refurbishment and modern gas-filled airships spoke up. (Yep, that’s the estimable Sergey Brin!)

In the aftermath of a brilliant PR move, CNBC ran a story yesterday that summed up the February 26 to March 1 Google experience. The title was “Google Co-Founder Sergey Brin Says in Rare Public Appearance That Company ‘Definitely Messed Up’ Gemini Image Launch.” What an incisive comment from one of the father of “clever” methods of determining relevance. The article includes this brilliant analysis:

He also commented on the flawed launch last month of Google’s image generator, which the company pulled after users discovered historical inaccuracies and questionable responses. “We definitely messed up on the image generation,” Brin said on Saturday. “I think it was mostly due to just not thorough testing. It definitely, for good reasons, upset a lot of people.”

That’s the Google “gem.” Amazing.

Stephen E Arnold, March 5, 2024

Techno Bashing from Thumb Typers. Give It a Rest, Please

March 5, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Every generation says that the latest cultural and technological advancements make people stupider. Novels were trash, the horseless carriage ruined traveling, radio encouraged wanton behavior, and the list continues. Everything changed with the implementation of television aka the boob tube. Too much television does cause cognitive degradation. In layman’s terms, it means the brain goes into passive functioning rather than actively thinking. It would be almost a Zen moment. Addiction is fun for some.

The introduction of videogames, computers, and mobile devices augmented the decline of brain function. The combination of AI-chatbots and screens, however, might prove to be the ultimate dumbing down of humans. APA PsycNet posted a new study by Umberto León-Domínguez called, “Potential Cognitive Risks Of Generative Transformer-Based AI-Chatbots On Higher Order Executive Thinking.”

Psychologists already discovered that spending too much time on a screen (i.e. playing videogames, watching TV or YouTube, browsing social media, etc.) increases the risk of depression and anxiety. When that is paired with AI-chatbots, or programs designed to replicate the human mind, humans rely on the algorithms to think for them.

León-Domínguez wondered if too much AI-chatbot consumption impaired cognitive development. In his abstract he invented some handy new terms that:

“The “neuronal recycling hypothesis” posits that the brain undergoes structural transformation by incorporating new cultural tools into “neural niches,” consequently altering individual cognition. In the case of technological tools, it has been established that they reduce the cognitive demand needed to solve tasks through a process called “cognitive offloading.” Cognitive offloading”perfectly describes younger generations and screen addicts. “Cultural tools into neural niches” also respects how older crowds view new-fangled technology, coupled with how different parts of the brain are affected with technology advancements. The modern human brain works differently from a human brain in the 18th-century or two thousand years ago.

He found:

“The pervasive use of AI chatbots may impair the efficiency of higher cognitive functions, such as problem-solving. Importance: Anticipating AI chatbots’ impact on human cognition enables the development of interventions to counteract potential negative effects. Next Steps: Design and execute experimental studies investigating the positive and negative effects of AI chatbots on human cognition.”

Are we doomed? No. Do we need to find ways to counteract stupidity? Yes. Do we know how it will be done? No.

Isn’t tech fun?

Whitney Grace, March 6, 2024

SearXNG: A New Metasearch Engine

March 4, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Internet browsers and search engines are two of the top applications used on computers. Search engine giants like Bing and Google don’t respect users’ privacy and they track everything. They create individual user profiles then sell and use the information for targeted ads. The search engines also demote controversial information and return biased search results. On his blog, FlareXes shares a solution that protects privacy and encompasses metasearch: “Build Your Own Private Search Engine With SearXNG.”

SearXNG is an open source, customizable metasearch engine that returns search results from multiple sources and respects privacy. It was originally built off another open source project SearX. SearXNG has an extremely functional user interface. It also aggregates information from over seventy search engines, including DuckDuckGo, Brave Search, Bing, and Google.

The best thing about SearXNG is protecting user privacy: But perhaps the best thing about SearXNG is its commitment to user privacy. Unlike some search engines, SearXNG doesn’t track users or generate personalized profiles, and it never shares any information with third parties.”

Because SearXNG is a metasearch engine, it supports organic search results. This allows users to review information that would otherwise go unnoticed. That doesn’t mean the returns will allegedly be unbiased. The idea is that SearXNG returns better results than a revenue juggernaut:

SearXNG aggregates data from different search engines that doesn’t mean this could be biased. There is no way for Google to create a profile about you if you’re using SearXNG. Instead, you get high-quality results like Google or Bing. SearXNG also randomizes the results so no SEO or top-ranking will not gonna work. You can also enable independent search engines like Brave Search, Mojeek etc.”

If you want a search engine that doesn’t collect your personal data and has betters search results, warrants a test drive. The installation may require some tech fiddling.

Whitney Grace, March 4, 2024

Synthetic Data: From Science Fiction to Functional Circumscription

March 4, 2024

green-dino_thumbThis essay is the work of a dumb humanoid. No smart software required.

Synthetic data are information produced by algorithms, not by real-world events. It’s created using real-world data and numerical recipes. The appeal is that it is easier than collecting real life information, cheaper than dealing with data from real life, and faster than fooling around with surveys, monitoring devices, and law suits. In theory, synthetic data is one promising way of skirting the expense of getting humans involved.

What Is [a] Synthetic Sample – And Is It All It’s Cracked Up to Be?” tackles the subject of a synthetic sample, a topic which is one slice of the synthetic data universe. The article seeks “to uncover the truth behind artificially created qualitative and quantitative market research data.” I am going to avoid the question, “Is synthetic data useful?” because the answer is, “Yes.” Bean counters and those looking to find a way out of the pickle barrel filled with expensive brine are going to chase after the magic of algorithms producing data to do some machine learning magic.

image

In certain situations, fake flowers are super. Other times, the faux blooms are just creepy. Thanks, MSFT Copilot Bing thing. Good enough.

Are synthetic data better than real world data? The answer from my vantage point is, “It depends.” Fancy math can prove that for some use cases, synthetic data are “good enough”; that is, the data produce results close enough to what a “real” data set provides. Therefore, just use synthetic data. But for other applications, synthetic data might throw some sand in the well-oiled marketing collateral describing the wonders of synthetic data. (Some university research labs are quite skilled in PR speak, but the reality of their methods may not line up with the PowerPoints used to raise venture capital.)

This essay discusses a research project to figure out if a synthetic sample works or in my lingo if the synthetic sample is good enough. The idea is that as long as the synthetic data is within a specified error range, the synthetic sample can be used and may produce “reliable” or useful results. (At least one hopes this is the case.)

I want to focus on one portion of the cited article and invite you to read the complete Kantar explanation.

Here’s the passage which snagged my attention:

… right now, synthetic sample currently has biases, lacks variation and nuance in both qual and quant analysis. On its own, as it stands, it’s just not good enough to use as a supplement for human sample. And there are other issues to consider. For instance, it matters what subject is being discussed. General political orientation could be easy for a large language model (LLM), but the trial of a new product is hard. And fundamentally, it will always be sensitive to its training data – something entirely new that is not part of its training will be off-limits. And the nature of questioning matters – a highly ’specific’ question that might require proprietary data or modelling (e.g., volume or revenue for a particular product in response to a price change) might elicit a poor-quality response, while a response to a general attitude or broad trend might be more acceptable.

These sentences present several thorny problems is academic speak. Let’s look at them in the vernacular of rural Kentucky where I live.

First, we have the issue of bias. Training data can be unintentionally or intentionally biased. Sample radical trucker posts on Telegram, and use those messages to train a model like Reor. That output is going to express views that some people might find unpalatable. Therefore, building a synthetic data recipe which includes this type of Telegram content is going to be oriented toward truck driver views. That’s good and bad.

Second, a synthetic sample may require mixing data from a “real” sample. That’s a common sense approach which reduces some costs. But will the outputs be good enough. The question then becomes, “Good enough for what applications?” Big, general questions about how a topic is presented might be close enough for horseshoes. Other topics like those focusing on dealing with a specific technical issue might warrant more caution or outright avoidance of synthetic data. Do you want your child or wife to die because the synthetic data about a treatment regimen was close enough for horseshoes. But in today’s medical structure, that may be what the future holds.

Third, many years ago, one of the early “smart” software companies was Autonomy, founded by Mike Lynch. In the 1990s, Bayesian methods were known but some — believe it or not — were classified and, thus, not widely known. Autonomy packed up some smart software in the Autonomy black box. Users of this system learned that the smart software had to be retrained because new terms and novel ideas not in the original training set were not findable by the neuro linguistic program’s engine.  Yikes, retraining requires human content curation of data sets, time to retrain the system, and the expense of redeploying the brains of the black boxes. Clients did not like this and some, to be frank, did not understand why a product did not work like an MG sports car. Synthetic data has to be trained to “know” about new terms and avid the “certain blindness” probability based systems possess.

Fourth, the topic of “proprietary data modeling” means big bucks. The idea behind synthetic data is that it is cheaper. Building proprietary training data and keeping it current is expensive. Is it better? Yeah, maybe. Is it faster? Probably not when humans are doing the curation, cleaning, verifying, and training.

The write up states:

But it’s likely that blended models (human supplemented by synthetic sample) will become more common as LLMs get even more powerful – especially as models are finetuned on proprietary datasets.

Net net: Synthetic data warrants monitoring. Some may want to invest in synthetic data set companies like Kantar, for instance. I am a dinobaby, and I like the old-fashioned Stone Age approach to data. The fancy math embodies sufficient risk for me. Why increase risk? Remember my reference to a dead loved one? That type of risk.

Stephen E Arnold, March 4, 2023

Technology Becomes Detroit

March 4, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Have you ever heard of technical debt? Technical debt is when an IT team prioritize speedy delivery of a product over creating a feasible, quality product. Technology history is full of technical debt. Some of the more famous cases are the E.T. videogame for the Atari, Windows Vista, and the Samsung Galaxy Gear. Technical debt is an ongoing issue for IT departments and tech companies. It’s apparently getting worse. ITPro details the current problems with technical debt in, “IT Leaders Need To Accept They’ll Never Escape Technical Debt, But That Doesn’t Mean They Should Down Tools.”

Gordon Haff is a senior leader at Red Hat and a technology evangelist. Haff told ITPro that tech experts will continue to remain hindered as they continue to deal with technical debt and shill shortages. Tech experts want to advance their field with transformative projects but they’re held back by the same aforementioned issues. Haff stressed that as soon as one project is complete, tech experts build the next project on existing architecture. It creates a technical debt infrastructure.

Haff provided an example using a band-aid metaphor:

“Haff pointed toward application modernization as a prime example of this rinse and repeat trend. Many enterprises, he said, deliberately choose to not tinker with certain applications due to the fact they still worked nominally.

Fast forward several years later, these applications are overhauled and modernized, then are left to their own devices – to some extent – and reassessed during the next transformation cycle.

‘If you go back 10 years, we had this sort of bimodal IT, or fast-slow IT, that was kind of the thing,” he explained. “The idea was ‘we’ll leave that old stuff, we’ll shove that off into the corner and not worry about it’ and the cool kids can work on all this greenfield, often new customer-facing applications.

‘But by and large, it’s then a case of ‘oh we actually need to deal with this core business stuff’ and these older applications.’”

Haff suggests that IT experts shouldn’t approach their work with a “one and done” mindset. They should realize their work is constantly evolving. These should be aware of how to go with the flow and program legacy systems that don’t transform into large messes. There’s a reason videogame companies have beta tests, restaurants have soft openings, and musicals have previews. They test things to deliver quality products. Technical debt leads to technical rot.

Whitney Grace, March 4, 2024

Forget the Words. Do Short-Form Video by Hiring a PR Professional

March 1, 2024

green-dino_thumbThis essay is the work of a dumb humanoid. No smart software required.

I think “Everyone’s a Sellout Now” is about 4,000 words. The main idea is that traditional publishing is roached. Artists and writers must learn to do video editing or have enough of mommy and daddy’s money to pay someone to promote the creator’s output. The essay is well written; however, I am not sure it conveys a TikTok fact unknown or hiding in the world of BlueSky-type services.

image

This bright young student should have used a ChatGPT-type service. Thanks, MSFT Copilot. At least you are outputting which is more than I can say for your fierce but lagging competitor.

I noted this passage:

Because self-promotion sucks.

I think I agree, but why not hire an “output handler.” The OH does the PR.

Here’s another quote to note:

The problem is that America more or less runs on the concept of selling out.

Is there a fix for the gasoline of America? Yes. The essay asserts:

author-content creators succeed by making the visually uninteresting labor of typing on a laptop worthwhile to watch.

The essay concludes with this less-than-uplifting comment:

To achieve the current iteration of the American dream, you’ve got to shout into the digital void and tell everyone how great you are. All that matters is how many people believe you.

Downer? Yes, and what makes it fascinating is that the author gets paid for writing. I think this is a “real job.”

Several observations:

  1. I think smart software is going to do more than write wacko stuff for SmartNews-type publications.
  2. Readers of “downer” essays are likely to go more “down”; that is, become less positive and increasingly antagonistic to what makes the US of A tick
  3. The essay delivers the news about the importance of TikTok without pointing out that the service is China-affiliated and provides content not permitted for consumption in China.

Net net: Hire a gig worker to do the OH. Pay for PR. Quit complaining or complain in fewer words.

PS. The categorical affirmative of “everyone” is disproved with a single example. As I have pointed out in an essay about a grousing Xoogler, I operate differently. Therefore, the everyone is like fuzzy antecedents. Sloppy.

Stephen E Arnold, March 1, 2024

Bad News Delivered via Math

March 1, 2024

green-dino_thumbThis essay is the work of a dumb humanoid. No smart software required.

I am not going to kid myself. Few people will read “Hallucination is Inevitable: An Innate Limitation of Large Language Models” with their morning donut and cold brew coffee. Even fewer will believe what the three amigos of smart software at the National University of Singapore explain in their ArXiv paper. Hard on the heels of Sam AI-Man’s ChatGPT mastering Spanglish, the financial payoffs are just too massive to pay much attention to wonky outputs from smart software. Hey, use these methods in Excel and exclaim, “This works really great.” I would suggest that the AI buggy drivers slow the Kremser down.

image

The killer corollary. Source: Hallucination is Inevitable: An Innate Limitation of Large Language Models.

The paper explains that large language models will be reliably incorrect. The paper includes some fancy and not so fancy math to make this assertion clear. Here’s what the authors present as their plain English explanation. (Hold on. I will give the dinobaby translation in a moment.)

Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all of the computable functions and will therefore always hallucinate. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs.

Here’s my take:

  1. The map is not the territory. LLMs are a map. The territory is the human utterances. One is small and striving. The territory is what is.
  2. Fixing the problem requires some as yet worked out fancier math. When will that happen? Probably never because of no set can contain itself as an element.
  3. “Good enough” may indeed by acceptable for some applications, just not “all” applications. Because “all” is a slippery fish when it comes to models and training data. Are you really sure you have accounted for all errors, variables, and data? Yes is easy to say; it is probably tough to deliver.

Net net: The bad news is that smart software is now the next big thing. Math is not of too much interest, which is a bit of a problem in my opinion.

Stephen E Arnold, March 1, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta