Dust Up: Social Justice and STEM Publishing

June 28, 2023

Are you familiar with “social justice warriors?” These are people who. Take it upon themselves to police the world for their moral causes, usually from a self-righteous standpoint. Social justice warriors are also known my the acronym SJWs and can cross over into the infamous Karen zone. Unfortunately Heterodox STEM reports SJWs have invaded the science community and Anna Krylov and Jay Tanzman discussed the issue in their paper: “Critical Social Justice Subverts Scientific Publishing.”

SJWs advocate for the politicization of science, adding an ideology to scientific research also known as critical social justice (CSJ). It upends the true purpose of science which is to help and advance humanity. CSJ adds censorship, scholarship suppression, and social engineering to science.

Krylov and Tanzmans’ paper was presented at the Perils for Science in Democracies and Authoritarian Countries and they argue CSJ harms scientific research than helps it. They compare CSJ to Orwell’s fictional Ministry of Love; although real life examples such as Josef Goebbels’s Nazi Ministry of Propaganda, the USSR’s Department for Agitation and Propaganda, and China’s authoritarian regime work better. CSJ is the opposite of the Enlightenment that liberated human psyches from religious and royal dogmas. The Enlightenment engendered critical thinking, the scientific process, philosophy, and discovery. The world became more tolerant, wealthier, educated, and healthier as a result.

CSJ creates censorship and paranoia akin to tyrannical regimes:

“According to CSJ ideologues, the very language we use to communicate our findings is a minefield of offenses. Professional societies, universities, and publishing houses have produced volumes dedicated to “inclusive” language that contain long lists of proscribed words that purportedly can cause offense and—according to the DEI bureaucracy that promulgates these initiatives—perpetuate inequality and exclusion of some groups, disadvantage women, and promote patriarchy, racism, sexism, ableism, and other isms. The lists of forbidden terms include “master database,” “older software,” “motherboard,” “dummy variable,” “black and white thinking,” “strawman,” “picnic,” and “long time no see” (Krylov 2021: 5371, Krylov et al. 2022: 32, McWhorter 2022, Paul 2023, Packer 2023, Anonymous 2022). The Google Inclusive Language Guide even proscribes the term “smart phones” (Krauss 2022). The Inclusivity Style  Guide of the American Chemical Society (2023)—a major chemistry publisher of more than 100 titles—advises against using such terms as “double blind studies,” “healthy weight,” “sanity check,” “black market,” “the New World,” and “dark times”…”

New meanings that cause offense are projected onto benign words and their use is taken out of context. At this rate, everything people say will be considered offensive, including the most uncontroversial topic: the weather.

Science must be free from CSJ ideologies but also corporate ideologies that promote profit margins. Examples from American history include, Big Tobacco, sugar manufacturers, and Big Pharma.

Whitney Grace, June 28, 2023

Google: I Promise to Do Better. No, Really, Really Better This Time

June 27, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

The UK online publication The Register made available this article: “Google Accused of Urging Android Devs to Mislabel Apps to Get Forbidden Kids Ad Data.” The write up is not about TikTok. The subject is Google and an interesting alleged action by the online advertising company.

6 24 i promise

The high school science club member who pranked the principal says when caught: “Listen to me, Mr. Principal. I promise I won’t make that mistake again. Honest. Cross my heart and hope to die. Boy scout’s honor. No, really. Never, ever, again.” The illustration was generated by the plagiarism-free MidJourney.

The write up states as “actual factual” behavior by the company:

The complaint says that both Google and app developers creating DFF apps stood to gain by not applying the strict “intended for children” label. And it claims that Google incentivized this mislabeling by promising developers more advertising revenue for mixed-audience apps.

The idea is that intentionally assigned metadata made it possible for Google to acquire information about a child’s online activity.

My initial reaction was, “What’s new? Google says one thing and then demonstrates it adolescent sense of cleverness via a workaround?

After a conversation with my team, I formulated a different hypothesis; specifically, Google has institutionalized mechanisms to make it possible for the company’s actual behavior to be whatever the company wants its behavior to be.

One can hope this was a one-time glitch. My “different hypothesis” points to a cultural and structural policy to make it possible for the company to do what’s necessary to achieve its objective.

Stephen E Arnold, June 27, 2023

The New Ethics: Harvard Innovates Again

June 26, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I have no idea if the weird orange newspaper’s story “Harvard Dishonesty Expert Accused of Dishonesty” is on the money. I find it amusing and a useful insight into the antics of Ivory Tower professor behavior. As a old dinobaby, I have seen a number of examples of what one of Tennessee Williams’ well-adjusted characters called mendacity. And this Harvard confection is a topper.

22 june man caught

The snagged wizard, in my mental theater said, “I did not mean to falsify data, plagiarize, or concoct a modest amount of twaddle like the president of Stanford University. I apologize. I really am sorry. May I buy you a coffee? I could also write your child a letter of recommendation to Harvard admissions.” This touching and now all-too-common scene has been visualized by the really non-imitative MidJourney system.

The core of the “real news” story is captured in this segment of the article:

A high-profile expert on ethics and dishonesty is facing allegations of dishonesty in her own work and has taken administrative leave from Harvard Business School.

The “real news” article called attention to the behavior of the high profile expert; to wit:

In 2021, a 2012 paper on dishonesty by Gino, behavioral economist Dan Ariely and other co-authors was retracted from the journal Proceedings of the National Academy of Sciences after the Data Colada team suggested there was fraud in one of the experiments involved. [Ah, Data Colada, the apologizing professor’s pals.]

If true, the professor attacked the best-selling author and others for not being on the up and up. And that mud slinger from the dusty Wild West of Harvard’s ethics unit alleged fudged information. That’s a slick play in my book.

What’s this say about the ethical compass of the professor, about Harvard’s hiring and monitoring processes, and about the failure of the parties to provide a comment to the weird orange newspaper?

Ah, no comment. A wise lawyer’s work possibly. An ethical wise lawyer.

Stephen E Arnold, June 26, 2023

The Future from the Masters of the Obvious

June 26, 2023

The last few years have seen many societal changes that, among other things, affect business operations. Gartner corals these seismic shifts into six obvious considerations for its article, “6 Macro Factors Reshaping Business this Decade.” Contributor Jordan Turner writes:

“Executives will continue to grapple with a host of challenges during the 2020s, but from the maelstrom that was their first few years, new business opportunities will arise. ‘As we entered the 2020s, economies were already on the edge,’ says Mark Raskino, Distinguished VP Analyst at Gartner. ‘A decade-long boom, generated substantially from inexpensive finance and lower-cost energy, led to structural stresses such as highly leveraged debt, crumbling international alliances and bubble-like asset prices. We were overdue for a reckoning.’ Six macro factors that will reshape business this decade. The pandemic coincided with and catalyzed societal shifts, spurring a strategy reset for many industries. Executive leaders must acknowledge these six changes to reconsider how business will get done.”

Their list includes: the threat of recession, systemic mistrust, poor economic productivity, sustainability, a talent shortage, and emerging technologies. See the write-up for details on each. Not surprisingly, the emerging technologies list includes adaptive AI alongside the metaverse, platform engineering, sustainable technology and superapps. Unfortunately, the Gartner wizards omitted replacing consultants and analysts with smart software. That may be the most cost-effective transition for businesses yet the most detrimental to workers. We wonder why they left it out.

And grapple? Yes, grapple. I wonder if Gartner will have a special presentation and a conference about these. Attendees can grapple. Like Musk and Zuck?

Cynthia Murrell, June 26, 2023

Canada Bill C-18 Delivers a Victory: How Long Will the Triumph Pay Off in Cash Money?

June 23, 2023

News outlets make or made most of their money selling advertising. The idea was — when I worked at a couple of big news publishing companies — the audience for the content would attract those who wanted to reach the audience. I worked at the Courier-Journal & Louisville Times Co. before it dissolved into a Gannett marvel. If a used car dealer wanted to sell a 1980 Corvette, the choice was the newspaper or a free ad in what was called AutoTrader. This was a localized, printed collection of autos for sale. Some dealers advertised, but in the 1980s, individuals looking for a cheap or free way to pitch a vehicle loved AutoTrader. Despite a free option, the size of the readership and the sports news, comics, and obituaries made the Courier-Journal the must-have for a motivated seller.

6 23 cannae

Hannibal and his war elephant Zuckster survey the field of battle after Bill C-18 passes. MidJourney was the digital wonder responsible for this confection.

When I worked at the Ziffer in Manhattan, we published Computer Shopper. The biggest Computer Shopper had about 800 pages. It could have been bigger, but there were paper and press constraints If I recall correctly. But I smile when I remember that 85 percent of those pages were paid advertisements. We had an audience, and those in the burgeoning computer and software business wanted to reach our audience. How many Ziffers remember the way publishing used to work?

When I read the National Post article titled “Meta Says It’s Blocking News on Facebook, Instagram after Government Passes Online News Bill,” I thought about the Battle of Cannae. The Romans had the troops, the weapons, and the psychological advantage. But Hannibal showed up and, if historical records are as accurate as a tweet, killed Romans and mercenaries. I think it may have been estimated that Roman whiz kids lost 40,000 troops and 5,000 cavalry along with the Roman strategic wizards Paulus, Servilius, and Atilius.

My hunch is that those who survived paid with labor or money to be allowed to survive. Being a slave in peak Rome was a dicey gig. Having a fungible skill like painting zowie murals was good. Having minimal skills? Well, someone has to work for nothing in the fields or quarries.

What’s the connection? The publishers are similar to the Roman generals. The bad guys are the digital rebels who are like Hannibal and his followers.

Back to the cited National Post article:

After the Senate passed the Online News Act Thursday, Meta confirmed it will remove news content from Facebook and Instagram for all Canadian users, but it remained unclear whether Google would follow suit for its platforms.  The act, which was known as Bill C-18, is designed to force Google and Facebook to share revenues with publishers for news stories that appear on their platforms. By removing news altogether, companies would be exempt from the legislation.

The idea is that US online services which touch most online users (maybe 90 or 95 percent in North America) will block news content. This means:

  1. Cash gushers from Facebook- and Google-type companies will not pay for news content. (This has some interesting downstream consequences but for this short essay, I want to focus on the “not paying” for news.)
  2. The publishers will experience a decline in traffic. Why? Without a “finding and pointing” mechanism, how would I find this “real news” article published by the National Post. (FYI: I think of this newspaper as Canada’s USAToday, which was a Gannett crown jewel. How is that working out for Gannett today?)
  3. Rome triumphed only to fizzle out again. And Hannibal? He’s remembered for the elephants-through-the-Alps trick. Are man’s efforts ultimately futile?

What happens if one considers, the clicks will stop accruing to the publishers’ Web sites. How will the publishers generate traffic? SEO. Yeah, good luck with that.

Is there an alternative?

Yes, buy Facebook and Google advertising. I call this pay to play.

The Canadian news outlets will have to pay for traffic. I suppose companies like Tyler Technologies, which has an office in Vancouver I think, could sell ads for the National Post’s stories, but that seems to be a stretch. Similarly the National Post could buy ads on the Embroidery Classics & Promotions (Calgary) Web site, but that may not produce too many clicks for the Canadian news outfits. I estimate one or two a month.

Bill C-18 may not have the desired effect. Facebook and Facebook-type outfits will want to sell advertising to the Canadian publishers in my opinion. And without high-impact, consistent and relevant online advertising, state-of-art marketing, and juicy content, the publishers may find themselves either impaled on their digital hopes or placed in servitude to the Zuck and his fellow travelers.

Are these publishers able to pony up the cash and make the appropriate decisions to generate revenues like the good old days?

Sure, there’s a chance.

But it’s a long shot. I estimate the chances as similar to King Charles’ horse winning the 2024 King George V Stakes race in 2024; that is, 18 to 1. But Desert Hero pulled it off. Who is rooting for the Canadian publishers?

Stephen E Arnold, June 23, 2023

High School Redux: Dust Up in the Science Club

June 22, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

One cannot make up certain scenarios. Let me illustrate.

Navigate to “Google Accuses Microsoft of Anticompetitive Cloud Practices in Complaint to FTC.” You will have to pony up to read the article. The main point is that the Google “filed a complaint to the U.S. Federal Trade Commission.” Why? Microsoft is acting in an unfair manner. Is the phrase “Holy cow” applicable. Two quasi or at least almost monopolies are at odds. Amazing.

6 22 high schoool fight

MidJourney’s wealth of originality produced this image of two adolescents threatening one another. Is the issue a significant other? A dented bicycle? A solution to a tough math problem like those explained by PreMath? Nope. The argument is about more weighty matters: Ego. Will one of these mature wizards call their mom? A more likely outcome is to let loose a flurry of really macho legal eagles and/or a pride of PR people.

But the next item is even more fascinating. Point your click monitoring, data sucking browser at “Send Me Location: Mark Zuckerberg Says He’s Down to Fight Elon Musk in a Cage Match.” Visualize if you will Elon Musk and Mark Zuckerberg entering the ring at a Streetbeefs’ venue. The referee is the ever-alert Anomaly. Scarface is in the ring just in case some real muscle is needed to separate the fighters.

Let’s step back: Google wants to be treated fairly because Microsoft is using its market power to make sure the Google is finding it difficult to expand its cloud business. What’s the fix? Google goes to court. Yeah, bold. What about lowering prices, improving service, and providing high value functionality? Nah, just go to court. Is this like two youngsters arguing in front of their lockers and one of them telling the principal that Mr. Softie is behaving badly.

And the Musk – Zuckerberg drama? An actual physical fight? No proxies. Just no-holds-barred fisticuffs? Apparently that’s the implication of the cited story. That social media territory is precious by golly.

Several observations:

  1. Life is surprising
  2. Alleged techno-giants are oblivious to the concept of pettiness
  3. Adolescent behavior, not sophisticated management methods, guide certain firms.

Okay, ChatGPT, beat these examples for hallucinatory content. Not even smart software can out-think how high school science club members process information and behave in front of those not in the group.

Stephen E Arnold, June 22, 2023

News Flash about SEO: Just 20 Years Too Late but, Hey, Who Pays Attention?

June 21, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I read an article which would have been news a couple of decades ago. But I am a dinobaby (please, see anigif bouncing in an annoying manner) and I am hopelessly out of touch with what “real news” is.

6 16 unhappy woman

An entrepreneur who just learned that in order to get traffic to her business Web site, she will have to spend big bucks and do search engine optimization, make YouTube videos (long and short), and follow Google’s implicit and explicit rules. Sad, MBA, I believe. The Moping Mistress of the Universe is a construct generated by the ever-innovative MidJourney and its delightful Discord interface.

The write up catching my attention is — hang on to your latte — “A Storefront for Robots: The SEO Arms Race Has Left Google and the Web Drowning in Garbage Text, with Customers and Businesses Flailing to Find Each Other.” I wondered if the word “flailing” is a typographic error or misspelling of “failing.” Failing strikes me as a more applicable word.

The thesis of the write up is that the destruction of precision and recall as useful for relevant online search and retrieval is not part of the Google game plan.

The write up asserts:

The result is SEO chum produced at scale, faster and cheaper than ever before. The internet looks the way it does largely to feed an ever-changing, opaque Google Search algorithm. Now, as the company itself builds AI search bots, the business as it stands is poised to eat itself.

Ah, ha. Garbage in, garbage out! Brilliant. The write up is about 4,000 words and makes clear that ecommerce requires generating baloney for Google.

To sum up, if you want traffic, do search engine optimization. The problem with the write up is that it is incorrect.

Let me explain. Navigate to “Google Earned $10 Million by Allowing Misleading Anti-Abortion Ads from Fake Clinics, Report Says.” What’s the point of this report? The answer is, “Google ads.” And money from a controversial group of supporters and detractors. Yes! An arms race of advertising.

Of course, SEO won’t work. Why would it? Google’s business is selling advertising. If you don’t believe me, just go to a conference and ask any Googler — including those wearing Ivory Tower Worker” pins — and ask, “How important is Google’s ad business?” But you know what most Googlers will say, don’t you?

For decades, Google has cultivated the SEO ploy for one reason. Failed SEO campaigns end up one place, “Google Advertising.”

Why?

If you want traffic, like the abortion ad buyers, pony up the cash. The Google will punch the Pay to Play button, and traffic results. One change kicked in after 2006. The mom-and-pop ad buyers were not as important as one of the “brand” advertisers. And what was that change? Small advertisers were left to the SEO experts who could then sell “small” ad campaigns when the hapless user learned that no one on the planet could locate the financial advisory firm named “Financial Specialist Advisors.” Ah, then there was Google Local. A Googley spin on Yellow Pages. And there have been other innovations to make it possible for advertisers of any size to get traffic, not much because small advertisers spend small money. But ad dollars are what keeps Googzilla alive.

Net net: Keep in mind that Google wants to be the Internet. (AMP that up, folks.) Google wants people to trust the friendly beastie. The Googzilla is into responsibility. The Google is truth, justice, and the digital way. Is the criticism of the Google warranted? Sure, constructive criticism is a positive for some. The problem I have is that it is 20 years too late. Who cares? The EU seems to have an interest.

Stephen E Arnold, June 21, 2023

The Famous Google Paper about Attention, a Code Word for Transformer Methods

June 20, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Wow, many people are excited a Bloomberg article calledThe AI Boom Has Silicon Valley on Another Manic Quest to Change the World: A Guide to the New AI Technologies, Evangelists, Skeptics and Everyone Else Caught Up in the Flood of Cash and Enthusiasm Reshaping the Industry.”

In the tweets and LinkedIn posts one small factoid is omitted from the second hand content. If you want to read the famous DeepMind-centric paper which doomed the Google Brain folks to watch their future from the cheap seats, you can find “Attention Is All You Need”, branded with the imprimatur of the Neural Information Processing Systems Conference held in 2017. Here’s the link to the paper.

For those who read the paper, I would like to suggest several questions to consider:

  1. What economic gain does Google derive from proliferation of its transformer system and method; for example, the open sourcing of the code?
  2. What does “attention” mean for [a] the cost of training and [b] the ability to steer the system and method? (Please, consider the question from the point of view of the user’s attention, the system and method’s attention, and a third-party meta-monitoring system such as advertising.)
  3. What other tasks of humans, software, and systems can benefit from the user of the Transformer system and methods?

I am okay with excitement for a 2017 paper, but including a link to the foundation document might be helpful to some, not many, but some.

Net net: Think about Google’s use of the word “trust” and “responsibility” when you answer the three suggested questions.

Stephen E Arnold, June 20, 2023

Google: Smart Software Confusion

June 19, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I cannot understand. Not only am I old; I am a dinobaby. Furthermore, I am like one of William James’s straw men: Easy to knock down or set on fire. Bear with me this morning.

I read “Google Skeptical of AI: Google Doesn’t Trust Its Own AI Chatbots, Asks Employees Not to Use Bard.” The write up asserts as “real” information:

It seems that Google doesn’t trust any AI chatbot, including its own Bard AI bot. In an update to its security measures, Alphabet Inc., Google’s parent company has asked its employees to keep sensitive data away from public AI chatbots, including their own Bard AI.

The go-to word for the Google in the last few weeks is “trust.” The quote points out that Google doesn’t “trust” its own smart software. Does this mean that Google does not “trust” that which it created and is making available to its “users”?

6 17 google gatekeeper

MidJourney, an interesting but possibly insecure and secret-filled smart software system, generated this image of Googzilla as a gatekeeper. Are gatekeepers in place to make money, control who does what, and record the comings and goings of people, data, and content objects?

As I said, I am a dinobaby, and I think I am dumb. I don’t follow the circular reasoning; for example:

Google is worried that human reviewers may have access to the chat logs that these chatbots generate. AI developers often use this data to train their LLMs more, which poses a risk of data leaks.

Now the ante has gone up. The issue is one of protecting itself from its own software. Furthermore, if the statement is accurate, I take the words to mean that Google’s Mandiant-infused, super duper, security trooper cannot protect Google from itself.

Can my interpretation be correct? I hope not.

Then I read “This Google Leader Says ML Infrastructure Is Conduit to Company’s AI Success.” The “this” refers to an entity called Nadav Eiron, a Stanford PhD and Googley wizard. The use of the word “conduit” baffles me because I thought “conduit” was a noun, not a verb. That goes to support my contention that I am a dumb humanoid.

Now let’s look at the text of this write up about Google’s smart software. I noted this passage:

The journey from a great idea to a great product is very, very long and complicated. It’s especially complicated and expensive when it’s not one product but like 25, or however many were announced that Google I/O. And with the complexity that comes with doing all that in a way that’s scalable, responsible, sustainable and maintainable.

I recall someone telling me when I worked at a Fancy Dan blue chip consulting firm, “Stephen, two objectives are zero objectives.” Obviously Google is orders of magnitude more capable than the bozos at the consulting company. Google can do 25 objectives. Impressive.

I noted this statement:

we created the OpenXLA [an open-source ML compiler ecosystem co-developed by AI/ML industry leaders to compile and optimize models from all leading ML frameworks] because the interface into the compiler in the middle is something that would benefit everybody if it’s commoditized and standardized.

I think this means that Google wants to be the gatekeeper or man in the middle.

Now let’s consider the first article cited. Google does not want its employees to use smart software because it cannot be trusted.

Is it logical to conclude that Google and its partners should use software which is not trusted? Should Google and its partners not use smart software because it is not secure? Given these constraints, how does Google make advances in smart software?

My perception is:

  1. Google is not sure what to do
  2. Google wants to position its untrusted and insecure software as the industry standard
  3. Google wants to preserve its position in a workflow to maximize its profit and influence in markets.

You may not agree. But when articles present messages which are alarming and clearly focused on market control, I turn my skeptic control knob. By the way, the headline should be “Google’s Nadav Eiron Says Machine Learning Infrastructure Is a Conduit to Facilitate Google’s Control of Smart Software.”

Stephen E Arnold, June 19, 2023

Is Smart Software Above Navel Gazing: Nope, and It Does Not Care

June 15, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]_thumb_thumb_thumb_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Synthetic data. Statistical smoothing. Recursive methods. When we presented our lecture “OSINT Blindspots” at the 2023 National Cyber Crime Conference, the audience perked up. The terms might have been familiar, but our framing caught the more than 100 investigators’ attention. The problem my son (Erik) and I described was butt simple: Faked data will derail a prosecution if an expert witness explains that machine-generated output may be wrong.

We provided some examples, ranging from a respected executive who obfuscates his “real” business from a red-herring business. We profiled how information about a fervid Christian adherence to God’s precepts overshadowed a Ponzi scheme. We explained how an American living in Eastern Europe openly flaunts social norms in order to distract authorities from an encrypted email business set up to allow easy, seamless communication for interesting people. And we included more examples.

6 14 how long befoe...

An executive at a big time artificial intelligence firm looks over his domain and asks himself, “How long will it take for the boobs and boobettes to figure out that our smart software is wonky?” The illustration was spit out by the clever bits and bytes at MidJourney.

What’s the point in this blog post? Who cares besides analysts, lawyers, and investigators who have to winnow facts which are verifiable from shadow or ghost information activities?

It turns out that a handful of academics seem to have an interest in information manipulation. Their angle of vision is broader than my team’s. We focus on enforcement; the academics focus on tenure or getting grants. That’s okay. Different points of view lead to interesting conclusions.

Consider this academic and probably tough to figure out illustration from “The Curse of Recursion: Training on Generated Data Makes Models Forget”:

image

A less turgid summary of the researchers’ findings appears at this location.

The main idea is that gee-whiz methods like Snorkel and small language models have an interesting “feature.” They forget; that is, as these models ingest fake data they drift, get lost, or go off the rails. Synthetic cloth, unlike natural cotton T shirts, look like shirts. But on a hot day, those super duper modern fabrics can cause a person to perspire and probably emit unusual odors.

The authors introduce and explain “model collapse.” I am no academic. My interpretation of the glorious academic prose is that the numerical recipes, systems, and methods don’t work like the nifty demonstrations. In fact, over time, the models degrade. The hapless humanoids who are dependent on these lack the means to figure out what’s on point and what’s incorrect. The danger, obviously, is that clueless and lazy users of smart software make more mistakes in judgment than a person might otherwise reach.

The paper includes fancy mathematics and more charts which do not exactly deliver on the promise of a picture is worth a thousand words. Let me highlight one statement from the journal article:

Our evaluation suggests a “first mover advantage” when it comes to training models such as LLMs. In our work we demonstrate that training on samples from another generative model can induce a distribution shift, which over time causes Model Collapse. This in turn causes the model to mis-perceive the underlying learning task. To make sure that learning is sustained over a long time period, one needs to make sure that access to the original data source is preserved and that additional data not generated by LLMs remain available over time. The need to distinguish data generated by LLMs from other data raises questions around the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale. One option is community-wide coordination to ensure that different parties involved in LLM creation and deployment share the information needed to resolve questions of provenance. Otherwise, it may become increasingly difficult to train newer versions of LLMs without access to data that was crawled from the Internet prior to the mass adoption of the technology, or direct access to data generated by humans at scale.

Bang on.

What the academics do not point out are some “real world” business issues:

  1. Solving this problem costs money; the point of synthetic and machine-generated data is to reduce costs. Cost reduction wins.
  2. Furthermore, fixing up models takes time. In order to keep indexes fresh, delays are not part of the game plan for companies eager to dominate a market which Accenture pegs as worth trillions of dollars. (See this wild and crazy number.)
  3. Fiddling around to improve existing models is secondary to capturing the hearts and minds of those eager to worship a few big outfits’ approach to smart software. No one wants to see the problem because that takes mental effort. Those inside one of firms vying to own information framing don’t want to be the nail that sticks up. Not only do the nails get pounded down, they are forced to leave the platform. I call this the Dr. Timnit Gebru effect.

Net net: Good paper. Nothing substantive will change in the short or near term.

Stephen E Arnold, June 15, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta