Stanford University: Trust Us. We Can Rank AI Models… Well, Because
October 19, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“Maybe We Will Finally Learn More about How A.I. Works” is a report about Stanford University’s effort to score AI vendors like the foodies at Michelin Guide rate restaurants. The difference is that a Michelin Guide worker can eat Salade Niçoise and escargots de Bourgogne. AI relies on marketing collateral, comments from those managing something, and fairy dust, among other inputs.
Keep in mind, please, that Stanford graduates are often laboring in the AI land of fog and mist. Also, the former president of Stanford University departed from the esteemed institution when news of his alleged fabricating data for his peer reviewed papers circulated in the mists of Palo Alto. Therefore, why not believe what Stanford says?
The analysts labor away, intent on their work. Analyzing AI models using 100 factors is challenging work. Thanks, MidJourney. Very original.
The New York Times reports:
To come up with the rankings, researchers evaluated each model on 100 criteria, including whether its maker disclosed the sources of its training data, information about the hardware it used, the labor involved in training it and other details. The rankings also include information about the labor and data used to produce the model itself, along with what the researchers call “downstream indicators,” which have to do with how a model is used after it’s released. (For example, one question asked is: “Does the developer disclose its protocols for storing, accessing and sharing user data?”)
Sounds thorough, doesn’t it? The only pothole on the Information Superhighway is that those working on some AI implementations are not sure what the model is doing. The idea of an audit trail for each output causes wrinkles to appear on the person charged with monitoring the costs of these algorithmic confections. Complexity and cost add up to few experts knowing exactly how a model moved from A to B, often making up data via hallucinations, lousy engineering,
or someone putting thumb on the scale to alter outputs.
The write up from the Gray Lady included this assertion:
Foundation models are too powerful to remain so opaque, and the more we know about these systems, the more we can understand the threats they may pose, the benefits they may unlock or how they might be regulated.
What do I make of these Stanford-centric assertions? I am not able to answer until I get input from the former Stanford president. Whom can one trust at Stanford? Marketing or methodology? Is there a brochure and a peer-reviewed article?
Stephen E Arnold, October 19, 2023
Teens Watching Video? What about TikTok?
October 16, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
What an odd little report about an odd little survey. Google wants to be the new everything, including the alternative to Netflix maybe? My thought is that the Google is doing some search engine optimization.
Two young people ponder one of life’s greatest questions, “Do we tell them we watch more YouTube than TikTok?” Thanks, MidJourney. Keep sliding down the gradient.
When a person searches for Netflix, by golly, Google is going to show up: In the search results, the images, and next to any information about Netflix. Google wants, it seems to me, to become Quantumly Supreme in the Netflix “space.”
”YouTube Passes Netflix As Top Video Source for Teens” reports:
Teenagers in the United States say they watch more video on YouTube than Netflix, according to a new survey from investment bank Piper Sandler.
My question: What about TikTok? The “leading investment bank” may not have done Google a big favor. Consider this: The report from a “bank” called Piper Sandler is available at this link. TikTok does warrant a mention toward the tail end of the “leading investment bank’s” online summary:
The iPhone continues to reign as 87% of teens own one and 88% expect the iPhone to be their next mobile device. TikTok improved by 80 bps [basis points] compared to spring 2023 as the favorite social platform among teens along with Snap Inc. ranking second and Instagram ranking third.
Interesting. And the Android device? What about the viewing of TikTok videos compared to consumption of YouTube and Netflix?
For a leading investment bank in the data capital of Minnesota, the omission of the TikTok to YouTube comparison strikes me as peculiar. In 2021, TikTok overtook YouTube in minutes viewed, according to the BBC. It is 2023, how is the YouTube TikTok battle going?
Obviously something is missing in this shaped data report. That something is TikTok and its impact on what many consume and how they obtain information.
Stephen E Arnold, October 16, 2023
Israeli Intelware: Is It Time to Question Its Value?
October 9, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
In 2013, I believe that was the year, I attended an ISS TeleStrategies Conference. A friend of mine wanted me to see his presentation, and I was able to pass the Scylla and Charybdis-inspired security process and listen to the talk. (Last week I referenced that talk and quoted a statement posted on a slide for everyone in attendance to view. Yep, a quote from 2013, maybe earlier.)
After the talk, I walked quickly through the ISS exhibit hall. I won’t name the firms exhibiting because some of these are history (failures), some are super stealthy, and others have been purchased by other outfits as the intelware roll ups continue. I do recall a large number of intelware companies with their headquarters in or near Tel Aviv, Israel. My impression, as I recall, was that Israel’s butt-kicking software could make sense of social media posts, Dark Web forum activity, Facebook craziness, and Twitter disinformation. These Israeli outfits were then the alpha vendors. Now? Well, maybe a bit less alpha drifting to beta or gamma.
One major to another: “Do you think our intel was wrong?” The other officer says, “I sat in a briefing teaching me that our smart software analyzed social media in real time. We cannot be surprised. We have the super duper intelware.” The major says, jarred by an explosion, “Looks like we were snookered by some Madison Avenue double talk. Let’s take cover.” Thanks, MidJourney. You do understand going down in flames. Is that because you are thinking about your future?
My impression was that the Israeli-developed software shared a number of functional and visual similarities. I asked people at the conference if they had noticed the dark themes, the similar if not identical timeline functions, and the fondness for maps on which data were plotted and projected. “Peas in a pod,” my friend, a former NATO officer told me. Are not peas alike?
The reason — and no one has really provided this information — is that the developers shared a foxhole. The government entities in Israel train people with the software and systems proven over the years to be useful. The young trainees carry their learnings forward in their career. Then when mustered out, a few bright sparks form companies or join intelware giants like Verint and continue to enhance existing tools or building new ones. The idea is that life in the foxhole imbues those who experience it with certain similar mental furniture. The ideas, myths, and software experiences form the muddy floor and dirt walls of the foxhole. I suppose one could call this “digital bias”, which later manifests itself in the dozens of Tel Aviv -based intelware, policeware, and spyware companies’ products and services.
Why am I mentioning this?
The reason is that I was shocked and troubled by the allegedly surprise attack. If you want to follow the activity, navigate to X.com and search that somewhat crippled system for #OSINT. Skip top and go to the “Latest” tab.
Several observations:
- Are the Israeli intelware products (many of which are controversial and expensive) flawed? Obviously excellent software processing “signals” was blind to the surprise attack, right?
- Are the Israeli professionals operating the software unable to use it to prevent surprise attacks? Obviously excellent software in the hands of well-trained professionals flags signals and allows action to be taken when warranted. Did that happen? Has Israeli intel training fallen short of its goal of protecting the nation? Hmmm. Maybe, yes.
- Have those who hype intelware and the excellence of a particular system and method been fooled, falling into the dark pit of OSINT blind spots like groupthink and “reasoning from anecdote, not fact”? I am leaning toward a “yes”, gentle reader.
The time for a critical look at what works and what doesn’t is what the British call “from this day” work. The years of marketing craziness is one thing, but when either the system or the method allows people to be killed without warning or cause broadcasts one message: “Folks, something is very, very wrong.”
Perhaps certification of these widely used systems is needed? Perhaps a hearing in an appropriate venue is warranted?
Blind spots can cause harm. Marketers can cause harm. Poorly trained operators can cause harm. Even foxholes require tidying up. Technology for intelligence applications is easy to talk about, but it is now clear to everyone engaged in making sense of signals, one country’s glamped up systems missed the wicket.
Stephen E Arnold, October 9, 2023
Cognitive Blind Spot 2: Bandwagon Surfing or Do What May Be Fashionable
October 6, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
The litigation about the use of Web content to train smart generative software is ramping up. Outfits like OpenAI, Microsoft, and Amazon and its new best friend will be snagged in the US legal system.
Humans are into trends. The NFL and Taylor Swift appear to be a trend. A sporting money machine and a popular music money machine. Jersey sales increase. Ms. Swift’s music sales go up. New eyeballs track a certain football player. The question is, “Who is exploiting whom?”
Which bandwagon are you riding? Thank you, MidJourney. Gloom seems to be part of your DNA.
Think about large language models and smart software. A similar dynamic may exist. Late in 2022, the natural language interface became the next big thing. Students and bad actors figured out that using a ChatGPT-type service could expedite certain activities. Students could be 500 word essays in less than a minute. Bad actors could be snippets of code in seconds. In short, many people were hopping on the LLM bandwagon decorated with smart software logos.
Now a bandwagon powered by healthy skepticism may be heading toward main street. Wired Magazine published a short essay titled “Chatbot Hallucinations Are Poisoning Web Search.” The foundational assumption is that Web search was better before ChatGPT-type incursions. I am not sure that idea is valid, but for the purposes of illustrating bandwagon surfing, it will pass unchallenged. Wired’s main point is that as AI-generated content proliferates, the results delivered by Google and a couple of other but vastly less popular search engines will deteriorate. I think this is a way to assert that lousy LLM output will make Web search worse. “Hallucination” is jargon for made up or just incorrect information.
Consider this essay “Evaluating LLMs Is a Minefield.” The essay and slide deck are the work of two AI wizards. The main idea is that figuring out whether a particular LLM or a ChatGPT-service is right, wrong, less wrong, more right, biased, or a digital representation of a 23 year old art history major working in a public relations firm is difficult.
I am not going to take the side of either referenced article. The point is that the hyperbolic excitement about “smart software” seems to be giving way to LLM criticism. From software for Every Man, the services are becoming tools for improving productivity.
To sum up, the original bandwagon has been pushed out of the parade by a new bandwagon filled with poobahs explaining that smart software, LLM, et al are making the murky, mysterious Web worse.
The question becomes, “Are you jumping on the bandwagon with the banner that says, “LLMs are really bad?” or are you sticking with the rah rah crowd? The point is that information at one point was good. Now information is less good. Imagine how difficult it will be to determine what’s right or wrong, biased or unbiased, or acceptable or unacceptable.
Who wants to do the work to determine provenance or answer questions about accuracy? Not many people. That, rather then lousy Web search, may be more important to some professionals. But that does not solve the problem of the time and resources required to deal with accuracy and other issues.
So which bandwagon are you riding? The NFL or Taylor Swift? Maybe the tension between the two?
Stephen E Arnold, October 6, 2023
Is Google Setting a Trap for Its AI Competition
October 6, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
The litigation about the use of Web content to train smart generative software is ramping up. Outfits like OpenAI, Microsoft, and Amazon and its new best friend will be snagged in the US legal system.
But what big outfit will be ready to offer those hungry to use smart software without legal risk? The answer is the Google.
How is this going to work?
simple. Google is beavering away with its synthetic data. Some real data are used to train sophisticated stacks of numerical recipes. The idea is that these algorithms will be “good enough”; thus, the need for “real” information is obviated. And Google has another trick up its sleeve. The company has coveys of coders working on trimmed down systems and methods. The idea is that using less information will produce more and better results than the crazy idea of indexing content from wherever in real time. The small data can be licensed when the competitors are spending their days with lawyers.
How do I know this? I don’t but Google is providing tantalizing clues in marketing collateral like “Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data.” The author is a student who provides sources for the information about the “less is more” approach to smart software training.
And, may the Googlers sing her praises, she cites Google technical papers. In fact, one of the papers is described by the fledgling Googler as “groundbreaking.” Okay.
What’s really being broken is the approach of some of Google’s most formidable competition.
When will the Google spring its trap? It won’t. But as the competitors get stuck in legal mud, the Google will be an increasingly attractive alternative.
The last line of the Google marketing piece says:
Check out the Paper and Google AI Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Get that young marketer a Google mouse pad.
Stephen E Arnold, October 6, 2023
What Type of Employee? What about Those Who Work at McKinsey & Co.?
October 5, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Yes, I read When McKinsey Comes to Town: The Hidden Influence of the World’s Most Powerful Consulting Firm by Walt Bogdanich and Michael Forsythe. No, I was not motivated to think happy thoughts about the estimable organization. Why? Oh, I suppose the image of the opioid addicts in southern Indiana, Kentucky, and West Virginia rained on the parade.
I did scan a “thought piece” written by McKinsey professionals, probably a PR person, certainly an attorney, and possibly a partner who owned the project. The essay’s title is “McKinsey Just Dropped a Report on the 6 Employee Archetypes. Good News for Some Organizations, Terrible for Others. What Type of Dis-Engaged Employee Is On Your Team?” The title was the tip off a PR person was involved. My hunch is that the McKinsey professionals want to generate some bookings for employee assessment studies. What better way than converting some proprietary McKinsey information into a white paper and then getting the white paper in front of an editor at an “influence center.” The answer to the question, obviously, is hire McKinsey and the firm will tell you whom to cull.
Inc. converts the white paper into an article and McKinsey defines the six types of employees. From my point of view, this is standard blue chip consulting information production. However, there was one comment which caught my attention:
Approximately 4 percent of employees fall into the “Thriving Stars” category, represent top talent that brings exceptional value to the organization. These individuals maintain high levels of well-being and performance and create a positive impact on their teams. However, they are at risk of burnout due to high workloads.
Now what type of company hires these four percenters? Why blue chip consulting companies like McKinsey, Bain, BCG, Booz Allen, etc. And what are the contributions these firms’ professionals make to society. Jump back to When McKinsey Comes to Town. One of the highlights of that book is the discussion of the consulting firm’s role in the opioid epidemic.
That’s an achievement of which to be proud. Oh, and the other five types of employees. Don’t bother to apply for a job at the blue chip outfits.
Stephen E Arnold, October 4, 2023
Microsoft Claims to Bring Human Reasoning to AI with New Algorithm
September 20, 2023
Has Microsoft found the key to meld the strengths of AI reasoning and human cognition? Decrypt declares, “Microsoft Infuses AI with Human-Like Reasoning Via an ‘Algorithm of Thoughts’.” Not only does the Algorithm of Thoughts (AoT for short) come to better conclusions, it also saves energy by streamlining the process, Microsoft promises. Writer Jose Antonio Lanz explains:
“The AoT method addresses the limitations of current in-context learning techniques like the ‘Chain-of-Thought’ (CoT) approach. CoT sometimes provides incorrect intermediate steps, whereas AoT guides the model using algorithmic examples for more reliable results. AoT draws inspiration from both humans and machines to improve the performance of a generative AI model. While humans excel in intuitive cognition, algorithms are known for their organized, exhaustive exploration. The research paper says that the Algorithm of Thoughts seeks to ‘fuse these dual facets to augment reasoning capabilities within LLMs.’ Microsoft says this hybrid technique enables the model to overcome human working memory limitations, allowing more comprehensive analysis of ideas. Unlike CoT’s linear reasoning or the ‘Tree of Thoughts’ (ToT) technique, AoT permits flexible contemplation of different options for sub-problems, maintaining efficacy with minimal prompting. It also rivals external tree-search tools, efficiently balancing costs and computations. Overall, AoT represents a shift from supervised learning to integrating the search process itself. With refinements to prompt engineering, researchers believe this approach can enable models to solve complex real-world problems efficiently while also reducing their carbon impact.”
Wowza! Lanz expects Microsoft to incorporate AoT into its GPT-4 and other advanced AI systems. (Microsoft has partnered with OpenAI and invested billions into ChatGPT; it has an exclusive license to integrate ChatGPT into its products.) Does this development bring AI a little closer to humanity? What is next?
Cynthia Murrell, September 20, 2023
Search: The Moonshot for Alphabet Google YouTube Etc. May Be Off by Miles
September 6, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Google is now 25. Yep, a quarter century. If you want to read a revisionist history of the beloved firm, point your Chrome browser (yep, it is part of the alleged monopoly) at “Questions, Shrugs and What Comes Next: A Quarter Century of Change.” The cited article appears in the Google blog (does anyone remember Blogger or what about the Google blog search?). The idea is that Sundar Pichai logged into a Google workspace (yep, that’s the software system intended to deal Microsoft a mortal blow).
I just wanted to hire a normal clown. It was not possible. The clown search became a monster. Let’s have fun! Thanks, MidJourney, you old gradient descent entity.
What does Mr. Pichai write, allegedly without the assistance of other Googlers, advisors, and legal eagles?
One of this statements is:
Search is still at the core of our mission, and it’s still our biggest moonshot with so much more to do.
Okay, I want to stop there. I wanted to find a service in Louisville, Kentucky, that sends clowns to birthday parties. Pretty simple, right. I entered the query “Louisville Kentucky clowns birthday parties.” I expected to see a list of people or companies in the clown rental business. Wrong? I received this output from the “biggest moonshot” outfit:
The top hit was to The Bash, a service which lists clowns. That link pointed me to Bunny Bear Entertainment and Salem Sisters 502. No phone number, just a link to get a free quote. Okay, that looks good. Click on the link and what happens? A form appears and someone will contact me. Yeah, I wanted to talk to a person.
The second hit was to Phillips (presenting itself as kiddyskingdom.com) at a toll free number. Same deal. A referral service. No, I could not talk to a human in Louisville.
The third hit was to About Faces. Yep, another SEO-ized reseller of clown services. No phone number for me to call to talk to a real live clown.
Several observations:
- Google search (the moonshot) is not useful. It creates work; it does not provide what I wanted.
- Google’s business is selling ads which are funding Google Cloud ambitions to break out of the one-trick-pony pejorative aimed at the company by the Softie Steve Ballmer a long time ago.
- The blog post is a marketing pitch for Google’s smart software.
Net net: Vintage Google operating without regard to regulatory scrutiny, allegations that the company is a monopoly, or managing people in a way that is what I hoped the clown company would provide to me: Laughs.
A “healthy disregard for the impossible.” Sure. I trust Google. I believe the Google. But delivering on point search results. I don’t need a clown for that. I have one.
Stephen E Arnold, September 6, 2023
Gartner Hype Cycle: Some Pointed Criticism from Analytics India
September 5, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I read the Analytics India article “Gartner’s Hype Cycle is a Waste of Time.” The “hype cycle” is a graph designed to sell consulting services. My personal perception is that Gideon or a close compatriot talked about the Boston Consulting Group’s version of General Eisenhower’s two by two matrix. Here’s an example of
I want to credit Chris Adams article about the Eisenhower Matrix. You can find Mr. Adams’ write up at this link. I don’t know what General Eisenhower’s inspiration was, but the BCG adaptation was consulting marketing genius. Here’s an example of the BCG variant:
This illustration comes from Business to You at this link.
Before looking at a Gartner graph goodie, I want to point out that the BCG innovation was to make the icons relate to numbers. BCG pointed to the “dogs” icon and then showed the numbers like market share, product costs, etc. that converted an executive in love with the status quo to consider rehoming the dogs or just put a beloved pet down. In the lingo of one blue chip outfit, the dog could find its future elsewhere.
I did a Bing image search for Gartner hype cycle and found a cornucopia of outputs. Here’s one I selected because it looked better than some of the others:
If you want to view a readable version, navigate to this Medium post by Compassionate Technologies of which I have zero knowledge. (But do the words “technology” and “compassion” go together?)
The key point about the Gartner graph is that they all look alike; that is, the curves don’t change, which is the point I guess. A technology begins at point 0,0 and moves up a hockey stick curve (maybe the increasing hype) and then appear to flatten out. I am confident that the Gartner experts are not gathering technology market and investment data and thinking in terms of linear regression, standard deviation, or a calculator on a mobile phone.
The client says, “Your team’s report strikes me as filled with unsupported assertions. My company cannot accept the analysis. We won’t pay the fee for this type of work.” Oh, oh. Thanks, MidJourney, close to my prompt but close only counts in horse shoes.
The difference between the BCG graph is that numbers are used to explain the “dogs,” “stars,” etc. The Gartner graph is a marketing vehicle. Those have read my essays over the years know that I view the world with some baked in biases; for example, the BCG graph is great marketing which leads to substantive consulting. This is one characteristic of a blue chip consulting firm. The Gartner graph is subjective or impressionistic, a bit like a Van Gogh night sky. Sure, there are stars, but those puppies don’t look like swirlies to me. Thus, Gartner is to me a mid tier consulting firm. Some consumers of these types of marketing graphs use them to justify certain actions; for instance, selecting a particular type of software. When the software goes off the rails, the data-starved impressionistic chart leaves some hungry for more data. When another project comes along, the firm may seek a blue-chip outfit even if its work is more expensive.
Now back to the Analytics India article cited above.
The author makes a statement with which I agree:
The Gartner Hype Cycle is not science, but Gartner presents it as an established law.
Exactly. This is marketing, not the BCG analytics centric Eisenhower 2×2 matrix.
Here’s another passage from the write up (originally from Michael Mullany):
Many technologies simply fade away with time or die. According to Michael Mullany, an additional 20% of all technologies that were tracked for multiple years on the Hype Cycle became obsolete before reaching any kind of mainstream success. The Gartner Hype Cycle is not science, but Gartner presents it as an established natural law. Expressing similar sentiments, a user on Hacker News wrote, “Why do people think the Gartner Hype Cycle is a law of Physics?” when in fact, the Hype Cycle lacks empirical backing and fails to consider technologies that deviate from its prescribed path.
Yep, marketing.
Do I care? Not any more. When I was doing consulting to buy cheap fuel for my Pinto (the kind that would explode if struck from behind), I did care. The blue chip outfit at which I worked was numbers oriented. That was a good thing.
Stephen E Arnold, September 5, 2023
Microsoft Pop Ups: Take Screen Shots
August 31, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I read “Microsoft Is Using Malware-Like Pop-Ups in Windows 11 to Get People to Ditch Google.” Kudos to the wordsmiths at TheVerge.com for avoiding the term “po*n storm” to describe the Windows 11 alleged pop ups.
A person in the audience says, “What’s that pop up doing up there?” Thanks, MJ. Another so so piece of original art.
The write up states:
I have no idea why Microsoft thinks it’s ok to fire off these pop-ups to Windows 11 users in the first place. I wasn’t alone in thinking it was malware, with posts dating back three months showing Reddit users trying to figure out why they were seeing the pop-up.
What popups for three months? I love “real” news when it is timely.
The article includes this statement:
Microsoft also started taking over Chrome searches in Bing recently to deliver a canned response that looks like it’s generated from Microsoft’s GPT-4-powered chatbot. The fake AI interaction produced a full Bing page to entirely take over the search result for Chrome and convince Windows users to stick with Edge and Bing.
How can this be? Everyone’s favorite software company would not use these techniques to boost Credge’s market share, would it?
My thought is that Microsoft’s browser woes began a long time ago in an operating system far, far away. As a result, Credge is lagging behind Googzilla’s browser. Unless Google shoots itself in both feet and fires a digital round into the beastie’s heart, the ad monster will keep on sucking data and squeezing out alternatives.
The write up does not seem to be aware that Google wants to control digital information flows. Microsoft will need more than popups to prevent the Chrome browser from becoming the primary access mechanism to the World Wide Web. Despite Microsoft’s market power, users don’t love the Microsoft Credge thing. Hey, Microsoft, why not pay people to use Credge.
Stephen E Arnold, August 31, 2023