Great Moments in Smart Software: IBM Watson Gets to Find Its Future Elsewhere Again

June 19, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

The smart software game is a tough one. Whip up some compute, download the models, and go go go. Unfortunately artificial intelligence is artificial and often not actually intelligent. I read an interesting article in Time Magazine (who knew it was still in business?). The story has a clickable title: “McDonald’s Ends Its Test Run of AI Drive-Throughs With IBM.” The juicy word IBM, the big brand McDonald’s, and the pickle on top: IBM.

image

A college student tells the smart software system at a local restaurant that his order was misinterpreted. Thanks, MSFT Copilot. How your “recall” today? What about system security? Oh, that’s too bad.

The write up reports with the glee of a kid getting a happy meal:

McDonald’s automated order taker with IBM received scores of complaints in recent years, for example — with many taking to social media to document the chatbot misunderstanding their orders.

Consequently, the IBM fast food service has been terminated.

Time’s write up included a statement from Big Blue too:

In an initial statement, IBM said that “this technology is proven to have some of the most comprehensive capabilities in the industry, fast and accurate in some of the most demanding conditions," but did not immediately respond to a request for further comment about specifics of potential challenges.

IBM suggested its technology could help fight cancer in Houston a few years ago. How did that work out? That smart software worker had an opportunity to find its future elsewhere. The career trajectory, at first glance, seems to be from medicine to grilling burgers. One might interpret this as an interesting employment trajectory. The path seems to be heading down to Sleepy Town.

What’s the future of the IBM smart software test? The write up points out:

Both IBM and McDonald’s maintained that, while their AI drive-throughs partnership was ending, the two would continue their relationship on other projects. McDonalds said that it still plans to use many of IBM’s products across its global system.

But Ronald McDonald has to be practical. The article adds:

In December, McDonald’s launched a multi-year partnership with Google Cloud. In addition to moving restaurant computations from servers into the cloud, the partnership is also set to apply generative AI “across a number of key business priorities” in restaurants around the world.

Google’s smart software has been snagged in some food controversies too. The firm’s smart system advised some Googlers to use glue to make the cheese topping stick better. Yum.

Several observations seem to be warranted:

  1. Practical and money-saving applications of IBM’s smart software do not have the snap, crackle, and pop of OpenAI’s PR coup with Microsoft in January 2023. Time is writing about IBM, but the case example is not one that makes me crave this particular application. Customers want a sandwich, not something they did not order.
  2. Examples of reliable smart software applications which require spontaneous reaction to people ordering food or asking basic questions are difficult to find. Very narrow applications of smart software do result in positive case examples; for example, in some law enforcement software (what I call policeware), the automatic processes of some vendors’ solutions work well; for example, automatic report generation in the Shadowdragon Horizon system.
  3. Big companies spend money, catch attention, and then have to spend more money to remediate and clean up the negative publicity.

Net net: More small-scale testing and less publicity chasing seem to be two items to add to the menu. And, Watson, keep on trying. Google is.

Stephen E Arnold, June 19, 2024

x

Who Is On First? It Is a Sacrifice Play, Sports Fans

June 19, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

Apologies to Abbott and Costello. Who is on first when it comes to running “search” at Google. I thought it was Prabhakar Raghavan, that master of laughs and one half of the Sundar & Prabhakar Comedy Show. But I was, it seems, once again wrong. “Google’s Head of Search Tells Employees That AI Will Keep Making Absurd Mistakes, but They’re Gonna Keep Pushing It Out” contains several shockers to my worn out dinobaby systems.

image

The comedian tells a joke about AI and then reveals the punch line. “It’s ad money.” Thanks, MSFT Copilot. Good enough.

First, forget Prabhakar, that master of the comedy demonstrations. “Hey, it is only a fact. So what if it is wrong, you user.” The head of search is Liz Reid. I know. You may be asking, “Who?” Ms. Reid has been employed at Google for decades. But don’t fret, Comedy Central fans, Prabhakar is in charge, according to the pooh-bah at The Verge. Whew. That’s a relief.

Second, the crazy outputs from Google’s smart software are nothing to get excited about. The write up reports Ms. Reid said:

“I don’t think we should take away from this that we shouldn’t take risks,” Reid said during the meeting. “We should take them thoughtfully. We should act with urgency. When we find new problems, we should do the extensive testing but we won’t always find everything and that just means that we respond.”

That’s the spirit. A Minimum Viable Product.

Third, Google’s real love is advertising. While this head of search and don’t worry AI dust up is swirling, please, ignore Google’s “new” advertising network. If you must learn about what Google is doing behind the dust cloud of AI, navigate to “Google Is Putting Unskippable In-Stream Ads in Your Free TV Channels.” The AI stuff is interesting, but the Googzilla is definitely interested in creating new video advertising streams. AI, meh. Ads, yeah, let’s go.

The head of search articulates what I would call the “good enough” and Minimum Viable Product attitude. The Absurd Mistakes article reports:

When reached by CNBC, a defensive Google spokesperson said the “vast majority” of AI Overview responses were accurate and that upon its own internal testing, the company found issues on “less than one in every 7 million unique queries on which AI Overviews appeared.”

Is there another character in the wings ready to take over the smart software routine? Sure. Sundar & Prabhakar are busy with the ad play. That will make it to Broadway. AI can open in Pittsburgh or Peoria.

Stephen E Arnold, June 19, 2014

Ah, Google, Great App Screening

June 19, 2024

Doesn’t google review apps before putting them in their online store? If so, apparently not very well. Mashable warns, “In Case You Missed It: Bank Info-Stealing Malware Found in 90+ Android Apps with 5.5M Installs.” Some of these apps capture this sensitive data with the help of an advanced trojan called Anasta. Reporter Cecily Mauran writes:

“As of Thursday [May 30], Google has banned the apps identified in the report, according to BleepingComputer. Anatsa, also known as ‘TeaBot,’ and other malware in the report, are dropper apps that masquerade as PDF and QR code readers, photography, and health and fitness apps. As the outlet reported, the findings demonstrate the ‘high risk of malicious dropper apps slipping through the cracks in Google’s review process.’ Although Anatsa only accounts for around two percent of the most popular malware, it does a lot of damage. It’s known for targeting over 650 financial institutions — and two of its PDF and QR code readers had both amassed over 70,000 downloads at the time the report was published. Once installed as a seemingly legitimate app, Anatsa uses advanced techniques to avoid detection and gain access to banking information. The two apps mentioned in the report were called ‘PDF Reader and File Manager’ by Tsarka Watchfaces and ‘QR Reader and File Manager’ by risovanul. So, they definitely have an innocuous look to unsuspecting Android users.”

The article reports Anasta and other malware was found in these categories: file managers, editors, translators, photography, productivity, and personalization apps. It is possible Google caught all the Anasta-carrying apps, but one should be careful just in case.

Cynthia Murrell, June 19, 2024

DeepMind Is Going to Make Products, Not Science

June 18, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

Crack that Google leadership whip. DeepMind is going to make products. Yes, just like that. I am easily confused. I thought Google consolidated its smart software efforts. I thought Dr. Jeffrey Dean did a lateral arabesque making way for new leadership. The company had new marching orders under the calming light of a Red Alert, hair-on-fire, OpenAI and Microsoft will be the new Big Dogs.

image

From Google DeepMind to greener pastures. Thanks, OpenAI art thing.

Now I learn from “Google’s DeepMind Shifting From Research Powerhouse To AI Product Giant, Redefining Industry Dynamics”:

Alphabet Inc‘s subsidiary Google DeepMind has decided to transition from a research lab to an AI product factory. This move could potentially challenge the company’s long-standing dominance in foundational research… Google DeepMind, has merged its two AI labs to focus on developing commercial services. This strategic change could potentially disrupt the company’s traditional strength in fundamental research

From wonky images of the US founding fathers to weird outputs which appear to be indicative of Google’s smart software and its knowledge of pizza cheese interaction, the company seems to be struggling. To further complicate matters, Google’s management finesse created this interesting round of musical chairs:

…the departure of co-founder Mustafa Suleyman to Microsoft in March adds another layer of complexity to DeepMind’s journey. Suleyman’s move to Microsoft, where he has described his experience as “truly transformational,” indicates the competitive and dynamic nature of the AI industry.

Several observations:

  1. Microsoft seems to be suffering the AI wobblies. The more it tries to stabilize its AI activities, the more unstable the company seems to be
  2. Who is in charge of AI at Google?
  3. Has Google turned off the blinking red and yellow alert lights and operates in what might be called low lumen normalcy?
  4. xx

However, Google’s thrashing may not matter. OpenAI cannot get its system to stay online. Microsoft has a herd of AI organizations to manage and has managed to create a huge PR gaffe with its “smart” Recall feature. Apple deals in “to be” smart products and wants to work with everyone just without paying.

Net net: Is Google representative of the unraveling of the Next Big Thing?

Stephen E Arnold, June 18, 2024

x

x

x

Free AI Round Up with Prices

June 18, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

EWeek (once PCWeek and a big fat Ziff publication) has published what seems to be a mash up of MBA-report writing, a bit of smart software razzle dazzle, and two scoops of Gartner Group-type “insight.” The report is okay, and its best feature is that it is free. Why pay a blue-chip or mid-tier consulting firm to assemble a short monograph? Just navigate to “21 Best Generative AI Chatbots.”

image

A lecturer shocks those in the presentation with a hard truth: Human-generated reports are worse than those produced by a “leading” smart software system. Is this the reason a McKinsey professional told interns, “Prompts are the key to your future.” Thanks, MSFT Copilot. Good enough.

The report consists of:

A table with the “leading” chatbots presented in random order. Forget that alphabetization baloney. Sorting by “leading” chatbot name is so old timey. The table presents these evaluative/informative factors:

  • Best for use case; that is, in the opinion of the “analysts” when one would use a specific chatbot in the opinion of the EWeek “experts” I assume
  • Query limit. This is baffling since recyclers of generative technology are eager to sell a range of special plans
  • Language model. This column is interesting because it makes clear that of the “leading” chatbots 12 of them are anchored in OpenAI’s “solutions”; Claude turns up three times, and Llama twice. A few vendors mention the use of multiple models, but the “report” does not talk about AI layering or the specific ways in which different systems contribute to the “use case” for each system. Did I detect a sameness in the “leading” solutions? Yep.
  • The baffling Chrome “extension.” I think the idea is that the “leading” solution with a Chrome extension runs in the Google browser. Five solutions do run as a Chrome extension. The other 16 don’t.
  • Pricing. Now prices are slippery. My team pays for ChatGPT, but since the big 4o, the service seems to be free. We use a service not on the list, and each time I access the system, the vendor begs — nay, pleads — for more money. One vendor charges $2,500 per month paid annually. Now, that’s a far cry from Bing Chat Enterprise at $5 per month, which is not exactly the full six pack.

The bulk of the report is a subjective score for each service’s feature set, its ease of use, the quality of output (!), and support. What these categories mean is not provided in a definition of terms. Hey, everyone knows about “quality,” right? And support? Have you tried to contact a whiz-bang leading AI vendor? Let me know how that works out? The screenshots vary slightly, but the underlying sameness struck me. Each write up includes what I would call a superficial or softball listing of pros and cons.

The most stunning aspect of the report is the explanation of “how” the EWeek team evaluated these “leading” systems. Gee, what systems were excluded and why would have been helpful in my opinion. Let me quote the explanation of quality:

To determine the output quality generated by the AI chatbot software, we analyzed the accuracy of responses, coherence in conversation flow, and ability to understand and respond appropriately to user inputs. We selected our top solutions based on their ability to produce high-quality and contextually relevant responses consistently.

Okay, how many queries? How were queries analyzed across systems, assuming similar systems received the same queries? Which systems hallucinated or made up information? What queries causes one or more systems to fail? What were the qualifications of those “experts” evaluating the system responses? Ah, so many questions. My hunch is that EWeek just skipped the academic baloney and went straight to running queries, plugging in a guess-ti-mate, and heading to Starbucks? I do hope I am wrong, but I worked at the Ziffer in the good old days of the big fat PCWeek. There was some rigor, but today? Let’s hit the gym?

What is the conclusion for this report about the “leading” chatbot services? Here it is:

Determining the “best” generative AI chatbot software can be subjective, as it largely depends on a business’s specific needs and objectives. Chatbot software is enormously varied and continuously evolving,  and new chatbot entrants may offer innovative features and improvements over existing solutions. The best chatbot for your business will vary based on factors such as industry, use case, budget, desired features, and your own experience with AI. There is no “one size fits all” chatbot solution.

Yep, definitely worth the price of admission.

Stephen E Arnold, June 18, 2024

Palantir: Fear Is Good. Fear Sells.

June 18, 2024

President Eisenhower may not have foreseen AI when he famously warned of the military-industrial complex, but certain software firms certainly fit the bill. One of the most successful, Palantir, is pursuing Madison Avenue type marketing with a message of alarm. The company’s co-founder, Alex Karp, is quoted in the fear-mongering post at right-wing Blaze Media, “U.S. Prepares for War Amid Growing Tensions that China Could Invade Taiwan.”

After several paragraphs of panic over tensions between China and Taiwan, writer Collin Jones briefly admits “It is uncertain if and when the Chinese president will deploy an attack against the small country.” He quickly pivots to the scary AI arms race, intimating Palantir and company can save us as long as we let (fund) them. The post concludes:

“Palantir’s CEO and co-founder Alex Karp said: ‘The way to prevent a war with China is to ramp up not just Palantir, but defense tech startups that produce software-defining weapons systems that scare the living F out of our adversaries.’ Karp noted that the U.S. must stay ahead of its military opponents in the realm of AI. ‘Our adversaries have a long tradition of being not interested in the rule of law, not interested in fairness, not interested in human rights and on the battlefield. It really is going to be us or them.’ Karp noted that the U.S. must stay ahead of its military opponents in the realm of AI. You do not want a world order where our adversaries try to define new norms. It would be very bad for the world, and it would be especially bad for America,’ Karp concluded.”

Wow. But do such scare tactics work? Of course they do. For instance, we learn from DefenseScoop, “Palantir Lands $480M Army Contract for Maven Artificial Intelligence Tech.” That article reports on not one but two Palantir deals: the titular Maven expansion and, we learn:

“The company was recently awarded another AI-related deal by the Army for the next phase of the service’s Tactical Intelligence Targeting Access Node (TITAN) ground station program, which aims to provide soldiers with next-generation data fusion and deep-sensing capabilities via artificial intelligence and other tools. That other transaction agreement was worth $178 million.”

Those are just two recent examples of Palantir’s lucrative government contracts, ones that have not, as of this writing, been added this running tally. It seems the firm has found its winning strategy. Ramping up tensions between world powers is a small price to pay for significant corporate profits, apparently.

Cynthia Murrell, June 18, 2024

The Gray Lady Tap Dances

June 17, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

The collision of myth, double talk, technology, and money produces some fascinating tap dancing. Tip tap tip tap. Tap tap. That’s the sound of the folks involved with explaining that technology is no big deal. Drum roll. Then the coda. Tip tap tip tap. Tap tap tap. It is not money. Tip tap tip tap. tap tap.

I think quite a few business decisions are about money; specifically, getting a bonus or a hefty raise because “efficiency” improves “quality.” One can dance around the dead horse, but at some point that horse needs to be relocated.

image

The “real” Mona Lisa. Can she be enhanced, managed, and be populated with metadata without a human art director? Yep. Thanks, MSFT Copilot. Good enough.

I read “New York Times Union Urges Management to Reconsider 9 Art Department Cuts as Paper Ramps Up AI Tools | Exclusive.” The write up weaves a number of themes together. There is the possibility of management waffling, a common practice these days. Recall, an incident, Microsoft? The ever-present next big thing makes an appearance. Plus, there is the Gray Lady, working hard to maintain its position as the newspaper for for the USA today. (That sounds familiar, doesn’t it?)

The main point of the write up is that the NYT’s art department might lose staff. The culprit is not smart software. Money is not the issue. Quality will not suffer. Yada yada. The write up says:

The Times denies that the reductions are in any way related to the newspaper’s AI initiatives.

And the check is in the mail.

I also noted:

A spokesman for the Times said the affected employees are being offered a buyout, and have nothing to do with the use of AI. “Last month, The Times’s newsroom made the difficult decision to reduce the size of its art production team with workflow changes to make photo toning and color correction work more efficient,” Charlie Stadtlander told TheWrap.”On May 30th, we offered generous voluntary buyouts for 9 employees to accept. These changes involve the adoption of new workflows and the expanded use of industry-standard tools that have been in use for years — they are not related to The Times’s AI efforts.”

Nope. Never. Impossible. Unthinkable.

What is the smart software identified as a staff reducer? It is Claro but that is not the name of the company. The current name of the service is Pixometry, which is a mashup of Claro and Elpical. So what does this controversial smart software do? The firm’s Web site says:

Pixometry is the latest evolution of Claro, the leading automated image enhancement platform for Publishers and Retailers around the globe. Combining exceptional software with outstanding layered AI services, Pixometry delivers a powerful image processing engine capable of creating stunning looking images, highly accurate cut-outs and automatic keywording in seconds. Reducing the demands upon the Photoshop teams, Pixometry integrates seamlessly with production systems and prepares images for use in printed and digital media.

The Pixometry software delivers:

Cloud based automatic image enhancement & visual asset management solutions for publishers & retail business.

Its functions include:

  • Automatic image “correction” because “real” is better than real
  • Automatic cut outs and key wording (I think a cut out is a background remover so a single image can be plucked from a “real” photo
  • Consistent, high quality results. None of that bleary art director eye contact.
  • Multi-channel utilization. The software eliminates telling a Photoshop wizard I need a high-res image for the magazine and a then a 96 spot per inch version for the Web. How long will that take? What? I need the images now.
  • Applied AI image intelligence. Hey, no hallucinations here. This is “real” image enhancement and better than what those Cooper Union space cadets produce when they are not wandering around looking for inspiration or whatever.

Does that sound like reality shaping or deep fake territory? Hmmm. That’s a question none of the hair-on-fire write ups addresses. But if you are a Photoshop  and Lightroom wizard, the software means hasta la vista in my opinion. Smart software may suck at office parties but it does not require vacays, health care (just minor system updates), or unions. Software does not argue, wear political buttons, or sit around staring into space because of a late night at the “library.”

Pretty obscure unless you are a Photoshop wizard. The Pixometry Web site explains that it provides a searchable database of images and what looks like one click enhancement of images. Hey, every image needs a bit of help to be “real”, just like “real” news and “real” management explanations. The Pixometry Web site identifies some organizations as “loving” Pixometry; for example, the star-crossed BBC, News UK, El Mercurio, and the New York Times. Yes, love!

Let’s recap. Most of the reporting about this use of applied smart software gets the name of the system wrong. None of the write ups point out that art director functions in the hands of a latte guzzling professional are not quick, easy, or without numerous glitches. Furthermore, the humans in the “art” department must be managed.

The NYT is, it appears, trying to do the two-step around software that is better, faster, and cheaper than the human powered options. Other observations are:

  1. The fast-talking is not going to change the economic benefit of smart software
  2. The notion of a newspaper fixing up photos underscores that deep fakes have permeated institutions which operate as if it were 1923 skidoo time
  3. The skilled and semi-skilled workers in knowledge industries may taste blood when the titanium snake of AI bites them on the ankle. Some bites will be fatal.

Net net: Being up front may have some benefits. Skip the old soft shoe, please.

Stephen E Arnold, June 17, 2024

Wow, Criticism from Moscow

June 17, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

I read “Edward Snowden Eviscerates OpenAI’s Decision to Put a Former NSA Director on Its Board: This Is a Willful, Calculated Betrayal of the Rights of Every Person on Earth.” The source is the interesting public figure Edward Snowden. He rose to fame by violating his secrecy requirement imposed by the US government on individuals with access to sensitive, classified, or top secret information. He then ended his dalliance with “truth” by relocating to Russia. From that bastion of truth and justice, he gives speeches and works (allegedly) at a foundation. He is a symbol of modern something. I find him a fascinating character, complete with the on-again, off-again glasses and his occasion comments about security. He is an expert on secrets it seems.

image

Thanks, MSFT Copilot.

Fortune Magazine obviously views him as a way to get clicks, sell subscriptions, and cement its position as a source of high-value business information. I am not sure my perception of Fortune is congruent with that statement. Let’s look and see what Mr. Snowden’s “news” is telling Fortune to tell us to cause me to waste a perfectly good Saturday (June 14, 2024) morning writing about an individual who willfully broke the law and decamped to that progressive nation state so believed by its neighbors in Eastern Europe.

Fortune reports:

“Do not ever trust OpenAI or its products,” the NSA employee turned whistleblower wrote on X Friday morning, after the company announced retired U.S. Army Gen. Paul Nakasone’s appointment to the board’s new safety and security committee. “There’s only one reason for appointing [an NSA director] to your board. This is a willful, calculated betrayal of the rights of every person on earth. You have been warned.”

Okay, I am warned. Several observations:

  1. Telegram, allegedly linked in financial and technical ways, to Russia recently began censoring the flow of information from Ukraine into Russia. Does Mr. Snowden have an opinion about that interesting development. Telegram told Tucker Carlson that it embraced freedom. Perhaps OpenAI is simply being pragmatic in the Telegram manner?
  2. Why should Mr. Snowden’s opinion warrant coverage in Fortune Magazine? Oh, sorry. I answered that already. Fortune wants clicks, money, and to be perceived as relevant. News flash: Publishing has changed. Please, tape the memo to your home office wall.
  3. Is Mr. Snowden correct? I am neither hot nor cold when it comes to Sam AI Man, the Big Dog at OpenAI. My thought is that OpenAI might be taking steps to understand how much value the information OpenAI can deliver to the US government once the iPhone magic moves from “to be” to reality. Most Silicon Valley outfits are darned clumsy in their response to warrants. Maybe OpenAI’s access to someone who knows interesting information can be helpful to the company and ultimately to its users who reside in the US?

Since 2013, the “Snowden thing” has created considerable ripples. If one accepts Mr. Snowden’s version of events, he is a hero. As such, shouldn’t he be living in the US, interacting with journalists directly not virtually, and presenting his views to the legal eagles who want to have a chat with him? Mr. Snowden’s response is to live in Moscow. It is okay in the spring and early summer. The rest of the year can be brutal. But there’s always Sochi for a much-needed vacay and the wilds of Siberia for a bit of prison camp exploration.

Moscow has its charms and an outstanding person like Mr. Snowden. Thanks, Fortune, for reminding me how important his ideas and laptop stickers are. I like the “every person on earth.” That will impress people in Latvia.

Stephen E Arnold, June 17, 2024

Hallucinations in the Courtroom: AI Legal Tools Add to Normal Wackiness

June 17, 2024

Law offices are eager to lighten their humans’ workload with generative AI. Perhaps too eager. Stanford University’s HAI reports, “AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries.” Close enough for horseshoes, but for justice? And that statistic is with improved, law-specific software. We learn:

“In one highly-publicized case, a New York lawyer faced sanctions for citing ChatGPT-invented fictional cases in a legal brief; many similar cases have since been reported. And our previous study of general-purpose chatbots found that they hallucinated between 58% and 82% of the time on legal queries, highlighting the risks of incorporating AI into legal practice. In his 2023 annual report on the judiciary, Chief Justice Roberts took note and warned lawyers of hallucinations.”

But that was before tailor-made retrieval-augmented generation tools. The article continues:

“Across all areas of industry, retrieval-augmented generation (RAG) is seen and promoted as the solution for reducing hallucinations in domain-specific contexts. Relying on RAG, leading legal research services have released AI-powered legal research products that they claim ‘avoid’ hallucinations and guarantee ‘hallucination-free’ legal citations. RAG systems promise to deliver more accurate and trustworthy legal information by integrating a language model with a database of legal documents. Yet providers have not provided hard evidence for such claims or even precisely defined ‘hallucination,’ making it difficult to assess their real-world reliability.”

So the Stanford team tested three of the RAG systems for themselves, Lexis+ AI from LexisNexis and Westlaw AI-Assisted Research & Ask Practical Law AI from Thomson Reuters. The authors note they are not singling out LexisNexis or Thomson Reuters for opprobrium. On the contrary, these tools are less opaque than their competition and so more easily examined. They found that these systems are more accurate than the general-purpose models like GPT-4. However, the authors write:

“But even these bespoke legal AI tools still hallucinate an alarming amount of the time: the Lexis+ AI and Ask Practical Law AI systems produced incorrect information more than 17% of the time, while Westlaw’s AI-Assisted Research hallucinated more than 34% of the time.”

These hallucinations come in two flavors. Many responses are flat out wrong. Others are misgrounded: they are correct about the law but cite irrelevant sources. The authors stress this second type of error is more dangerous than it may seem, for it may lure users into a false sense of security about the tool’s accuracy.

The post examines challenges particular to RAG-based legal AI systems and discusses responsible, transparent ways to use them, if one must. In short, it recommends public benchmarking and rigorous evaluations. Will law firms listen?

Cynthia Murrell, June 17, 2024

A Fancy Way of Saying AI May Involve Dragons

June 14, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

The essay “What Apple’s AI Tells Us: Experimental Models” makes clear that pinning down artificial intelligence is proving to be more difficult than some anticipated in January 2023, the day when Google’s Red Alert squawked and many people said, “AI is the silver bullet I want for my innovation cannon.”

image

Image source: https://www.geographyrealm.com/here-be-dragons/

Here’s a sentence I found important in the One Useful Thing essay:

What is worth paying attention to is how all the AI giants are trying many different approaches to see what works.

The write up explains different approach to AI that the author has identified. These are:

  1. Apps
  2. Business models with subscription fees

The essay concludes with a specter “haunting AI.” The write up says:

I do not know if AGI[artificial general intelligence] is achievable, but I know that the mere idea of AGI being possible soon bends everything around it, resulting in wide differences in approach and philosophy in AI implementations.

Today’s smart software environment has an upside other than the money churn the craziness vortices generate:

Having companies take many approaches to AI is likely to lead to faster adoption in the long term. And, as companies experiment, we will learn more about which sets of models are correct.

Several observations are warranted.

First, the confessions of McKinsey’s AI team make it clear that smart outfits may not know what they are doing. The firms just plunge forward and then after months of work recycle the floundering into lessons. Presumably these lessons are “hire McKinsey.” See my write up “What Is McKinsey & Co. Telling Its Clients about AI?”

Second, another approach is to use AI in the hopes that staff costs can be reduced. I think this is the motivation of some AI enthusiasts. PwC (I am not sure if it is a consulting firm, an accounting firm, or some 21st century mutation) fell in lust with OpenAI. Not only did the firm kick OpenAI’s tires, PwC signed up to be what’s called an “enterprise reseller.” A client pays PwC to just make something work. In this case, PwC becomes the equivalent of a fix it shop with a classy address and workers with clean fingernails. The motivation, in my opinion, is cutting staff. “PwC Is Doing Quiet Layoffs. It’s a Brilliant Example of What Not to Do” says:

This is PwC in the U.K., and obviously, they operate under different laws than we do here in the United States. But in case you’re thinking about following this bad example, I asked employment attorney Jon Hyman for advice. He said, "This request would seem to fall under the umbrella of ‘protected concerted activity’ that the NLRB would take issue with. That said, the National Labor Relations Act does not apply to supervisors — defined as one with the authority to make personnel decisions using independent judgment. "Thus," he continues, "whether this specific PwC request runs afoul of the NLRA’s legal protections for employees to engage in protected concerted activity would depend on whether the laid-off employees were ‘supervisors’ under the Act."

I am a simpler person. The quiet layoffs complement the AI initiative. Quiet helps keep staff from making the connection I am suggesting. But consulting firms keep one eye on expenses and the other on partners’ profits. AI is a catalyst, not a technology.

Third, more AI fatigue write ups are appearing. One example is “The AI Fatigue: Are We Getting Tired of Artificial Intelligence?” reports:

Hema Sridhar, Strategic Advisor for Technological Futures at the University of Auckland, says that there is a lot of “noise on the topic” so it is clear that “people are overwhelmed”. “Almost every company is using AI. Pretty much every app that you’re currently using on your phone has recently released some version with some kind of AI-feature or AI-enhanced features,” she adds. “Everyone’s using it and [it’s] going to be part of day-to-day life, so there are going to be some significant improvements in everything from how you search for your own content on your phone, to more improved directions or productivity tools that just fundamentally change the simple things you do every day that are repetitive.”

Let me reference Apple Intelligence to close this write up. Apple did not announce hardware. It talked about “to be” services. Instead of doing the Meta open source thing, the Google wrong answers with historically flawed images, or the MSFT on-again, off-again roll outs — Apple just did “to be.”

My hunch is that Apple is not cautious; its professionals know that AI products and services may be like those old maps which say, “Here be dragons.” Sailing close to the shore  makes sense.

Stephen E Arnold, June 14, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta