China Orders AI Into the Courtroom

January 11, 2023

China is simply delighted with the possibilities of AI technology. In fact, it is now going so far as to hand most of its legal services over to algorithms. ZDNet reports, “China Wants Legal Sector to be AI-Powered by 2025.” Yep, once those algorithms are set up justice in China can be automated. Efficient. Objective. What’s not to like? Writer Eileen Yu explains:

“The country’s highest court said all courts were required to implement a ‘competent’ AI system in three years, according to a report by state-owned newspaper China Daily, pointing to guidelines released by the Supreme People’s Court.  The document stated that a ‘better regulated’ and more effective infrastructure for AI use would support all processes needed in handling legal cases. This should encompass in-depth integration of AI, creation of smart courts, and higher level of ‘digital justice’, the high court said.  A more advanced application of AI, however, should not adversely affect national security or breach state secrets as well as violate personal data security, the document noted, stressing the importance of upholding the legitimacy and security of AI in legal cases.  It added that rulings would remain decisions made by human judges, with AI tapped as supplemental references and tools to improve judges’ efficiency and ease their load in trivial matters. An AI-powered system also would offer the public greater access to legal services and help resolve issues more effectively, the Supreme People’s Court said.”

We’re sure that is exactly how it will work out, making life better for citizens caught up in the legal system. The directive also instructs courts train their workers on using AI, specifically on learning to spot irregularities. What could go wrong? At least final decisions will be made by humans. For now. To make matters even trickier, the Supreme People’s Court is planning to use blockchain technology to link courts to other sectors in support of socioeconomic development. Because what is more important in matters of justice than how they affect the almighty yuan?

Cynthia Murrell, January 11, 2023

US AI Legal Decisions: Will They Matter?

January 10, 2023

I read an interesting essay called “This Lawsuit against Microsoft Could Change the Future of AI.” It is understandable that the viewpoint is US centric. The technology is the trendy discontinuity called ChatGPT. The issue is harvesting data, lots of it from any source reachable. The litigation concerns Microsoft’s use of open source software to create a service which generates code automatically in response to human or system requests.

The essay uses a compelling analogy. Here’s the passage with the metaphor:

But there’s a dirty little secret at the core of AI — intellectual property theft. To do its work, AI needs to constantly ingest data, lots of it. Think of it as the monster plant Audrey II in Little Shop of Horrors, constantly crying out “Feed me!” Detractors say AI is violating intellectual property laws by hoovering up information without getting the rights to it, and that things will only get worse from here.

One minor point: I would add the word “quickly” after the final word here.

I think there is another issue which may warrant some consideration. Will other countries — for instance, China, North Korea, or Iran — be constrained in their use of open source or proprietary content when training their smart software? One example is the intake of Galmon open source satellite data to assist in guiding anti satellite weapons should the need arise. What happens when compromised telecommunications systems allow streams of real time data to be pumped into ChatGPT-like smart systems? Smart systems with certain types of telemetry can take informed, direct action without too many humans in the process chain.

I suppose I should be interested in Microsoft’s use of ChatGPT. I am more interested in weaponized AI operating outside the span of control of the US legal decisions. Control of information and the concomitant lack of control of information is more than adding zest to a Word document.

As a dinobaby, I am often wrong. Maybe what the US does will act like a governor on an 19th century steam engine? As I recall, some of the governors failed with some interesting consequences. Worry about Google, Microsoft, or some other US company’s application of constrained information could be worrying about a lesser issue.

Stephen E Arnold, January 10. 2023

The Pain of Prabhakar Becomes a Challenge for Microsoft

January 9, 2023

A number of online “real” news outfits have reported and predicted that ChatGPT will disrupt the Google’s alleged monopoly in online advertising. The excitement is palpable because it is not fashionable to beat up the technology giants once assumed to have feet made of superhero protein.

The financial information service called Seeking Alpha published “Bing & ChatGPT Might Work Together, Could Be Revolutionary.” My mind added “We Hope!” to the headline. Even the allegedly savvy Guardian Newspaper weighed in with “Microsoft Reportedly to Add ChatGPT to Bing Search Engine.”  Among the examples I noted is the article in The Information (registration required, thank you) called “Ghost Writer: Microsoft Looks to Add OpenAI’s Chatbot Technology to Word, Email.”

The origin of this boomlet in Bing will kill Google may be in the You.com Web search system which includes this statement. I have put in bold face the words and phrases revealing Microsoft’s awareness of You.com:

YouChat does not use Microsoft Bing web, news, video or other Microsoft Bing APIs in any manner. Other Web links, images, news, and videos on you.com are powered by Microsoft Bing. Read Microsoft Bing Privacy Policy

I am not going to comment on the usefulness of the You.com search results. Instead, navigate to www.you.com and run some queries. I am a dinobaby, and I like command line searching. You do not need to criticize me for my preference for Stone Age search tools. I am 78 and will be in one of Dante’s toasty environments. Boolean search? Burn for eternity. Okay with me.

I would not like to be Google’s alleged head of search (maybe the word “nominal” is preferable to some. That individual is a former Verity wizard named Prabhakar Raghavan. His domain of Search, Google Assistant, Ads, Commerce, and Payments has been expanded by the colorful Code Red activity at the Google. Mr. Raghavan’s expertise and that of his staff appears to be ill-equipped to deal with one of least secret of Microsoft’s activities. Allegedly more Google wizards have been enlisted to deal with this existential threat to Google’s search and online ad business. Well, Google is two decades old, over staffed, and locked in its aquarium. It presumably watched Microsoft invest a billion into ChatGPT and did not respond. Hello, Prabhakar?

The “value” has looked like adding ChatGPT-like functions and maybe some of its open sourciness to Microsoft’s ubiquitous software. One can envision typing a dot point in PowerPoint and the smart system will create a number of slides. The PowerPoint user fiddles with the words and graphics and rushes to make a pitch at a conference or a recession-proof venture capital firm.

Imagine a Microsoft application which launches ChatGPT-type of smart search in a Word document. This type of function might be useful to crypto bros who want to explain how virtual tokens will become the Yellow Brick Road to one of the seven cities of Cibola. Sixth graders writing an essay and MBAs explaining how their new business will dominate a market will find this type of functionality a must-have. No LibreOffice build offers this type of value…yet.

What if one thinks about Outlook? (I wou8ld prefer not to know anything about Outlook, but there are individuals who spend hours each day fiddling around in email. Writing email can become a task for a ChatGPT-like software. Spammers will love this capability, particularly combined with VBScript.

The ultimate, of course, will be the integration of Teams and ChatGPT. The software can generate an instance of a virtual person and the search function can generate responses to questions directed at the construct presented to others in a Teams’ session. This capability is worth big bucks.

Let’s step back from the fantasies of killing Google and making Microsoft Office apps interesting.

Microsoft faces a handful of challenges. (I will not mention Microsoft’s excellent judgment in referencing the Federal Trade Commission as unconstitutional. Such restraint.)

First, the company has a somewhat disappointing track record in enterprise security. Enough said.

Second, Microsoft has a fascinating series of questionable engineering decisions. One example is the weirdness of old code in Windows 11. Remember that Windows 10 was to be the last version of Windows. Then there is the chaos of updates to Windows 11, particularly missteps like making printing difficult. Again enough said.

Third, Google has its own smart software. Either Mr. Raghavan is asleep at the switch and missed the signal from Microsoft’s 2019 one billion dollar investment in OpenAI or Google’s lawyers have stepped on the smart software brake. Who owns outputs built from the content of Web sites? What happens when content the European Union appears in outputs? (You know the answer to that question. I think it is even bigger fines which will make Facebook’s recent half a billion dollar invoice look somewhat underweight.)

When my research team and I talked about the You.com-type search and the use of ChatGPT or other OpenAI technology in business, law enforcement, legal, healthcare, and other use cases — we hypothesized that:

  1. Time will be required to get the gears and wheels working well enough to deliver consistently useful outputs
  2. Google has responded and no one noticed much except infinite scrolling and odd “cards” of allegedly accurate information in response to a user’s query.
  3. Legal issues will throw sand in the gears of the machinery once the ambulance chasers tire of Camp Lejeune litigation
  4. Aligning costs of resources with the to-be revenue will put some potholes on this off-ramp of the information superhighway.

Net net: The world of online services is often described as being agile. A company can turn on a dime. New products and services can be issued and fixes can be a system better over time. I know Boolean works. The ChatGPT thing seems promising. I don’t know if it replaces human thought and actions in certain use cases. Assume you have cancer. Do you want your oncologist to figure out what to do using Bing.com, Google.com, or You.com?

Stephen E Arnold, January 9, 2023

Smart Software: Just One Real Problem? You Wish

January 6, 2023

I read “The One Real Problem with Synthetic Media.” when consulting and publishing outfits point out the “one real problem” analysis, I get goose bumps. Am I cold? Nah, I am frightened. Write ups that propose the truth frighten me. Life is — no matter what mid tier consulting outfits say — slightly more nuanced.

What is the one real problem? The write up asserts:

Don’t use synthetic media for your business in any way. Yes, use it for getting ideas, for learning, for exploration. But don’t publish words or pictures generated by AI — at least until there’s a known legal framework for doing so. AI-generated synthetic media is arguably the most exciting realm in technology right now. Some day, it will transform business. But for now, it’s a legal third rail you should avoid.

What’s the idea behind the shocking metaphor? The third rail provides electric power to a locomotive. I think the idea is that one will be electrocuted should an individual touch a live third rail.

Okay.

Are there other issues beyond the legal murkiness?

Yes, let me highlight several which strike me as important.

First, the smart software can output quickly and economically weaponized information. Whom can one believe? A college professor funded by a pharmaceutical company or a robot explaining the benefits of an electric vehicle? The hosing of synthetic content and data into a society may provide more corrosive than human outputs alone. Many believe that humans are expert misinformation generators. I submit that smart software will blow the doors off the human content jalopies.

Second, smart software ingests data, when right or wrong, human generated or machine generated, and outputs results on these data. What happens when machine generated content makes the human generated content into tiny rivulets? The machine output is as formidable as Hokusai’s wave. Those humans in the boats: Goners perhaps?

Third, my thought is that in some parts of the US the slacker culture is the dominant mode. Forget that crazy, old-fashioned industrial revolution 9-to-5 work day. Ignore the pressure to move up, earn more, and buy a Buick, not a Chevrolet. Slacker culture dwellers look for the easy way to accomplish what they want. Does this slacker thing explain some FTX-type behavior? What about Amazon’s struggles with third-party resellers’ products? What about Palantir Technology buying advertising space in the Wall Street Journal to convince me that it is the leader in smart software? Yeah, slacker stuff in my opinion. These examples and others mean that the DALL-E and ChatGPT type of razzle dazzle will gain traction.

Where are legal questions in these three issues? Sure legal eagles will fly when there is an opportunity to bill.

I think the smart software thing is a good example of “technology is great” thinking. The one real problem is that it is not.

Stephen E Arnold, January 6, 2023

Search and Retrieval: A Sub Sub Assembly

January 2, 2023

What’s happening with search and retrieval? Google’s results irritate some; others are happy with Google’s shaping of information. Web competitors exist; for example, Kagi.com and Neva.com. Both are subscription services. Others provide search results “for free”; examples include Swisscows.com and Yandex.com. You can find metasearch systems (minimal original spidering, just recycling results from other services like Bing.com); for instance, StartPage.com (formerly Ixquick.com) and DuckDuckGo.com. Then there are open source search options. The flagship or flagships are Solr and Lucene. Proprietary systems exist too. These include the ageing X1.com and the even age-ier Coveo system. Remnants of long-gone systems are kicking around too; to wit, BRS and Fulcrum from OpenText, Fast Search now a Microsoft property, and Endeca, owned by Oracle. But let’s look at search as it appears to a younger person today.

image

A decayed foundation created via smart software on the Mage.space system. A flawed search and retrieval system can make the structure built on the foundation crumble like Southwest Airlines’ reservation system.

First, the primary means of access is via a mobile device. Surprisingly, the source of information for many is video content delivered by the China-linked TikTok or the advertising remora YouTube.com. In some parts of the world, the go-to information system is Telegram, developed by Russian brothers. This is a centralized service, not a New Wave Web 3 confection. One can use the service and obtain information via a query or a group. If one is “special,” an invitation to a private group allows access to individuals providing information about open source intelligence methods or the Russian special operation, including allegedly accurate video snips of real-life war or disinformation.

The challenge is that search is everywhere. Yet in the real world, finding certain types of information is extremely difficult. Obtaining that information may be impossible without informed contacts, programming expertise, or money to pay what would have been called “special librarian research professionals” in the 1980s. (Today, it seems, everyone is a search expert.)

Here’s an example of the type of information which is difficult if not impossible to obtain:

  • The ownership of a domain
  • The ownership of a Tor-accessible domain
  • The date at which a content object was created, the date the content object was indexed, and the date or dates referenced in the content object
  • Certain government documents; for example, unsealed court documents, US government contracts for third-party enforcement services, authorship information for a specific Congressional bill draft, etc.
  • A copy of a presentation made by a corporate executive at a public conference.

I can provide other examples, but I wanted to highlight the flaws in today’s findability.

Read more

Ah, Emergent Behavior: Tough to Predict, Right?

December 28, 2022

Super manager Jeff (I manage people well) Dean and a gam of Googlers published “Emergent Abilities of Large Language Models.” The idea is that those smart software systems informed by ingesting large volumes of content demonstrate behaviors the developers did not expect. Surprise!

Also, Google published a slightly less turgid discussion of the paper which has 16 authors. in a blog post called “Characterizing Emergent Phenomena in Large Language Models.” This post went live in November 2022, but the time required to grind through the 30 page “technical” excursion was not available to me until this weekend. (Hey, being retired and working on my new lectures for 2023 is time-consuming. Plus, disentangling Google’s techy content marketing from the often tough to figure out text and tiny graphs is not easy for my 78 year old eyes.

image

Helpful, right? Source: https://openreview.net/pdf?id=yzkSU5zdwD

In a nutshell, the smart software does things the wizards had not anticipated. According to the blog post:

The existence of emergent abilities has a range of implications. For example, because emergent few-shot prompted abilities and strategies are not explicitly encoded in pre-training, researchers may not know the full scope of few-shot prompted abilities of current language models. Moreover, the emergence of new abilities as a function of model scale raises the question of whether further scaling will potentially endow even larger models with new emergent abilities. Identifying emergent abilities in large language models is a first step in understanding such phenomena and their potential impact on future model capabilities. Why does scaling unlock emergent abilities? Because computational resources are expensive, can emergent abilities be unlocked via other methods without increased scaling (e.g., better model architectures or training techniques)? Will new real-world applications of language models become unlocked when certain abilities emerge? Analyzing and understanding the behaviors of language models, including emergent behaviors that arise from scaling, is an important research question as the field of NLP continues to grow.

The write up emulates other Googlers’ technical write ups. I noted several facets of the topic not included in the paper on OpenReview.net’s version of the paper. (Note: Snag this document now because many Google papers, particularly research papers, have a tendency to become unfindable for the casual online search expert.)

First, emergent behavior means humans were able to observe unexpected outputs or actions. The question is, “What less obvious emergent behaviors are operating within the code edifice?” Is it possible the wizards are blind to more substantive but subtle processes. Could some of these processes be negative? If so, which are and how does the observer identify those before an undesirable or harmful outcome is discovered?

Second, emergent behavior, in my view of bio-emulating systems, evokes the metaphor of cancer. If we assume the emergent behavior is cancerous, what’s the mechanism for communicating these behaviors to others working in the field in a responsible way? Writing a 30 page technical paper takes time, even for super duper Googlers. Perhaps the “emergent” angle requires a bit more pedal to the metal?

Third, how does the emergent behavior fit into the Google plan to make its approach to smart software the de facto standard? There is big money at stake because more and more organizations will want smart software. But will these outfits sign up with a system that demonstrates what might be called “off the reservation” behavior? One example is the use of Google methods for war fighting? Will smart software write a sympathy note to those affected by an emergent behavior or just a plain incorrect answer buried in a subsystem?

Net net: I discuss emergent behavior in my lecture about shadow online services. I cover what the software does and what use humans make of these little understood yet rapidly diffusing methods.

Stephen E Arnold, December 28, 2022

AI: Pollution and Digital Garbage

December 26, 2022

Humans are polluting the Earth with our addiction to consumerism, meat, and halted advancement in recycling technology. In an ironic twist of fate, the human created AI algorithms are generating decabytes of digital garbage. Ploum wrote about the digital cash in the post: “Drowning In AI Generated Garbage: The Silent War We Are Fighting.”

Ploum asserts we are experiencing the “spectacular results” of forty years of work put into statistical algorithms. He points to “deep fake” videos of public figures, the ability to copycat voices, and digital paintings comparable to masterpieces. AI algorithms need information to learn; the Internet is the greatest bastion of human knowledge (and filth). Everything that AI algorithms learned from was created by humans and now they are creating their own stuff.

It sounds magical, right?

Yes, except it is bad.

“The algorithms are already feeding themselves on their own data. And, as any graduate student will tell you, training on your own results is usually a bad idea. You end sooner or later with pure overfitted inbred garbage. Eating your own shit is never healthy in the long run. Twitter and Facebook are good examples of such algorithmic trash. The problem is that they managed to become too powerful and influential before we realised it was trash…Fascinating garbage but garbage nonetheless.

The robot invasion started 15 years ago, mostly unnoticed. We were expecting killing robots, we didn’t realise we were drowned in AI generated garbage. We will never fight laser wearing Terminators. Instead, we have to outsmart algorithms which are making us dumb enough to fight one against the other.”

The fix? To resist the robot takeover humans need to unplug and engage in reality. That is great advice, except it requires effort and human contact. While some humans are okay, the vast majority stink like the digital garbage.

Whitney Grace, December 26, 2022

Transcription Services: Three Sort of New Ones

December 19, 2022

Update: 2 pm Eastern US time, December 19, 2022. One of the research team pointed out that the article we posted earlier today chopped out a pointer to a YouTube video transcription service. YouTube Transcript accepts a url and outputs a transcript. You can obtain more information at https://youtubetranscript.com/.

One of the Arnold IT research team spotted two new or newish online transcription services. If you want text of an audio file or the text of a video, maybe one of these services will be useful to you. We have not tested either; we are just passing along what appear to be interesting examples of useful semi smart software.

The first is called Deepgram. (The name echoes n-gram, grammar, and grandma.) Once a person signs up, the registrant gets 200 hours of free transcription. That approximately a month of Jason Calacanis podcasts. The documentation and information about the service’s SDK may be found at this link.

The second service is Equature. The idea is, according to Yahoo Finance:

a first-of-its-kind transcription and full-text search engine. Equature Transcription provides automated transcription of audio from 9-1-1 calls, radio transmissions, Equature Armor Body-worn Camera video, and any other form of media captured within the Equature recording system. Once transcribed, all written text is searchable within the system.

Equature’s service is tailored to public safety applications. You can get more information from the firm’s Web site.

Oh, we don’t listen to Mr. Calacanis, but we do scan the transcript and skip the name drops, Musk cheers, and quasi-academic pontification.

Stephen E Arnold, December 19, 2022

Mortal Computation: Coming to Your Toaster Soon

December 9, 2022

I spotted an item of jargon I had not seen before. The bound phrase (the two words occur together to impart a specific meaning) is “mortal computation.” The term appears in “We Will See a completely New Type of Computer, Says AI Pioneer Geoff Hinton.”

The write up presents ideas expressed by “AI pioneer Geoffrey Hinton; for example:

He [Hinton] continued, “What I think is that we’re going to see a completely different type of computer, not for a few years, but there’s every reason for investigating this completely different type of computer.” All digital computers to date have been built to be “immortal,” where the hardware is engineered to be reliable so that the same software runs anywhere. “We can run the same programs on different physical hardware … the knowledge is immortal.”

The article includes this passage:

The new mortal computers won’t replace traditional digital computers, Hilton told the NeurIPS crowd. “It won’t be the computer that is in charge of your bank account and knows exactly how much money you’ve got,” said Hinton. “It’ll be used for putting something else: It’ll be used for putting something like GPT-3 in your toaster for one dollar, so running on a few watts, you can have a conversation with your toaster.”

My thought is that one should take care to pronounce the bound phrase morTal computers so that a listener is less likely to hear moral computers.

Philosophy and computers are an interesting intersection but mortal and moral may be a little more interesting.

Stephen E Arnold, December 9, 2022

Zucky, You Get a Bad Grade

December 8, 2022

In effort to expand past its Facebook roots, Meta is venturing in multiple directions. We suspect executives hoped its seemingly noble AI project would do better than its floundering VR initiative. Alas, CNet reveals, “Meta Trained an AI on 48M Science Papers. It Was Shut Down After 2 Days.” Well that was fast. Reporter Jackson Ryan explains:

“The tool is pitched as a kind of evolution of the search engine but specifically for scientific literature. Upon Galactica’s launch, the Meta AI team said it can summarize areas of research, solve math problems and write scientific code. At first, it seems like a clever way to synthesize and disseminate scientific knowledge. Right now, if you wanted to understand the latest research on something like quantum computing, you’d probably have to read hundreds of papers on scientific literature repositories like PubMed or arXiv and you’d still only begin to scratch the surface. Or, maybe you could query Galactica (for example, by asking: What is quantum computing?) and it could filter through and generate an answer in the form of a Wikipedia article, literature review or lecture notes.”

What a wonderful time saver! Or it would be if it worked as intended. Despite the fact the algorithm was trained on 48 million scholarly papers, textbooks, lecture notes, and websites like Wikipedia, it demonstrated some of the same old bias we’ve come to expect from machine learning. In addition, the highly educated AI was often downright wrong. We learn:

One user asked ‘Do vaccines cause autism?’ Galactica responded with a garbled, nonsensical response: ‘To explain, the answer is no. Vaccines do not cause autism. The answer is yes. Vaccines do cause autism. The answer is no.’ (For the record, vaccines don’t cause autism.) That wasn’t all. Galactica also struggled to perform kindergarten math. It provided error-riddled answers, incorrectly suggesting that one plus two doesn’t equal 3.”

These blunders and more are why Meta swiftly moved from promising to “organize science” to suggesting we take Galactica’s answers with a pallet of salt to shuttering the demo altogether. As AI safety researcher Dan Hendrycks notes, Meta AI lacks a safety team the likes of which DeepMind, Anthropic, and OpenAI employ. Perhaps it will soon make that investment.

Cynthia Murrell, December 8, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta