More Fake Drake and a Google Angle

May 5, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Copyright law was never designed to address algorithms that can flawlessly mimic artists and writers based on what it learns from the Internet. Absent any more relevant litigation, however, it may be up to the courts to resolve this thorny and rapidly escalating issue. And poor Google, possessor of both YouTube and lofty AI ambitions, is stuck between a rock and a hard place. The Verge reports, “AI Drake Just Set an Impossible Legal Trap for Google.”

To make a winding story short, someone used AI to create a song that sounded eerily like Drake and The Weekend and post it in TikTok. From there it made its way to Apple Music, Spotify, and YouTube. While Apple and Spotify could and did pull the track from their platforms right away, user-generated-content platforms TikTok and Google are bound by established takedown processes that rest on copyright law. And new content generated by AI that mimics humans is not protected by copyright. Yet.

The track was eventually removed on TikTok and YouTube based on an unauthorized sample of a producer tag at the beginning. But what if the song were re-released without that snippet? Publishers now assert that training AI on bodies of artists’ work is itself copyright infringement, and a fake Drake (or Taylor Swift or Tim McGraw) song is therefore a derivative work. Sounds logical to me. But for Google, both agreeing and disagreeing pose problems. Writer Nilay Patel explains:

“So now imagine that you are Google, which on the one hand operates YouTube, and on the other hand is racing to build generative AI products like Bard, which is… trained by scraping tons of data from the internet under a permissive interpretation of fair use that will definitely get challenged in a wave of lawsuits. AI Drake comes along, and Universal Music Group, one of the largest labels in the world, releases a strongly worded statement about generative AI and how its streaming partners need to respect its copyrights and artists. What do you do?

*If Google agrees with Universal that AI-generated music is an impermissible derivative work based on the unauthorized copying of training data, and that YouTube should pull down songs that labels flag for sounding like their artists, it undercuts its own fair use argument for Bard and every other generative AI product it makes — it undercuts the future of the company itself.

*If Google disagrees with Universal and says AI-generated music should stay up because merely training an AI with existing works is fair use, it protects its own AI efforts and the future of the company, but probably triggers a bunch of future lawsuits from Universal and potentially other labels, and certainly risks losing access to Universal’s music on YouTube, which puts YouTube at risk.”

Quite the conundrum. And of course, it is not just music. YouTube is bound to face similar issues with movies, TV shows, news, podcasts, and other content. Patel notes creators and their publishers are highly motivated to wage this fight because, for them, it is a fight to the potential death of their industries. Will Google sacrifice the currently lucrative YouTube or its potentially more profitable AI aspirations?

Cynthia Murrell, May 5, 2023

An AI Detector: Programming Cats Chase Digital Mouse

May 4, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

In the cyber security game, good people create smart tools to identify threats before these occur. Sounds great, right? The reality is a bit different. A prolific identifier of threats on Twitter like LockBit’s new play, a resurgence of Cobalt Strike beacons, and Node.js issues. The question I ask is: “Why are bad actors enjoying the successes they do? Isn’t smart cyber security supposed to head these black hat riders off at the pass?”

We have the cat-and-mouse game for security professionals and ne’er do wells.

I thought about cats and mice when I read “AI Detector and ChatGPT Checker Proven Tool: New Release.” The idea is that software can spot content produced by less smart software. I noted this passage:

The newly released version of a plagiarism scanner with a percentage has advanced capabilities to detect content written by AI. The effectiveness of this innovative software confirms 97% verified accuracy.

I interpret this statement to mean that the previous version of the ChatGPT detector failed at detecting smart software generated text. Thus, the most recent version has nailed the problem. Sounds good, maybe sounds great.

I want to point out that the pace of smart software morphing is zipping along. In fact, there are more people tugging at these “zippers”, getting new ideas, and experimenting with ways to extract more useful outputs. The AI bandwagon is like a hot rod. There are people who want to tweak, tune, and customize their vehicles. One can buy a Volkswagen with an electric motor or build a Tesla with an internal combustion engine. stamp out such abnormalities. What’s a Tesla dealer going to do? Oh, right, there are no Tesla dealers.

I urge you to try the PlagiarismCheck.org software. Keep in mind the cat-and-mouse game the group is playing. Like the cyber security outfits, reacting to threats is necessary because there are more innovative bad actors than defense systems know about. The plagiarism checker may work today, but tomorrow?

Stephen E Arnold, May 4, 2023

Google AI Reorganization: Hinton Bails Out

May 2, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I saw a number of pointers to a New York Times’ story about an AI wizard bailing out of the smooth riding Google AI operation. “‘The Godfather of A.I.’ Leaves Google and Warns of Danger Ahead” states that the AI expert “worries it will cause serious harm.” I liked this statement because it displays the Times’s penchant for adding opinions to information provided by an expert. I love psycho-journalism!

Dr. Hinton’s journey from A.I. groundbreaker to doomsayer marks a remarkable moment for the technology industry at perhaps its most important inflection point in decades. Industry leaders believe the new A.I. systems could be as important as the introduction of the web browser in the early 1990s and could lead to breakthroughs in areas ranging from drug research to education. But gnawing at many industry insiders is a fear that they are releasing something dangerous into the wild. Generative A.I. can already be a tool for misinformation. Soon, it could be a risk to jobs. Somewhere down the line, tech’s biggest worriers say, it could be a risk to humanity.

Remember the halcyon days of “objective” Google search results? What about the excitement of sending short messages for free and harmlessly capturing followers with a pithy bon mot? Has the warm flush of Facebook’s ability to build communities among users and predators faded? Each of these looked benign. Entertaining curiosities.

Now smart software is viewed with some skepticism. Gee. It only took a quarter century for people to figure out that flowing information is sometimes good and many times a bit like water blasted from a nozzle at great speed.

I found this comment interesting:

Until last year, he [Hinton] said, Google acted as a “proper steward” for the technology, careful not to release something that might cause harm. But now that Microsoft has augmented its Bing search engine with a chatbot — challenging Google’s core business — Google is racing to deploy the same kind of technology. The tech giants are locked in a competition that might be impossible to stop, Dr. Hinton said. His immediate concern is that the internet will be flooded with false photos, videos and text, and the average person will “not be able to know what is true anymore.”

I wonder if the OSINT cheerleaders have considered that what may be a multi-billion dollar industry could be facing a bit of a challenge. Mixing up Ukrainian field survey tags with Russian targeting devices will be small potatoes if Mr. Hinton is correct.

The photograph of the wizard captures a person who is not a 20 something Googler. The expression seems to suggest a growing awareness of a rework of the Information Superhighway and some other furniture of the modern world.

Stephen E Arnold, May 2, 2023

Google Smart Software: Lawyers to the Rescue

May 2, 2023

The article “Beginning of the End of OpenAI” in Analytics India raised an interesting point about Google’s smart software. The essay suggests that a legal spat over a trademark for “GPT” could allow Google to make a come-from-behind play in the generative software race. I noted this passage:

A lot of product names appear with the term ‘GPT’ in it. Now, if OpenAI manages to get its trademark application decided in favour, all of these applications would have to change their name, and ultimately not look appealing to customers.

Flip this idea to “if Google wins…”, OpenAI could — note “could” — face a fleet of Google legal eagles and the might of Google’s prescient, forward forward, quantumly supreme marketing army.

What about useful products, unbiased methods of generating outputs, and slick technology? Wait. I know the answer. “That stuff is secondary to our new core competency. The outputs of lawyers and marketing specialists.”

Stephen E Arnold May 2, 2023

Digital Dumplings: AI Outputs Must Be Groomed, Trimmed, and Message Aligned

May 1, 2023

Vea4_thumb_thumb_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I read “Beijing Moves to Force AI Bots to Display socialist Core Values.” I am not sure that the write up is spot on, but let’s assume that it is close enough for horseshoes. The main idea is that AI can chat. However, the AI must be steered to that it outputs content displaying “socialist core values.”

The write up states:

Companies will also have to make sure their chatbots create words and pictures that are truthful and respect intellectual property, and will be required to register their algorithms, the software brains behind chatbots, with regulators. The rules are not final, and regulators may continue to modify them, but experts said engineers building AI services in China were already figuring out how to incorporate the edicts into their products.

My reaction is that those who would argue that training smart software plus any post training digital filters will work. Let’s assume that those subject to the edict achieve the objective. What about US smart software whose developers insist that objectivity is the real deal? China’s policy if implemented and delivers, makes it clear that smart software is not objective. Developers can and will use its malleability to achieve their goals.

How about those students who reveal deep secrets on TikTok? Will these individuals be manipulated via smart software informed of the individuals’ hot buttons?

Is that data dumpling a psychographic trigger with a payload different from ground pork, veggies, and spices?

Stephen E Arnold, May 1, 2023

Google Innovates in Smart Software: A Reorganization

April 28, 2023

Vea4_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Someone once told me that it takes six months for staff to adjust to a reorganization. Is this a bit of folklore. Nope, I just think the six month estimate is dead wrong. I think it takes longer, often a year or more to integrate two units of the same company. How do I know? I watched Halliburton take over Nuclear Utility Services. Then I watched Bell + Howell take over the Courier Journal’s database publishing unit. Finally, I have quite direct memories of not being able to find much of anything when we last moved.

Now the Alphabet Google thing is addressing its marketing problem with a reorganization. I learned this by reading “Announcing Google DeepMind.” The write up by a founder of DeepMind says:

Sundar is announcing that DeepMind and the Brain team from Google Research will be joining forces as a single, focused unit called Google DeepMind. Combining our talents and efforts will accelerate our progress towards a world in which AI helps solve the biggest challenges facing humanity…

Not a word about catching up with Microsoft’s Bing ChatGPT marketing, not a peep about the fast cycle integration of orchestration software across discrete ChatGPT-type functions, and not a whisper about why Google is writing about what is to happen.

What’s my take on this Code Red or Red Alert operational status which required the presence of Messrs. Brin and Page?

  1. Google is demonstrating that a reorganization will address the Microsoft ChatGPT marketing. A reorganization and a close partnership among Sundar [Pichai], Jeff Dean, James Manyika, and Demis [Hassabis]? Okay.
  2. Google announced quantum supremacy, its protein folding breakthrough, and the game playing ability of its smart software. Noble achievements, but Microsoft is pushing smart Bing into keyboards. That’s one way to get Android and iPhone users’ attention. Will it work for Microsoft? Probably not, but it is something visible.
  3. Google is simply not reacting. A baby ecosystem is growing up around Midjourney. I learned about unprompt.ai. The service provides a search and point-to-get the prompt service. When I saw this service, I realized that ChatGPT may be morphing in ways that any simple Web search engine could implement. For Google, deploying the service would be trivial. The problem is that reorgs don’t pay much attention outside of the fox hole in which senior management prefers to dwell.

Net net: Google is too big and has too much money to concede. However, the ChatGPT innovation off road vehicle is zipping along. Google is organizing the wizards who will on Google’s glitzy glamping rig. ChatGPT is hitting the rocks and crawling over obstacles. The Google machine is in a scenic observation point with a Pebble Beach-type of view. What’s the hurry? Google is moving… with a reorg.

Stephen E Arnold, April 28, 2023

A Googley Rah Rah for Synthetic Data

April 27, 2023

Vea4_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I want to keep this short. I know from experience that most people don’t think too much about synthetic data. The idea is important, but other concepts are important and no one really cares too much. When was the last time Euler’s Number came up at lunch?

A gaggle of Googlers extoll the virtues of synthetic in a 19 page ArXiv document called “Synthetic Data from Diffusion Models Improves ImageNet Classification.” The main idea is that data derived from “real” data are an expedient way to improve some indexing tasks.

I am not sure that a quote from the paper will do much to elucidate this facet of the generative model world. The paper includes charts, graphs, references to math, footnotes, a few email addresses, some pictures, wonky jargon, and this conclusion:

And we have shown improvements to ImageNet classification accuracy extend to large amounts of generated data, across a range of ResNet and Transformer-based models.

The specific portion of this quote which is quite important in my experience is the segment “across a range of ResNet and Transformer-based models.” Translating to Harrod’s Creek lingo, I think the wizards are saying, “Synthetic data is really good for text too.”

What’s bubbling beneath the surface of this archly-written paper? Here are my answers to this question:

  1. Synthetic data are a heck of a lot cheaper to generate for model training; therefore, embrace “good enough” and move forward. (Think profits and bonuses.)
  2. Synthetic data can be produced and updated more easily that fooling around with “real” data. Assembling training sets, tests, deploying and reprocessing are time sucks. (There is more work to do than humanoids to do it when it comes to training, which is needed frequently for some applications.)
  3. Synthetic datasets can be smaller. Even baby Satan aka Sam Altman is down with synthetic data. Why? Elon could only buy so many nVidia processing units. Thus finding a way to train models with synthetic data works around a supply bottleneck.

My summary of the Googlers’ article is much more brief than the original: Better, faster, cheaper.

You don’t have to pick one. Just believe the Google. Who does not trust the Google? Why not buy synthetic data and ready-to-deploy models for your next AutoGPT product? Google’s approach worked like a champ for online ads. Therefore, Google’s approach will work for your smart software. Trust Google.

Stephen  E Arnold, April 27, 2023

What Smart Software Will Not Know and That May be a Problem

April 26, 2023

This blog post is the work of a real, live dinobaby. No smart software involved.

I read a short item called “Who Owns History? How Remarkable Historical Footage Is Hidden and Monetized.” The main point of the article was to promote a video which makes clear that big companies are locking “extraordinary footage… behind paywalls.” The focus is on images, and I know from conversations with people with whom I worked who managed image rights years ago. The companies are history; for example, BlackStar and Modern Talking Pictures. And there were others.

Images are now a volleyball, and the new spiker on the Big Dog Team is smart software generated images. I have a hunch that individuals and companies will aggregate as many of these as possible. The images will then be subject to the classic “value adding” process and magically become for fee. Image trolls will feast.

I don’t care too much about images. I do think more about textual and tabular content. The rights issue is a big one, but I came at smart software from a different angle. Smart software has to be trained, whether via a traditional human constructed corpus, a fake-o corpus courtesy of the synthetic data wizards, or some shotgun marriage of “self training” and a mash up of other methods.

But what if important information are not available to the smart software? Won’t that smart software be like a student who signs up for Differential Geometry without Algebraic Topology? Lots of effort but that insightful student may not be in gear to keep pace with other students in the class. Is not knowing the equivalent of being uninformed or just dumb?

One of the issues I have with smart software is that some content, which I think is essential to clear thinking, is not available to today’s systems. Let me give one example. In 1963, when I was sophomore at a weird private university, a professor urged me to read the metaphysics text by a person named A. E. Taylor. The college I attended did not have too many of Dr. Taylor’s books. There was a copy of his Aristotle and nothing else. I did some hunting and located a copy of Elements of Metaphysics, a snappy thriller.

However, Dr. Taylor wrote a number of other books. I went looking for these because I assume that the folks training smart data want to make sure the “model” has information about the nature of information and related subjects. Guess what? Project Gutenberg, the Internet Archive, and the online gem Amazon have the Aristotle book and a couple of others. FYI: You can get a copy of A. E. Taylor’s Metaphysics for $3.88, a price illustrating the esteem in which Dr. Taylor’s work is held today.

My team and I ran some queries on the smart software systems to which we have access. We learned that information from Dr. Taylor is a scarce as hen’s teeth. We shifted gears and checked out information generated by the much loved brother of Henry James. More of William James’s books were available at bargain basement prices. A collection of essays was less than $2 on Amazon.

My point is that images are likely to be locked up behind a paywall. However, books which may be important to one’s understanding of useless subjects like ethics, perception, and information are not informing the outputs of the smart software we probed. (Yes, we mean you, gentle Bard, and you too ChatGPT.)

Does the possible omission of these types of content make a difference?

Probably not. Embrace synthetic data. The “old” content is not digitally massaged. Who cares? We are in “good enough” land. It’s like a theme park with a broken rollercoaster and some dicey carnies.,

Stephen E Arnold, April 26, 2023

Google: A PR Special Operation Underway

April 25, 2023

Vea4_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

US television on Sunday, April 16, 2023. Assorted blog posts and articles by Google friends like Inc. Magazine. Now the British Guardian newspaper hops on the bandwagon.

Navigate toGoogle Chief Warns AI Could Be Harmful If Deployed Wrongly.” Let me highlight a couple of statements in the write up and then offer a handful of observations designed intentionally to cause some humanoids indigestion.

The article includes this statement:

Sundar Pichai also called for a global regulatory framework for AI similar to the treaties used to regulate nuclear arms use, as he warned that the competition to produce advances in the technology could lead to concerns about safety being pushed aside.

Also, this gem:

Pichai added that AI could cause harm through its ability to produce disinformation.

And one more:

Pichai admitted that Google did not fully understand how its AI technology produced certain responses.

Enough. I want to shift to the indigestion inducing portion of this short essay.

First, Google is in Code Red. Why? What were the search wizards under the guidance of Sundar and Prabhakar doing for the last year? Obviously not paying attention to the activity of OpenAI. Microsoft was and stole the show at the hoe down in Davos. Now Microsoft has made available a number of smart services designed to surf on its marketing tsunami and provide more reasons for enterprise customers to pay for smart Microsoft software. Neither the Guardian nor Sundar seem willing to talk about the reality of Google finding itself in the position of Alta Vista, Lycos, or WebCrawler in the late 1990s and early 2000s when Google search delivered relevant results. At least Google did until it was inspired by the Yahoo, GoTo, and Overture approach to making cash. Back to the question: Why ignore the fact that Google is in Code Red? Why not ask one half of the Sundar and Prabhakar Comedy Team how they got aced by a non-headliner act at the smart software vaudeville show?

Second, I loved the “could cause harm.” What about the Android malware issue? What about the ads which link to malware in Google search results? What about the monopolization of online advertising and pricing ads beyond the reach of many small businesses? What about the “interesting” videos on YouTube? Google has its eye on the “could” of smart software without paying much attention to the here-and-now downsides of its current business. And disinformation? What is Google doing to scrub that content from its search results. My team identified a distributor of pornography operating in Detroit. That operator’s content can be located with a single Google query. If Google cannot identify porn, how will it flag smart software’s “disinformation”?

Finally, Google for decades has made a big deal of hiring the smartest people in the world. There was a teen whiz kid in Moscow. There was a kid in San Jose with a car service to get him from high school to the Mountain View campus. There is deep mind with its “deep” team of wizards. Now this outfit with more than 100,000 (more or less full time geniuses) does not know how its software works. How will that type of software be managed by the estimable Google? The answer is, “It won’t.” Google’s ability to manage is evident with heart breaking stories about its human relations and personnel actions. There are smart Googlers who think the software is alive. Does this person have company-paid mental health care? There are small businesses like an online automobile site in ruins because a Googler downchecked the site years ago for an unknown reason. The Google is going to manage something well?

My hunch is that Google wants to make sure that it becomes the primary vendor of ready-to-roll training data and microwavable models. The fact that Amazon, Microsoft, and a group of Chinese outfits are on the same information superhighway illustrates one salient fact: The PR tsunami highlights Google’s lack of positive marketing action and the taffy-pull sluggishness of demos that sort of work.

What about the media which ask softball questions and present as substance recommendations that the world agree on AI rules? Perhaps Google should offer to take over the United Nations or form a World Court of AI Technology? Maybe Google should just be allowed to put other AI firms out of business and keep trying to build a monopoly based on software the company doesn’t appear to understand?

The good news is that Sundar did not reprise the Paris demonstration of Bard. That only cost the company a few billion when the smart software displayed its ignorance. That was comedic, and I think these PR special operations are fodder for the spring Sundar and Prabhakar tour of major cities.

The T shirts will not feature a dinosaur (Googzilla, I believe) freezing in a heavy snow storm. The art can be produced using Microsoft Bing’s functions too. And that will be quite convenient if Samsung ditches Google search for Bing and its integrated smart software. To add a bit of spice to Googzilla’s catered lunch is the rumor that Apple may just go Bing. Bye, bye billions, baby, bye bye.

If that happens, Google loses: [a] a pickup truck filled with cash, [b] even more technical credibility, and [c] maybe Googzilla’s left paw and a fang. Can Sundar and Prabhakar get applause when doing one-liners with one or two performers wearing casts and sporting a tooth gap?

Stephen E Arnold, April 25, 2023

AI That Sort of, Kind of Did Not Work: Useful Reminders

April 24, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I read “Epic AI Fails. A List of Failed Machine Learning Projects.” My hunch is that a write up suggesting that smart software may disappoint in some cases is not going to be a popular topics. I can hear the pooh-poohs now: “The examples used older technology.” And “Our system has been engineered to avoid that problem.” And “Our Large Language Model uses synthetic data which improves performance and the value of system outputs.” And “We have developed a meta-layer of AI which integrates multiple systems in order to produce a more useful response.”

Did I omit any promises other than “The check is in the mail” or “Our customer support team will respond to your call immediately, 24×7, and with an engineer, not a smart chatbot because. Humans, you know.”

The main point of the article from Analytics India, an online publication, provides some color on interesting flops; specifically:

  • Amazon’s recruitment system. Think discrimination against females. Amazon’s Rekognition system and its identification of elected officials as criminals. Wait. Maybe those IDs were accurate?
  • Covid 19 models. Moving on.
  • Google and the diabetic retinopathy detection system. The marketing sounded fine. Candy for breakfast? Sure, why not?
  • OpenAI’s Samantha. Not as crazy as Microsoft Tay but in the ballpark.
  • Microsoft Tay. Yeah, famous self instruction in near real time.
  • Sentient Investment AI Hedge Fund. Your retirement savings? There are jobs at Wal-Mart I think.
  • Watson. Wow. Cognitive computing and Jeopardy.

The author takes a less light-hearted approach than I. Useful list with helpful reminders that it is easier to write tweets and marketing collateral than deliver smart software that delivers on sales confections.

Stephen E Arnold, April 24, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta