Data Thirst? Guess Who Can Help?

April 17, 2024

As large language models approach the limit of freely available data on the Internet, companies are eyeing sources supposedly protected by copyrights and user agreements. PCMag reports, “Google Let OpenAI Scrape YouTube Data Because Google Was Doing It Too.” It seems Google would rather double down on violations than be hypocritical. Writer Emily Price tells us:

“OpenAI made headlines recently after its CTO couldn’t say definitively whether the company had trained its Sora video generator on YouTube data, but it looks like most of the tech giants—OpenAI, Google, and Meta—have dabbled in potentially unauthorized data scraping, or at least seriously considered it. As the New York Times reports, OpenAI transcribed than a million hours of YouTube videos using its Whisper technology in order to train its GPT-4 AI model. But Google, which owns YouTube, did the same, potentially violating its creators’ copyrights, so it didn’t go after OpenAI. In an interview with Bloomberg this week, YouTube CEO Neal Mohan said the company’s terms of service ‘does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service.’ But when pressed on whether YouTube data was scraped by OpenAI, Mohan was evasive. ‘I have seen reports that it may or may not have been used. I have no information myself,’ he said.”

How silly to think the CEO would have any information. Besides stealing from YouTube content creators, companies are exploring other ways to pierce untapped sources of data. According to the Times article cited above, Meta considered buying Simon & Schuster to unlock all its published works. We are sure authors would have been thrilled. Meta executives also considered scraping any protected data it could find and hoping no one would notice. If caught, we suspect they would consider any fees a small price to pay.

The same article notes Google changed its terms of service so it could train its AI on Google Maps reviews and public Google Docs. See, the company can play by the rules, as long as it remembers to change them first. Preferably, as it did here, over a holiday weekend.

Cynthia Murrell, April 17, 2024

A Less Crazy View of AI: From Kathmandu via Tufts University

April 16, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I try to look for interesting write ups from numerous places. Some in Kentucky (well, not really) and others in farther flung locations like Kathmandu. I read “The boring truth about AI.” The article was not boring in my opinion. The author (Amar Bhidé) presented what seemed like a non-crazy, hyperbole-free discussion of smart software. I am not sure how many people in Greenspring, Kentucky, read the Khatmandu Post, but I am not sure how many people in Greenspring, Kentucky, can read.

image

Rah rah. Thanks, MSFT Copilot, you have the hands-on expertise to prove that the New York City chatbot is just the best system when it comes to providing information of a legal nature that is dead wrong. Rah rah.

What’s the Tufts University business professor say? Let’s take a look at several statements in the article.

First, I circled this passage:

As economic historian Nathan Rosenberg and many others have shown, transformative technologies do not suddenly appear out of the blue. Instead, meaningful advances require discovering and gradually overcoming many unanticipated problems.

Second, I put a blue check mark next to this segment:

Unlike the Manhattan Project, which proceeded at breakneck speed, AI developers have been at work for more than seven decades, quietly inserting AI into everything from digital cameras and scanners to smartphones, automatic braking and fuel-injection systems in cars, special effects in movies, Google searches, digital communications, and social-media platforms. And, as with other technological advances, AI has long been put to military and criminal uses. Yet AI advances have been gradual and uncertain.

The author references IBM’s outstanding Watson system. I think that’s part of the gradual and uncertain in the hands of Big Blue’s marketing professionals.

Finally, I drew a happy face next to this:

Perhaps LLM chatbots can increase profits by providing cheap if maddening, customer service. Someday, a breakthrough may dramatically increase the technology’s useful scope. For now, though, these oft-mendacious talking horses warrant neither euphoria nor panic about “existential risks to humanity.” Best keep calm and let the traditional decentralised evolution of technology, laws, and regulations carry on.

I would suggest that a more pragmatic and less frenetic approach to smart software makes more sense than the wild and crazy information zapped from podcasts and conference presentations.

Stephen E Arnold, April 16, 2024

Google Cracks Infinity Which Overshadows Quantum Supremacy Maybe?

April 16, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The AI wars are in overdrive. Google’s high school rhetoric is in another dimension. Do you remember quantum supremacy? No, that’s okay, but it makes it clear that the Google is the leader in quantum computing. When will that come to the Pixel mobile device? Now Google’s wizards, infused with the juices of a rampant high school science club member (note the words rampant and member, please. They are intentional.)

An article in Analytics India (now my favorite cheerleading reference tool) uses this headline: “Google Demonstrates Method to Scale Language Model to Infinitely Long Inputs.” Imagine a demonstration of infinity using infinite inputs. I thought the smart software outfits were struggling to obtain enough content to train their models. Now Google’s wizards can handle “infinite” inputs. If one demonstrates infinity, how long will that take? Is one possible answer, “An infinite amount of time.”

Wow.

The write up says:

This modification to the Transformer attention layer supports continual pre-training and fine-tuning, facilitating the natural extension of existing LLMs to process infinitely long contexts.

Even more impressive is the diagram of the “infinite” method. I assure you that it won’t take an infinite amount of time to understand the diagram:

image

See, infinity may have contributed to Cantor’s mental issues, but the savvy Googlers have sidestepped that problem. Nifty.

But the write up suggests that “infinite” like many Google superlatives has some boundaries; for instance:

The approach scales naturally to handle million-length input sequences and outperforms baselines on long-context language modelling benchmarks and book summarization tasks. The 1B model, fine-tuned on up to 5K sequence length passkey instances, successfully solved the 1M length problem.

Google is trying very hard to match Microsoft’s marketing coup which caused the Google Red Alert. Even high schoolers can be frazzled by flashing lights, urgent management edicts, and the need to be perceived as a leader in something other than online advertising. The science club at Google will keep trying. Next up quantumly infinite. Yeah.

Stephen E Arnold, April 16, 2024

Another Cultural Milestone for Social Media

April 16, 2024

Well this is an interesting report. PsyPost reports, “Researchers Uncover ‘Pornification’ Trend Among Female Streamers on Twitch.” Authored by Kristel Anciones-Anguita and Mirian Checa-Romero, the study was published in the  Humanities and Social Sciences Communications journal. The team analyzed clips from 1,920 livestreams on Twitch.tv, a platform with a global daily viewership of 3 million. They found women streamers sexualize their presentations much more often, and more intensely, than the men. Also, the number of sexy streams depends on the category. Not surprisingly, broadcasters in categories like ASMR and “Pools, Hot Tubs & Beaches” are more self-sexualized than, say, gamer girls. Shocking, we know.

The findings are of interest because Twitch broadcasters formulate their own images, as opposed to performers on traditional media. There is a longstanding debate, even among feminists, whether using sex to sell oneself is empowering or oppressive. Or maybe both. Writer Eric W. Dolan notes:

“Studies on traditional media (such as TV and movies) have extensively documented the sexualization of women and its consequences. However, the interactive and user-driven nature of new digital platforms like Twitch.tv presents new dynamics that warrant exploration, especially as they become integral to daily entertainment and social interaction. … This autonomy raises questions about the factors driving self-sexualization, including societal pressures, the pursuit of popularity, and the platform’s economic incentives.”

Or maybe women are making fully informed choices and framing them as victims of outside pressure is condescending. Just a thought. The issue gets more murky when the subjects, or their audiences, are underage. The write-up observes:

“These patterns of self-sexualization also have potential implications for the shaping of audience attitudes towards gender and sexuality. … ‘Our long-term goals for this line of research include deepening our understanding of how online sexualized culture affects adolescent girls and boys and how we can work to create more inclusive and healthy online communities,’ Anciones-Anguita said. ‘This study is just the beginning, and there is much more to explore in terms of the pornification of culture and its psychological impact on users.”

Indeed there is. See the article for more details on what the study considered “sexualization” and what it found.

Cynthia Murrell, April 16, 2024

An Interesting Prediction about Mobile Phones

April 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I have hated telephone calls for decades: Intrusive, phone tag baloney, crappy voice mail systems, and wacko dialing codes when in a country in which taxis are donkeys. No thanks. But the mobile phone revolution is here. Sure, I have a mobile phone. Plus, I have a Chinese job just to monitor data flows. And I have an iPhone which I cart around to LE trade shows to see if a vendor can reveal the bogus data we put on the device.

4 14 mobile implant

What’s the future? An implant? Yeah, that sounds like a Singularity thing or a big ear ring, a wire, and a battery pack which can power a pacemaker, an artificial kidney, and an AI processing unit. What about a device that is smart and replaces the metal candy bar, which has not manifested innovations in the last five or six years? I don’t care about a phone which is capable of producing TikToks.

The future of the phone has been revealed in the online publication Phone Arena. “AI Will Kill the Smartphone As We Know It. Here’s Why!” explains:

I know the idea may sound very radical at first glance, but if we look with a cold, objective eye at where the world is going with the software as a service model, it suddenly starts to sound less radical.

The idea is that the candy bar device will become a key fob, a decorative pin (maybe a big decorative pin), a medallion on a thick gold chain (rizz, right?), or maybe a shrinkflation candy bar?

My own sense of the future is skewed because I am a dinobaby. I have a cheapo credit card which is a semi-reliable touch-and-tap gizmo. Why not use a credit card form factor with a small screen (obviously unreadable by a dinobaby but designers don’t care about dinobabies in my experience). With ambient functionality, the card “just connects” and one can air talk and read answers on the unreadable screen. Alternatively, one’s wireless ear buds can handle audio duties.

Net net: The AI function is interesting. However, other technical functions will have to become available. Until then, keep upgrading those mobile phones. No, I won’t answer. No, I won’t click on texts from numbers I don’t have on a white list. No, I won’t read social media baloney. That’s a lot of no’s, isn’t it? Too bad. When you are a dinobaby, you will understand.

Stephen E Arnold, April 15, 2024

Taming AI Requires a Combo of AskJeeves and Watson Methods

April 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I spotted a short item called “A Faster, Better Way to Prevent an AI Chatbot from Giving Toxic Responses.” The operative words from my point of view are “faster” and “better.” The write up reports (with a serious tone, of course):

Teams of human testers write prompts aimed at triggering unsafe or toxic text from the model being tested. These prompts are used to teach the chatbot to avoid such responses.

Yep, AskJeeves created rules. As long as the users of the system asked a question for which there was a rule, the helpful servant worked; for example, What’s the weather in San Francisco? However, ask a question for which there was no rule, what happens? The search engine reality falls behind the marketing juice and gets shopped until a less magical version appears as Ask.com. And then there is IBM Watson. That system endeared itself to groups of physicians who were invited to answer IBM “experts’” questions about cancer treatments. I heard when Watson was in full medical-revolution mode that some docs in a certain Manhattan hospital used dirty words to express his view about the Watson method. Rumor or actual factual? I don’t know, but involving humans in making software smart can be fraught with challenges: Managerial and financial to name but two.

image

The write up says:

Researchers from Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab used machine learning to improve red-teaming. They developed a technique to train a red-team large language model to automatically generate diverse prompts that trigger a wider range of undesirable responses from the chatbot being tested. They do this by teaching the red-team model to be curious when it writes prompts, and to focus on novel prompts that evoke toxic responses from the target model. The technique outperformed human testers and other machine-learning approaches by generating more distinct prompts that elicited increasingly toxic responses. Not only does their method significantly improve the coverage of inputs being tested compared to other automated methods, but it can also draw out toxic responses from a chatbot that had safeguards built into it by human experts.

How much improvement? Does the training stick or does it demonstrate that charming “Bayesian drift” which allows the probabilities to go walk-about, nibble some magic mushrooms, and generate fantastical answers? How long did the process take? Was it iterative? So many questions, and so few answers.

But for this group of AI wizards, the future is curiosity-driven red-teaming. Presumably the smart software will not get lost, suffer heat stroke, and hallucinate. No toxicity, please.

Stephen E Arnold, April 15, 2024

Publishers Not Thrilled with Internet Archive

April 15, 2024

So you are saving the library of an island? So what?

The non-profit Internet Archive (IA) preserves digital history. It also archives a wealth of digital media, including a large number of books, for the public to freely access. Certain major publishers are trying to stop the organization from sharing their books. These firms just scored a win in a New York federal court. However, the IA is not giving up. In its defense, the organization has pointed to the opinions of authors and copyright scholars. Now, Hachette, HarperCollins, John Wiley, and Penguin Random House counter with their own roster of experts. TorrentFreak reports, “Publishers Secure Widespread Support in Landmark Copyright Battle with Internet Archive.” Journalist Ernesto Van der Sar writes:

“The importance of this legal battle is illustrated by the large number of amicus briefs that are filed by third parties. Previously, IA received support from copyright scholars and the Authors Alliance, among others. A few days ago, another round of amicus came in at the Court of Appeals, this time to back the publishers who filed their reply last week. In more than a handful of filings, prominent individuals and organizations urge the Appeals Court not to reverse the district court ruling, arguing that this would severely hamper the interests of copyright holders. The briefs include positions from industry groups such as the MPA, RIAA, IFPI, Copyright Alliance, the Authors Guild, various writers unions, and many others. Legal scholars, professors, and former government officials, also chimed in.”

See the article for more details on those chimes. A couple points to highlight: First, AI is a part of this because of course it is. Several trade groups argue IA makes high-quality texts too readily available for LLMs to train upon, posing an “artificial intelligence” threat. Also of interest are the opinions that differentiate this case from the Google Books precedent. We learn:

“[Scholars of relevant laws] stress that IA’s practice should not be seen as ‘transformative’ fair use, arguing that the library offers a ‘substitution’ for books that are legally offered by the publishers. This sets the case apart from current legal precedents including the Google Books case, where Google’s mass use of copyrighted books was deemed fair use. ‘IA’s exploitation of copyrighted books is thus the polar opposite of the copying that was found to be transformative in Google Books and HathiTrust. IA offers no “utility-expanding” searchable database to its subscribers.’”

Ah, the devilish details. Will these amicus-rich publishers prevail, or will the decision be overturned on IA’s appeal?

Cynthia Murrell, April 15, 2024

Is This Incident the Price of Marketing: A Lesson for Specialized Software Companies

April 12, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

A comparatively small number of firms develop software an provide specialized services to analysts, law enforcement, and intelligence entities. When I started work at a nuclear consulting company, these firms were low profile. In fact, if one tried to locate the names of the companies in one of those almost-forgotten reference books (remember telephone books), the job was a tough one. First, the firms would have names which meant zero; for example, Rice Labs or Gray & Associates. Next, if one were to call, a human (often a person with a British accent) would politely inquire, “To whom did you wish to speak?” The answer had to conform to a list of acceptable responses. Third, if you were to hunt up the address, you might find yourself in Washington, DC, staring at the second floor of a non-descript building once used to bake pretzels.

image

Decisions, decisions. Thanks, MSFT Copilot. Good enough. Does that phrase apply to one’s own security methods?

Today, the world is different. Specialized firms in a country now engaged in a controversial dust up in the Eastern Mediterranean has companies which have Web sites, publicize their capabilities as mechanisms to know your customer, or make sense of big data. The outfits have trade show presences. One outfit, despite between the poster child from going off the rails, gives lectures and provides previews of its technologies at public events. How times have changed since I have been working in commercial and government work since the early 1970s.

Every company, including those engaged in the development and deployment of specialized policeware and intelware are into marketing. The reason is cultural. Madison Avenue is the whoo-whoo part of doing something quite interesting and wanting to talk about the activity. The other reason is financial. Cracking tough technical problems costs money, and those who have the requisite skills are in demand. The fix, from my point of view, is to try to operate with a public presence while doing the less visible, often secret work required of these companies. The evolution of the specialized software business has been similar to figuring out how to walk a high wire over a circus crowd. Stay on the wire and the outfit is visible and applauded. Fall off the wire and fail big time. But more and more specialized software vendors make the decision to try to become visible and get recognition for their balancing act. I think the optimal approach is to stay out of the big tent avoid the temptations of fame, bright lights, and falling to one’s death.

Why CISA Is Warning CISOs about a Breach at Sisense” provides a good example of public visibility and falling off the high wire. The write up says:

New York City based Sisense has more than a thousand customers across a range of industry verticals, including financial services, telecommunications, healthcare and higher education. On April 10, Sisense Chief Information Security Officer Sangram Dash told customers the company had been made aware of reports that “certain Sisense company information may have been made available on what we have been advised is a restricted access server (not generally available on the internet.)”

Let me highlight one other statement in the write up:

The incident raises questions about whether Sisense was doing enough to protect sensitive data entrusted to it by customers, such as whether the massive volume of stolen customer data was ever encrypted while at rest in these Amazon cloud servers. It is clear, however, that unknown attackers now have all of the credentials that Sisense customers used in their dashboards.

This firm enjoys some visibility because it markets itself using the hot button “analytics.” The function of some of the Sisense technology is to integrate “analytics” into other products and services. Thus it is an infrastructure company, but one that may have more capabilities than other types of firms. The company has non commercial companies as well. If one wants to get “inside” data, Sisense has done a good job of marketing. The visibility makes it easy to watch. Someone with skills and a motive can put grease on the high wire. The article explains what happens when the actor slips up: “More than a thousand customers.”

How can a specialized software company avoid a breach? One step is to avoid visibility. Another is to curtail dreams of big money. Redefine success because those in your peer group won’t care much about you with or without big bucks. I don’t think that is just not part of the game plan of many specialized software companies today. Each time I visit a trade show featuring specialized software firms as speakers and exhibitors I marvel at the razz-ma-tazz the firms bring to the show. Yes, there is competition. But when specialized software companies, particularly those in the policeware and intelware business, market to both commercial and non-commercial firms, that visibility increases their visibility. The visibility attracts bad actors the way Costco roasted chicken makes my French bulldog shiver with anticipation. Tibby wants that chicken. But he is not a bad actor and will not get out of bounds. Others do get out of bounds. The fix is to move the chicken, then put it in the fridge. Tibby will turn his attention elsewhere. He is a dog.

Net net: Less blurring of commercial and specialized customer services might be useful. Fewer blogs, podcasts, crazy marketing programs, and oddly detailed marketing write ups to government agencies. (Yes, these documents can be FOIAed by the Brennan folks, for instance. Yes, those brochures and PowerPoints can find their way to public repositories.) Less marketing. More judgment. Increased security attention, please.

Stephen E Arnold, April 12, 2024

Are Experts Misunderstanding Google Indexing?

April 12, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Google is not perfect. More and more people are learning that the mystics of Mountain View are working hard every day to deliver revenue. In order to produce more money and profit, one must use Rust to become twice as wonderful than a programmer who labors to make C++ sit up, bark, and roll over. This dispersal of the cloud of unknowing obfuscating the magic of the Google can be helpful. What’s puzzling to me is that what Google does catches people by surprise. For example, consider the “real” news presented in “Google Books Is Indexing AI-Generated Garbage.” The main idea strikes me as:

But one unintended outcome of Google Books indexing AI-generated text is its possible future inclusion in Google Ngram viewer. Google Ngram viewer is a search tool that charts the frequencies of words or phrases over the years in published books scanned by Google dating back to 1500 and up to 2019, the most recent update to the Google Books corpora. Google said that none of the AI-generated books I flagged are currently informing Ngram viewer results.

image

Thanks, Microsoft Copilot. I enjoyed learning that security is a team activity. Good enough again.

Indexing lousy content has been the core function of Google’s Web search system for decades. Search engine optimization generates information almost guaranteed to drag down how higher-value content is handled. If the flagship provides the navigation system to other ships in the fleet, won’t those vessels crash into bridges?

In order to remediate Google’s approach to indexing requires several basic steps. (I have in various ways shared these ideas with the estimable Google over the years. Guess what? No one cared, understood, and if the Googler understood, did not want to increase overhead costs. So what are these steps? I shall share them:

  1. Establish an editorial policy for content. Yep, this means that a system and method or systems and methods are needed to determine what content gets indexed.
  2. Explain the editorial policy and what a person or entity must do to get content processed and indexed by the Google, YouTube, Gemini, or whatever the mystics in Mountain View conjure into existence
  3. Include metadata with each content object so one knows the index date, the content object creation date, and similar information
  4. Operate in a consistent, professional manner over time. The “gee, we just killed that” is not part of the process. Sorry, mystics.

Let me offer several observations:

  1. Google, like any alleged monopoly, faces significant management challenges. Moving information within such an enterprise is difficult. For an organization with a Foosball culture, the task may be a bit outside the wheelhouse of most young people and individuals who are engineers, not presidents of fraternities or sororities.
  2. The organization is under stress. The pressure is financial because controlling the cost of the plumbing is a reasonably difficult undertaking. Second, there is technical pressure. Google itself made clear that it was in Red Alert mode and keeps adding flashing lights with each and every misstep the firm’s wizards make. These range from contentious relationships with mere governments to individual staff member who grumble via internal emails, angry Googler public utterances, or from observed behavior at conferences. Body language does speak sometimes.
  3. The approach to smart software is remarkable. Individuals in the UK pontificate. The Mountain View crowd reassures and smiles — a lot. (Personally I find those big, happy looks a bit tiresome, but that’s a dinobaby for you.)

Net net: The write up does not address the issue that Google happily exploits. The company lacks the mental rigor setting and applying editorial policies requires. SEO is good enough to index. Therefore, fake books are certainly A-OK for now.

Stephen E Arnold, April 12, 2024

AI Will Take Jobs for Sure: Money Talks, Humans Walk

April 12, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Report Shows Managers Eager to Replace or Devalue Workers with AI Tools

Bosses have had it with the worker-favorable labor market that emerged from the pandemic. Fortunately, there is a new option that is happy to be exploited. We learn from TechSpot that a recent “Survey Reveals Almost Half of All Managers Aim to Replace Workers with AI, Could Use It to Lower Wages.” The report is by beautiful.ai, which did its best to spin the results as a trend toward collaboration, not pink slips. Nevertheless, the numbers seem to back up worker concerns. Writer Rog Thubron summarizes:

“A report by Beautiful.ai, which makes AI-powered presentation software, surveyed over 3,000 managers about AI tools in the workplace, how they’re being implemented, and what impact they believe these technologies will have. The headline takeaway is that 41% of managers said they are hoping that they can replace employees with cheaper AI tools in 2024. … The rest of the survey’s results are just as depressing for worried workers: 48% of managers said their businesses would benefit financially if they could replace a large number of employees with AI tools; 40% said they believe multiple employees could be replaced by AI tools and the team would operate well without them; 45% said they view AI as an opportunity to lower salaries of employees because less human-powered work is needed; and 12% said they are using AI in hopes to downsize and save money on worker salaries. It’s no surprise that 62% of managers said that their employees fear that AI tools will eventually cost them their jobs. Furthermore, 66% of managers said their employees fear that AI tools will make them less valuable at work in 2024.”

Managers themselves are not immune to the threat: Half of them said they worry their pay will decrease, and 64% believe AI tools do their jobs better than experienced humans do. At least they are realistic. Beautiful.ai stresses another statistic: 60% of respondents who are already using AI tools see them as augmenting, not threatening, jobs. The firm also emphasizes the number of managers who hope to replace employees with AI decreased “significantly” since last year’s survey. Progress?

Cynthia Murrell, April 12, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta