Smart Software: Reproducibility Is Not Part of the Game Plan, Thank You

March 24, 2023

Note: The essay below has been crafted by a real, still-alive dinobaby. No smart software required, thank you.

I love it when outfits suggest one thing and do another. I was tempted to write about some companies’ enthusiastic support for saving whales and their even more intense interest in blocking the ban on “forever chemicals.” But whales are one thing and smart software is another.

Specifically, the once open OpenAI is allegedly embracing the proprietary and trade secret approach to technology. “OpenAI’s Policies hinder Reproducible Research on Language Models” reports:

On Monday [March 20. 2023], OpenAI announced that it would discontinue support for Codex by Thursday. Hundreds of academic papers would no longer be reproducible: independent researchers would not be able to assess their validity and build on their results. And developers building applications using OpenAI’s models wouldn’t be able to ensure their applications continue working as expected.

The article elaborates on this main idea.

Several points:

  1. Reproducibility means that specific recipes have to be known and then tested. Who in Silicon Valley wants this “knowledge seeping” to take place when demand for the know how is — as some may say — doing the hockey stick chart thing.
  2. Good intentions are secondary to money, power, and control. The person or persons who set thresholds, design filters, orchestrate what content and when is fed into a smart system, and similar useful things want their fingers on the buttons. Outsiders in academe or another outfit eager to pirate expertise? Nope.
  3. Reproducibility creates opportunities for those not in the leadership outfit to find themselves criticized for bias, manipulation, and propagandizing. Who wants that other than a public relations firm?

Net net: One cannot reproduce much flowing from today’s esteemed research outfits. Should I mention the president of Stanford University as the poster person for intellectual pogo stick hopping? Oh, I just did.

Stephen E Arnold, March 24, 2023

Useful Scholarly / Semi-Scholarly Research System with Deduplicated Results

March 24, 2023

I was delighted to receive a link to OpenAIRE Explore. The service is sponsored by a non-profit partnership established in 2018 as a legal outfit. The objective is to “ensure a permanent open scholarly communication infrastructure to support European research.” (I am not sure whoever wrote the description has read “Book Publishers Won’t Stop Until Libraries Are Dead.)

The specific service I found interesting is Explore located at https://explore.openaire.eu. The service is described by OpenAIRE this way:

A comprehensive and open dataset of research information covering 161m publications, 58m research data, 317k research software items, from 124k data sources, linked to 3m grants and 196k organizations.

Maybe looking at that TechDirt article will be useful.

I ran a number of queries. The probably unreadable screenshot below illustrates the nice interface and the results to my query for Hopf fibrations (if this query doesn’t make sense to you, there’s not much I can do. Perhaps OpenAIRE Explore is ill-suited to queries about Taylor Swift and Ticketmaster?):

image

The query returned 127 “hits” and identified four organizations as having people interested in the subject. (Hopf fibrations are quite important, in my opinion.) No ads, no crazy SEO baloney, but probably some non-error checked equations. Plus, the result set was deduplicated. Imagine that. A use Vivisimo-type function available again.

Observation: Some professional publishers are likely to find the service objectionable. Four of the giants are watching their legal eagles circle the hapless Internet Archive. But soon… maybe OpenAIRE will attract some scrutiny.

For now, OpenAIRE Explore is indeed useful.

Stephen E Arnold, March 24, 2023

TikTok in Context: It Is Technology, Not the Wizards Writing Code

March 23, 2023

Note: Written by a real, still alive dinobaby. No smart software involved, thank you.

Yep, let’s focus on technology; specifically, online and digitization. The press release / essay “MEMO: TikTok Is a Threat. So Is the Rest of Big Tech” does not name names. The generalization “technology” is a garden spray, not a disciplined Banksy can of spray paint. Yep, technology.

The write up from the Tech Oversight Project states:

Right now, lawmakers are weighing the virtues of a TikTok ban in the United States versus a forced divestiture from Chinese Communist Party-connected parent company ByteDance. Regardless of which direction lawmakers choose, focusing solely on TikTok does not fully get at the heart of the practices every platform engages in to cause so much harm.

And the people? Nope, generalizations and a handful of large companies. And the senior managers, the innovators, the individuals who happily coded the applications and services? Not on the radar.

The document does include some useful information about the behaviors of large technology-centric companies; for example and these are quotes from the cited document:

  • Facebook developed a censorship tool in an attempt to court Chinese engagement.
  • In an effort to court the Chinese market, Google developed a censored version of its platform for use in China and was forced to backtrack under pressure from human rights organizations.
  • 155 of Apple’s top 200 suppliers are based in China.

My view is that specific senior executives directly involved in okaying a specific action or policy should be named. These individuals made decisions based on their ethical and financial contexts. Those individuals should be mapped to specific decisions.

Disconnecting the people who were the “deciders” from the broad mist of “technology” and the handful of companies named is not helpful.

Responsibility accrues to an individual, and individuals are no longer in second grade where shooting a teacher incurs zero penalty. Accountability should have a shelf life akin to a pressurized can of party cheese.

Stephen E Arnold, March 23, 2023

The Mysterious Knowledge Management and Enterprise Search Magic Is Coming Back

March 23, 2023

Note: This post was written by a real, still alive dinobaby. No smart software needed yet.

In the glory days of pre-indictment enterprise search innovators, some senior managers worried that knowledge loss would cost them. The fix, according to some of the presentations I endured, was to use an enterprise search system from one of the then-pre-eminent vendors. No, I won’t name them, but you can hunt for a copy of my Enterprise Search Report (there are three editions of the tome) and check out the companies’ technology which I analyzed.

The glory days, 2nd edition is upon us if I understand “A Testing Environment for AI and Language Models.”

Not having information generates “digital friction.” I noted this passage:

According to a recent survey of 1,000 IT managers at large enterprises, 67% expressed concern over the loss of knowledge and expertise when employees leave the company. The cost of knowledge loss and inefficient knowledge sharing is significant, with IDC estimating that Fortune 500 companies lose approximately $31.5 billion each year by failing to share knowledge. This figure is particularly alarming, given the current uncertain economic climate. By improving information search and retrieval tools, a Fortune 500 company with 4,000 employees could save roughly $2 million per month in lost productivity. Intelligent enterprise search is a critical tool that can help prevent information islands and enable organizations to effortlessly find, surface, and share knowledge and corporate expertise. Seamless access to knowledge and expertise within the digital workplace is essential. The right enterprise search platform can connect workers to knowledge and expertise, as well as connect disparate information silos to facilitate discovery, innovation, and productivity.

Yes, the roaring 2000s all over again.

The only question I have is what start up will be the “new” Autonomy, Delphi, Entopia, Fast Search & Transfer, Grokker, Klevu, or Uniqa, et al? Which of the 2nd generation of enterprise search systems will have an executive accused of financial Fancy Dancing? What will the 2nd edition’s buzzwords do to surf on AI/ML, neural nets, and deep learning?

Exciting. Will these new systems solve the problem of employees’ quitting and taking their know how and “knowledge” with them? Sure. (Why should I be the one to suggest that investors’ dreams could be like Silicon Valley Bank’s risk management methods? And what about “knowledge”? No problem, of course.

Stephen E Arnold, March 23, 2023

TikTok: Some Interesting Assertions

March 22, 2023

Note: This essay is the work of a real, still-living dinobaby. I am too dumb to use smart software.

I read the “testimony” posted by someone at the House of Representatives. No, the document did not include, “Congressman, thank you for the question. I don’t have the information at hand. I will send it to your office.” As a result, the explanation reflects hand crafting by numerous anonymous wordsmiths. Singapore. Children. Everything is Supercalifragilisticexpialidocious. The quip “NSA to go” is shorter and easier to say.

Therefore, I want to turn my attention to the newspaper in the form of a magazine. The Economist published “How TikTok Broke Social Media.” Great Economist stuff! When I worked at a blue chip consulting outfit in the 1970s, one had to have read the publication. I looked at help wanted  ads and the tech section, usually a page or two. The rest of the content was MBA speak, and I was up to my ears in that blather from the numerous meetings through which I suffered.

With modest enthusiasm I worked my way through the analysis of social media. I circled several paragraphs, I noticed one big thing — The phrase “broke social media.” Social media was in my opinion, immune to breaking. The reason is that online services are what I call “ghost like.” Sure, there is one service, which may go away. Within a short span of time, like eight year olds playing amoeba soccer, another gains traction and picks up users and evolves sticky services. Killing social media is like shooting ping pong balls into a Tesla sized blob of Jell-O, an early form of the morphing Terminator robot.  In short, the Jell-O keeps on quivering, sometimes for a long, long time, judging from my mother’s ability to make one Jell-O dessert and keep serving it for weeks. Then there was another one. Thus, the premise of the write up is wrong.

I do want to highlight one statement in the essay:

The social apps will not be the only losers in this new, trickier ad environment. “All advertising is about what the next-best alternative is,” says Brian Wieser of Madison and Wall, an advertising consultancy. Most advertisers allocate a budget to spend on ads on a particular platform, he says, and “the budget is the budget”, regardless of how far it goes. If social-media advertising becomes less effective across the board, it will be bad news not just for the platforms that sell those ads, but for the advertisers that buy them.

My view is shaped by more than 50 years in the online information business. New forms of messaging and monetization are enabled by technology. On example is a thought experiment: What will an advertiser pay to influence the output of a content generator infused with smart software. I have first hand information that one company is selling AI-generated content specifically to influence what appears when a product is reviewed. The technique involves automation, a carousel of fake personas (sockpuppets to some), and carefully shaped inputs to the content generation system. Now is this advertising like a short video? Sure, because the output can be in the form of images or a short machine-generated video using machine generated “real” people. Is this type of “advertising” going to morph and find its way into the next Discord or Telegram public user group?

My hunch is that this type of conscious manipulation and automation is what can be conceptualized as “spawn of the Google.”

Net net: Social media is not “broken.” Advertising will find a way… because money. Heinous psychological manipulation. Exploited by big companies. Absolutely.

Stephen E Arnold, March 22, 2023

TikTok: What Does the Software Do?

March 22, 2023

A day or two ago, information reached me in rural Kentucky about Google’s Project Zero cyber team. I think the main idea is that Google’s own mobiles, Samsung’s, and those of a handful of other vendors were vulnerable. Interesting. The people who make the phones do not know exactly what flaws or data drains their own devices have. What sticks in my mind is that these are not new mobiles like the Nothing Phone.

Why do I mention this? Software can exploit these flaws. Who knew? Obviously not Google when the phones were designed, coded, manufactured, or shipped. Some Googlers use these devices which is even more remarkable. How can a third party know exactly what functions or latent functions exist within hardware or software for that matter?

I assume that the many cyber experts will tell me, “We know.”

Okay, you know. I am not sure I believe you. Sorry.

Now I come to the TikTok is good, TikTok is evil write up “It’s Wild That Western Governments Have Decided That TikTok Might Spy for Chine. The App Hasn’t Helped Itself.” The article reports:

In December, TikTok admitted that some ByteDance staff in the US and China gained access to personal data of journalists in a bid to monitor their location and expose company leaks. A spokesperson said four employees who accessed the data had been fired, CNN reported at  the time. TikTok has maintained the app doesn’t spy on individuals, and has pointed to the steps it’s taking to hive off user information.  Theo Bertram, TikTok’s vice president for public policy in Europe, tweeted on Thursday that the app does not “collect any more data than other apps.”

What’s my point? The Google Project Zero team did not know what was possible with its own code on its own devices. Who knows exactly what the TikTok app does and does not do? Who knows what latent capabilities reside within the app?

The Wall Street Journal published ” on March 19, 2023, page A-4, “DOJ Looking into TikTok’s Tracking of Journalists.” The story contained a statement attributed to a TikTok executive. The snippet I clipped whilst waiting for a third-world airline is:

TikTok’s chief executive Shou Zi Chew has said that divesting the company from its Chinese owners doesn’t offer any more protection that a multibillion-dollar plan the company has already proposed.

Now I am supposed to trust software from an allegedly China-affiliated app? What?

In the absence of sufficient information, what is a prudent path. One can compartmentalize as I do. One can stop using the software as I have for certain applications? One can filter the malicious app so that it is not available? One can install cyber defenses that monitor what’s going in and out and capture data about those flows?

The bottom-line today March 18, 2023, is that we don’t know what we don’t know. Therefore, hasta la vista TikTok.

Stephen E Arnold, March 22, 2023

Google and Its High School Management: An HR Example

March 22, 2023

I read “Google Won’t Honor Medical Leave During Its Layoffs, Outraging Employees.” Interesting explanation of some of Google’s management methods. These specific actions strike me as similar to those made by my high school science club in 1959. We were struggling with the issue of requiring a specific academic threshold for admission. As I recall, one had to have straight A’s in math and science or no Science Club for that person. (We did admit one student who published an article in the Journal of Astronomy with his brother as co-author. He had an incomplete in calculus because he was in Hawaii fooling around with a telescope and missed the final exam. We decided to let him in. Because, well, we were the Science Club for goodness sakes!)

image

Scribbled Diffusion’s rendition of a Google manager (looks a bit like a clown, doesn’t it?) telling an employee he is fired and that his medical insurance has been terminated.

The article reports:

While employees’ severance packages might come with a few more months of health insurance, being fired means instantly losing access to Google’s facilities. If that’s where a laid-off Googler’s primary care doctor works, that person is out of luck, and some employees told CNBC they lost access to their doctors the second the layoff email arrived. Employees on leave also have a lot to deal with. One former Googler, Kate Howells, said she was let go by Google from her hospital bed shortly after giving birth. She worked at the company for nine years.

The highlight of the write up, however, is the Comment Section. Herewith are several items I found noteworthy:

  • Gsgrego writes, “Employees, aka expendable garbage.”
  • Chanman819 offers, “I’ve mentioned it before in one of the other layoff threads, but companies shouldn’t burn bridges when doing layoffs… departing employees usually end up at competitors, regulators, customers, vendors, or partners in the same industry. Many times, they boomerang back a few years in the future. Making sure they have an axe to grind during negotiations or when on the other side of a working relationship is exceptionally ill-advised.
  • Ajmas says, “Termination by accounting.”
  • Asvarduil offers, “Twitter and Google are companies that I now consider radioactive to work for. Even if they don’t fail soon, they’re very clearly poorly-managed. If I had to work for someone else, they’re both companies I’d avoid.
  • MisterJim adds, “Two thoughts: 1. Stay classy Google! 2. Google has employees? Anyone who’s tried to contact them might assume otherwise.

High school science club lives on in the world of non-founder management.

Stephen E Arnold, March 22, 2023

Stanford: Llama Hallucinating at the Dollar Store

March 21, 2023

Editor’s Note: This essay is the work of a real, and still alive, dinobaby. No smart software involved with the exception of the addled llama.

What happens when folks at Stanford University use the output of OpenAI to create another generative system? First, a blog article appears; for example, “Stanford’s Alpaca Shows That OpenAI May Have a Problem.” Second, I am waiting for legal eagles to take flight. Some may already be aloft and circling.

image

A hallucinating llama which confused grazing on other wizards’ work with munching on mushrooms. The art was a creation of ScribbledDiffusion.com. The smart software suggests the llama is having a hallucination.

What’s happening?

The model trained from OWW or Other Wizards’ Work mostly works. The gotcha is that using OWW without any silly worrying about copyrights was cheap. According to the write up, the total (excluding wizards’ time) was $600.

The article pinpoints the issue:

Alignment researcher Eliezer Yudkowsky summarizes the problem this poses for companies like OpenAI:” If you allow any sufficiently wide-ranging access to your AI model, even by paid API, you’re giving away your business crown jewels to competitors that can then nearly-clone your model without all the hard work you did to build up your own fine-tuning dataset.” What can OpenAI do about that? Not much, says Yudkowsky: “If you successfully enforce a restriction against commercializing an imitation trained on your I/O – a legal prospect that’s never been tested, at this point – that means the competing checkpoints go up on BitTorrent.”

I love the rapid rise in smart software uptake and now the snappy shift to commoditization. The VCs counting on big smart software payoffs may want to think about why the llama in the illustration looks as if synapses are forming new, low cost connections. Low cost as in really cheap I think.

Stephen E Arnold, March 21, 2023

Negative News Gets Attention: Who Knew? Err. Everyone in TV News

March 21, 2023

I love academic studies. I have a friend who worked in television news in New York before he was lured to the Courier Journal’s video operation. I asked him how news was prioritized. His answer: “If it bleeds, it leads.” I think he told me this in 1980. I called him and asked when TV news producers knew about the “lead, bleed” angle. His answer, “Since the first ratings study.”

Now I know the decades old truism is — well — true. No film at 11 for this insight.

If you want a more professional analysis of my friend who grew up in Brooklyn, navigate to “Negativity Drives Online News Consumption.” Feel free to substitute any media type for “online.”

Here’s a statement I found interesting:

Online media is important for society in informing and shaping opinions, hence raising the question of what drives online news consumption.

Ah, who knew?

My takeaway from the write up is basic: If smart software ingests that which is online or in other media, that smart software will “discover” or “recurse” to the “lead, bleed” idea. Do I hear a stochastic parrot squawking? OSINT issue? Yep.

Stephen E Arnold, March 21, 2023

Are the Image Rights Trolls Unhappy?

March 21, 2023

Imagine the money. Art aggregators like Getty Images, Alamy, and others suck up images from old books, open source repositories, and probably from kindergarteners. Then when some blog boob uses an image, the image rights trolls leap into action. Threatening letters flood the “infringers” and a reminder than money must be paid. Who authorizes this? The law and the publishers who tell the “enforcer”, “Sure, get some money and will split it with you.” A great business indeed. Many “pigeons” are defeathered.

But there is a road block which some image rights trolls will endeavor to remove. “AI-Generated Images from Text Can’t Be Copyrighted, US Government Rules” states:

Any images that are produced by giving a text prompt to current generative AI models, such as Midjourney or Stable Diffusion, cannot be copyrighted in the US. That’s according to the US Copyright Office (USCO), which has equated such prompts to a buyer giving directions to a commissioned artist.

There is hope. The article points out:

The US Copyright Office left open the door for protecting works with AI-generated elements.

I can hear the sighs of relief from Mr. Pigeon’s office in London to the professionals exhaling at the Higbee law firm. Hope lives! The article adds:

However, the office has left the door open to granting copyright protections to work with AI-generated elements. “The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work,” it said. “This is necessarily a case-by-case inquiry. If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it.” Last month, the USCO determined that images generated by Midjourney and used in a graphic novel were not copyrightable. However, it said the text and layout of Kris Kashtanova’s Zarya of the Dawn could be afforded copyright protection.

Will blog boobs who use machine generated images be able to illustrate their war veteran blogs, the church bulletins, and the individuals who wanted to celebrate flower arranging be free to create with smart software?

Maybe. I have confidence that legal eagles in the image trolling game will find a way. Where there is money to be had, creativity blooms. (Sorry flower person.)

Stephen E Arnold, March 21, 2023

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta