Libraries: Who Needs Them? Perhaps Everyone

May 3, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

How dare libraries try to make the works they purchase more easily accessible to their patrons! The Nation ponders, “When You Buy a Book, You Can Loan It to Anyone. This Judge Says Libraries Can’t. Why Not?” The case was brought before the U.S. District Court in Manhattan by four publishers unhappy with the Internet Archive’s (IA) controlled digital lending (CDL) program. We learn the IA does plan to appeal the decision. Writer Michelle M. Wu explains:

“At issue was whether a library could legally digitize the books it already owned and lend the digital copies in place of the print. The IA maintained that it could, as long as it lent only the same number of copies it owned and locked down the digital copies so that a borrower could not copy or redistribute them. It would be doing what libraries had always done, lend books—just in a different format. The publishers, on the other hand, asserted that CDL infringed on authors’ copyrights, making unauthorized copies and sharing these with libraries and borrowers, thereby depriving the authors and publishers of rightful e-book sales. They viewed CDL as piracy. While Judge John G. Koeltl’s opinion addressed many issues, all his reasoning was based on one assumption: that copyright primarily is about authors’ and publishers’ right to profit. Despite the pervasiveness of this belief, the history of copyright tells us something different.”

Wu recounts copyright’s evolution from a means to promote the sharing of knowledge to a way for publishers to rake in every possible dime. The shift was driven by a series of developments in technology. In the 1980s, the new ability to record content to video tape upset Hollywood studios. Apparently, being able to (re)watch a show after its initial broadcast was so beyond the pale a lawsuit was required. Later, Internet-based innovations prompted more legal proceedings. On the other hand, tools evolved that enabled publishers to enforce their interpretation of copyright, no judicial review required. Wu asserts:

“Increasing the impact on the end user, publishers—not booksellers or authors—now control prices and access. They can charge libraries multiple times what they charge an individual and bill them repeatedly for the same content. They can limit the number of copies a library buys, or even refuse to sell e-books to libraries at all. Such actions ultimately reduce the amount of content that libraries can provide to their readers.”

So that is how the original intention of copyright law has been turned on its head. And how publishers are undermining the whole purpose of libraries, which are valiantly trying to keep pace with technology. Perhaps the IA will win it’s appeal and the valuable CDL program will be allowed to continue. Either way, their litigious history suggests publishers will keep fighting for control over content.

Cynthia Murrell, May 3, 2023

TikTok: An App for Mind Control?

March 29, 2023

I read “TikTok Is Part of China’s Cognitive Warfare Campaign.” The write up is an opinion. Before I suggest that the write is missing the big picture, let me highlight what I think sums up the argument:

While a TikTok ban may take out the first and fattest mole, it fails to contend with the wider shift to cognitive warfare as the sixth domain of military operations under way, which includes China’s influence campaigns on TikTok, a mass collection of personal and biometric data from American citizens and their race to develop weapons that could one day directly assault or disable human minds.

The problem for me is that I think the “mind control” angle is just one weapon in a specific application environment. The Middle Kingdom is working like Type A citizen farmers in these strike zones:

  1. Financial. The objective is to get on the renminbi bus and off the donkey cart dollar.
  2. Physical. The efficacy of certain pathogens is familiar to anyone who had an opportunity to wear a mask and stay home for a year or two.
  3. Political. The “deal” between two outstanding nation states in the Middle East is a signal I noted.
  4. Technological. The Huawei superwatch, the steady progress in microprocessor engineering, and those phone-home electric vehicles are significant developments.
  5. Social. Western democracies may not be embracing China-style methods, but some countries like India are definitely feeling the vibe for total control of the Internet.

The Guardian — may the digital overlords smile on the “real” news organization’s JavaScript which reminds how many Guardian articles I read since the “bug” was placed on my computer — gets part of the story correct. Hopefully the editors will cover the other aspects of the Chinese initiative.

TikTok, not the main event. Plus, it does connect to WiFi, Congressperson.

Stephen E Arnold, March 29, 2023

Negative News Gets Attention: Who Knew? Err. Everyone in TV News

March 21, 2023

I love academic studies. I have a friend who worked in television news in New York before he was lured to the Courier Journal’s video operation. I asked him how news was prioritized. His answer: “If it bleeds, it leads.” I think he told me this in 1980. I called him and asked when TV news producers knew about the “lead, bleed” angle. His answer, “Since the first ratings study.”

Now I know the decades old truism is — well — true. No film at 11 for this insight.

If you want a more professional analysis of my friend who grew up in Brooklyn, navigate to “Negativity Drives Online News Consumption.” Feel free to substitute any media type for “online.”

Here’s a statement I found interesting:

Online media is important for society in informing and shaping opinions, hence raising the question of what drives online news consumption.

Ah, who knew?

My takeaway from the write up is basic: If smart software ingests that which is online or in other media, that smart software will “discover” or “recurse” to the “lead, bleed” idea. Do I hear a stochastic parrot squawking? OSINT issue? Yep.

Stephen E Arnold, March 21, 2023

Google and Its Puzzles: Insiders Only, Please

December 26, 2022

ProPublica made available an article of some importance in my opinion. “Porn, Piracy, Fraud: What Lurks Inside Google’s Black Box Ad Empire” walks through the intentional, quite specific engineering of its crucial advertising system to maximize revenue and befuddle (is “defraud” a synonym?) advertisers. I was asked more than a decade ago to do a presentation of my team’s research into Google’s advertising methodology. I declined. At that time, I was doing some consulting work for a company I am not permitted to name. That contract stipulated that I would not talk about a certain firm’s business technologies. I signed because… money.

The ProPublica essay does the revealing about what is presented as a duplicitous, underhanded, and probably illegal business process subsystem. I don’t have to present any of the information I have gathered over the years. I can cite this important article and point out several rocks which the capable writers at ProPublica either did not notice or flipped them over and concluded, “Nah, nothing to see here.”

I urge you to do two things. First, read the ProPublica write up. Number Two: Print it out. My hunch is that it may be disappeared or become quite difficult to find at some point in the future. Why? Ah, grasshopper, that is a question easily answered by the managers who set up Foundem and who were stomped by Googzilla. Alternatively you could chase down a person at the French government tax authority and ask, “Why were French tax forms not findable via a Google search for several years.” These individuals might have the information you need. Shifting gears: Ask Magix, the software company responsible for Sony Vegas why cracks for the software appear in YouTube videos. If you use your imagination, you will come up with ideas for gathering first person information about the lovable online advertising company’s systems and methods. Hint: Look up Dr. Timnit Gebru and inquire about her interactions with one of Google chief scientists. I guarantee that a useful anecdote will bubble up.

So what’s in the write up. Let me highlight a main point and then cite a handful of interesting statements in the article.

What is the main point? In my opinion, ProPublica’s write up says, “The GOOG maximizes its return at the expense of the advertisers and of the users.”

Who knew? Not me. I think the Alphabet Google YouTube DeepMind outfit is the most wonderfulest company in the world. Remember: You heard this here first. I have a priceless Google mouse pad too.

Consider these three statements from the essay. First, Google lingo is interesting:

Google spokesperson Michael Aciman said the company uses a combination of human oversight, automation and self-serve tools to protect ad buyers and said publisher confidentiality is not associated with abuse or low quality.

The idea is that Google is interested in using a hybrid method to protect ad buyers. Plus there is a difference between publishers and confidentiality. I find it interesting that instead of talking about [a] the ads themselves (porn, drugs, etc.), [b] the buyers of advertising which is a distinct industry dependent upon Google for revenue, [c] the companies who want to get their message in front of people allegedly interested in the product of service, or [d] the user of search or some other Google service. Google wants to “protect ad buyers.” And what about the others I have identified? Google doesn’t care. Logical sure but doesn’t Google have the other entities in mind? That’s a question regulators should have asked and had answered after Google settle the litigation with Yahoo over advertising technology, at the time of Google’s acquisition of Oingo (Applied Semantics), or at the time Google acquired DoubleClick. In my opinion, much of the ProPublica write up operates in a neverland of weird Google speak, not the reality of harvesting money from those largely in the dark about what’s happening in the business processes.

Second, consider this statement:

we matched 70% of the accounts in Google’s ad sellers list to one or more domains or apps, more than any dataset ProPublica is aware of. But we couldn’t find all of Google’s publisher partners. What we did find was a system so large, secretive and bafflingly complex that it proved impossible to uncover everyone Google works with and where it’s sending advertisers’ money.

The passage seems to suggest that Google’s engineers went beyond clever and ventured into the murky acreage of intentional obfuscation. It seems as if Google wanted to be able to consume advertising budgets without any entity having the ability to determine [a] if the ad were displayed in a suitable context; that is, did the advertiser’s message match the needs of the user to who the ad was shown.  And [b] was the ad appropriate even if it contained words and phrases on Google’s unofficial stop word lists. (If you have not see these, send an email to benkent2020 at yahoo dot com and one of my team will email you some of the more interesting words that guarantee Google’s somewhat lax processes will definitely try to block. If a word is not on a Google stop list, then the messages will probably be displayed. Remember: As Google terminates six percent of its staff, some of those humans presumably will not be able to review ads per item one above. And [c] note the word “bafflingly”. The focus of much Google engineering over the last 15 years has been to build competitive barriers, extent the monopoly function with “partners”, and double talk in order to keep regulators and curious Congressional people away. That’s my take on  this passage.

Now for the third passage I will cite:

…we uncovered scores of previously unreported peddlers of pirated content, porn and fake audiences that take advantage of Google’s lax oversight to rake in revenue.

I don’t need to say much more about this statement that look at and think about pirated content (copyright), porn (illegal content in some jurisdictions) and fake audiences (cyber fraud). Does this statement suggest that Google is a criminal enterprise? That’s a good question.

I have some high level observations about this excellent article in ProPublica. I offer these in the hope that ProPublica will explore some of these topics or an enterprising graduate student will consider the statements and do some digging.

  1. Why is Google unable to manage its staff? This is an important question because the ad behaviors described in the ProPublica article are the result of executive compensation plans and incentives. Are employees rewarded for implementing operations that further “soft” fraud or worse?
  2. How will Google operate in a more fragmented, more regulated environment? Is one possible behavior a refusal to modify the guiding hand of compensation and incentive programs away from generating more and more money within external constraints? My hunch is that Google will do whatever is necessary to build its revenue.
  3. What mechanisms exist or will be implemented to keep Google’s automated systems operating in a legal, ethical way?

Net net: Finally, after decades of craziness about how wonderful Googzilla is, more critical research is appearing. Is it too little and too late? In my view, yes.

Stephen E Arnold, December 26, 2022

The Internet: Cue the Music. Hit It, Regrets, I Have Had a Few

December 21, 2022

I have been around online for a few years. I know some folks who were involved in creating what is called “the Internet.” I watched one of these luminaries unbutton his shirt and display a tee with the message, “TCP on everything.” Cute, cute, indeed. (I had the task of introducing this individual only to watch the disrobing and the P on everything joke. Tip: It was not a joke.)

Imagine my reaction when I read “Inventor of the World Wide Web Wants Us to Reclaim Our Data from Tech Giants.” The write up states:

…in an era of growing concern over privacy, he believes it’s time for us to reclaim our personal data.

Who wants this? Tim Berners-Lee and a startup. Content marketing or a sincere effort to derail the core functionality of ad trackers, beacons, cookies which expire in 99 years, etc., etc.

The article reports:

Berners-Lee hopes his platform will give control back to internet users. “I think the public has been concerned about privacy — the fact that these platforms have a huge amount of data, and they abuse it,” he says. “But I think what they’re missing sometimes is the lack of empowerment. You need to get back to a situation where you have autonomy, you have control of all your data.”

The idea is that Web 3 will deliver a different reality.

Do you remember this lyric:

Yes, there were times I’m sure you knew
When I bit off more than I could chew
But through it all, when there was doubt
I ate it up and spit it out
I faced it all and I stood tall and did it my way.

The my becomes big tech, and it is the information highway. There’s no exit, no turnaround, and no real chance of change before I log off for the final time.

Yeah, digital regrets. How’s that working out at Amazon, Facebook, Google, Twitter, and Microsoft among others? Unintended consequences and now the visionaries are standing tall on piles of money and data.

Change? Sure, right away.

Stephen E Arnold, December 21, 2022

TikTok Explained without Mentioning Regulation and US Education Failings

December 19, 2022

I am not into TikTok. I enjoy reading analyses of TikTok by individuals who are not engaged in law enforcement, crime analysis, and intelligence work for the US and its allies. Most of these deep dives are entertaining because they miss the obvious: Hoovering data from users for strategic and tactical information weaponization and information operations. I assume that makes me a party pooper, particularly among those who are into the mobile experience. I recall laughing out loud when I listened to a podcast featuring a Silicon Valley news type explaining that TikTok was no big deal. Ho ho ho.

I read this morning (December 17, 2022, 530 am US Eastern) “TikTok’s Secret Sauce.” The write up explains insights gleaned from “a project studying algorithmic amplification and distortion.” Quotes from the write up are in italic to differentiate them from my comments.

I learned:

… the average ratio of hearts to views on TikTok is roughly 5%. People are just not that predictable.

Okay, people are not predictable. May I suggest spending some time with the publicly available information on the Recorded Future Web site? Google and In-Q-Tel were early supporters of this company. The firm’s predictive analytics rely, in part, that people are creatures of habits. Useful information emerges from these types of analyses. In fact, most intelware does, and this includes specialists in other countries, including some not allied with the US.

I learned:

Exploration explains why there are an unending variety of incredibly weird niches on TikTok: the app manages to connect those creators to their niche audiences.

Let’s think in terms of unarticulated needs and desires. TikTok makes it possible for that which is not stated to emerge from user behavior. Feedback ensures that skinny girls and diets that deliver thinness get in front of certain individuals. Feedback is good and finding content that reveals more of the user’s psychographic footprint useful. Why? Manipulation, identification of individuals with certain behavior fingerprints, and amplification of certain messaging. Yep, useful.

I learned:

More generally, in AI applications, the sophistication of the algorithm is rarely the limiting factor.

Interesting. Perhaps the function of TikTok is just obvious. It, in my opinion, so obvious that it is overlooked. In high school more than a half century ago, I recall our class having to read “The Purloined Letter” by that sporty writing Edgar Allan Poe. The main idea is that the obvious is overlooked.

In some countries — might TikTok’s home base be an example — certain actions are obvious and then ignored or misunderstood. TikTok is that type of product. Now, after years of availability, experts are asking questions and digging into the service.

The limiting factor is a failure to understand how online information and services can be weaponized, deliver directed harm, and be viewed as a harmless time waster. Is it too late? Maybe not, but I get a kick out of the reactions of experts to what is as clear and straightforward as driving a vehicle over a mostly clueless pedestrian or ordering spicy regional cuisine without understanding the concept of hot.

Stephen E Arnold, December 19, 2022

A Paradox at the Center of the Internet: No Big Deal

December 2, 2022

The Internet is a mess, but compared to how it was in its early decades it is way more organized. The organization of the Internet is called centralization. Gordon Brander of Unconscious wants the Internet to be decentralized. He says that will happen after it becomes more centralized first, read his explanation here: “Centralization Is Inevitable.” Brander says that the best way to understand the benefits of decentralization is to understand how centralization first happens.

While there are many ways to map centralization, the Internet is concentrated into different hubs or a scale-free network. The best way to define a scale-free network is:

“The defining characteristic of scale-free networks is a power law distribution with a long tail. A small number of nodes with an extremely large number of links, and an extremely large number of nodes with a small number of links. Think Twitter. Most users have a few followers, while a few influencers have millions. This power law distribution grants the biggest hubs a lot of power over the network. It also makes hubs important to the functioning of the network in ways that are not immediately obvious, like keystone species in an ecology.”

These networks emerge because there receive preferential attachment or “the rich-get-richer” scenario. Users prefer a hub/network, ergo it will receive more attention, trust, users, etc. Scale-free networks are also more efficient, because links between systems are smaller.

Another advantage is that they are resilient to attack, i.e. if one part of the hub fails, the entire system continues to run. That also makes networks more vulnerable to attacks, because a well-laced virus could knock out all the nodes.

Brander ends his spiel by stating the centralization and decentralization of the Internet is the circle of life: random start-ups, exponential growth, consolidation, collapse, then repeat. Someone cue The Lion King’s opening song!

Whitney Grace, December 2, 2022

WikiLeaks: Oh, Oh, Some Folks Are Not Happy

December 1, 2022

I read “WikiLeaks Website Is Struggling to Stay Online—As Millions of Documents Disappear.” If the write up is on the money, one lesson from this alleged cancel culture action is to hit the Print to PDF and save a document.” Assuming that online is forever is one of those weird misperceptions many online users have. Nope.

The write up says:

WikiLeaks’ website appears to be coming apart at the seams, with more and more of the organization’s content unavailable without explanation. WikiLeaks technical issues, which have been ongoing for months, have gotten worse in recent weeks as increasingly larger portions of its website no longer function.

The write up points out:

Although WikiLeaks long boasted that it released more than 10 million documents in 10 years, at current, less than 3,000 documents remain accessible, according to an analysis by the Daily Dot of the website’s leaks archive.

What’s interesting is that no one has claimed responsibility for hitting the delete key. What I find interesting is that the site has been online for many years. Now here’s a question, “Who could have taken this action?” Microsoft would say that it was 1,000 engineers working for a nation state. Others might say, “Oh, just a technical glitch.” A few might say, “Teens fooling around?” Does this list exhaust the possibilities?

Stephen E Arnold, December 1, 2022

Sesamy for Content in Small Bites

December 1, 2022

Here is good news for anyone who would like to purchase a piece of content without a long-term relationship with its host platform. The Next Web reports, “Swedish Startup Sesamy Seeks to Slaughter the Subscription Model.” It is such a good idea, we wonder whether this company will become an Amazon acquisition target. Writer Cate Lawrence tells us:

“[Sesamy is] So far, the Stockholm-based company has partnered with every major book publisher in Sweden and Denmark to offer users the option to purchase digital content as a single purchase. You can then consume it on any app or device. This means you can play Sesamy audiobooks in your favorite audio app and download watermarked ebooks to any ereader. And you actually own the book instead of renting it with a platform like Amazon Kindle. … Publishing companies are struggling to woo readers who look to cut costs, and Sesamy offers them a new business model and potential revenue source. In October, the company launched SmartID with Swedish publication Breakit, enabling publishers to monetize non-subscribed readers, without cannibalizing their existing revenues from digital subscriptions.

The software will also include built-in price optimization that suggests a fair retail cost to readers and publishers, ensuring that the platform remains competitive. And this incremental revenue may add up at a time when people are culling their subscriptions to save money.”

There must be an appetite for this sort of service—the company just raked in €3.3 million in a recent funding round. It will use this capital to make available single issues of newspapers and magazines. Yes please. Lawrence contemplates an extension to academic journal articles. They should really be free, she notes, but single-article access would be an improvement. Sesamy was founded in March 2021 by the folks behind the podcast platform Acast.

Cynthia Murrell, December 1, 2022

Smart Software: Can Humans Keep Pace with Emergent Behavior ?

November 29, 2022

For the last six months, I have been poking around the idea that certain behaviors are emergent; that is, give humans a capability or a dataspace, and those humans will develop novel features and functions. The examples we have been exploring are related to methods used by bad actors to avoid take downs by law enforcement. The emergent behaviors we have noted exploit domain name registry mechanisms and clever software able to obfuscate traffic from Tor exit nodes. The result of the online dataspace is unanticipated emergent behaviors. The idea is that bad actors come up with something novel using the Internet’s furniture.

We noted “137 Emergent Abilities of Large Language Models.” If our understanding of this report is mostly accurate, large language models like those used by Google and other firms manifest emergent behavior. What’s interesting is that the write up explains that there is not one type of emergent behavior. The article ideas a Rivian truck bed full of emergent behaviors.

Here’s are the behaviors associated with big data sets and LaMDA 137B. (The method is a family of Transformer-based neural language models specialized for dialog. Correctly or incorrectly we associate LaMBA with Google’s smart software work. See this Google blog post.) Now here are the items mentioned in the Emergent Abilities paper:

Gender inclusive sentences German

Irony identification

Logical arguments

Repeat copy logic

Sports understanding

Swahili English proverbs

Word sorting

Word unscrambling

Another category of emergent behavior is what the paper calls “Emergent prompting strategies.” The idea is more general prompting strategies manifest themselves. The system can perform certain functions that cannot be implemented when using “small” data sets; for example, solving multi step math problems in less widely used languages.

The paper includes links so the different types of emergent behavior can be explored. The paper wraps up with questions researchers may want to consider. One question we found suggestive was:

What tasks are language models currently not able to to perform, that we should evaluate on future language models of better quality?

The notion of emergent behavior is important for two reasons: [a] Systems can manifest capabilities or possible behaviors not anticipated by developers and [b] Novel capabilities may create additional unforeseen capabilities or actions.

If one thinks about emergent behaviors in any smart, big data system, humans may struggle to understand, keep up, and manage downstream consequences in one or more dataspaces.

Stephen E Arnold, November 29, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta