Alphabet Spells Out YouTube Recommendations: Are Some Letters Omitted?

September 23, 2021

I have been taking a look at Snorkel (Stanford AI Labs, open source stuff, and the commercial variants). I am a dim wit. It seems to me that Google has found a diving partner and embracing some exotic equipment. The purpose of the Snorkel is to implement smart workflows. These apparently will allow better, faster, and cheaper operations; for example, classifying content for the purpose of training smart software. Are their applications of Snorkel-type thinking to content recommendation systems. Absolutely. Note that subject matter experts and knowledge bases are needed at the outset of setting up a Snorkelized system. Then, the “smarts” are componentized. Future interaction is by “engineers”, who may or may not be subject matter experts. The directed acyclic graphs are obviously “directed.” Sounds super efficient.

Now navigate to “On YouTube’s Recommendation System.” This is a lot of words for a Googler to string together: About 2,500.

Here’s the key passage:

These human evaluations then train our system to model their decisions, and we now scale their assessments to all videos across YouTube.

Now what letters are left out? Maybe the ones that spell built-in biases, stochastic drift, and Timnit Gebru? On the other hand, that could be a “Ré” of hope for cost reduction.

Stephen E Arnold, September 23, 2021

Will Life Become Directed Nudges?

September 17, 2021

I read an article with a thought provoking message. The write up is “Changing Customer Behavior in the Next New Normal.” How do these changes come about? The article is about insurance, which has seemed like a Ponzi pie dolloped with weird assurances when disaster strikes. And when disaster strikes where are the insurance companies? Some work like beavers to avoid fulfilling their end of the financial deal policy holders thought was a sure thing. House burn up in California? Ida nuke your trailer? Yeah, happy customers.

But what’s interesting about the write up is that it advocates manipulation, nudges, and weaponized digital experiences to get people to buy insurance. I learned:

The experience of living through the pandemic has changed the way people live and behave. Changes which offered positive experiences will last longer, especially the ones driven by well-being, convenience, and simplicity. Thereby, digital adoption, value-based personalized purchasing, and increased health awareness will be the customer behaviors that will shape the next new normal. This will be a game-changer for the life insurance industry and provide an opportunity for the industry to think beyond the usual, innovate, and offer granular, value-based and integrated products to meet customer needs. The focus will be on insurance offerings, which will combine risk transfer with proactive and value-added services and emerge as a differentiator.

Not even the murky writing of insurance professionals can completely hide the message. Manipulation is the digital future. If people selling death insurance have figured it out, other business sectors will as well.

The future will be full of directed experiences.

That’s super.

Stephen E Arnold, September 17, 2021

TikTok: Privacy Spotlight

September 15, 2021

There is nothing like rapid EU response to privacy matters. “TikTok Faces Privacy Investigations by EU Watchdog” states:

The watchdog is looking into its processing of children’s personal data, and whether TikTok is in line with EU laws about transferring personal data to other countries, such as China.

The data hoovering capabilities of a TikTok-type app have been known for what — a day or two or a decade? My hunch is that we are leaning toward the multi-year awareness side of the privacy fence. The write up points out:

TikTok said privacy was “our highest priority”.

Plus about a year ago an EU affiliated unit poked into the TikTok privacy matter.

However, the write up fails to reference a brilliant statement by a Swisher-type of thinker. My recollection is that the gist of the analysis of the TikTok privacy issue in the US was, “Hey, no big deal.”

We’ll see. I wait for a report on this topic. Perhaps a TikTok indifferent journalist will make a TikTok summary of the report findings.

Stephen E Arnold, September 15, 2021

Has TikTok Set Off an Another Alarm in Washington, DC?

September 9, 2021

Perhaps TikTok was hoping the recent change to its privacy policy would slip under the radar. The Daily Dot reports that “Senators are ‘Alarmed’ at What TikTok Might Be Doing with your Biometric Data.” The video-sharing platform’s new policy specifies it now “may collect biometric identifiers and biometric information,” like “faceprints and voiceprints.” Why are we not surprised? Two US senators expressed alarm at the new policy which, they emphasize, affects nearly 130 million users while revealing few details. Writer Andrew Wyrich reports,

“That change has sparked Sen. Amy Klobuchar (D-Minn.) and Sen. John Thune (R-S.D.) to ask TikTok for more information on how the app plans to use that data they said they’d begin collecting. Klobuchar and Thune wrote a letter to TikTok earlier this month, which they made public this week. In it, they ask the company to define what constitutes a ‘faceprint’ and a ‘voiceprint’ and how exactly that collected data will be used. They also asked whether that data would be shared with third parties and how long the data will be held by TikTok. … Klobuchar and Thune also asked the company to tell them whether it was collecting biometric data on users under 18 years old; whether it will ‘make any inferences about its users based on faceprints and voiceprints;’ and whether the company would use machine learning to determine a user’s age, gender, race, or ethnicity based on the collected faceprints or voiceprints.”

Our guess is yes to all three, though we are unsure whether the company will admit as much. Nevertheless, the legislators make it clear they expect answers to these questions as well as a list of all entities that will have access to the data. We recommend you do not hold your breath, Senators.

Cynthia Murrell, September 9, 3021

Change Is Coming But What about Un-Change?

September 8, 2021

My research team is working on a short DarkCyber video about automating work processes related to smart software. The idea is that one smart software system can generate an output to update another smart output system. The trend was evident more than a decade ago in the work of Dr. Zbigniew Michalewicz, his son, and collaborators. He is the author of How to Solve It: Modern Heuristics. There were predecessors and today many others following smart approaches to operations for artificial intelligence or what is called by thumbtypers AIOps. The DarkCyber video will become available on October 5, 2021. We’ll try to keep the video peppy because smart software methods are definitely exciting and mostly invisible. And like other embedded components, some of these “modules” will become components, commoditized, and just used “as is.” That’s important because who worries about a component in a larger system? Do you wonder if the microwave is operating at peak efficiency with every component chugging along up to spec? Nope and nope.

I read a wonderful example of Silicon Valley MBA thinking called “It’s Time to Say “Ok, Boomer!” to Old School Change Management.” At first glance, the ideas about efficiency and keeping pace with technical updates make sense. The write up states:

There are a variety of dated methods when it comes to change management. Tl;dr it’s lots of paper and lots of meetings. These practices are widely regarded as effective across the industry, but research shows this is a common delusion and change management itself needs to change.

Hasta la vista Messrs. Drucker and the McKinsey framework.

The write up points out that a solution is at hand:

DevOps teams push lots of changes and this is creating a bottleneck as manual change management processes struggle to keep up. But, the great thing about DevOps is that it solves the problem it creates. One of the key aspects where DevOps can be of great help in change management is in the implementation of compliance. If the old school ways of managing change are too slow why not automate them like everything else? We already do this for building, testing and qualifying, so why not change? We can use the same automation to record change events in real time and implement release controls in the pipelines instead of gluing them on at the end.

Does this seem like circular reasoning?

I want to point out that if one of the automation components operates using probability and the thresholds are incorrect, the data poisoned (corrupted by intent or chance) or the “averaging” which is a feature of some systems triggers a butterfly effect, excitement may ensue. The idea is that a small change may have a large impact downstream; for example, a wing flap in Biloxi could create a flood in the 28th Street Flatiron stop.

Several observations:

  • AIOps are already in operation at outfits like the Google and will be componentized in an AWS-style package
  • Embedded stuff, like popular libraries, are just used and not thought about. The practice brings joy to bad actors who corrupt some library offerings
  • Once a component is up and running and assumed to be okay, those modules themselves resist change. When 20 somethings encounter mainframe code, their surprise is consistent. Are we gonna change this puppy or slap on a wrapper? What’s your answer, gentle reader?

Net net: AIOps sets the stage for more Timnit Gebru shoot outs about bias and discrimination as well as the type of cautions produced by Cathy O’Neil in Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.

Okay, thumbtyper.

Stephen E Arnold, September 8, 2021

TikTok: No Big Deal? Data Collection: No Big Deal Either

September 7, 2021

Here’s an interesting and presumably dead accurate statement from “TikTok Overtakes YouTube for Average Watch Time in US and UK.”

YouTube’s mass audience means it’s getting more demographics that are comparatively light internet users… it’s just reaching everyone who’s online.

So this means Google is number one? The write up points out:

The Google-owned video giant has an estimated two billion monthly users, while TikTok’s most recent public figures suggested it had about 700 million in mid-2020.

Absolutely. To me, it looks as if two billion is bigger than 700 million.

But TikTok has “upended the streaming and social landscape.”

How? Two billion is bigger than 700 million. Googlers like metrics, and that’s a noticeable difference.

I learned that the average time per user spent on the apps is higher for TikTok than for YouTube. TikTok has a high levels of “engagement.”

Google YouTube has more users, but TikTok users are apparently more hooked on the short form content from the quasi-China influenced outfit.

Advertisers will care. Retailers who want to hose users with product pitches via TikTok care.

Data harvesters at TikTok will definitely care. The more time spent on a monitored app provides a more helpful set of data about the users. These users can be tagged and analyzed using helpful open source tools like Bootleg.

Just a point to consider: How useful will time series data be about a TikTok user or user cluster? How useful will such data be when it comes time to identify a candidate for insider action? But some Silicon Valley wizards pooh pooh TikTok data collection. Maybe a knowledge gap for this crowd?

Stephen E Arnold, September 9, 2021

FR Is Going Far

September 6, 2021

Law enforcement officials are using facial recognition software and the array of cameras that cover the majority of the world to identify bad actors. The New York Times reports on a story that was used to track down a terroristic couple: “A Fire In Minnesota. An Arrest In Mexico. Cameras Everywhere.”

Mena Yousif is an Iranian refuge and Jose Felan is a felon. The couple were frustrated about the current state of the American law enforcement system and government, especially after George Floyd’s death. They set fire to buildings, including schools, stores, gas stations, and caused damage to over 1500. The ATF posted videos of the pair online, asking for any leads to their arrests. The ATF received tips as Felan and Yousif traveled across the US to the Mexican border. The were on the run for two weeks before they were identified outside of a motel in Texas.

Mexican authorities deployed a comprehensive facial recognition system, deployed in 2019, and it was used to find Felan and Yousif. Dahua Technology designed Mexico’s facial recognition system. Dahua is a Chinese company, one of the largest video surveillance companies in the world, and is partially owned by the its government. The US Defense and Commerce departments blacklisted Dahua for China’s treatment of Uighur Muslims and the trade war. Dahua denies the allegations and stated that it cannot control how its technology is used. Facial recognition did not catch Yousif and Felan, instead they were given a tip.

China is marketing surveillance technology to other countries, particularly in South America, Asia, and Africa, as a means to minimize crime and promote order. There are issues with the technology being perfect and the US does use it despite them:

“In the United States, facial recognition technology is widely used by law enforcement officials, though poorly regulated. During a congressional hearing in July, lawmakers expressed surprise that 20 federal agencies were using it without having fully assessed the risks of misuse or bias — some algorithms have been found to work less accurately on women and people of color, and it has led to mistaken arrests. The technology can be a powerful and effective crime-solving tool, though, placing it, for now, at a tipping point. At the start of the hearing, Representative Sheila Jackson Lee, Democrat of Texas, highlighted the challenge for Congress — or anyone — in determining the benefits and downsides to using facial recognition: It’s not clear how well it works or how widely it’s used. As Ms. Jackson Lee said, “Information on how law enforcement agencies have adopted facial recognition technology remains underreported or nonexistent.”

Many governments around the world, including the US, seem poised to their increase the amount of facial recognition and tracking technology for law and order. What is interesting is that China has been a pacesetter.

Whitney Grace, September 9, 2021

Not an Onion Report: Handwaving about Swizzled Data

August 24, 2021

I read at the suggestion of a friend “These Data Are Not Just Excessively Similar. They Are Impossibly Similar.” At first glance, I thought the write up was a column in an Onion-type of publication. Nope, someone copied the same data set and pasted it into itself.

Here’s what the write up says:

The paper’s Excel spreadsheet of the source data indicated mathematical malfeasance.

Malfeasance. Okay.

But what caught my interest was the inclusion of this name: Dan Ariley. If this is the Dan Ariely who wrote these books, that fact alone is suggestive. If it is a different person, then we are dealing with routine data dumbness or data dishonesty.


The write up contains what I call academic ducking and covering. You may enjoy this game, but I find it boring. Non reproducible results, swizzled data, and massaged numerical recipes are the status quo.

Is there a fix? Nope, not as long as most people cannot make change or add up the cost of items in a grocery basket. Smart software depends on data. And if those data are like those referenced in this Metafilter article, well. Excitement.

Stephen E Arnold, August 24, 2021

Big Data, Algorithmic Bias, and Lots of Numbers Will Fix Everything (and Your Check Is in the Mail)

August 20, 2021

We must remember, “The check is in the mail” and “I will always respect you” and “You can trust me.” Ah, great moments in the University of Life’s chapbook of factoids.

I read “Moving Beyond Algorithmic Bias Is a Data Problem”. I was heartened by the essay. First, the document has a document object identifier and a link to make checking updates easy. Very good. Second, the focus of the write up is the inherent problem of most of the Fancy Dan baloney charged big data marketing to which I have been subjected in the last six or seven years. Very, very good.

I noted this statement in the essay:

Why, despite clear evidence to the contrary, does the myth of the impartial model still hold allure for so many within our research community? Algorithms are not impartial, and some design choices are better than others.

Notice the word “myth”. Notice the word “choices.” Yep, so much for the rock solid nature of big data, models, and predictive silliness based on drag-and-drop math functions.

I also starred this important statement by Donald Knuth:

Donald Knuth said that computers do exactly what they are told, no more and no less.

What’s the real world behavior of smart anti-phishing cyber security methods? What about the autonomous technology in some nifty military gear like the Avenger drone?

Google may not be thrilled with the information in this essay nor thrilled about the nailing of the frat bros’ tail to the wall; for example:

The belief that algorithmic bias is a dataset problem invites diffusion of responsibility. It absolves those of us that design and train algorithms from having to care about how our design choices can amplify or curb harm. However, this stance rests on the precarious assumption that bias can be fully addressed in the data pipeline. In a world where our datasets are far from perfect, overall harm is a product of both the data and our model design choices.

Perhaps this explains why certain researchers’ work is not zipping around Silicon Valley at the speed of routine algorithm tweaks? The statement could provide some useful insight into why Facebook does not want pesky researchers at NYU’s Ad Observatory digging into how Facebook manipulates perception and advertisers.

The methods for turning users and advertisers into puppets is not too difficult to figure out. That’s why certain companies obstruct researchers and manufacture baloney, crank up the fog machine, and offer free jargon stew to everyone including researchers. These are the same entities which insist they are not monopolies. Do you believe that these are mom-and-pop shops with a part time mathematician and data wrangler coming in on weekends? Gee, I do.

The “Moving beyond” article ends with a snappy quote:

As Lord Kelvin reflected, “If you cannot measure it, you cannot improve it.”

Several observations are warranted:

  1. More thinking about algorithmic bias is helpful. The task is to get people to understand what’s happening and has been happening for decades.
  2. The interaction of math most people don’t understand and very simple objectives like make more money or advance this agenda is a destabilizing force in human behavior. Need an example. The Taliban and its use of WhatsApp is interesting, is it not?
  3. The fix to the problems associated with commercial companies using algorithms as monetary and social weapons requires control. The question is from whom and how.

Stephen E Arnold, August 20, 2021

Nifty Interactive Linear Algebra Text

August 17, 2021

Where was this text when I was an indifferent student in a one-cow town high school? I suggest you take a look at Dan Margalit’s and Joseph Rabinoff’s Interactive linear Algebra. The text is available online and as a PDF version. The information is presented clearly and there are helpful illustrations. Some of them wiggle and jump. This is a must-have in my opinion. Linear algebra in the age of whiz-bang smart methods? Yes. One comment: When we checked the online version, the hot links in the index did not resolve. Use the Next link.

Stephen E Arnold, August 17, 2021

Next Page »

  • Archives

  • Recent Posts

  • Meta