Google and Its Puzzles: Insiders Only, Please

December 26, 2022

ProPublica made available an article of some importance in my opinion. “Porn, Piracy, Fraud: What Lurks Inside Google’s Black Box Ad Empire” walks through the intentional, quite specific engineering of its crucial advertising system to maximize revenue and befuddle (is “defraud” a synonym?) advertisers. I was asked more than a decade ago to do a presentation of my team’s research into Google’s advertising methodology. I declined. At that time, I was doing some consulting work for a company I am not permitted to name. That contract stipulated that I would not talk about a certain firm’s business technologies. I signed because… money.

The ProPublica essay does the revealing about what is presented as a duplicitous, underhanded, and probably illegal business process subsystem. I don’t have to present any of the information I have gathered over the years. I can cite this important article and point out several rocks which the capable writers at ProPublica either did not notice or flipped them over and concluded, “Nah, nothing to see here.”

I urge you to do two things. First, read the ProPublica write up. Number Two: Print it out. My hunch is that it may be disappeared or become quite difficult to find at some point in the future. Why? Ah, grasshopper, that is a question easily answered by the managers who set up Foundem and who were stomped by Googzilla. Alternatively you could chase down a person at the French government tax authority and ask, “Why were French tax forms not findable via a Google search for several years.” These individuals might have the information you need. Shifting gears: Ask Magix, the software company responsible for Sony Vegas why cracks for the software appear in YouTube videos. If you use your imagination, you will come up with ideas for gathering first person information about the lovable online advertising company’s systems and methods. Hint: Look up Dr. Timnit Gebru and inquire about her interactions with one of Google chief scientists. I guarantee that a useful anecdote will bubble up.

So what’s in the write up. Let me highlight a main point and then cite a handful of interesting statements in the article.

What is the main point? In my opinion, ProPublica’s write up says, “The GOOG maximizes its return at the expense of the advertisers and of the users.”

Who knew? Not me. I think the Alphabet Google YouTube DeepMind outfit is the most wonderfulest company in the world. Remember: You heard this here first. I have a priceless Google mouse pad too.

Consider these three statements from the essay. First, Google lingo is interesting:

Google spokesperson Michael Aciman said the company uses a combination of human oversight, automation and self-serve tools to protect ad buyers and said publisher confidentiality is not associated with abuse or low quality.

The idea is that Google is interested in using a hybrid method to protect ad buyers. Plus there is a difference between publishers and confidentiality. I find it interesting that instead of talking about [a] the ads themselves (porn, drugs, etc.), [b] the buyers of advertising which is a distinct industry dependent upon Google for revenue, [c] the companies who want to get their message in front of people allegedly interested in the product of service, or [d] the user of search or some other Google service. Google wants to “protect ad buyers.” And what about the others I have identified? Google doesn’t care. Logical sure but doesn’t Google have the other entities in mind? That’s a question regulators should have asked and had answered after Google settle the litigation with Yahoo over advertising technology, at the time of Google’s acquisition of Oingo (Applied Semantics), or at the time Google acquired DoubleClick. In my opinion, much of the ProPublica write up operates in a neverland of weird Google speak, not the reality of harvesting money from those largely in the dark about what’s happening in the business processes.

Second, consider this statement:

we matched 70% of the accounts in Google’s ad sellers list to one or more domains or apps, more than any dataset ProPublica is aware of. But we couldn’t find all of Google’s publisher partners. What we did find was a system so large, secretive and bafflingly complex that it proved impossible to uncover everyone Google works with and where it’s sending advertisers’ money.

The passage seems to suggest that Google’s engineers went beyond clever and ventured into the murky acreage of intentional obfuscation. It seems as if Google wanted to be able to consume advertising budgets without any entity having the ability to determine [a] if the ad were displayed in a suitable context; that is, did the advertiser’s message match the needs of the user to who the ad was shown.  And [b] was the ad appropriate even if it contained words and phrases on Google’s unofficial stop word lists. (If you have not see these, send an email to benkent2020 at yahoo dot com and one of my team will email you some of the more interesting words that guarantee Google’s somewhat lax processes will definitely try to block. If a word is not on a Google stop list, then the messages will probably be displayed. Remember: As Google terminates six percent of its staff, some of those humans presumably will not be able to review ads per item one above. And [c] note the word “bafflingly”. The focus of much Google engineering over the last 15 years has been to build competitive barriers, extent the monopoly function with “partners”, and double talk in order to keep regulators and curious Congressional people away. That’s my take on  this passage.

Now for the third passage I will cite:

…we uncovered scores of previously unreported peddlers of pirated content, porn and fake audiences that take advantage of Google’s lax oversight to rake in revenue.

I don’t need to say much more about this statement that look at and think about pirated content (copyright), porn (illegal content in some jurisdictions) and fake audiences (cyber fraud). Does this statement suggest that Google is a criminal enterprise? That’s a good question.

I have some high level observations about this excellent article in ProPublica. I offer these in the hope that ProPublica will explore some of these topics or an enterprising graduate student will consider the statements and do some digging.

  1. Why is Google unable to manage its staff? This is an important question because the ad behaviors described in the ProPublica article are the result of executive compensation plans and incentives. Are employees rewarded for implementing operations that further “soft” fraud or worse?
  2. How will Google operate in a more fragmented, more regulated environment? Is one possible behavior a refusal to modify the guiding hand of compensation and incentive programs away from generating more and more money within external constraints? My hunch is that Google will do whatever is necessary to build its revenue.
  3. What mechanisms exist or will be implemented to keep Google’s automated systems operating in a legal, ethical way?

Net net: Finally, after decades of craziness about how wonderful Googzilla is, more critical research is appearing. Is it too little and too late? In my view, yes.

Stephen E Arnold, December 26, 2022

The Internet: Cue the Music. Hit It, Regrets, I Have Had a Few

December 21, 2022

I have been around online for a few years. I know some folks who were involved in creating what is called “the Internet.” I watched one of these luminaries unbutton his shirt and display a tee with the message, “TCP on everything.” Cute, cute, indeed. (I had the task of introducing this individual only to watch the disrobing and the P on everything joke. Tip: It was not a joke.)

Imagine my reaction when I read “Inventor of the World Wide Web Wants Us to Reclaim Our Data from Tech Giants.” The write up states:

…in an era of growing concern over privacy, he believes it’s time for us to reclaim our personal data.

Who wants this? Tim Berners-Lee and a startup. Content marketing or a sincere effort to derail the core functionality of ad trackers, beacons, cookies which expire in 99 years, etc., etc.

The article reports:

Berners-Lee hopes his platform will give control back to internet users. “I think the public has been concerned about privacy — the fact that these platforms have a huge amount of data, and they abuse it,” he says. “But I think what they’re missing sometimes is the lack of empowerment. You need to get back to a situation where you have autonomy, you have control of all your data.”

The idea is that Web 3 will deliver a different reality.

Do you remember this lyric:

Yes, there were times I’m sure you knew
When I bit off more than I could chew
But through it all, when there was doubt
I ate it up and spit it out
I faced it all and I stood tall and did it my way.

The my becomes big tech, and it is the information highway. There’s no exit, no turnaround, and no real chance of change before I log off for the final time.

Yeah, digital regrets. How’s that working out at Amazon, Facebook, Google, Twitter, and Microsoft among others? Unintended consequences and now the visionaries are standing tall on piles of money and data.

Change? Sure, right away.

Stephen E Arnold, December 21, 2022

TikTok Explained without Mentioning Regulation and US Education Failings

December 19, 2022

I am not into TikTok. I enjoy reading analyses of TikTok by individuals who are not engaged in law enforcement, crime analysis, and intelligence work for the US and its allies. Most of these deep dives are entertaining because they miss the obvious: Hoovering data from users for strategic and tactical information weaponization and information operations. I assume that makes me a party pooper, particularly among those who are into the mobile experience. I recall laughing out loud when I listened to a podcast featuring a Silicon Valley news type explaining that TikTok was no big deal. Ho ho ho.

I read this morning (December 17, 2022, 530 am US Eastern) “TikTok’s Secret Sauce.” The write up explains insights gleaned from “a project studying algorithmic amplification and distortion.” Quotes from the write up are in italic to differentiate them from my comments.

I learned:

… the average ratio of hearts to views on TikTok is roughly 5%. People are just not that predictable.

Okay, people are not predictable. May I suggest spending some time with the publicly available information on the Recorded Future Web site? Google and In-Q-Tel were early supporters of this company. The firm’s predictive analytics rely, in part, that people are creatures of habits. Useful information emerges from these types of analyses. In fact, most intelware does, and this includes specialists in other countries, including some not allied with the US.

I learned:

Exploration explains why there are an unending variety of incredibly weird niches on TikTok: the app manages to connect those creators to their niche audiences.

Let’s think in terms of unarticulated needs and desires. TikTok makes it possible for that which is not stated to emerge from user behavior. Feedback ensures that skinny girls and diets that deliver thinness get in front of certain individuals. Feedback is good and finding content that reveals more of the user’s psychographic footprint useful. Why? Manipulation, identification of individuals with certain behavior fingerprints, and amplification of certain messaging. Yep, useful.

I learned:

More generally, in AI applications, the sophistication of the algorithm is rarely the limiting factor.

Interesting. Perhaps the function of TikTok is just obvious. It, in my opinion, so obvious that it is overlooked. In high school more than a half century ago, I recall our class having to read “The Purloined Letter” by that sporty writing Edgar Allan Poe. The main idea is that the obvious is overlooked.

In some countries — might TikTok’s home base be an example — certain actions are obvious and then ignored or misunderstood. TikTok is that type of product. Now, after years of availability, experts are asking questions and digging into the service.

The limiting factor is a failure to understand how online information and services can be weaponized, deliver directed harm, and be viewed as a harmless time waster. Is it too late? Maybe not, but I get a kick out of the reactions of experts to what is as clear and straightforward as driving a vehicle over a mostly clueless pedestrian or ordering spicy regional cuisine without understanding the concept of hot.

Stephen E Arnold, December 19, 2022

A Paradox at the Center of the Internet: No Big Deal

December 2, 2022

The Internet is a mess, but compared to how it was in its early decades it is way more organized. The organization of the Internet is called centralization. Gordon Brander of Unconscious wants the Internet to be decentralized. He says that will happen after it becomes more centralized first, read his explanation here: “Centralization Is Inevitable.” Brander says that the best way to understand the benefits of decentralization is to understand how centralization first happens.

While there are many ways to map centralization, the Internet is concentrated into different hubs or a scale-free network. The best way to define a scale-free network is:

“The defining characteristic of scale-free networks is a power law distribution with a long tail. A small number of nodes with an extremely large number of links, and an extremely large number of nodes with a small number of links. Think Twitter. Most users have a few followers, while a few influencers have millions. This power law distribution grants the biggest hubs a lot of power over the network. It also makes hubs important to the functioning of the network in ways that are not immediately obvious, like keystone species in an ecology.”

These networks emerge because there receive preferential attachment or “the rich-get-richer” scenario. Users prefer a hub/network, ergo it will receive more attention, trust, users, etc. Scale-free networks are also more efficient, because links between systems are smaller.

Another advantage is that they are resilient to attack, i.e. if one part of the hub fails, the entire system continues to run. That also makes networks more vulnerable to attacks, because a well-laced virus could knock out all the nodes.

Brander ends his spiel by stating the centralization and decentralization of the Internet is the circle of life: random start-ups, exponential growth, consolidation, collapse, then repeat. Someone cue The Lion King’s opening song!

Whitney Grace, December 2, 2022

WikiLeaks: Oh, Oh, Some Folks Are Not Happy

December 1, 2022

I read “WikiLeaks Website Is Struggling to Stay Online—As Millions of Documents Disappear.” If the write up is on the money, one lesson from this alleged cancel culture action is to hit the Print to PDF and save a document.” Assuming that online is forever is one of those weird misperceptions many online users have. Nope.

The write up says:

WikiLeaks’ website appears to be coming apart at the seams, with more and more of the organization’s content unavailable without explanation. WikiLeaks technical issues, which have been ongoing for months, have gotten worse in recent weeks as increasingly larger portions of its website no longer function.

The write up points out:

Although WikiLeaks long boasted that it released more than 10 million documents in 10 years, at current, less than 3,000 documents remain accessible, according to an analysis by the Daily Dot of the website’s leaks archive.

What’s interesting is that no one has claimed responsibility for hitting the delete key. What I find interesting is that the site has been online for many years. Now here’s a question, “Who could have taken this action?” Microsoft would say that it was 1,000 engineers working for a nation state. Others might say, “Oh, just a technical glitch.” A few might say, “Teens fooling around?” Does this list exhaust the possibilities?

Stephen E Arnold, December 1, 2022

Sesamy for Content in Small Bites

December 1, 2022

Here is good news for anyone who would like to purchase a piece of content without a long-term relationship with its host platform. The Next Web reports, “Swedish Startup Sesamy Seeks to Slaughter the Subscription Model.” It is such a good idea, we wonder whether this company will become an Amazon acquisition target. Writer Cate Lawrence tells us:

“[Sesamy is] So far, the Stockholm-based company has partnered with every major book publisher in Sweden and Denmark to offer users the option to purchase digital content as a single purchase. You can then consume it on any app or device. This means you can play Sesamy audiobooks in your favorite audio app and download watermarked ebooks to any ereader. And you actually own the book instead of renting it with a platform like Amazon Kindle. … Publishing companies are struggling to woo readers who look to cut costs, and Sesamy offers them a new business model and potential revenue source. In October, the company launched SmartID with Swedish publication Breakit, enabling publishers to monetize non-subscribed readers, without cannibalizing their existing revenues from digital subscriptions.

The software will also include built-in price optimization that suggests a fair retail cost to readers and publishers, ensuring that the platform remains competitive. And this incremental revenue may add up at a time when people are culling their subscriptions to save money.”

There must be an appetite for this sort of service—the company just raked in €3.3 million in a recent funding round. It will use this capital to make available single issues of newspapers and magazines. Yes please. Lawrence contemplates an extension to academic journal articles. They should really be free, she notes, but single-article access would be an improvement. Sesamy was founded in March 2021 by the folks behind the podcast platform Acast.

Cynthia Murrell, December 1, 2022

Smart Software: Can Humans Keep Pace with Emergent Behavior ?

November 29, 2022

For the last six months, I have been poking around the idea that certain behaviors are emergent; that is, give humans a capability or a dataspace, and those humans will develop novel features and functions. The examples we have been exploring are related to methods used by bad actors to avoid take downs by law enforcement. The emergent behaviors we have noted exploit domain name registry mechanisms and clever software able to obfuscate traffic from Tor exit nodes. The result of the online dataspace is unanticipated emergent behaviors. The idea is that bad actors come up with something novel using the Internet’s furniture.

We noted “137 Emergent Abilities of Large Language Models.” If our understanding of this report is mostly accurate, large language models like those used by Google and other firms manifest emergent behavior. What’s interesting is that the write up explains that there is not one type of emergent behavior. The article ideas a Rivian truck bed full of emergent behaviors.

Here’s are the behaviors associated with big data sets and LaMDA 137B. (The method is a family of Transformer-based neural language models specialized for dialog. Correctly or incorrectly we associate LaMBA with Google’s smart software work. See this Google blog post.) Now here are the items mentioned in the Emergent Abilities paper:

Gender inclusive sentences German

Irony identification

Logical arguments

Repeat copy logic

Sports understanding

Swahili English proverbs

Word sorting

Word unscrambling

Another category of emergent behavior is what the paper calls “Emergent prompting strategies.” The idea is more general prompting strategies manifest themselves. The system can perform certain functions that cannot be implemented when using “small” data sets; for example, solving multi step math problems in less widely used languages.

The paper includes links so the different types of emergent behavior can be explored. The paper wraps up with questions researchers may want to consider. One question we found suggestive was:

What tasks are language models currently not able to to perform, that we should evaluate on future language models of better quality?

The notion of emergent behavior is important for two reasons: [a] Systems can manifest capabilities or possible behaviors not anticipated by developers and [b] Novel capabilities may create additional unforeseen capabilities or actions.

If one thinks about emergent behaviors in any smart, big data system, humans may struggle to understand, keep up, and manage downstream consequences in one or more dataspaces.

Stephen E Arnold, November 29, 2022

Are Governments Behaving Like Sheep?

November 24, 2022

North Korea, China, and possibly Russia are incarnates of Orwell’s Big Brother from the dystopian 1984 novel. The US government is compared to Big Brother (and rightly so) when it attempts to block free speech. The thing about outlawing free speech is that it takes too much energy to regulate. The US government wants to limit free speech, but only when it feels like it. We also do not want that, because the government lies. Gizmodo explains why we do not want the government to be Big Brother in: “You Really Don’t Want The Government To Be Your Content Moderator.”

The Department of Homeland Security is collaborating with tech firms and large businesses to repackage Bush’s “War on Terror” into a new product. They are building tools to monitor social media and combat disinformation. Why did this happen?

“In April, the Biden administration announced the launch of a Disinformation Governance Board, a new unit within DHS meant to “standardize the [government’s] treatment of disinformation” across various agencies. But the project was fumbled from the start: the unit initially failed to release a charter, leaving Americans to wonder just what exactly this shadowy new group with a creepy name was going to be doing. It didn’t take long for critics—on both the political left and right—to start referring to it as a “Ministry of Truth,” (the notorious propaganda bureau from George Orwell’s 1984). Though officials tried to salvage the effort. DHS shuttered the board in May after it had been operational for less than a month.”

Biden’s administration continued the Orwellian acts with a new organization: Cybersecurity and Infrastructure Security (CISA). Big businesses such as JPMorgan Chase and Twitter are working with the FBI and CISA to approach state-sponsored disinformation campaigns. The US government also wants to address COVID-19 vaccine efficacy, US support of Ukraine, Afghanistan withdrawal, and racial justice.

Is the US government is not an impartial entity despite what politicians claim?

Whitney Grace, November 24, 2022

Estonia and e-Residency

November 21, 2022

I have been to Estonia a couple of time. Once I visited in the summer. Another time I visited in February. Here’s a tip: “Leaves of Grass” weather is preferable in my opinion.

I mention Estonia because I noted a link to the Estonian government’s e-Residency information. You can find the basics at “Become and E-Resident.”

The main idea is that one can join Estonia’s digital nation. E-Residency is open to people from other countries. The idea is that the business would be “location independent” and the company would be an EU outfit.

The benefits include:

  • Grow your business remotely
  • Minimized bureaucracy (keep in mind that this is an EU company within a Baltic state with a Russian border)
  • Joining an international community.

There are nominal fees, probably less than US$200, and a background checking process.

The idea is an interesting one. However, the e-Residency does not appear to include one of those “golden passports” available from some countries.

Are there downsides? A few, for example:

  • Explaining to a US tax authority what’s going on
  • Anticipating how the program will evolve; for example, laws passed in Estonia going forward
  • Dealing with litigation in the US, EU, and elsewhere
  • Resolving issues arising from payment to vendors and collecting money from customers.

If this approach to business appears attractive, check out the Estonia government’s Web site.

Stephen E Arnold, November 21, 2022

Confirming a Fundamental Law of Online: Centralization Is Emergent

November 17, 2022

The author of “Scaling Mastodon Is Impossible” did not set out to provide evidence of this fundamental Arnold Law of Online: Centralization is emergent. The law means that when someone creates an online service, traffic flow or whatever one calls what happens online causes centralization. The idea is that centralization is cheaper and somewhat easier to maintain than the “let many flowers bloom” approach to development. (Hello, Amazon, Facebook, Google, and Twitter. You have an advantage. Why not use it to your advantage?)

The article about Mastodon states:

Decentralization promotes an utopian view of the world that I belief fails to address actual real problems in practice. Yet on that decentralization wave a lot of projects are riding from crypto-currencies [1], defi or things such as Mastodon. All of these things have one thing in common: distrust. Some movements come from the distrust of governments or taxation, others come from the distrust of central services.

As the essay creeps to its conclusion, I spotted a gem of observation; to wit:

Wikipedia for all it’s faults shows quite well that a centralized thing can exist with the right model behind it. The software and the content is open, and if WikiMedia were to fuck up too much, then someone else could step into place and replace it. But the risk of that happening, keeps the organization somewhat in check.

If the author is correct, the future of online may look more like Wikipedia. Possibly? There is another Arnold Law of Online to consider:

Online services lead to monopolization.

This means there will be new Amazons and Googles in the future. Emergent does not mean good, however.

Stephen E Arnold, November 17, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta