Google Stop Words: Close Enough for the Mom and Pop Online Ad Vendor

April 15, 2021

I remember from a statistics lecture given by a fellow named Dr. Peplow maybe that fuzzy is one of the main characteristics of statistics. The idea is that a percentage is not a real entity; for example, the average number of lions in a litter is three, give or take a couple of the magnets for hunters and poachers. Depending upon the data set, the “real” number maybe 3.2 cubs in a litter. Who has ever seen a fractional lion? Certainly not me.

Why am I thinking fuzzy? Google is into data. The company collects, counts, and transform “real” data into actions. Whip in some smart software, and the company has processes which transform an advertiser’s need to reach eyeballs with some statistically validated interest in whatever the Mad Ave folks are trying to sell.

Google Has a Secret Blocklist that Hides YouTube Hate Videos from Advertisers—But It’s Full of Holes” suggests that some of the Google procedures are fuzzy. The uncharitable might suggest that Google wants to get close enough to collect ad money. Horse shoe aficionados use the phrase “close enough for horse shoes” to indicate a toss which gets a point or blocks an opponent’s effort. That seems to be one possible message from the Mark Up article.

I noted this passage in the essay:

If you want to find YouTube videos related to “KKK” to advertise on, Google Ads will block you. But the company failed to block dozens of other hate and White nationalist terms and slogans, an investigation by The Markup has found. Using a list of 86 hate-related terms we compiled with the help of experts, we discovered that Google uses a blocklist to try to stop advertisers from building YouTube ad campaigns around hate terms. But less than a third of the terms on our list were blocked when we conducted our investigation.

What seems to be happening is that Google’s methods for taking a term and then “broadening” it so that related terms are identified is not working. The idea is that related terms with a higher “score” are more directly linked to the original term. Words and phrases with lower “scores” are not closely related. The article uses the example of the term KKK.

I learned:

Google Ads suggested millions upon millions of YouTube videos to advertisers purchasing ads related to the terms “White power,” the fascist slogan “blood and soil,” and the far-right call to violence “racial holy war.” The company even suggested videos for campaigns with terms that it clearly finds problematic, such as “great replacement.” YouTube slaps Wikipedia boxes on videos about the “the great replacement,” noting that it’s “a white nationalist far-right conspiracy theory.” Some of the hundreds of millions of videos that the company suggested for ad placements related to these hate terms contained overt racism and bigotry, including multiple videos featuring re-posted content from the neo-Nazi podcast The Daily Shoah, whose official channel was suspended by YouTube in 2019 for hate speech.

It seems to me that Google is filtering specific words and phrases on a stop word list. Then the company is not identifying related terms, particularly words which are synonyms for the word on the stop list.

Is it possible that Google is controlling how it does fuzzification. In order to get clicks and advertising, Google blocks specifics and omits the term expansion and synonym identification settings to eliminate the words and phrases identified by the Mark Up’s investigative team?

These references to synonym expansion and reference to query expansion are likely to be unfamiliar to some people. Nevertheless, fuzzy is in the hands of those who set statistical thresholds.

Fuzzy is not real, but the search results are. Ad money is a powerful force in some situations. The article seems to have uncovered a couple of enlightening examples. String matching coupled with synonym expansion seem to be out of step. Some fuzzification may be helpful in the hate speech methods.

Stephen E Arnold, April 12, 2021

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta