SEO Semantics and the Vibrant Vivid Vees
January 29, 2021
Years ago, one of the executives at Vivisimo, which was acquired by IBM, told me about the three Vees. These were the Vees of Vivisimo’s metasearch system. The individual, who shall remain nameless, whispered: Volume, Velocity, and Variety. He smiled enigmatically. In a short time, the three Vees were popping up in the context of machine learning, artificial intelligence, and content discovery.
The three Vivisimo Vees seem to capture the magic and mystery of digital data flows. I am not on that wheezing bus in Havana.
Volume is indeed a characteristic of online information. Even if one has a trickle of Word documents to review each day, the individual reading, editing, and commenting on a report has a sense that there are more Word documents flying around than the handful in this morning’s email. But in the context of our datasphere, no one knows how much digital data exist, what it contains, who has access, etc. Volume is a fundamental characteristic of today’s datasphere. The only way to contain data is to pull the plug. That is not going to happen unless there is something larger than Google. Maybe a massive cyber attack?
The second Vee is variety. From the point of view of the Vivisimo person, variety referred to the content that text centric system processed. Text, unlike a tidy database file, is usually a mess. Without structure, transform and load outfits have been working for decades to convert the messy into the orderly or at least pull out certain chunks so that one can extract key words, dates, and may entities with reasonable accuracy. Today there is a lot of variety; however, for every new variant old ones become irrelevant. At best, the variety challenge is like a person in a raft trying to paddle to keep from being swamped with intentional and unintentional content types. How about those encrypted message? Another hurdle for the indexing outfit: Decryption, metadata extraction and assignment, and processing throughput. So the variety Vee is handled by focusing on a subset of content. Too bad for those who think that “all” information is online.
The third Vee is a fave among the real time crowd. The idea that streams and flows of data in real time can be processed on the fly, patterns identified, advanced analytics applied, and high value data emitted. This notion is a good one when working in print shop in the 17th century. Those workflows don’t make any sense when figuring out the stream of data produced by an unidentified drone which may be weaponized. Furthermore, if a monitoring device notes a several millisecond pattern before a person’s heart attack, that’s not too helpful when the afflicted individual falls over dead a second later. What is “real time”? Answer: There are many types, so the fix is to focus, narrow, winnow, and go for a high probability signal. Sometimes it works; sometimes it doesn’t.
The three Vees are a clever and memorable marketing play. A company can explain how its system manages each of these issues for a particular customer use case. The one size fit all idea is not what generates information processing revenues. Service fees, subscriptions, and customization are the money spinners.
The write up “The Four V’s of Semantic Search” adds another Vee to the Vivisimo three: Veracity. I don’t want to argue “truth” because in the datasphere for every factoid on one side of an argument, even a Bing search can generate counter examples. What’s interesting is that this veracity Vee is presented as part of search engine optimization using semantic techniques. Here’s a segment I circled:
The fourth V is about how accurate the information is that you share, which speaks about your expertise in the given subject and to your honesty. Google cares about whether the information you share is true or not and real or not, because this is what Googles [sic] audience cares about. That’s why you won’t usually get search results that point to the fake news sites.
Got that. Marketing hoo hah, sloganeering, and word candy — just like the three Vivisimo Vees.
Stephen E Arnold, January 29, 2021
Common Sense via a Survey: Social Media and Kiddies. Guess the Results?
January 29, 2021
I read “Social Media Damages Teenagers’ Mental Health, Report Says.” I had a college professor who loved studies like this. Grant money. Demonstrate the obvious. Write a research paper. Get more grant money. Repeat.
The write up reports:
Teenagers’ mental health is being damaged by heavy social media use, a report has found.
Yikes! Who knew?
Here’s what the academic wizards unearthed, almost the discovery of the Twitter and Facebook era, and I quote:
- One in three girls was unhappy with their personal appearance by the age of 14, compared with one in seven at the end of primary school
- The number of young people with probable mental illness has risen to one in six, up from one in nine in 2017
- Boys in the bottom set at primary school had lower self-esteem at 14 than their peers.
I wonder if the youthful person wearing fur and horns in the US Capitol a couple of weeks ago is a manifestation of delayed youth.
On the other hand, mobile neck has become a thing. Quite surprising that social media is not the wonderland of community and positivity that some folks assumed.
Imagine that! Assume. Ass of you and me, according to another instructor in college who seemed less inclined to research common sense under a grant umbrella.
Stephen E Arnold, January 28, 2021
Twitter and the Fire Hose for Academics
January 29, 2021
I read “Enabling the Future of Academic Research with the Twitter API.” According to the official Twitter statement:
Our developer platform hasn’t always made it easy for researchers to access the data they need, and many have had to rely on their own resourcefulness to find the right information.
Understatement, of course.
The post continues:
We’ve also made improvements to help academic researchers use Twitter data to advance their disciplines, answer urgent questions during crises, and even help us improve Twitter.
Help is sometimes — well — helpful. But self help is often a positive step; for example, verifying the actual identity of a person who uses the tweeter thing. There are some software robots chugging along I believe.
Also, charging a subscription fee. The amount is probably less important than obtaining verifiable bank information. Sure, some software robots have accounts at outstanding institutions like Credit Suisse and HSBC, but whatever account data are available might be helpful under certain circumstances.
But academics? How many academics work for non governmental or governmental entities as experts, analysts, and advisors? Will the tweeter thing’s new initiative take such affiliations into account before and during usage of Twitter data?
I assume that a tweeter senior manager will offer an oracular comment like, “For sure.”
There are three hoops through which the agile academic must jump, and I quote:
- You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.
- You have a clearly defined research objective, and you have specific plans for how you intend to use, analyze, and share Twitter data from your research…
- You will use this product track for non-commercial purposes….
Sounds like a plan which will make some nation states’ academics wriggle with anticipative joy.
My view is that this new initiative may unfold in interesting ways. But I am sure the high school science club managers have considered such possibilities. Why who would hire a graduate student to access tweeter outputs to obtain actionable information for use by a country’s intelligence professionals? The answer in the twitterverse is, “Who would risk losing the trust of Twitter by doing that?” Certainly not an academic funded by an intelligence or law enforcement entity.
Right, no one. Misuse the tweeter? Inconceivable.
Stephen E Arnold, January 29, 2021
AI That Judges Us on Our Appearance
January 29, 2021
An echo of the long-debunked field of phrenology has emerged in today’s AI. American Scientist delves into “The Dark Past of Algorithms that Associate Appearance and Criminality.” It seems makers of these new technologies missed an old lesson: do not judge a book by its cover. Despite these long-accepted words of wisdom recent algorithms have popped up that purport to identify criminals, or potential criminals, by appearance alone. We’re told several schools have installed cameras that supposedly identify cheaters and inattentive students. There is even an AI out of Stanford University that claims to be an accurate gaydar. Writer Catherine Stinson outlines not only why such algorithms are prone to error, from but also the dangers they pose to certain segments of society. I recommend interested readers check out the article. Here is one excerpt that summarizes the basics:
“Complex personal traits such as a tendency to commit crimes are exceedingly unlikely to be genetically linked to appearance in such a way as to be readable from photographs. First, criminality would have to be determined to a significant extent by genes rather than environment. There may be some very weak genetic influences, but any that exist would be washed out by the much larger influence of environment. Second, the genetic markers relevant to criminality would need to be linked in a regular way to genes that determine appearance. This link could happen if genes relevant to criminality were clustered in one section of the genome that happens to be near genes relevant to face shape. For a complex social trait such as criminality, this clustering is extremely unlikely. A much more likely hypothesis is that any association that exists between appearance and criminality works in the opposite direction: A person’s appearance influences how other people treat them, and these social influences are what drives some people to commit crimes (or to be found guilty of them).”
The points about flawed data sets are also very important—consider the differences between a mug shot and selfies posted to social media. Between the historic ridicule of phrenology and more recent discussions around AI bias, it is surprising developers would even consider teaching their algorithms to make assumptions based on appearance.
Cynthia Murrell, January 29, 2021
Online Axiom: Distorted Information Is Part of the Datasphere
January 28, 2021
I read a 4,300 word post called “Nextdoor Is Quietly Replacing the Small-Town Paper” about an online social network aimed at “neighbors.” Yep, just like the one in which Mr. Rogers lived in for 31 years.
A world that only exists in upscale communities, populated by down home folks with money, and alarm systems.
The write up explains:
Nextdoor is an evolution of the neighborhood listserv forthe social media age, a place to trade composting tips, offerbabysitting services, or complain about the guy down the street whodoesn’t clean up his dog’s poop. Like many neighborhood listservs,it also has increasingly well-documented issues with racial profiling, stereotyping of the homeless, and political ranting of variousstripes, including QAnon. But Nextdoor has gradually evolved into something bigger and more consequential than just a digital bulletin board: In many communities,the platform has begun to step into roles once filled by America’slocal newspapers.
As I read this, I recalled that Google wants to set up its own news operation in Australia, but the GOOG is signing deals with independent publishers, maybe the mom-and-pop online advertising company should target Nextdoor. Imagine the Google Local ads which could be hosed into this service. Plus, Nextdoor already disappears certain posts and features one of the wonkiest interfaces for displaying comments and locating items offered for free or for sale. Google-ize it?
The article gathers some examples of how the at homers use Nextdoor to communicate. Information, disinformation, and misinformation complement quasi-controversial discussions. But if one gets too frisky, then the “seed” post is deleted from public view.
I have pointed out in my lectures (when I was doing them until the Covid thing) that the local and personal information is a goldmine of information useful to a number of commercial and government entities.
If you know zero about Nextdoor, check out the long, long article hiding happily behind a “register to read” paywall. On the other hand, sign up and check out the service.
Google, if you were a good neighbor, you would be looking at taking Nextdoor to Australia to complement the new play of “Google as a news publisher.” A “real” news outfit. Maybe shaped information is an online “law” describing what’s built in to interactions which are not intermediated?
Stephen E Arnold, January 28, 2021
What Makes the Web Slow? Really Slow?
January 28, 2021
I read “We Rendered a Million Web Pages to Find Out What Makes the Web Slow.” My first reaction was the East Coast Internet outage which ruined some Type A workers’ day. I can hear the howls, “Mommy, I can’t attend class, our Internet is broken again.”
Here’s a passage from the “Rendered a Million Web Pages” which I found interesting:
Internet commentators are fond of saying that correlation does not equal causation, and indeed we can’t get at causality directly with these models. Great caution should be exercised when interpreting the coefficients, particularly because a lot confounding factors may be involved. However, there’s certainly enough there to make you go “hmm”.
Yep, I went “hmm.” But for these reasons:
- Ad load times slow down my Web experiences. Don’t you love those white page hung ads on the YouTube or the wonky baloney on the Daily Mail?
- How about crappy Internet service providers?
- Are you thrilled with cache misses?
- Pages stuffed full of trackers, bugs, codes, and spammy SEO stuff.
Hmm, indeed.
Stephen E Arnold, January 28, 2021
Is Google Becoming a ‘Real’ Publisher: Moves in Australia May Signal a New Thrust for the Online Ad Outfit
January 28, 2021
I read “Google Revives Australia News Platform Launch Amid Content Payment Fight.” Feint, misstep, or gut punch? I am not sure. Australia is a long way from Harrod’s Creek, Kentucky. The write up mentions a Googler in whom I have an interest: Melanie Silva, a VP at the mom-and-pop online ad vendor Google. Ms. Silva has responsibility for managing and directing Google Australia. She has previous financial and travel sector experience and eight years at the GOOG. According to Time Auction:
She holds a Bachelor of Economics degree from Macquarie University, Diploma of Financial Planning and Diploma of Interactive and Direct Marketing from the Institute of Direct Marketing in the United Kingdom. She is a karaoke fan, amazing wife and mother of two.
Ms. Silva would be in charge of Google’s own Australian news Web service News Showcase.
Instead of paying non-Googley publishers for their content’s headlines and news extracts, Google may be jumping into “real” publishing. As I pointed out in my monograph, Google: The Digital Gutenberg, the mom-and-pop ad outfit had become the world’s largest publisher. Sorry, Facebook. But each Google search result page is a newly published item, stuffed with individualized ads and shaped content. When I wrote the monograph for a now defunct Brexit zapped publisher, no one cared. Google? A publisher? You must be kidding. Right, more craziness from Harrod’s Creek.
The idea is simple. Form partnerships with Googley outfits for content. Boom. A news site which sidesteps Australia’s keen desire to capture revenue from the mom-and-pop online ad company.
How realistic is this Googley play? The answer is, “It depends.” Google wants to avoid getting into the check writing habit for mere nation states. The French deal is not one the GOOG wants to watch diffuse like tribbles across the globe. On the other hand, maybe the wizards of Mountain View will find a way to make the “real” publisher model work. In that case, established “real” news outfits may have a problem.
What if one of the seven outfits demonstrating Google-approved qualities and publishes syndicated news just like the “real” news outfits in Australia? Well, that will keep some legal eagles in ocean front nests guarded by new BMWs happy.
As I wrote in 2008:
What sets Google’s publishing process apart is the small number of individual steps required to take in, process, and push out information. (Google: The Digital Gutenberg, Infonortics, 2009, page 24)
“Small number” means efficiencies “real” news companies cannot easily imagine or implement.
Worth watching Australia. The country may designate Google as its new red kangaroo.
Stephen E Arnold, January 28, 2021
Amazon: Dark Pattern? Of Course Not
January 28, 2021
Consumer advocates have noticed Amazon is not one to make it very simple to stop paying it money. Yahoo Finance shares the (paywalled) Bloomberg article, “Amazon Makes It Too Hard to Cancel Prime, Groups Tell FTC.” Amazon Prime is the company’s $119/ year membership that allows one to get free shipping and freely stream music and videos, among other benefits. We’re reminded the program has contributed greatly to the company’s dominance of the worldwide online retail market. Writers Matt Day and Ben Brody report:
“In a letter to the Federal Trade Commission on Thursday, a group led by Public Citizen said the steps required to cancel Prime ‘are designed to unfairly and deceptively undermine the will of the consumer,’ and may violate FTC rules as well as other consumer protection laws. The letter draws on a complaint by Norway’s consumer protection agency, which on Thursday asked Norwegian regulators to determine whether Amazon violated local law. … The report by Forbrukerrådet, Norway’s state-backed consumer protection agency, documents how Amazon riddles the process with ‘dark patterns,’ or manipulative techniques, including steps that nestle the choice to leave in between other options to abort the whole process or maintain their membership. The group also produced a video that demonstrates how a user who wants to cancel Prime might accidentally click buttons that actually keep them in the program. While complaints routinely land at the FTC with little action, at least one of the parties involved in Thursday’s letter, the Center for Digital Democracy, has been able to push commissioners in the past.”
For its part, Amazon insists it is “clear and easy” to cancel the membership. Amazon is already going through a congressional antitrust investigation and probes by the FTC, European Commission, and other regulators. The shift into the new presidential administration is unlikely to help the company’s position. If Amazon suddenly makes it easier to cancel one’s Prime subscription, we need not wonder why.
Cynthia Murrell, January 28, 2021
Bitchute: Still Powering Those Ultra Bits
January 28, 2021
Republicans view Democrats with suspicion. Democrats stare back at Republicans. Both political parties have media outlets that support each of their political ideologies. The only problem for either party are the extremists (and conspiracy theorists) that haunt their ranks. That being said welcome to BitChute, a conservative video streaming platform that allows frisky speech, conspiracy theorists, and Web 3.0 thinkers.
Mashable deep dives into the platform in: “BitChute Welcomes The Dangerous Hate Speech That YouTube Bans.” BitChute has not received as much attention as other alternative social media Web sites. British citizen Ray Vahey, a Web developer, founded BitChute as a free speech platform when Google banned certain contentious speech and extremist content on YouTube. Vahey lives in Thailand and he actively supports conspiracy theories.
BitChute is funded by donations and will start playing ads from the advertising company Criteo. Most of BitChute’s content comes from YouTube and it is not owned by the uploads. Reuters, for example, has a channel, but Reuters does not own it. There have been takedown allegations, although they were copyright infringement and not community guidelines.
“As HOPE not hate’s report puts it: “BitChute exists to circumvent the moderation of mainstream platforms.” BitChute really seems like the Wild West. The company lists basic community guidelines on the site, but users can easily find videos that violate them. And it’s not like there’s so much content that BitChute couldn’t moderate it all. “
There are fewer uploaders to BitChute than YouTube enjoys, but that does not limit the depth of unusual factoids shared in videos. BitChute’s guidelines state terrorism recruitment videos were not allowed, yet there are many available as well as mass shooting videos.
BitChute may be poised for growth.
Whitney Grace, January 28, 2020
Google Management: What Happens When Science Club Management Methods Emulate Secret Societies?
January 27, 2021
A secret society is one with special handshakes, initiation routines, and a code of conduct which prohibits certain behavior. Sometimes even a secret society has a trusted, respected member whose IQ and personal characteristics are what might be called an “issue.” My hunch is that the write up “Google Hired a Lawyer to Probe Bullying Claims about DeepMind Cofounder Mustafa Suleyman and Shifted His Role” may be a good example — if the real news is indeed accurate — of mostly adult judgment. [The linked document resides behind a paywall … because money.]
As I understand the information in this write up, uber wizard Mustafa Suleyman allegedly engaged in behavior the Googlers found out of bounds. Note, however, that the alleged perpetrator was not terminated. Experts in smart software are tough to locate and hire. Mr. Suleyman was given a lateral arabesque. First defined by Laurence J. Peter is that some management issues can be resolved by shifts to a comparable level of the hierarchy just performing different management or job functions. A poor manager could be encouraged to accept a position as chief quality officer in an organization’s new office in Alert, in the Qikiqtaaluk Region, Nunavut, Canada. (Bring a Google sweater.)
DeepMind is known for crushing a human Go player, who may now be working as a delivery person for Fanji Braised Meat in Preserved Sauce on Zhubashi in Xian, China. The company developed software able to teach itself the game of checkers. Allegedly DeepMind performed magic with protein folding calculations, but it seems to have come up short on problems for solving death and providing artificial general intelligence for a user of Google calendar.
These notable technical accomplishments may have produced a sinkhole brimming with red ink. The 2019 Google financials indicate that about $1 billion in debt has been written off. Revenue appears to be a bit of a challenge for the Googlers working on technology that will generate sustainable revenue for Google’s next 20 years.,
And what about those management methods channeling how high school science clubs operated in the 1950s:
- Generate fog to make it difficult to discern exactly what happened and why Google’s in house people professionals could not gather the information about alleged bullying? Why a lawyer? Why not a private investigative group? There are some darned good ones in merrie olde Angleland.
- Mixed signals are emitted. If something actionable occurred, why not let the aggrieved go through appropriate legal and employee oversight channels to resolve the matter? Answer: Let someone else have the responsibility. The science club does science, not human like stuff.
- The dodge-deflect-apologize pattern is evident to me in rural Kentucky. How long will this adolescent tactic remain functional?
To sum up, the science club did something. What is fuzzy? Why is fuzzy? Keep folks guessing maybe? What will those bright sprouts in the high school science club do next? Put a cow on top of Big Ben?
Stephen E Arnold, January 27, 2021