Facebook Wants to Help Wikipedia with Factoid Accuracy
September 12, 2022
Yes, Facebook is an arbiter of truth.
Researchers have a love-hate relationship with Wikipedia. They love that it is a constantly updated, digital encyclopedia with quick search and reference tools, but hate its inaccuracies. SinguarlityHub discusses how Facebook wants to change Wikipedia’s unreliability: “Meta Is Building An AI To Fact-Check Wikipedia-All 6.5 Million Articles.”
Wikipedia’s editors wrote: “The online encyclopedia does not consider itself to be reliable as a source and discourages readers from using it in academic or research settings.” Most basic information on Wikipedia is true, but it is good to double-check information, but most people do not do that. Facebook armed with its new Meta facade is working on an AI to verify all of Wikipedia’s information.
The AI would fact-check the information in the articles, but it works differently than expected:
“Meta’s model will “understand” content not by comparing text strings and making sure they contain the same words, but by comparing mathematical representations of blocks of text, which it arrives at using natural language understanding (NLU) techniques. What we have done is to build an index of all these web pages by chunking them into passages and providing an accurate representation for each passage,’ Fabio Petroni, Meta’s Fundamental AI Research tech lead manager, told Digital Trends. ‘That is not representing word-by-word the passage, but the meaning of the passage. That means that two chunks of text with similar meanings will be represented in a very close position in the resulting n-dimensional space where all these passages are stored.’”
Thankfully the AI’s learning dataset of four million Wikipedia citations is cleaner and better than what other AI have learned from in the past. The dataset is also constantly being updated. The developers are also teaching the AI how to distinguish a reliable source from a bad one, i.e. a scientific paper vs. a conspiracy theory article.
The Meta team said no one has used AI to verify Wikipedia’s information before. It is great that Facebook is doing the world a favor by fact-checking Wikipedia, but what will Facebook correct in the Facebook Wikipedia information?
Whitney Grace, September 12, 2022
Microsoft: Explaining Its Cloud Policies and Revealing Its Thought Processes
September 12, 2022
After I graduated from a so so university, some other academic entity paid me money to work on a PhD. As part of the deal, I had to teach one class in freshman composition. The students were working like pious beavers to become nuns, priests, and I suppose capable professionals in a religious bookstore or some similar line of work.
I read some wild and crazy essays: Truth: The Path to Salvation, Faith: The Rock in the Thunderstorm of Life, etc etc. I was transported back to my small apartment behind a big estate type house and correcting the type of errors Grammarly eliminates. No computers in 1967 that would fit in my roomy 700 square feet.
The essay which caught my attention is — in modern lingo — a blog post. Its title lacks the metaphorical impact of those freshman essays but the content is quite remarkable.
First, the title: “New Licensing Benefits Make Bringing Workloads and Licenses to Partners’ Clouds Easier.” The main idea is that Microsoft wants to demonstrate that it is not really a quasi-monopoly. Nope, it learned its lesson when Mr. Gates’ testimony successfully thwarted the US government decades ago. Who knew he was a gifted rhetorician or a word-meister capable of The Road Ahead?
The blog title is interesting because it talks about benefits. The idea that Microsoft wants to make life easier. You know. Just like the Windows 11 changes for the corporations who deploy the operating systems to one or two employees. No big deal. Just add annoyances and kill printing. But the payoffs addressed in the blog “essay” requires some linguistic calisthenics.
Here’s a sampling:
CSP or cloud solution provider
Easier
Ecosystem
Empower
Ensure
Excited
Exciting
Flexible virtualization
Hosted
Joint success
Outsourcers
QMTH or qualified multitenant hosting
QOS or qualifying operating system
SPLA or service provider licensing agreements
Scenarios
Virtual core
Workloads
What does the word choice suggest? To me, I am suspicious. How can a giant corporation with a stellar track record of delivering software which often does not work care so much about a provider. What is a provider. A good shepherd, a rock in a storm, a beacon to salvation?
Third, I noted a fascinating but very tiny asterisk in the section title “More Flexibility and Options for Software Outsourcing.” The asterisk points to the foot of the blog essay. Listed at that point are the companies not allowed to get paid to let customers put Microsoft software on these alien, and apparently inappropriate computer systems. You want multi cloud? You want freedom to run the software for which you pay where to want to run it? Ho ho ho. Not unless a regulator shows some moxie.
Who are the dark and threatening cloudies? Here’s the list with the tiny asterisk:
- Alibaba
- Amazon Web Services
- Microsoft.
See, Microsoft puts Microsoft on its own list. How can a giant company be more fair? Impossible to out do this path to salvation.
Fourth, information which strikes me as important appears toward the end of the blog post; to wit:
At its inception, SPLA was intended to allow partners to offer hosted services from their own datacenters, not for managed service providers buying through SPLA to host on others’ datacenters. We are making changes to the SPLA program, starting in October 2022, to better align with the program’s intent, and with other commercial licensing programs.
Observations:
- Microsoft is scrambling to be on the side of its partners and customers but, to me, mostly the customers
- The European Union is likely to be confused by the language of the blog post but will muddle through and continue the crackdown on the US technology companies and their business practices
- The Microsoft partners need to generate revenue with Microsoft generating leads, engineering service opportunities, and positioning that maximizes the benefit of many happy Windows, Word, and Teams users.
Net net: Not an F, but I would score the write up as a C minus or D plus. The split infinitive in the blog post was bad. But the tiny asterisk and red lining estimable companies like Alibaba, Amazon, and Google. Clumsy clumsy.
Stephen E Arnold, September 12, 2022
Google News Provides Access to Bombshell about Google
September 9, 2022
I thought that the Google had a news deal of some type with the GOOG and its news service. If you are not familiar with Google News (the ad free thing for many years) is available at this link. Google News included a story called “Google Pays ‘Enormous’ Sums to Maintain Search-Engine Dominance, DOJ Says.” Now this is not news here in Harrod’s Creek. Isn’t a modest payment provided to the people’s friend Apple to provide search results? Maybe? Maybe not?
What I find interesting is that locating the story on Google News required using the string “Google Search Engine Dominance.” [Note: This may require a payment to read unless one views the story via Google News. Maybe Google and Bloomberg have a special operation underway? Gee, I don’t know.] Other queries were less helpful. Interesting? Nah. Just the black box of Google News search working its magic. (Maybe that’s why Google Dorks are so darned popular among certain analysts and research-minded individuals. The information is in Google, but it can require a few cartwheels to locate in my experience.)
What was the main point of this Bloomberg story. (When I think of Bloomberg, I do associate the company with the chips on motherboards which phone home. Was this story accurate, true, grounded in verifiable data, or a confection like some social media mavens output? Again I don’t know. As I get older, I realize I don’t know much, if anything.)
The Bloomberg Google story on Google News says:
Alphabet Inc.’s Google pays billions of dollars each year to Apple Inc., Samsung Electronics Co. and other telecom giants to illegally maintain its spot as the No. 1 search engine, the US Justice Department told a federal judge Thursday [September 8, 2022].
News flash. This is not news. What is mildly interesting is that the US government after decades of finding joy in Google mouse pads, T shirts, and other tchotchkes is sort of investigating. (Why was it so darned difficult to get French income tax forms to come up in Google search results? Were those cranky folks at Foundem blowing smoke? You know the answer: I don’t know.
The write up continues:
“Google invests billions in defaults, knowing people won’t change them,” Dintzer told Judge Amit Mehta during a hearing in Washington that marked the first major face-off in the case and drew top DOJ antitrust officials and Nebraska’s attorney general among the spectators. “They are buying default exclusivity because defaults matter a lot.” Google’s contracts form the basis of the DOJ’s landmark antitrust lawsuit, which alleges the company has sought to maintain its online search monopoly in violation of antitrust laws.
Okay, written contracts. That’s something sort of concrete I suppose.
In my opinion, the best line in the Google story on Google News from good and friendly Bloomberg is this one:
“Default exclusivity allows Google to systemically deny rivals’ data,” he said.
If true, does this mean that former Googler Eric Schmidt was off base when he said that fear of Qwant was keeping him awake at night?
Probably not. But Google seems to have been taking steps to reduce the probability of Qwant or any other search engine gaining traction somewhat seriously. Does Google know its search system is only useful when one masters the machinations of the Dorkers?
Again: I don’t know.
Stephen E Arnold, September 9, 2022
A Semantic Search Use Case: But What about General Business Content with Words and Charts?
September 9, 2022
I am okay with semantic search. The idea is that a relatively standard suite of mathematical procedures delivers “close enough for horse shoes” matches germane to a user’s query. Elastic is now combining key word with some semantic goodness. The idea is that mixing methods delivers more useful results. Is this an accurate statement?
The answer is, “It depends on the use cases.”
“How Semantic Search Improves Search Accuracy” explains a use case that is anchored in a technical corpus. Now I don’t want to get crossways with a group of search experts. I would submit that, in general, the vocabulary for scientific, medical, and technical information is more constrained. One does not expect to find “cheugy” or OG* in a write up about octonitrocubane.
In my limited experience, what happens is that a constrained corpus allows the developer of a finding system to use precise taxonomies, and some dinobabies may employ controlled vocabularies like those kicking around old-school commercial databases.
However, what happens when the finding system ingests a range of content objects from tweets, online news services, and TikTok-type content?
The write up says:
One particular advantage of semantic search is the resolution of ambiguous terminology and that all specific subtypes (“children”) of a technical term will be found without the need to mention them in the query explicitly.
Sounds good, particularly for scientific and technical content. What about those pesky charts and graphs? These are often useful, but many times are chock full of fudged data. What about the query, “Octonitrocubane invalid data”? I want to have the search system present links to content which may be in an article. Why? I want to make sure the alleged data set squares with my limited knowledge of statistical principles. Yeah, sorry.
The write up asserts:
A lexical search will deliver back all documents in which “pesticides” is mentioned as the text string “pesticides” plus variants thereof. A semantic search will, in addition to all documents containing the text string “pesticides”, also return documents that contain specific pesticides like bixafen, boscalid, or imazamox.
What about a chemical structure search? I want a document with structure information. Few words, just nifty structures just like the stuff inorganic and organic chemists inhale each day. Sorry about that.
Net net: Writing about search is tough when the specific corpus, the content objects, and the presence of controlled terms in addition to strings in a content object are not spelled out. Without this information, the assertions are a bit fluffy.
And the video thing? The DoD, NIST, and other outfits are making videos. Things that go boom are based on chemistry. Can semantic search find the videos and the results of tests?
Yeah, sure. The PowerPoint deck probably says so. Hands on search experience may not. Search-enabled applications may work better than plain old search jazzed up with close enough for horse shoes methods.
Stephen E Arnold, September 9, 2022
[* OG means original gangster]
Meta: Grade School Behavior?
September 9, 2022
Despite being the domain of Baby Boomers and conspiracy theorists, Facebook is still a powerful tech company. Facebook is not afraid to sell out its users despite proclamations of loving support, but Apple Insider discusses its hypocritical behavior: “Facebook Is fine When Punishing Others Financially, But Cries When Others Do It To Them.”
Zuckerberg and his company processes behaviors similar to an elementary school bully: it acts big and tough, but when it is confronted and injured Facebook runs away crying. Facebook is acting like the aforementioned bully, because Apple has affected its profits. Apple changed its privacy policies, thus preventing Facebook from harvesting dollars from user data.
Despite Facebook claiming it does not sell data, it does. Apple added the Apple App Tracking Transparency to mobile devices, so users can prevent third-party Web sites (i.e. Facebook) from sharing data. Facebook did not like that, so Zuckerberg threatened to take Apple to court, then changed his mind. He decided to ruin other companies’ bottom lines to save his own. Facebook released data that showed users are reading less, so the company will not pay publishers for news articles.
This has led large media companies to fire writers and switch over to video production:
“But Facebook had exaggerated its figures by between 150% and 900%. Facebook denies this, but it later settled a lawsuit brought by advertisers over the issue. Facebook paid out $40 million then, but some publishers who had pivoted to video simply could not move back and did not recover. While there are forces beyond Facebook that contributed to this, the University of North Carolina said that even before the coronavirus, 20 newspaper businesses were closing every month.”
Facebook has now turned to VR, but is riding on the backs of creators to drive funding for the platform. Facebook was critical of Apple’s 30% commission fee from its app store purchases, so when Apple discovered Facebook’s Meta fees they had to say something:
“’Now — Meta seeks to charge those same creators significantly more than any other platform,’[said Apple Senior Director of Corporate Communications Fred Sainz.] ‘[Meta’s] announcement lays bare Meta’s hypocrisy. It goes to show that while they seek to use Apple’s platform for free, they happily take from the creators and small businesses that use their own.’”
Would a neutral observer use the word “hypocritical” to describe some Meta actions? Sure, and may add the term “zucker squeeze.”
Whitney Grace, September 4, 2022
Is Digital Piracy Is Similar to the US Anti-Drug Campaign
September 9, 2022
From the 1980s-2000s. American kids were subjugated to the DARE. The DARE program was a federal drug prevention program that was supposed to educate kids about the dangers of drugs and alcohol. It failed miserably. Instead, kids were exposed to more knowledge about drugs and alcohol. The same thing happened with anti-piracy ads: “Why Piracy PSAs Often Fail Spectacularly” says The Hustle.
Ever since the Internet allowed people to pirate everything from music to movies to software, screens were flooded with anti-piracy PSAs. The anti-piracy ads compared digital theft to stealing a car, bike, etc. The PSAs did more harm than good, like DARE, but they are entertaining as eye-rolling memes. Why did they fail?
“Many don’t see it as theft. It’s called file sharing.
Messaging is too extreme. It’s reasonable to compare downloading a movie to stealing a DVD — not to grand theft auto.
They’re not relatable. People might be deterred by malware warnings, but an Indian PSA featuring Bollywood stars — who are worth up to 200k times the nation’s annual per capita income — failed to garner sympathy.
Declaring piracy a widespread issue implies everyone’s doing it. So, why not you?”
In the United States, pirates aka file sharers are not bothered by the idea of stealing a few bucks from Hollywood. Piracy is also a white-collar crime. While there are fines and stiff penalties, the risks are minor compared to hacking, identity theft, murder, sex trafficking, and the list goes on.
No one cares unless it allows law enforcement to issue a warrant to prevent worse crimes or the moguls lose a lot of money, then they get the talking political heads involved.
Digital piracy is not new and we can thank the 1990s for the legendary rap, “Don’t Copy That Floppy.”
Whitney Grace, September 9, 2022
Google: Adulting Becomes a Thing
September 8, 2022
My goodness, it has taken more than 20 years for the Backrub-inspired search and ad company to embrace adulting. This term takes a noun like adult and converts it to a verb. This English trick is one that thrills English as a Second Language students. What I am going to do is equate “adulting” with the management precepts of Peter Drucker. Now you see why figuring out what I am saying and not saying is so darned unusual.
First, however, we need some context. That estimable source of real news (Fox) published this story: “Google CEO Sundar Pichai Looking to Improve Tech Giant’s Efficiency.” The Big Dog of the Google is participating in explainers to the tech worshipers that the time is now for adulting. The idea is that the Google is under pressure from several different hypercube vectors; for example:
- The lovable and enlightened Amazon with its newfound clicks from product search and a corresponding surge in product related advertising
- That affable crowd in Cupertino who are taking steps to make sure the walled garden does not allow Googzilla too much room in which to cause mischief
- Those with-it regulators and elected officials in governments near and far who don’t understand how making money on ads as the saloon swinging door with a charge to come in and leave works for the benefit of anyone except the Google
- Wizards who find themselves orthogonal to Google’s personnel postures. Yep, Dr. Timnit Gebru et al. “Disagree and Begone” could become a new Xoogler T shirt for diversity conference attendees
- Technical debt, which — despite Google’s mostly not talking about it — continues to incur some hefty costs. One can fire people but one cannot do much more than sell data center gear on eBay or Swappa
- High school management methods. I have explained this concept in previous posts so use the search box and read the explanation, please. The new idea is that the best high school science club members will not want to work at the Google. Yikes. Regressing toward the mean maybe?
What did the Big Dog say is the future of Google?
One big point is that the 20 percent frittered away on the dorm notion of one day a week of other stuff is over. Now Googlers have to work like a person on the Ford assembly line in 1937. Punch in, do stuff that matters, and punch out. No output, no pay. Simple. I remember reading that programmers write code about 30 minutes a day. What are these wizards going to do in the other 7.5 hours? Well, Foosball, table tennis, and volleyball may be difficult when the kid toys are removed. Google is a place for real work. What is that work? Well, Google doesn’t explain too much, but I assume it is quantifiable, good for humankind, fair, equitable, and unbiased just like Snorkel automated training data.
Another point is that the new Google sets priorities. I think priorities are useful. Why have a couple dozen messaging apps and smart software that displays ads totally unrelated to either the content of a YouTube video or to the interests of a Google customer who pays for Google services? I suppose Google has given up on solving death, which, as I understood the project, was a priority.
I also noted that Google is moving more slowly. My experience suggests that what went quickly was work blessed by the senior management. Some employees are left to their own devices to learn how Google works, snag a project, and produce something that makes money. In order to set priorities, one has to do the Drucker type work. Is that type of thinking in the Google incentive plan?
To sum up: Google is in danger of having to face life as an ageing sled dog or arthritic Googzilla. Maybe some of the “solve death” research can rejuvenate the behemoth before the snow piles up and Googzilla moves even more slowly.
Stephen E Arnold, September 8, 2022
Yacht Costs: One Trivial Omission
September 8, 2022
I read “Breaking Dow the Cost of an Oligarch’s Yacht.” Interesting stuff. I noted that the rule of thumb calculation for operating a super yacht was included in the write up:
Keeping a yacht in operation costs around 10 percent of its original price.
There’s not much detail, but I think the 10 percent may be low, and it certainly does not include legal fees if the yacht sinks or gets caught playing footloose and fancy free with its transponder gizmo. Then the yacht has to be upgraded because trendy interiors look untrendy after a year or so in the bling world. Furthermore the 10 percent number does not include recovery in the event the yacht sinks or burns — accidentally, of course.
Now what about that trivial omission?
I noted that the cited article does not include the bound phrase “money laundering.” Gee, I wonder why.
Stephen E Arnold, September 8, 2022
Tweet Terror in Some Geographic Areas
September 8, 2022
While western countries are chided for controversial engagement with LGBTQ groups, they cannot compare the staunch hatred they face in the Middle East. The Middle East is dominated by fundamentalist Islamic governments that criminalize homosexuality and transgender people. Unfortunately, these groups experienced a new wave of hatred Euro News reported in, “Arabic Anti-LGBTQ Campaign Goes On Twitter.”
The anti-LGBTQ campaign is called Fetrah, meaning “human instinct” in Arabic. Three Egyptian marketing professionals experienced in social media campaigns designed Fetrah. Fetrah promotes only two genders, rejects homosexuality, and supporters show a blue and pink flag.
Meta deleted the Fetrah page, but supporters managed to get a different page up on Facebook as well as on Instagram. Unlike other social media platforms, Twitter does not ban hate groups like Fetrah:
“Mahsa Alimardani, a digital rights expert told the Cube that Twitter and other social media platforms should be investing more resources into fighting this harmful campaign. ‘Too much censorship and policing can actually be a problem on some platforms but with Twitter we often find that the reverse is true, especially when it comes to harassment and harmful content targeting vulnerable communities’ said Alimardini. ‘We can see here a prime example of how queer communities in the Middle Eastern and North African regions can be harmed by Twitter’s inaction. The platform has very high threshold when it comes to policing content, which can be harmful,’ she added.”
Western countries have their faults, but many people have a “live and let live” attitude when it comes to LGBTQ people. People in the Middle East are not that different, but hatred is unfortunately promoted by religious governments.
Whitney Grace, September 8, 2022
Forget Cyber Fraud. Fungible Fraud Is Missed Too
September 8, 2022
Though cybercrime continues to grab headlines, it seems the old-fashioned kind is still a thing. Eight months after a Canadian heist was executed, reveals Smithsonian Magazine, “Hotel Discovers Its Famous Churchill Portrait Was Swapped with a Fake.” Readers may recognize the Roaring Lion as the much-reproduced photograph taken by Yousuf Karsh in 1941. The iconic image even made it onto England’s five-pound note in 2016. One of Karsh’s original, signed prints was proudly displayed at Ottawa’s Fairmont Château Laurier until some grinch replaced it with a forgery around last year’s winter holidays. It was the frame that, eventually, gave the imposter away. Reporter Ella Feldman writes:
“On the night of August 19, an employee at the hotel, the Fairmont Château Laurier, noticed that the frame containing their prized print did not match the other frames on the wall. The hotel called Jerry Fielder, director of Karsh’s estate, who requested a photo of the signature. ‘I’ve seen that signature for 43 years. So it took me just one second to know that someone had tried to copy it,’ Fielder tells the Guardian’s Leyland Cecco. ‘It was a fake.’ Hotel officials say that the photograph was stolen about eight months ago. Genevieve Dumas, the hotel’s general manager, tells CTV News that based on images submitted by the public, they’ve narrowed down the date of the heist to somewhere between December 25, 2021 and January 6, 2022. The hotel is asking anyone who has images of the photograph taken around that time to send them in.”
To put the loss in economic perspective, another signed original print of the portrait sold at auction for $62,500 in 2020. But it is about much more than money for the venerable hotel, which had close ties with Karsh. The photographer held his first exhibition there in 1936 and, in 1972, moved his photography studio to the site. Eight years later the Fairmont became home to Karsh and his wife Estrellita, who gifted the original print to the hotel after her husband’s death in 2002. An investigation into the theft is under way. We hope the eight-month trail has not grown too cold.
Cynthia Murrell, September 8, 2022