Newton and Shoulders of Giants? Baloney. Is It Everyday Theft?
January 31, 2023
Here I am in rural Kentucky. I have been thinking about the failure of education. I recall learning from Ms. Blackburn, my high school algebra teacher, this statement by Sir Isaac Newton, the apple and calculus guy:
If I have seen further, it is by standing on the shoulders of giants.
Did Sir Isaac actually say this? I don’t know, and I don’t care too much. It is the gist of the sentence that matters. Why? I just finished reading — and this is the actual article title — “CNET’s AI Journalist Appears to Have Committed Extensive Plagiarism. CNET’s AI-Written Articles Aren’t Just Riddled with Errors. They Also Appear to Be Substantially Plagiarized.”
How is any self-respecting, super buzzy smart software supposed to know anything without ingesting, indexing, vectorizing, and any other math magic the developers have baked into the system? Did Brunelleschi wake up one day and do the Eureka! thing? Maybe he stood on line and entered the Pantheon and looked up? Maybe he found a wasp’s nest and cut it in half and looked at what the feisty insects did to build a home? Obviously intellectual theft. Just because the dome still stands, when it falls, he is an untrustworthy architect engineer. Argument nailed.
The write up focuses on other ideas; namely, being incorrect and stealing content. Okay, those are interesting and possibly valid points. The write up states:
All told, a pattern quickly emerges. Essentially, CNET‘s AI seems to approach a topic by examining similar articles that have already been published and ripping sentences out of them. As it goes, it makes adjustments — sometimes minor, sometimes major — to the original sentence’s syntax, word choice, and structure. Sometimes it mashes two sentences together, or breaks one apart, or assembles chunks into new Frankensentences. Then it seems to repeat the process until it’s cooked up an entire article.
For a short (very, very brief) time I taught freshman English at a big time university. What the Futurism article describes is how I interpreted the work process of my students. Those entitled and enquiring minds just wanted to crank out an essay that would meet my requirements and hopefully get an A or a 10, which was a signal that Bryce or Helen was a very good student. Then go to a local hang out and talk about Heidegger? Nope, mostly about the opposite sex, music, and getting their hands on a copy of Dr. Oehling’s test from last semester for European History 104. Substitute the topics you talked about to make my statement more “accurate”, please.
I loved the final paragraphs of the Futurism article. Not only is a competitor tossed over the argument’s wall, but the Google and its outstanding relevance finds itself a target. Imagine. Google. Criticized. The article’s final statements are interesting; to wit:
As The Verge reported in a fascinating deep dive last week, the company’s primary strategy is to post massive quantities of content, carefully engineered to rank highly in Google, and loaded with lucrative affiliate links. For Red Ventures, The Verge found, those priorities have transformed the once-venerable CNET into an “AI-powered SEO money machine.” That might work well for Red Ventures’ bottom line, but the specter of that model oozing outward into the rest of the publishing industry should probably alarm anybody concerned with quality journalism or — especially if you’re a CNET reader these days — trustworthy information.
Do you like the word trustworthy? I do. Does Sir Isaac fit into this future-leaning analysis. Nope, he’s still pre-occupied with proving that the evil Gottfried Wilhelm Leibniz was tipped off about tiny rectangles and the methods thereof. Perhaps Futurism can blame smart software?
Stephen E Arnold, January 31, 2023
Smart Software: Can Humans Keep Pace with Emergent Behavior ?
November 29, 2022
For the last six months, I have been poking around the idea that certain behaviors are emergent; that is, give humans a capability or a dataspace, and those humans will develop novel features and functions. The examples we have been exploring are related to methods used by bad actors to avoid take downs by law enforcement. The emergent behaviors we have noted exploit domain name registry mechanisms and clever software able to obfuscate traffic from Tor exit nodes. The result of the online dataspace is unanticipated emergent behaviors. The idea is that bad actors come up with something novel using the Internet’s furniture.
We noted “137 Emergent Abilities of Large Language Models.” If our understanding of this report is mostly accurate, large language models like those used by Google and other firms manifest emergent behavior. What’s interesting is that the write up explains that there is not one type of emergent behavior. The article ideas a Rivian truck bed full of emergent behaviors.
Here’s are the behaviors associated with big data sets and LaMDA 137B. (The method is a family of Transformer-based neural language models specialized for dialog. Correctly or incorrectly we associate LaMBA with Google’s smart software work. See this Google blog post.) Now here are the items mentioned in the Emergent Abilities paper:
Gender inclusive sentences German
Irony identification
Logical arguments
Repeat copy logic
Sports understanding
Swahili English proverbs
Word sorting
Word unscrambling
Another category of emergent behavior is what the paper calls “Emergent prompting strategies.” The idea is more general prompting strategies manifest themselves. The system can perform certain functions that cannot be implemented when using “small” data sets; for example, solving multi step math problems in less widely used languages.
The paper includes links so the different types of emergent behavior can be explored. The paper wraps up with questions researchers may want to consider. One question we found suggestive was:
What tasks are language models currently not able to to perform, that we should evaluate on future language models of better quality?
The notion of emergent behavior is important for two reasons: [a] Systems can manifest capabilities or possible behaviors not anticipated by developers and [b] Novel capabilities may create additional unforeseen capabilities or actions.
If one thinks about emergent behaviors in any smart, big data system, humans may struggle to understand, keep up, and manage downstream consequences in one or more dataspaces.
Stephen E Arnold, November 29, 2022
Objectivity in ALGOs: ALL GONE?
April 21, 2022
Objective algorithms? Nope. Four examples.
- Navigate to “How Anitta Megafans Gamed Spotify to Help Create Brazil’s First Global Chart-Topper.” The write up explains how the Spotify algos were manipulated.
- Check out the Washington Post story (paywall, gentle reader) “Internet Algospeak Is Changing Our Language in Real Time from Nips Nops to le Dollar Bean.” Change the words; fool the objective and too-smart algorithm.
- Now navigate to your favorite day trading discussion group Wall Street Bets. You can find this loose confederation at this link. You may spot some interesting humor. A few “tips” signal attempts to take advantage of a number of investing characteristics.
- Finally, pick your favorite search engine and enter the phrase search engine optimization. Scan the results.
Each of these examples signal the industrious folks who want to find, believe they have discovered, or have found ways to fiddle with objective algorithms.
Envision a world in which algorithms do more and more light and heavy lifting. Who are the winners? My thought is that it will be the clever people and their software agents. Am I looking forward to be an algo loser?
Sure, sounds like heaven. Oh, what about individuals who cannot manipulate algorithms or hire the people who can? A T shirt which says “Objectivity is ALGOne.” There’s a hoodie available. It says, “Manipulate me.”
Stephen E Arnold, April 18, 2022
DarkCyber for January 18, 2022 Now Available : An Interview with Dr. Donna M. Ingram
January 18, 2022
The fourth series of DarkCyber videos kicks off with an interview. You can view the program on YouTube at this link. Dr. Donna M. Ingram is the author of a new book titled “Help Me Learn Statistics.” The book is available on the Apple ebook store and features interactive solutions to the problems used to reinforce important concepts explained in the text. In the interview, Dr. Ingram talks about sampling, synthetic data, and a method to reduce the errors which can creep into certain analyses. Dr. Ingram’s clients include financial institutions, manufacturing companies, legal subrogration customers, and specialized software companies.
Kenny Toth, January 18, 2022
Credder: A New Fake News Finder
November 5, 2021
Fake news is a pandemic that spreads as fast as COVID-19 and wreaks as much havoc. While scientists created a cure for the virus, it is more difficult to cure misinformation. Make Use Of says there is a new tool to detect fake news: “How To Spot Fake News With This Handy Tool.”
Credder is a online platform designed by Chris Palmieri. He is a professional restaurateur and decided to build Credder after seeing potential in review sites like Yelp. Unlike Yelp and other review sites, Credder does not rate physical locations or items. The platform does not host any news articles. It crawls publications for the latest news and allows users and verified journalists to rate articles.
Credder is designed to fight clickbait and ensure information accuracy. Ratings are posted below each news brief. Verified journalists comment about their ratings and users can submit new pieces to rate.
Credder spots fake news in the following ways:
“Search for relevant articles on Credder. Besides each article, you will see the Public Rating and the User Critic Rating.
• The higher the rating, the more reliable the source.
• You can click on the article, and you’ll be taken to the parent website.
• There’s also a handy search tool that you can use to find articles or authors via keywords.
• Users can also rate individual authors and outlets. In turn, each user is assigned a rating from Credder as well. This is designed to ensure the quality of ratings across the platform.”
Credder relies on crowdsourcing and honesty to rate articles. There is not a system in place to verify journalist credentials and bias happens when users give their favorite authors and sources high scores. Credder, however, is transparent similar to the Web of Trust.
Fake news is a rash that will not go away, but it can be stopped. A little common sense and information literacy goes a long way in combatting fake news. Credder should start making PSAs for YouTube, Hulu, and cable TV.
Whitney Grace, November 5, 2021
Key Words: Useful Things
October 7, 2021
In the middle of nowhere in the American southwest, lunch time conversation turned to surveillance. I mentioned a couple of characteristics of modern smartphones, butjec people put down their sandwiches. I changed the subject. Later, when a wispy LTE signal permitted, I read “Google Is Giving Data to Police Based on Search Keywords, Court Docs Show.” This is an example of information which I don’t think should be made public.
The write up states:
Court documents showed that Google provided the IP addresses of people who searched for the arson victim’s address, which investigators tied to a phone number belonging to Williams. Police then used the phone number records to pinpoint the location of Williams’ device near the arson, according to court documents.
I want to point out that any string could contain actionable information; to wit:
- The name or abbreviation of a chemical substance
- An address of an entity
- A slang term for a controlled substance
- A specific geographic area or a latitude and longitude designation on a Google map.
With data federation and cross correlation, some specialized software systems can knit together disparate items of information in a useful manner.
The data and the analytic tools are essential for some government activities. Careless release of such sensitive information has unanticipated downstream consequences. Old fashioned secrecy has some upsides in my opinion.
Stephen E Arnold, October 7, 2021
Gender Biased AI in Job Searches Now a Thing
June 30, 2021
From initial search to applications to interviews, job hunters are now steered through the process by algorithms. Employers’ demand for AI solutions has surged with the pandemic, but there is a problem—the approach tends to disadvantage women applicants. An article at MIT Technology Review describes one website’s true bro response: “LinkedIn’s Job-Matching AI Was Biased. The Company’s Solution? More AI.” Reporters Sheridan Wall and Hilke Schellmann also cover the responses of competing job search sites Monster, CareerBuilder, and ZipRecruiter. Citing former LinkedIn VP John Jerson, they write:
“These systems base their recommendations on three categories of data: information the user provides directly to the platform; data assigned to the user based on others with similar skill sets, experiences, and interests; and behavioral data, like how often a user responds to messages or interacts with job postings. In LinkedIn’s case, these algorithms exclude a person’s name, age, gender, and race, because including these characteristics can contribute to bias in automated processes. But Jersin’s team found that even so, the service’s algorithms could still detect behavioral patterns exhibited by groups with particular gender identities. For example, while men are more likely to apply for jobs that require work experience beyond their qualifications, women tend to only go for jobs in which their qualifications match the position’s requirements. The algorithm interprets this variation in behavior and adjusts its recommendations in a way that inadvertently disadvantages women. … Men also include more skills on their résumés at a lower degree of proficiency than women, and they often engage more aggressively with recruiters on the platform.”
Rather than, say, inject human judgment into the process, LinkedIn added new AI in 2018 designed to correct for the first algorithm’s bias. Other companies side-step the AI issue. CareerBuilder addresses bias by teaching employers how to eliminate it from their job postings, while Monster relies on attracting users from diverse backgrounds. ZipRecruiter’s CEO says that site classifies job hunters using 64 types of information, including geographical data but not identifying pieces like names. He refused to share more details, but is confident his team’s method is as bias-free as can be. Perhaps—but the claims of any of these sites are difficult or impossible to verify.
Cynthia Murrell, June 30, 2021
SPACtacular Palantir Tech Gets More Attention: This Is Good?
June 30, 2021
Palantir is working to expand its public-private partnership operations beyond security into the healthcare field. Some say the company has fallen short in its efforts to peddle security software to officials in Europe, so the data-rich field of government-managed healthcare is the next logical step. Apparently the pandemic gave Palantir the opening it was looking for, paving the way for a deal it made with the UK’s National Health Service to develop the NHS COVID-19 Data Store. Now however, CNBC reports, “Campaign Launches to Try to Force Palantir Out of Britain’s NHS.” Reporter Sam L. Shead states that more than 50 organizations, led by tech-justice nonprofit Foxglove, are protesting Palantir’s involvement. We learn:
“The Covid-19 Data Store project, which involves Palantir’s Foundry data management platform, began in March 2020 alongside other tech giants as the government tried to slow the spread of the virus across the U.K. It was sold as a short-term effort to predict how best to deploy resources to deal with the pandemic. The contract was quietly extended in December, when the NHS and Palantir signed a £23 million ($34 million) two-year deal that allows the company to continue its work until December 2022. The NHS was sued by political website openDemocracy in February over the contract extension. ‘December’s new, two-year contract reaches far beyond Covid: to Brexit, general business planning and much more,’ the group said. The NHS contract allows Palantir to help manage the data lake, which contains everybody’s health data for pandemic purposes. ‘The reality is, sad to say, all this whiz-bang data integration didn’t stop the United Kingdom having one of the worst death tolls in the Western world,’ said [Foxglove co-founder Cori] Crider. ‘This kind of techno solutionism is not necessarily the best way of making an NHS sustainable for the long haul.’”
Not surprisingly, privacy is the advocacy groups’ main concern. The very personal data used in the project is not being truly anonymized—instead it is being “pseudo-anonymized,” a reversible process where an alias is swapped out for identifying information. Both the NHS and Palantir assure citizens re-identification will only be performed if necessary, and that the company has no interest in the patient data itself. In fact, we are told, that data remains the property of NHS and Palantir can only do so much with it. Those protesting the project, though, understand that money can talk louder than corporate promises; many companies would offer much for the opportunity to monetize that data.
Cynthia Murrell, June 30, 2021
Translations: Gender Biased?
June 29, 2021
There are many sources of misinformation online, but here is one readers may not have considered—the spread of mistranslations, whether intentional or merely inept. The Taipei Times gives an example in, “Translation a Cover for False News.” We learn that Japan kindly donated 1.24 million doses of a Covid vaccine to Taiwan. Apparently, though, not everyone in Taiwan received the news with good grace. Writer Hung Yu-jui explains:
“They try to find fault with the AstraZeneca vaccine, talk about the donation in a sarcastic tone, and express pessimism about the relationship between Taiwan and Japan. It is very sad to see. Surprisingly, a social media post has been widely shared, saying that when Japanese Minister of Foreign Affairs Toshimitsu Motegi was asked in the Japanese legislature why the country only donated 1.24 million vaccines, he allegedly answered that the Taiwanese government had only asked for enough doses to meet its needs for this month, as locally made vaccines would soon become available. Having watched the video of the session, anyone with a good understanding of Japanese would know that Motegi did not make such a statement. The post was completely fabricated — a piece of vicious and despicable fake news. … Earlier this month, Hsinchu County Deputy Commissioner Chen Chien-hsien (???), the director of the Chinese Nationalist Party’s (KMT) chapter in the county, shared the post of Motegi’s alleged vaccine statement on social media, adding some intentionally malicious comments.”
When confronted, Chen explained a friend who speaks Japanese had translated the speech for him, so he had been sure the translation was accurate. While it is good to trust one’s friends, this is a case where verification would have been in order. Unless, as Yu-jui seems to suspect, the wrong translation was intentional. Either way, this demonstrates yet another fake-news vector to watch out for.
Cynthia Murrell, June 29, 2021
On the Complexity of Ad Algorithms
June 4, 2021
It seems like advertising engineer Harikesh Nair has an innate love of intricacy—perhaps to the point of making things more complicated than they need to be. Tech Xplore interviews the algorithm aficionado in its post, “Expert Discusses the ‘Horrendously Complex Science’ Behind Online Advertising Research.” Once inspired by algorithmic innovations out of Google, Facebook, and Amazon, Nair has since settled his work in Asia, where the advertising scene is beautifully complicated by a couple of factors. First, its online marketplace is accessed almost entirely through mobile devices. Nair explains:
“With mobile apps, there’s limited real estate: Nobody browses beyond the first three pages. If you have 8 listings per page, that’s 24 products. So you have to surface 24 items out of 300 million possible product options, and you have to personalize the rankings by finding a good match between the user’s needs and what’s available. All of that has to be done really fast, within 200 microseconds [0.0002 seconds], and it has to be executed flawlessly. It’s a horrendously complex science problem.”
So there is that. We are told a strong social aspect to consumer behavior further complicates Asia’s marketing scene:
“Their society is more communal than ours, so consumption is more embedded in their social networks. Rather than go to a site and search for something and then click and buy and leave, which is typical American consumption behavior, people in China share products on each other’s social feeds, and by doing so they can often get a price reduction.”
Nair points out that, unlike Google, Facebook, and Amazon, which each play different roles, China’s Tencent and Alibaba each blend search, social media, and sales under one virtual roof. That is an interesting difference.
The data scientist goes on to wax poetic about how difficult it is to prove certain ads actually lead to certain purchases. The process involves elaborate, high-stakes experiments run in the background of ad auctions involving hundreds of thousands of advertisers. Then the results must be dumbed down for “people who don’t really know that much about statistics.” Mm hmm. Are we sure all this is not simply justification for confusing everyone and getting paid to do it?
Cynthia Murrell, June 4, 2021