False News: Are Smart Bots the Answer?

November 7, 2019

To us, this comes as no surprise—Axios reports, “Machine Learning Can’t Flag False News, New Studies Show.” Writer Joe Uchill concisely summarizes some recent studies out of MIT that should quell any hope that machine learning will save us from fake news, at least any time soon. Though we have seen that AI can be great at generating readable articles from a few bits of info, mimicking human writers, and even detecting AI-generated stories, that does not mean they can tell the true from the false. These studies were performed by MIT doctoral student Tal Schuster and his team of researchers. Uchill writes:

“Many automated fact-checking systems are trained using a database of true statements called Fact Extraction and Verification (FEVER). In one study, Schuster and team showed that machine learning-taught fact-checking systems struggled to handle negative statements (‘Greg never said his car wasn’t blue’) even when they would know the positive statement was true (‘Greg says his car is blue’). The problem, say the researchers, is that the database is filled with human bias. The people who created FEVER tended to write their false entries as negative statements and their true statements as positive statements — so the computers learned to rate sentences with negative statements as false. That means the systems were solving a much easier problem than detecting fake news. ‘If you create for yourself an easy target, you can win at that target,’ said MIT professor Regina Barzilay. ‘But it still doesn’t bring you any closer to separating fake news from real news.’”

Indeed. Another of Schuster’s studies demonstrates that algorithms can usually detect text written by their kin. We’re reminded, however, that just because an article is machine written does not in itself mean it is false. In fact, he notes, text bots are now being used to adapt legit stories to different audiences or to generate articles from statistics. It looks like we will just have to keep verifying articles with multiple trusted sources before we believe them. Imagine that.

Cynthia Murrell, November 7, 2019

A New Private Company Directory Entering the Information Super Highway

November 1, 2019

DarkCyber spotted “Crunchbase Raises $30 Million to Go after Private Companies’ Data.” Business directories can be lucrative. Just track down and old school Dun & Bradstreet senior manager.

The approach taken by Crunchbase, which for a short period of time, was a Verizon property, consists of several parts:

  • Tracking information about private companies
  • Inclusion of information that will make the directory like LinkedIn, the Microsoft job hunting and social networking site
  • A modern-day service able to host corporate Web sites (maybe a 21st city Geocities?). The idea is to capture “partnership and careers pages.”

The write up describes Crunchbase as “one of the largest publicly accessible repositories of data about private companies.”

We learned:

Crunchbase partners with more than 4,000 data suppliers that provide it with valuable information on startup companies, such as annual revenue or burn rate.

Oracle provides a data marketplace and Amazon may be spinning up its streaming data marketplace. Will Crunchbase partner, compete, or sell to either of these companies?

Once in a while, DarkCyber looks up a company on Crunchbase. The experience is a “begging for dollars” journey. The useful information has been trimmed in order to get DarkCyber to sign up for hundreds of dollars to look up information about a private company easily findable elsewhere. A good source are Web sites of the outfits pumping cash into startups, tweets, and discussion groups.

Can the $30 million succeed where other directories have found themselves operated by trade associations or intelligence software equipped with a data base of open source information?

Worth watching. We know the investors have their eyes open as will Cengage, possibly the proud producers of Ward’s Business Directory of US Private and Public Companies.

Stephen E Arnold, November 1, 2019

Gender Bias in Old Books. Rewrite Them?

October 9, 2019

Here is an interesting use of machine learning. Salon tells us “What Reading 3.5 Million Books Tells Us About Gender Stereotypes.” Researchers led by University of Copenhagen’s Dr. Isabelle Augenstein analyzed 11 billion English words in literature published between 1900 and 2008. Not surprisingly, the results show that adjectives about appearance were most often applied to women (“beautiful” and “sexy” top the list), while men were more likely to be described by character traits (“righteous,” “rational,” and “brave” were most frequent). Writer Nicole Karlis describes how the team approached the analysis:

“Using machine learning, the researchers extracted adjectives and verbs connected to gender-specific nouns, like ‘daughter.’ Then the researchers analyzed whether the words had a positive, negative or neutral point of view. The analysis determined that negative verbs associated with appearance are used five times more for women than men. Likewise, positive and neutral adjectives relating to one’s body appearance occur twice as often in descriptions of women. The adjectives used to describe men in literature are more frequently ones that describe behavior and personal qualities.

“Researchers noted that, despite the fact that many of the analyzed books were published decades ago, they still play an active role in fomenting gender discrimination, particularly when it comes to machine learning sorting in a professional setting. ‘The algorithms work to identify patterns, and whenever one is observed, it is perceived that something is “true.” If any of these patterns refer to biased language, the result will also be biased,’ Augenstein said. ‘The systems adopt, so to speak, the language that we people use, and thus, our gender stereotypes and prejudices.’” Augenstein explained this can be problematic if, for example, machine learning is used to sift through employee recommendations for a promotion.”

Karlis does list some caveats to the study—it does not factor in who wrote the passages, what genre they were pulled from, or how much gender bias permeated society at the time. The research does affirm previous results, like the 2011 study that found 57% of central characters in children’s books are male.

Dr. Augenstein hopes her team’s analysis will raise awareness about the impact of gendered language and stereotypes on machine learning. If they choose, developers can train their algorithms on less biased materials or program them to either ignore or correct for biased language.

Cynthia Murrell, October 9, 2019

Thomson Reuters: Getting with the Conference Crowd

October 6, 2019

DarkCyber noted “Thomson Reuters acquires FC Business Intelligence.” FCBI, according the the firm’s Web site:

Founded round a kitchen table in 1990, originally with a focus on emerging markets, the company has grown organically in size and influence ever since.

We learned:

The business will be rebranded Reuters Events and will be operated as part of the Reuters News division of Thomson Reuters.

Thomson Reuters has not delivered hockey stick growth in the last three, five, eight years, has it?

Will conferences be the goose which puts golden eggs in the Thomson Reuters’ hen house?

What’s the motive force for a professional publishing outfit to get into conferences? DarkCyber hypothesizes that:

  • Getting more cash from traditional professional publishing markets is getting more difficult; for example, few law firms have clients willing to pay the commercial online fees from the “good old days”
  • Conferences, despite advances in technology, continue to give the Wall Street Journal and other organizations opportunities to meet and greet, generate revenue from booth rentals, and a way to hop on hot topics
  • Respond to the painful fact that it is easier to make one’s own news instead of paying to just report the news, particularly if it comes from a high profile conference.

Will Thomson Reuters slice and dice the content outputs in as many ways as possible? Possibly.

Worth watching as Lord Thomson of Fleet probably is from his eye in the sky.

Stephen E Arnold, October 6, 2019

An AI Tool to Identify AI-Written Text

September 19, 2019

When distinguishing human writing from AI-generated text, the secret is in the predictability. MIT Technology Review reports, “A New Tool Uses AI to Spot Text Written by AI.” We have seen how AI can produce articles that seem to us humans as if they were written by one of us, opening a new dimension in the scourge of fake news. Now, researchers have produced a tool that uses AI technology to detect AI-generated text. Writer Will Knight tells us:

“Researchers from Harvard University and the MIT-IBM Watson AI Lab have developed a new tool for spotting text that has been generated using AI. Called the Giant Language Model Test Room (GLTR), it exploits the fact that AI text generators rely on statistical patterns in text, as opposed to the actual meaning of words and sentences. In other words, the tool can tell if the words you’re reading seem too predictable to have been written by a human hand. … GLTR highlights words that are statistically likely to appear after the preceding word in the text. As shown in the passage above (from Infinite Jest), the most predictable words are green; less predictable are yellow and red; and least predictable are purple. When tested on snippets of text written by OpenAI’s algorithm, it finds a lot of predictability. Genuine news articles and scientific abstracts contain more surprises.”

See the article for that colorfully highlighted sample. Researchers enlisted Harvard students to test GLTR’s results. Without the tool, students spotted just half the AI-crafted passages. Using the highlighted results, though, they identified 72% of them. Such collaboration between the tool and human interpreters is the key to warding off fake articles, one researcher states. The article concludes with a link to try out the tool for oneself.

Cynthia Murrell, September 19, 2019

Questionable Journals Fake Legitimacy

September 13, 2019

The problem of shoddy or fraudulent research being published as quality work continues to grow, and it is becoming harder to tell the good from the bad. Research Stash describes “How Fake Scientific Journals Are Bypassing Detection Filters.” In recent years, regulators and the media have insisted scientific journals follow certain standards. Instead of complying, however, some of these “predatory” journals have made changes that just make them look like they have mended their ways. The write-up cites a study out of the Gandhinagar Institute of Technology in India performed by Naman Jain, a student of Professor Mayank Singh. Writer Dinesh C Sharma reports:

“The researchers took a set of journals published by Omics, which has been accused of publishing predatory journals, with those published by BMC Publishing Group. Both publish hundreds of open access journals across several disciplines. Using data-driven analysis, researchers compared parameters like impact factors, journal name, indexing in digital directories, contact information, submission process, editorial boards, gender, and geographical data, editor-author commonality, etc. Analysis of this data and comparison between the two publishers showed that Omics is slowly evolving. Of the 35 criteria listed in the Beall’s list and which could be verified using the information available online, 22 criteria are common between Omics and BMC. Five criteria are satisfied by both the publishers, while 13 are satisfied by Omics but not by BMC. The predatory publishers are changing some of their processes. For example, Omics has started its online submission portal similar to well-known publishers. Earlier, it used to accept manuscripts through email. Omics dodges most of the Beall’s criteria to emerge as a reputed publisher.”

Jain suggests we update the criteria for identifying quality research and use more data analytics to identify false and misleading articles. He offers his findings as a starting point, and we are told he plans to present his research at a conference in November.

Cynthia Murrell, September 13, 2019

Research Suggests Better Way to Foil Hate Groups

September 9, 2019

It is no secret that internet search and social media companies have a tough time containing the spread of hate groups across their platforms. Now a study from George Washington University and the University of Miami posits why. Inverse reports, “‘Global Hate Highways’ Reveal How Online Hate Clusters Multiply and Thrive.” This is my favorite quote from the article—“In it, [researchers] observe that hate spreads online like a diseased flea, jumping from one body to the next.”

The study tracked certain hate “clusters” across international borders and through different languages as they hopped from one platform to another. Current strategies for limiting the spread of such groups include the “microscopic approach” of banning individual users and the “macroscopic approach” that bans whole ideologies. Not only does the latter approach often run afoul of free speech protections, as the article points out, it is also error-prone—algorithms have trouble distinguishing conversations about hate speech from those that are hate speech (especially where parody is used.) Besides, neither of these approaches have proven very effective. The study suggests another way; reporter Sarah Sloat writes:

“The mathematical mapping model used here showed that both these policing techniques can actually make matters worse. That’s because hate clusters thrive globally not on a micro or macro scale but in meso scale — this means clusters interconnect to form networks across platforms, countries, and languages and are quickly able to regroup or reshape after a single user is banned or after a group is banned from a single platform. They self-organize around a common interest and come together to remove trolls, bots, and adverse opinions. …

“A better way to curb the spread of hate, the researchers posit, would involve randomly banning a small fraction of individuals across platforms, which is more likely to cause global clusters to disconnect. They also advise platforms to send in groups of anti-hate advocates to bombard hate-filled spaces together with individual users to influence others to question their stance.

“The goal is to prevent hate-filled online pits that radicalize individuals like the Christchurch shooter, an Australian who attacked in New Zealand, covered his guns with the names of other violent white supremacists and citations of ancient European victories, and posted a 74-page racist manifesto on the website 8chan.”

The researchers’ approach does not require any data on individuals, nor does it rely on banning ideas wholesale. Instead, it is all about weakening the connections that keep online hate groups going. Can their concept help society dissipate hate?

Cynthia Murrell, September 9, 2019

Thinking about Real News

September 7, 2019

Now that AI has gotten reasonably good at generating fake news, we have a study that emphasizes how dangerous such false articles can be. The Association for Psychological Science reports, “Fake News Can Lead to False Memories.” While the study, from the University College Cork, was performed on Irish citizens ahead of a vote on an abortion referendum, its results can easily apply to voters in any emotional or partisan contest. Like, say, next year’s U.S. presidential election.

Researchers recruited 3,140 likely voters and had them read six articles relevant to the referendum, two of which were accounts of scandalous behavior that never actually happened. We learn:

“After reading each story, participants were asked if they had heard about the event depicted in the story previously; if so, they reported whether they had specific memories about it. The researchers then informed the eligible voters that some of the stories they read had been fabricated, and invited the participants to identify any of the reports they believed to be fake. Finally, the participants completed a cognitive test. Nearly half of the respondents reported a memory for at least one of the made-up events; many of them recalled rich details about a fabricated news story. The individuals in favor of legalizing abortion were more likely to remember a falsehood about the referendum opponents; those against legalization were more likely to remember a falsehood about the proponents. Many participants failed to reconsider their memory even after learning that some of the information could be fictitious. And several participants recounted details that the false news reports did not include.

We note:

“‘This demonstrates the ease with which we can plant these entirely fabricated memories, despite this voter suspicion and even despite an explicit warning that they may have been shown fake news,’ [lead author Gillian] Murphy says.”

Indeed it does. Even those who scored high on the cognitive test were susceptible to false memories, though those who scored lower were more likely to recall stories that supported their own opinions. At least the more intelligent among us seem better able to question their own biases. Alas, not only the intelligent vote.

In addition to fake articles that can now be generated quickly and easily with the help of AI, we are increasingly subjected to convincing fake photos and videos, too. Let us hope the majority of the population learns to take such evidence with a grain of salt, and quickly. Always consider the source.

Cynthia Murrell, September 9, 2019

Incognito Mode Update Hinders Publisher Paywalls

September 3, 2019

Google’s effort to bolster the privacies of Chrome’s Incognito Mode does not sit well with one writer at BetaNews. Randall C. Kennedy insists, “Google Declares War on Private Property.” The headline seems to conflate the term “private” with “proprietary,” but never mind. The point is the fix makes it easier for dishonest readers to avoid paywalls, and that is a cause for concern. The write-up explains:

“Google has announced that it is closing a loophole that allowed website operators to detect whether someone was viewing their content under the browser’s Incognito Mode. This detection had become an important part of enforcing paywall restrictions since even tech-unsavvy visitors had learned to bypass the free per-month trial article counts at sites like nytimes.com by visiting them with Incognito Mode active (and thus disabling the sites’ ability to track how many free articles the user read via a cookie.) The content publishing community’s response to this blatant theft of property has been to simply block users from visiting their sites under Incognito Mode. And the way they detect if the mode is active is by monitoring the Chrome FileSystem API and looking for signs of private browsing. Now, with version 76, Google has closed this API ‘loophole’ and is promising to continue thwarting any future workarounds that seek to identify Incognito Mode browsing activity.”

Google says the change is to protect those who would circumvent censorship in repressive nations. However, in doing so, it thwarts publishers who desperately need, and deserve, to get paid for their work. Kennedy suspects Google’s real motivation is its own profits—if content creators cannot enforce paywalls, he reasons, their only recourse will be to display Google’s ads alongside their content. Perhaps.

Cynthia Murrell, September 3, 2019

Elsevier: Exemplary Customer Service

August 26, 2019

Academic publishers’ journals are expensive and are notoriously protective of their content. Elsevier is the country’s largest academic publisher as well as the biggest paywall perpetrator. California is big on low cost, effective education, particularly the University of California.

The University of California and Elsevier have butted heads over access for months, but in July 2019 Elsevier pulled the plug on recent research. The Los Angeles Times explains the details in the article, “In Act Of Brinkmanship, A Big Publisher Cuts Off UC’s Access To Its Academic Journals.”

Elsevier’s contract with UC expired in 2018. UC is willing to renegotiate a contract with Elsevier, but UC wants the new contract to include an open access clause, meaning all work produced on its campuses will be free to the public.

Academic publishers usually print scholarly material for free, but require expensive subscription fees to access content. UC wants to change the system to where researchers pay to have their papers published, but not for subscriptions. UC creates 10% of all published research in the US and is the largest producer of academic content in favor of open access.

Elsevier and other academic publishers are profit gluttons, while hiding behind pay walls. UC wants to continue its relationship with Elsevier, but the former agreement would raise subscription and access costs to exorbitant amounts. The University of California found its contract with Elsevier to be cost prohibitive, so they took a stand and demanded open access for UC research.

“UC isn’t the only institution to stage a frontal assault on this model. Open access has been spreading in academia and in scholarly publishing; academic consortiums in Germany and Sweden also have demanded read-and-publish deals with Elsevier, which cut them off after they failed to reach deals last year. Those researchers are still cut off, according to Gemma Hersh, Elsevier’s vice president for global policy. Smaller deals have been made in recent months with research institutions in Norway and Hungary.

We noted this statement:

….Under the circumstances, it looks like Elsevier may have picked a fight with the wrong adversary. While the open-access movement is growing, ‘the reality is that the majority of the world’s articles are still published under the subscription model, and there is a cost associated with reading those articles,’ Hersh says.”

The academic publishing paywall seems to be under siege. There is pressure to reduce costs in higher education and many professors and professional staff are demanding open access.

Elsevier may be perceived as mishandling its customers.

Whitney Grace, August 26, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta