Can an Algorithm Tame Misinformation Online?

June 23, 2017

UCLA researchers are working on an algorithmic solution to the “fake news” problem, we learn from the article, “Algorithm Reads Millions of Posts on Parenting Sites in Bid to Understand Online Misinformation” at TechRadar. Okay, it’s actually indexing and text analysis, not “reading,” but we get the idea. Reporter Duncan Geere tells us:

There’s a special logic to the flow of posts on a forum or message board, one that’s easy to parse by someone who’s spent a lot of time on them but kinda hard to understand for those who haven’t. Researchers at UCLA are working on teaching computers to understand these structured narratives within chronological posts on the web, in an attempt to get a better grasp of how humans think and communicate online.

Researchers used the hot topic of vaccinations, as discussed on two parenting forums, as their test case. Through an examination of nearly 2 million posts, the algorithm was able to come to accurate conclusions, or “narrative framework.” Geere writes:

While this study was targeted at conversations around vaccination, the researchers say the same principles could be applied to any topic. Down the line, they hope it could allow for false narratives to be identified as they develop and countered by targeted messaging.

The phrase “down the line” is incredibly vague, but the sooner the better, we say (though we wonder exactly what form this “targeted messaging” will take). The original study can be found here at eHealth publisher JMIR Publications.

Cynthia Murrell, June 23, 2017

 

Academic Publisher Retracts Record Number of Papers

June 20, 2017

To the scourge of fake news we add the problem of fake research. Retraction Watch announces “A New Record: Major Publisher Retracting More Than 100 Studies from Cancer Journal over Fake Peer Reviews.”  We learn that Springer Publishing Company has just retracted 107 papers from a single journal after discovering their peer reviews had been falsified. Faking the integrity of cancer research? That’s pretty low. The article specifies:

To submit a fake review, someone (often the author of a paper) either makes up an outside expert to review the paper, or suggests a real researcher — and in both cases, provides a fake email address that comes back to someone who will invariably give the paper a glowing review. In this case, Springer, the publisher of Tumor Biology through 2016, told us that an investigation produced “clear evidence” the reviews were submitted under the names of real researchers with faked emails. Some of the authors may have used a third-party editing service, which may have supplied the reviews. The journal is now published by SAGE. The retractions follow another sweep by the publisher last year, when Tumor Biology retracted 25 papers for compromised review and other issues, mostly authored by researchers based in Iran.

The article shares Springer’s response to the matter, some from their official statement and some from a spokesperson. For example, we learn the company cut ties with the “Tumor Biology” owners, and that the latest fake reviews were caught during a process put in place after that debacle.  See the story for more details.

Cynthia Murrell, June 20, 2017

Bookeyes for Free Classic Literature

June 15, 2017

We want to let you in on a nifty new resource—Bookeyes lets users download classic literature, in eBook form, for free. As of this writing, the site has 65 works to choose from, with the option to request something specific not yet on the page. (I requested Wuthering Heights by Emily Brontë.)

Despite the lack of Brontë sisters, the selection is pretty representative of the traditional Western-centric cannon, from Machiavelli to Thoreau. There’s your Homer, your Shakespeare, Jack London, Tolstoy, and Twain. Beowulf too, naturally. We also see books by Jane Austin, Harriet Beecher Stowe, and Frederick Douglas.

The search box works as expected—the first few letters of a title or author’s name narrows the list without reloading the page. To say the “About” page is succinct is an understatement; it simply declares:

Bookeyes is your home for book classics. Pick a title on our homepage and enjoy!

On the Contact page is a photo of site creator Kermitt Davis, who is either quite young or incredibly well-preserved. We applaud his effort to bring classic literature to the masses; perhaps he could use more suggestions for works that are out of copyright. Know of any good ones that might fall outside the syllabus for a Survey of Prominent Western Literature?

Cynthia Murrell, June 15, 2017

 

Google and Hate Speech: None of This I Know It When I See It

June 7, 2017

I read “YouTube Clarifies “Hate Speech” Definition and Which Videos Won’t Be Monetized.” I don’t know much about defining abstractions because I live in rural Kentucky. Our governor just recommended prayer patrols to curb violence in Louisville, home of the Derby and lots of murders on weekends.

Google has nailed down the abstraction “hate speech.” According to the write up, Google’s definition is:

[content which] “promotes discrimination or disparages or humiliates an individual or group of people on the basis of the individual’s or group’s race, ethnicity, or ethnic origin, nationality, religion, disability, age, veteran status, sexual orientation, gender identity, or other characteristic associated with systematic discrimination or marginalization.”

And

“inappropriate use of family entertainment characters,” which means content showing kid-friendly characters in “violent, sexual, vile, or otherwise inappropriate behavior,” no matter if the content is satirical or a parody. The final category is somewhat broad: “incendiary and demeaning content” means that anything “gratuitously” demeaning or shameful toward an individual or group is prohibited.”

And

“controversial issues or sensitive events,” which YouTube defines as “video content that features or focuses on sensitive topics or events including, but not limited to, war, political conflicts, terrorism or extremism, death and tragedies, sexual abuse, even if graphic imagery is not shown… For example, videos about recent tragedies, even if presented for news or documentary purposes, may not be eligible for advertising given the subject matter.”

This is good to know for three reasons:

  1. Google can define abstractions. No disambiguation subroutines are required.
  2. Google could run ads against this type of content and make money, but Google will not do that. (Did Google run ads against these types of content in the past? Nah, “do not evil” shuts the door on that question.)
  3. Facebook can process Google’s definitions and craft even more functional guidelines. (Me too is the basic process for innovation or becoming a publisher with editorial guidelines.)

Next up for Google to define are “love,” “truth,” justice,” and “salary data.”

Stephen E Arnold, June 7, 2017

The WSJ Click Crater

June 7, 2017

Big name, old school publishers share a trait. These folks perceive themselves as a traffic magnets. I have been in meetings in which the shared understanding was that a publisher’s “brand” would sustain a flow of a digital revenue.

Again and again the “brand” fallacy proves itself. Examples range from the original New York Times’ online service (hello, Jeff Pemberton) to the Wall Street Journal’s early attempt to make its content available in a sort of wonky online interface decades ago (hello, Richard Levine?).

I just read “WSJ Ends Google Users’ Free Ride, Then Fades in Search Results.” The main point: The brand magnet is weak. Without the Google attracting eye balls and routing traffic to the Murdoch “blue chip”, the WSJ has found itself in a click crater.

What’s the fix?

Well, dear WSJ, the answer is to buy Adwords. Yep, the WSJ has to fork over big money per month to get the traffic up. Then the WSJ has to figure out how to monetize that traffic.

That’s not easy.

I subscribe to the dead tree edition of the newspaper. The digital version is allegedly available to me as part of my subscription. I don’t bother. The WSJ is not able to provide me with an email and a temporary password so i can enter data from the newspaper’s mailing label into the WSJ online system. Nah, I have to phone the WSJ. Go through a crazy process and I don’t want to do this. I am okay with a magic marker and a pair of scissors.

I learned from the Bloomberg write up:

Executives at the Journal, owned by Rupert Murdoch’s News Corp., argue that Google’s policy is unfairly punishing them for trying to attract more digital subscribers. They want Google to treat their articles equally in search rankings, despite being behind a paywall.

Right, click crater.

Bad Google. Baloney.

Publishers fumbled their digits. Don’t believe me? Chase down someone involved in the early versions of the Times Online or the Dow Jones News Service.

These did not work.

Why?

A newspaper is one thing. Online information is another.

Bad Google. Wrong. Publishers with horse blinders can find their way to the stable. Anything else is tough.

Stephen E Arnold, June 7, 2017

Semantic Platform Aggregates Scientific Information

May 1, 2017

A new scientific repository is now available from a prominent publisher, we learn from “GraphDB, Leading Semantic Database from Ontotext, Powers Springer Nature’s New Linked Open Data Platform” at PRWeb. (We note the word “leading” in the title; who verifies this assertion? Just curious.) The platform, dubbed SciGraph, aggregates data from Springer Nature and its academic partners. The press release specifies:

Thanks to semantic technologies, Linked Open Data and the GraphDB semantic database, all these data are connected in a way which semantically describes and visualizes how the information is interlinked. GraphDB’s capability to seamlessly integrate disparate data silos allows Springer Nature SciGraph to comprise metadata from journals and articles, books and chapters, organizations, institutions, funders, research grants, patents, clinical trials, substances, conference series, events, citations and reference networks, Altmetrics, and links to research datasets.

The dataset is released under a certain international creative commons license, and can be downloaded (by someone with the appropriate technical knowledge) here.

An early explorer of semantic technology, Ontotext was founded in 2000. Based in Bulgaria, the company keeps their North American office in New Jersey. Ontotext’s client roster includes big names in publishing, government agencies, and cultural institutions.

Cynthia Murrell, May 1, 2017

Thomson Reuters: Now the Answer Company

April 25, 2017

Earlier this year I saw a reference to “the answer company.” I ignored it. Yesterday I saw a link to a podcast with Casey Hall, who is the “head of social media for business communications” at Thomson Reuters. Thomson Reuters is a publicly traded company with revenues in the $14 billion range. Here’s a Google chart showing how the company has performed over the last few years:

image

To my untrained eye, it looks as if revenues are down and profits are up. Yikes. How were those cost savings achieved? Perhaps the podcast explains how “the answer company” will boost revenues and continue to generate sustainable returns for stakeholders and, of course, senior management.

The podcast addresses a number of Thomson Reuters’ themes. One, for instance, is the fact that the company has 45,000 employees and a “giant footprint.” As the podcast ground forward, I realized that “the answer company” wants its employees to embrace employee advocacy.

It seems that “the answer company” is trying to communicate with its employees. According to the write up “How Thomson Reuters Earned the Brand as The Answer Company” accompanying the podcast told me:

Thomson Reuters encourages their employees to engage with their network of data scientists, finance, and accounting professionals by sharing the brand’s message. Leveraging their employees’ networks allows them to increase their reach and enhance the authenticity of the message since it’s coming from a real person, the employee. The employee advocacy program also helps with internal communications. Employees engage with each other and share what’s going on in their part of the organization.

Yeah, but, what about explaining “how” Thomson Reuters became “the answer company”? As it turns out, the podcast focused exclusively on “on boarding employees,” which I don’t really understand. Another topic was measuring the impact of the employee advocacy program. I think this means closing sales.

I suppose that Thomson Reuters just decided it needed a new tag line even thought its online services usually require a person to run a search, read a results list, and hunt for the needed information. That’s not answers. That’s work.

I believe that Thomson Reuters licensed the Palantir Technologies’ system in order to have tools which make sense of information. But if the podcast is any indication of how Thomson Reuters became “the answer company,” my thought is that the company is trying social media as a sales tool.

As for answers, one still has to hunt to find out what companies Thomson Reuters owns. One has to run queries on its online legal information systems and then hunt for answers.

Ah, PR. Love it. An article title which does not related to the content of the podcast OR the article.

Stephen E Arnold, April 24, 2017

The Algorithm to Failure

April 12, 2017

Algorithms have practically changed the way the world works. However, this nifty code also has its limitations that lead to failures.

In a whitepaper published by Cornell University, authored by Shai Shalev-ShwartzOhad ShamirShaked Shammah and titled Failures of Deep Learning, the authors say:

It is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms.

The whitepaper touches four pain points of Deep Learning, which is based on algorithms. The authors propose remedial measures that possibly could overcome these impediments and lead to better AI.

Eminent personalities like Stephen Hawking, Bill Gates and Elon Musk have however warned against advancing AIs. Google in the past had abandoned robotics as the machines were becoming too intelligent. What now needs to be seen is who will win in the end? Commercial interests or unfounded fear?

Vishal Ingole, April 12, 2017

Whose Message Is It Anyway?

April 11, 2017

Instant messaging service provider WhatsApp is in a quandary. While privacy of its users is of utmost importance to them, where do they draw the line if it’s a question of national security?

In an editorial published in The Telegraph titled WhatsApp Accused of Giving Terrorists ‘a Secret Place to Hide’ as It Refuses to Hand over London Attacker’s Messages, the writer says:“The Government was considering legislation to force online firms to take down extremist material, but said it was time for the companies to “recognise that they have a responsibility” to get their own house in order.

Apps like WhatsApp offer end-to-end encryption for messages sent using its network. This makes it impossible (?) for anyone to intercept and read them, even technicians at WhatsApp. On numerous occasions, WhatsApp, owned by Facebook, has come under fire for protecting its user privacy. In this particular incident, the London attacker Ajao used WhatsApp to send message to someone. While Soctland Yard wants access to the messages sent by the terrorist, WhatsApp says its hands are tied.

The editorial also says that social media networks are no more tech companies, rather they are turning into publishing companies thus the onus is on them to ensure the radical materials are also removed from their networks. Who ultimately will win the battle remains to be seen, but right now, WhatsApp seems to have the edge.

Vishal Ingole, April 11, 2017

Alternative (Aka Fake) News Not Going Anywhere

March 29, 2017

The article titled The Rise of Fake News Amidst the Fall of News Media on Silicon Valley Watcher makes a convincing argument that fake news is the inevitable result of the collective failure to invest in professional media. The author, Tom Foremski, used to write for the Financial Times. He argues that the almost ongoing layoffs among professional media organizations such as the New York Times, Salon, The Guardian, AP, Daily Dot, and IBT illustrate the lack of a sustainable business model for professional news media. The article states,

People won’t pay for the news media they should be reading but special interest groups will gladly pay for the media they want them to read. We have important decisions to make about a large number of issues such as the economy, the environment, energy, education, elder healthcare and those are just the ones that begin with the letter “E” — there’s plenty more issues. With bad information we won’t be able to make good decisions. Software engineers call this GIGO – Garbage In Garbage Out.

This issue affects us all; fake news even got a man elected to the highest office in the land.  With Donald Trump demonstrating on a daily basis that he has no interest in the truth, whether, regarding the size of the crowds at his inauguration or the reason he lost the popular vote to Hillary Clinton, the news industry is already in a crouch. Educating people to differentiate between true and false news is nearly impossible when it is so much easier and more comfortable for people to read only what reconfirms their worldview. Foremski leaves it up to the experts and the visionaries to solve the problem and find a way to place a monetary value on professional news media.

Chelsea Kerwin, March 29, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta