Machine Learning Solution Would Help Keep Wikipedia Entries Updated

February 27, 2020

In a development that could ease the burden on Wikipedia volunteers, Eurasia Review reports, “Automated System Can Rewrite Outdated Sentences in Wikipedia Articles.” Researchers at MIT have created a system that could greatly simplify the never-ending process of keeping articles up to date on the site. Instead of having to rewrite sentences or paragraphs, volunteers could just insert the updated information into an unstructured sentence. The system would then generate “humanlike” text. Here’s how:

“Behind the system is a fair bit of text-generating ingenuity in identifying contradictory information between, and then fusing together, two separate sentences. It takes as input an ‘outdated’ sentence from a Wikipedia article, plus a separate ‘claim’ sentence that contains the updated and conflicting information. The system must automatically delete and keep specific words in the outdated sentence, based on information in the claim, to update facts but maintain style and grammar. …

We noted:

“The system was trained on a popular dataset that contains pairs of sentences, in which one sentence is a claim and the other is a relevant Wikipedia sentence. Each pair is labeled in one of three ways: ‘agree,’ meaning the sentences contain matching factual information; ‘disagree,’ meaning they contain contradictory information; or ‘neutral,’ where there’s not enough information for either label. The system must make all disagreeing pairs agree, by modifying the outdated sentence to match the claim. That requires using two separate models to produce the desired output. The first model is a fact-checking classifier — pretrained to label each sentence pair as ‘agree,’ ‘disagree,’ or ‘neutral’ — that focuses on disagreeing pairs. Running in conjunction with the classifier is a custom ‘neutrality masker’ module that identifies which words in the outdated sentence contradict the claim.”

Note this process still requires people to decide what needs updating, but researchers look forward to a time that even that human input could be sidestepped. (Is that a good thing?) Another hope is that the tool could be used to eliminate bias in the training of “fake news” detection bots. Researchers point out the system could be used on text-generating applications beyond Wikipedia, as well. See the write-up for more information.

Cynthia Murrell, February 27, 2020

Written by Stephen E. Arnold · Filed Under AI, News, Publishing

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.