Machine Learning Solution Would Help Keep Wikipedia Entries Updated
February 27, 2020
In a development that could ease the burden on Wikipedia volunteers, Eurasia Review reports, “Automated System Can Rewrite Outdated Sentences in Wikipedia Articles.” Researchers at MIT have created a system that could greatly simplify the never-ending process of keeping articles up to date on the site. Instead of having to rewrite sentences or paragraphs, volunteers could just insert the updated information into an unstructured sentence. The system would then generate “humanlike” text. Here’s how:
“Behind the system is a fair bit of text-generating ingenuity in identifying contradictory information between, and then fusing together, two separate sentences. It takes as input an ‘outdated’ sentence from a Wikipedia article, plus a separate ‘claim’ sentence that contains the updated and conflicting information. The system must automatically delete and keep specific words in the outdated sentence, based on information in the claim, to update facts but maintain style and grammar. …
We noted:
“The system was trained on a popular dataset that contains pairs of sentences, in which one sentence is a claim and the other is a relevant Wikipedia sentence. Each pair is labeled in one of three ways: ‘agree,’ meaning the sentences contain matching factual information; ‘disagree,’ meaning they contain contradictory information; or ‘neutral,’ where there’s not enough information for either label. The system must make all disagreeing pairs agree, by modifying the outdated sentence to match the claim. That requires using two separate models to produce the desired output. The first model is a fact-checking classifier — pretrained to label each sentence pair as ‘agree,’ ‘disagree,’ or ‘neutral’ — that focuses on disagreeing pairs. Running in conjunction with the classifier is a custom ‘neutrality masker’ module that identifies which words in the outdated sentence contradict the claim.”
Note this process still requires people to decide what needs updating, but researchers look forward to a time that even that human input could be sidestepped. (Is that a good thing?) Another hope is that the tool could be used to eliminate bias in the training of “fake news” detection bots. Researchers point out the system could be used on text-generating applications beyond Wikipedia, as well. See the write-up for more information.
Cynthia Murrell, February 27, 2020