Semantic Scholar: Mostly Useful Abstracting
December 4, 2020
A new search engine specifically tailored to scientific literature uses a highly trained algorithm. MIT Technology Review reports, “An AI Helps You Summarize the latest in AI” (and other computer science topics). Semantic Scholar generates tl;dr sentences for each paper on an author’s page. Literally—they call each summary, and the machine-learning model itself, “TLDR.” The work was performed by researchers at the Allen Institute for AI and the University of Washington’s Paul G. Allen School of Computer Science & Engineering.
AI-generated summaries are either extractive, picking a sentence out of the text to represent the whole, or abstractive, generating a new sentence. Obviously, an abstractive summary would be more likely to capture the essence of a whole paper—if it were done well. Unfortunately, due to limitations of natural language processing, most systems have relied on extractive algorithms. This model, however, may change all that. Writer Karen Hao tells us:
“How they did it: AI2’s abstractive model uses what’s known as a transformer—a type of neural network architecture first invented in 2017 that has since powered all of the major leaps in NLP, including OpenAI’s GPT-3. The researchers first trained the transformer on a generic corpus of text to establish its baseline familiarity with the English language. This process is known as ‘pre-training’ and is part of what makes transformers so powerful. They then fine-tuned the model—in other words, trained it further—on the specific task of summarization. The fine-tuning data: The researchers first created a dataset called SciTldr, which contains roughly 5,400 pairs of scientific papers and corresponding single-sentence summaries. To find these high-quality summaries, they first went hunting for them on OpenReview, a public conference paper submission platform where researchers will often post their own one-sentence synopsis of their paper. This provided a couple thousand pairs. The researchers then hired annotators to summarize more papers by reading and further condensing the synopses that had already been written by peer reviewers.”
The team went on to add a second dataset of 20,000 papers and their titles. They hypothesized that, as titles are themselves a kind of summary, this would refine the model further. They were not disappointed. The resulting summaries average 21 words to summarize papers that average 5,000 words, a compression of 238 times. Compare this to the next best abstractive option at 36.5 times and one can see TLDR is leaps ahead. But are these summaries as accurate and informative? According to human reviewers, they are even more so. We may just have here a rare machine learning model that has received enough training on good data to be effective.
The Semantic Scholar team continues to refine the software, training it to summarize other types of papers and to reduce repetition. They also aim to have it summarize multiple documents at once—good for researchers in a new field, for example, or policymakers being briefed on a complex issue. Stay tuned.
Cynthia Murrell, December 4, 2020