Which Is Better? Abstract or Full Text Search?
November 26, 2010
Please bear with us while we present a short lesson in the obvious: “Users searching full text are more likely to find relevant articles than searching only abstracts.” A recent BMC Bioinformatics research article written by Jimmy Lin titled “Is Searching Full Text More Effective than Searching Abstracts?” explores exactly that.
So maybe we opened with the conclusion, but here is some background information. Since it is no longer an anomaly to view a full-text article online, the author set out to determine if it would be more effective to search full-text versus only the short but direct text of an abstract. The results:
“Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.”
Yep, at the end of the day, searching from a bank of more words will in fact increase your likeliness of a hit. The extension here is the future must bring with it some solutions. Due to the longer length of the full-text articles and the growing digital archive waiting to be tamed, Lin predicts that multiple machines in a cluster as well as distributed text retrieval algorithms will be necessary to effectively handle the search requirements. Wonder who will be first in line to provide these services…
Sarah Rogers, November 26, 2010
Freebie
Comments
2 Responses to “Which Is Better? Abstract or Full Text Search?”
I found the following of interest in the fulltext of the mentioned paper:
“Even if the primary goal of a system is to leverage full-text content to enhance article retrieval, results have to be presented in a manner that suggests the relevance of an article. This necessarily involves creating some type of
surrogate for the article, which can either be indicative or informative. Common techniques for generating such surrogates include displaying titles and metadata (as with the current PubMed interface) and short keyword-in-context
extracts (as with Google Scholar). The first is primarily indicative, while the second aims to be informative.” I would like to see a similar tailored treatment for SSH Social Sciences and Humanities material. Or put another way: What can future Google Books search features do for SSH in light of what has already been done for STM? Will better late than never make any impact?
Relevance becomes a factor in the balance between searching in abstracts and full-text. Full text articles have discussions which may introduce non-relevant terms or hypotheses which skew the relevance of articles retrieved.
Similarly, full-text search on patents is likely to retrieve irrelevant documents as they are frequently packed with terms irrelevant to the key claims of the patent.