Disrupting Commercial Sci-Tech Indexes

November 10, 2021

Pooling knowledge is beneficial for advancing research. Despite the availability of digital databases on the Internet, these individual databases are not connected. Nature shares that an American technologist created a, “Giant, Free Index To World’s Research Papers Released Online.”

Carl Malamud designed an online index that catalogs words and short phrases from over one hundred journal articles, including paywalled papers. Malamud released the index under his California non-profit Public Resource. The index is free and its purpose is to help scientists discover insights from all research, even if stuck behind paywalls. Technically Malamud does not have the legal right to index the paywalled articles. However, the index only contains short sentences less than five letters long from the paywalled articles. It does not violate copyright. Publishers may still argue that the index is a violation.

The index is a major innovation:

“Malamud’s General Index, as he calls it, aims to address the problems faced by researchers such as Yadav. Computer scientists already text mine papers to build databases of genes, drugs and chemicals found in the literature, and to explore papers’ content faster than a human could read. But they often note that publishers ultimately control the speed and scope of their work, and that scientists are restricted to mining only open-access papers, or those articles they (or their institutions) have subscriptions to. Some publishers have said that researchers looking to mine the text of paywalled papers need their authorization.”

Some publishers, like Springer Nature, support open source development projects like the Malamud General Index. Springer Nature said open source projects do encounter problems when they do not secure proper rights.

Publishers do not have a case against Malamud. The index does not violate copyright and full text articles are not published in it. Instead the index pools a wealth of information and exposes paywalled articles to a larger audience, who will purchase content if it is helpful to research.

Publishers, however, may need convincing of this perspective.

Whitney Grace, November 10, 2021


One Response to “Disrupting Commercial Sci-Tech Indexes”

  1. Martin on November 10th, 2021 8:30 am

    I think the ” at the end of the Nature link needs removing, just points to a blank page for me (but works if I remove that character). Thanks for great update, very interesting.

