Internet Archive Scholar: Will Publishers Find a Way to Stomp This Free Knowledge Beast?

January 12, 2023

Here is a new search service worth noting. The Internet Archive Scholar was built to search the extensive, non-profit Internet Archive. The tool introduces itself:

“This full text search index includes over 25 million research articles and other scholarly documents preserved in the Internet Archive. The collection spans from digitized copies of eighteenth century journals through the latest Open Access conference proceedings and pre-prints crawled from the World Wide Web.”

Yes, that is a lot of information and a dedicated search system is a welcome addition. If only it were easier to find what one is looking for; the search leaves some on the Arnold IT team wanting more functionality. But the service is young, and the page notes that “Metadata is being improved and features have not been finalized.

The About page tells us more about how the tool works, where the metadata comes from (, and where to direct certain queries. It also addresses the issue of text and data mining:

“We intend to provide researcher access to the full corpus for text and data mining purposes. Derived datasets may also be posted publicly for analysis, for example a citation graph or N-gram frequencies by year. If you are interested or would like to see specific datasets made available, please contact us.

Currently snapshots of the full fatcat metadata corpus and upstream metadata sources are uploaded periodically to the Bulk Bibliographic Metadata collection on Read more in the Fatcat Guide.”

We look forward to seeing what functionality improvements the team implements as the Scholar is developed further. Readers may want to check it out for themselves and/or bookmark the site for future use. We are also curious about publishers’ reactions.

Cynthia Murrell, January 12, 2023


Got something to say?

  • Archives

  • Recent Posts

  • Meta