Paper Envisions an Open Science Platform for Chemistry Researchers
July 14, 2022
What could be accomplished if machine learning were harnessed to help scientists connect, collaborate, and build on each other’s findings? A team of researchers ponders “Making the Collective Knowledge of Chemistry Open and Machine Actionable.” Researchers Kevin Maik Jablonka, Luc Patiny, and Berend Smit hope their suggestions will bring the field of chemistry closer to FAIR principles (findable, accessible, interoperable, and reusable). The paper, published by Nature Chemistry, observes:
“Chemical research is still largely centered around paper-based lab notebooks, and the publication of data is often more an afterthought than an integral part of the process. Here we argue that a modular open-science platform for chemistry would be beneficial not only for data-mining studies but also, well beyond that, for the entire chemistry community. Much progress has been made over the past few years in developing technologies such as electronic lab notebooks that aim to address data-management concerns. This will help make chemical data reusable, however it is only one step. We highlight the importance of centering open-science initiatives around open, machine-actionable data and emphasize that most of the required technologies already exist—we only need to connect, polish and embrace them.”
The authors go on to describe how to do just that using structured and open data with semantic tools. In order to make the transition as smooth as possible, the team suggests data capture should be similar to the way chemists already work. Data should also be generated in a standardized format other researchers can easily use. A formal ontology will be important here. For consistency and accessibility, the paper also recommends building a modular data-analysis platform with a common interface and standardized protocols. This open-science platform would replace the hodgepodge of different, often proprietary, tools currently in use. It would also make publication of data a seamless, and centralized, part of the process. See the paper for all the details. The authors conclude:
“We emphasize that the technology is here not only to facilitate the process of publishing data in a FAIR format to satisfy the sponsors, but also to ensure that the combination of chemical data, FAIR principles and openness gives scientists the possibility to harvest all data so that all chemists can have access to the collective knowledge of everybody’s successful, partly successful and even ‘failed’ experiments.”
Cynthia Murrell, July 14, 2022