Science and data are interwoven in many ways. The scientific method has lent a good part of its overall approach and practices to data-driven analytics, software development, and data science. Now data science and software lend some tools to scientific research.
It's clear how data-driven culture, and even software practices like agile, which is all about iterative development, have borrowed from science. Now an emergent ecosystem of solutions centered around scientific research and publication may be about to repay the loan.
Traditionally, scientific research has relied on peer review. The peer-review and publication process can take anywhere from a few months to a few years to complete. In addition, the business model of many scientific publishers does not make research accessible to everyone.
To make research readily available to as many people as possible as soon as possible, many researchers choose to publish their work on pre-print repositories like Arxiv or Zenodo. Pre-prints solve the open access issues, as they are immediately accessible for free.
The reproducibility crisis and artificial intelligence
Most pre-prints will be revised, in minor or major ways, while others may not be published at all. But even for the ones that do go through the review and publication process successfully, an equally important issue remains: Reproducibility.
Reproducibility is a major principle of the scientific method. It means that a result obtained by an experiment or observational study should be achieved again with a high degree of agreement when the study is replicated with the same methodology by different researchers.
According to a 2016 Nature survey, more than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments.
This so-called reproducibility or replication crisis has not left artificial intelligence intact either. Although the writing has been on the wall for a while, 2020 may have been a watershed moment.
Critics argued that the Google team provided so little information about its code and how it was tested that the study amounted to nothing more than a promotion of proprietary tech.
As opposed to sometimes obscure research, AI has the public's attention and is backed and capitalized by the likes of Google. Plus, AI's machine learning subdomain with its black box models makes the issue especially pertinent. Hence, this incident was widely reported on and brought reproducibility to the fore.
Enter Papers with Code. Papers with Code is another repository for research, with its mission statement citing the creation of a free and open resource with machine learning papers, code, and evaluation tables as its goal. It highlights trending machine learning research and the code to implement it.
As far as reproducible research goes, we should also mention open-source technology by eLife that lets authors publish Executable Research Articles, treating live code and data as first-class citizens. And the good news doesn't end there.
Connected Papers is a free visual tool that helps researchers and applied scientists find and explore papers relevant to their field of work, in any domain. It creates a graph for each paper in its repository, by analyzing about 50,000 papers and selecting the few dozen with the strongest connections to the origin paper.
On Feb. 3, Connected Papers also announced a partnership with Arxiv. Now every paper page on Arxiv will link to a graph of Connected Papers. Interestingly, Connected Papers arranges papers according to their similarity. That means that even papers that do not directly cite each other can be strongly connected and very closely positioned.
The COVID GRAPH and Open Research Knowledge Graph (ORKG) teams have focused on COVID-19, and emphasized annotation and structure, respectively. Connected Papers seems to expand coverage, and emphasize algorithmic similarity.
Open access, discoverability, reproducibility, code, datasets, and knowledge graphs. This is all good news for research, and machine learning research too, obviously. It seems like steps towards a healthier, more productive research ecosystem are being taken.
This is especially true considering how many of these initiatives are either already connected, or can easily be connected. However, there's also one major issue we see connecting all those otherwise commendable efforts: Sustainability. Let's do a quick recap.
Connected Papers started as a weekend side project between friends, and then it got traction. Today, it is self-funded and free to use, with one sponsor that we know of and a call for more sponsors. COVID GRAPH is a volunteer effort, and ORKG is a publicly funded research project.
Those are different ways different teams have found towards what seems like a common goal: A better research ecosystem. Essentially, they are all trying to grapple with the dilemma of how to produce public goods that belong in the Commons, in a challenging, commercially-oriented environment.