In the near future (5-10 years), we expect scientists will use radically new tools to develop research papers. As scientists publish their research, these tools will document and publish the workflow as well as all the associated digital objects (data, software, etc.) that form the basis for the paper. This evolution in research publication will substantially improve science communication, promote a fair basis for crediting science contributions and offer a transparent way for other scientists to evaluate and even reproduce the research.
It is our view that in the future, publishers will accept submissions that do not just contain text and figures, they will also include data (both final and intermediate results), software, and other digital objects relevant to Geoscience research together with carefully documented provenance. Today, many journals accept datasets together with the paper, some journals accept software and software papers, but no geoscience journal includes the complete details related to the data, software, and provenance of a research paper.
Readers of papers in the future will be able to interact with a published article, modify its figures to explore the data, reproduce the results, run the published methods/models with new data. Today, readers simply get a static paper, and in the rare cases where data are downloadable, reproduction of the analysis requires significant additional work or may not even be possible.
As we work towards this future, data producers and software developers should get credit for the work that they do because all publications that build on their work would acknowledge their work through citations. Today, there is limited credit and reward for those that create data and software that forms the basis of much of geoscience research of the future.
The Geoscience Papers of the Future initiative contributes to this vision by encouraging scientists to publish papers that document appropriately the data, software, and provenance of their published results.
Many journals accommodate the publication of datasets, and in some cases other associated materials that can include code and other research products. Some journals publish software papers that describe the merits of a code base and its contribution to a community.
Several frameworks have been developed to document scientific articles so that they are more useful to researchers than just a simple PDF. These include iPython Notebook (for Python), Weaver (for R), and Computable Document Format (for Mathematica).
Although there are research tools that can be used to incorporate data, software and provenance in scientific articles, they are not routinely used in science – at least not by geoscientists.
Publishers have been interested in improving digital scholarship practices. Elsevier has invested in some initiatives in this direction including the Executable Papers Challenge, although this effort focused on Computer Science, and the Article of the Future effort which focuses on enhanced interaction between the reader and the publication (e.g., inclusion of published maps in Google maps, ability to zoom in on figures and select datapoints).
Such kinds of publications are not common geoscience practice. Although data sharing is regularly practiced in some communities, it is not in many others. Software is sometimes shared in community modeling frameworks or general code repositories. Publishing provenance of paper results is a rarity. Many geoscientists are not aware of what they should do to improve the documentation for the computational methods of their articles.
In response to a massive petition to make the results of federally-funded research publicly accessible, the US Office of Science and Technology issued a mandate for all government agencies that fund research to put a plan in place to release all research products so they are publicly accessible.
Given this mandate, NSF released a Public Access Plan in March 2015. NSF will require that all products of research be published for grants awarded after January 2016. NSF already has a mandatory Data Management Plan in place, although it is not enforced. Plans are underway to determine how research products are to be released. Other agencies that fund geosciences research as well are pursuing similar plans.
In 2011, NSF created the EarthCube initiative to create a community-driven dynamic cyberinfrastructure that supports standards for interoperability, infuses advanced technologies to improve and facilitate interdisciplinary research, and helps educate scientists in the emerging practices of digital scholarship, data and software stewardship, and open science.
The goal of the EarthCube OntoSoft project is to promote stewardship of software and data developed by geoscientists. The challenges being addressed span many software sharing topics, including: description, management, curation, dissemination, provenance, uncertainty, replication, reproduction, and credit among others.
Geoscientists now live in a world rich on digital data and that the description of their analyses cannot be fully captured entirely in the current peer review of publications that they write. OntoSoft is disseminating best practices that include the addition of computer code and data used in support of scientific investigations. This requires making software available, as well as the provenance and workflows used to generate the results. This will not only better describe experiments, it will hasten the pace of scientific discovery through improved ability to build on their own and other researchers’ work.
OntoSoft is developing a germinal ecosystem for software stewardship in geosciences to empower scientists to manage their software as valuable scientific assets in an open and transparent way that enables broader access to that software by other scientists, software professionals, students, and decision makers.
The OntoSoft project established an Early Career Advisory Committee to gather requirements concerning stewardship of geosciences software. One of the activities proposed was to document all the software used in a paper, share and publish it, and promote its reuse and appropriate credit to the original authors. The OntoSoft interactive assistant would be used to capture software metadata to describes software characteristics that facilitate reuse. This led to the concept of a Geoscience Paper of the Future activity.
A pilot program was launched in Spring 2015. This pilot program included thirteen researchers from the OntoSoft Early Career Advisory Board working with OntoSoft researchers to create an initial set of GPFs. The papers covered a diverse set of areas in geosciences, including hydrological modeling, sensor networks, marine ecology, visualization, volcanology, regional climate model evaluation, marine microbiology, tropical meteorology, hydrogeology, glaciology, ocean fisheries, and river ecohydrology. The group developed an initial definition and criteria for GPFs, training materials, and adapted best practices as appropriate for their research areas.
The GPF Initiative builds on this pilot program and extends it to reach the broader geosciences community.