Perhaps you remember Heroes, a popular American television show, whose tagline during its first season was “Save the cheerleader—save the world.” Well, I don’t know about cheerleaders, but I do know that we need to save the data to save our world.
Those of us involved in the burgeoning arena of data-intensive science sometimes think that we invented the collection and curation of the numbers that so preoccupy us. Fortunately, that isn’t true, as I was recently reminded while serving as a member of the Task Force on Data and Visualization of the National Science Foundation’s Office of Cyberinfrastructure. As is recounted the Task Force report, 200-year-old data from ships’ logs are helping to us understand climate change. The British navy has carefully recorded and maintained information on wind speed, temperature, and pressure for over two centuries—a truly unique collection of climate data.
Of course much of our data about climate change is of a more recent vintage, the result of numerous field and satellite observations. As the report section entitled Preserving Data to Preserve the Planet discusses, our understanding of chlorofluorocarbons’ destructive impact on the ozone layer stems from laboratory experiments, measurements in the field, and copious satellite data. The collection, curation, and analysis of these data led to the Montreal Protocol, the 1980s treaty (since revised several times) that has phased out ozone-depleting compounds and led to hopeful signs of recovery in stratospheric ozone.
As the NFS-OCI report makes clear, such data-intensive science is critical to understanding climate change and to unraveling the natural cycles of our planet. It’s not hyperbole to say—as the report does—that a robust data service infrastructure is required for the breakthrough science that will help us preserve Mother Earth.
The overall report challenges the NSF to “create a sustainable data infrastructure fit to support world-class research and innovation.” This particular section of the report offers several recommendations, all worth repeating.
The key recommendation is that we recognize and appropriately fund data infrastructure and services as essential research assets. Governments (in this case, the U.S. government) must provide adequate budgets to establish and maintain data sets and create the requisite cyberinfrastructure for their access and manipulation.
To achieve this overarching goal, the Task Force makes three supporting recommendations:
- Key research domains should identify the essential data for retention and archival. The Task Force suggests adopting a 20- to 30-year outlook, asking what data collected today will be needed two or three decades down the road.
- There needs to be an open call for large-scale data services that cut across disciplines and embrace a range of data types. This will lead to economies of scale and should include incentives for service providers to develop cost-effective methods of data access and curation.
- The entire scientific research community should be engaged to evangelize open access to these data services.
We should make no mistake about it: the collection, curation, and storage of data are essential to modern science, as are the access to and sharing of these data stores. It’s wonderful that the British navy had the foresight to gather and preserve valuable meteorological information. We need to take a page from their log book and ensure that we are building the cyberinfrastructure to support research in the years ahead. So, at the risk of being a cheerleader myself, let me just say “Save the data—save the world.”