Scientific research frequently yields unexpected benefits. Silly Putty, for example, was the byproduct of World War II research for potential rubber substitutes and this bouncy substance has delighted children for generations. Of perhaps more scientific gravitas, the cosmic microwave background radiation was first detected during experiments at Bell Labs on building antennae to pick up radio waves bounced off satellites—a discovery that helped advance the Big Bang theory and proved the final nail in the coffin for Fred Hoyle’s “continuous creation” steady-state alternative.
So it should come as no surprise that data-intensive science is producing its share of serendipitous discoveries. The report of the NSF-OCI (National Science Foundation-Office of Cyberinfrastructure) Task Force on Data and Visualization describes a few examples drawn from data-intensive research in oceanography.
Long the domain of ship-based observations, oceanography today now encompasses observatory-based research and a worldwide network of scientists from myriad disciplines. These efforts measure regular oceanic processes and aim to understand our planet’s climate, geodynamics, and marine ecosystems. For example, scientists at Rutgers University’s Coastal Ocean Observation Lab are collecting high-frequency radar data on ocean surface waves and currents, with an eye to answering specific questions, such as the impact on the marine food chain of the Hudson River’s flows into the Atlantic Ocean.
However, the accumulated data are also being used by the U.S. Coast Guard to facilitate life-saving ocean rescues. Taking advantage of Rutgers’ highly accurate, real-time data on ocean circulation patterns, the Coast Guard can more precisely define the search area for survivors of boating and aircraft accidents. Similarly, the New Jersey Board of Public Utilities is utilizing the Rutgers data to plan offshore wind farms, and the Department of Homeland Security is exploring the data’s potential for detecting ships that suspiciously have not reported their location.
These examples show how large data sets can yield unexpected benefits, and, as the Task Force reports states, they provide “an argument for funding and building robust systems to manage and store the data.” Unfortunately, much of the current Rutgers data has to be discarded due to the lack of capacity for storage, curation, and management. The report rather depressingly observes that the existing research culture often fails to encourage best practices in data management and sharing—thereby impeding the discovery of new uses for these data.
To remedy this situation, the Task Force offered these key recommendations:
- Introduce new funding models that have specific data-sharing expectations and support researchers in meeting data-management and data-sharing requirements imposed by research sponsors.
- Create new citation models in which data and software tool providers are credited with their data contributions and establish metrics that recognize open-access policies and sharing.
These recommendations are so commonsensical that it’s hard for me to imagine anyone objecting to them. After all, improved sea rescues and heightened security from terrorists seem like rather nice byproducts. And who knows when rich data sets might even give rise to the next Silly Putty?