Ecologists Are Drowning in Sea of Data. These Tools Could Help

Joe Jiang/Flickr

When marine ecologists released the Ocean Health Index for the first time in 2012, it was a majestically ambitious achievement. The index, born of a collaboration among dozens of scientists, economists and environmental managers at the National Center for Ecological Analysis and Synthesis of the University of California, Santa Barbara, and the nonprofit organization Conservation International, was designed as a comprehensive framework for scientifically evaluating the health of ocean ecosystems, both worldwide and regionally. Drawing on more than a hundred databases, the index pulled together local measurements of biodiversity and ecological productivity with information about fishing, industrial use, carbon storage, tourism and other factors to score the health of open ocean and coastal regions between 0 and 100. (The global ocean earned a score of 60 in that first year, with regional ratings between 36 and 86.) The authors of the index hoped that such a standardized basis of comparison between and within regions would help with identifying and communicating the most effective measures for protecting the oceans and with guiding policymakers toward better decisions.

But the value in such an index comes not from doing it once but being able to do it over and over again. When the OHI team took up the task again in 2013, they quickly hit snags: Their data sets, documentation and modeling procedures were still an ugly mess. The OHI team had wrangled the motley data into shape for the 2012 results, but they were having trouble reproducing their own work as they revisited it for the update.

Reproducibility has become a hot-button topic for the biomedical sciences and psychology in recent years, but those fields aren’t alone. Environmental scientists have warned repeatedly that problems with reproducibility and transparency could become increasingly dire as researchers embrace big data approaches to understanding the dynamics of ecosystems at scales ranging from the regional to the continental or even larger—an effort often called macrosystems ecology.

Now an essay published this week by Julia S. Stewart Lowndes of NCEAS and her colleagues about how the OHI team quietly overcame its ungainly data problem offers an interesting case study in how macrosystems ecology projects—and even more modestly focused research—can benefit from an open access makeover. Their story also offers a how-to for researchers who might like to follow their example.

“I want other people to see this as their own future and feel empowered by it,” Lowndes said.

Big data projects in the environmental sciences go back for at least half a century, to the International Biological Program of the mid-1960s and ’70s. Often they have met with skepticism from ecologists and other biologists who complained that the projects sometimes seemed unfocused or that they locked investigators into awkward, counterproductive collaborations. Biologists who study rare species and delicate environments have objected to the loss of control over what they considered sensitive or proprietary information.

The disparate data types used by ecologists can also be a challenge, said Stephanie E. Hampton, a marine biologist and former NCEAS deputy director who is now the director of Washington State University’s Center for Environmental Research, Education and Outreach. Genetic sequences, phylogenetic trees, land use data, remote sensing and imagery data, logs of population numbers and species behaviors—all these and more need to be standardized and combined in macrosystems ecology projects. “We’re all jealous of people working in genomics because they’re trying to manage just four letters,” she said, laughing. “I think ecology is the real poster child for…

Number of species depends how you count them

Hercules beetles
DRAWING LINES Scientists sometimes have difficulty determining whether organisms, such as these Hercules beetles, are members of different species. Genetic analysis alone may divide populations into species that don’t exist by other biological criteria.

Genetic methods for counting new species may be a little too good at their jobs, a new study suggests.

Computer programs that rely on genetic data alone split populations of organisms into five to 13 times as many species as actually exist, researchers report online January 30 in Proceedings of the National Academy of Sciences. These overestimates may muddy researchers’ views of how species evolve and undermine conservation efforts by claiming protections for species that don’t really exist, say computational evolutionary biologist Jeet Sukumaran and evolutionary biologist L. Lacey Knowles.

The lesson, says Knowles, “is that we shouldn’t use genetic data alone” to draw lines between species.

Scientists have historically used data about organisms’ ecological distribution, appearance and behavior to classify species. But the number of experts in taxonomy is dwindling, and researchers have turned increasingly to genetics to help them draw distinctions. Large genetic datasets and powerful computer programs can quickly sort out groups that have become or are in the process of becoming different species. That’s especially important in analyzing organisms for which scientists don’t have much ecological data, such as insects in remote locations or recently extinct organisms.

Knowles and Sukumaran, both of the University of Michigan in Ann Arbor, examined a commonly used computer analysis method, called multispecies coalescent, which picks out genetic differences among individuals that have arisen recently in evolutionary time. Such differences could indicate that a population of organisms is becoming a separate species. The researchers used a set of known species and tested the program’s ability to correctly predict…