Researchers from the University of British Columbia evaluated 500 randomly selected papers with simple and similar types of data -- length measurements of plants and animals -- and tracked the availability of the data over time.
They chose this data because length measurements have been collected in exactly the same way for decades and made for easy comparisons. They found that odds of finding the original data for these papers fell by 17 percent every year after publication.
"I think nobody expects that you'd be able to get data from a fifty-year-old paper, but to find that almost all the data sets are gone at twenty years was a bit of a surprise," said Timothy Vines of the University of British Columbia.
The alarming loss of scientific data was attributed to mundane factors like old email addresses and obsolete storage devices. The findings have been published in Current Biology.
"Publicly funded science generates an extraordinary amount of data each year. Much of these data are unique to a time and place, and are thus irreplaceable, and many other data sets are expensive to regenerate," Vine said.
"The current system of leaving data with authors means that almost all of it is lost over time. The data are thus unavailable for future researchers to check old results or use for entirely new purposes. Losing data is a waste of research funds, and it limits how we can do science."
The researchers propose that journals are the only party with the ability to ensure data isn't lost. Vine suggests journals require authors to upload their data to public archives as a condition of publication.
"Losing data is a waste of research funds and it limits how we can do science," says Vines. "Concerted action is needed to ensure it is saved for future research."
[University of British Columbia]