The study, conducted by Dr. Heather Piwowar of Duke University and Dr. Todd Vision of the University of North Carolina at Chapel Hill examined papers published on gene expression, to compare with large open archives that exist for this genetic data.
The study examined citations to over ten thousand articles that generated new gene expression data, a quarter of which had data publicly archived in the GEO and ArrayExpress repositories.
Papers with publicly available data received about 9 percent more citations overall, and the difference increased over time, with citations increasing for at least five years.
The researchers concluded that much of this citation difference was due to data reuse.
"Professional advancement in science is still highly dependent on how well your paper gets cited, even in a field like genomics where the data underlying that paper may have far more scientific impact over the long term." said Vision, a biologist affiliated with the National Evolutionary Synthesis Center and the Dryad Digital Repository.
"Until the happy day when hiring and promotion committees catch up with how to value data sharing for its own sake, it is comforting to know that scientists can still receive credit for data sharing in a currency that counts," Vision said
Researchers also analyzed the full text of articles for references to datasets to study data reuse trends, and included obstacles in their paper.
The analysis revealed that scientists generally stopped publishing papers using their own datasets within two years, while other scientists continued to reuse their data for at least six years. It also showed that data reuse is on the rise.
"Not only were the number of reuse papers higher", Piwowar said, "but analyses from 2002 to 2004 were reusing only one or two datasets, while a quarter of the studies by 2010 were using three or more."
"We need more open and cohesive infrastructure to support collecting evidence about the process and products of science," Piwowar said. "This evidence is needed to inform important policy decisions. For example, data archiving requirements, infrastructure, and education should be informed by evidence about how data is and is not reused."
The paper was published in PeerJ, a peer-reviewed journal with open access and freely available articles.