With hundreds to thousands of scientific papers published each day, keeping up with the state of the art in any area of science is a daunting task, they said.
The researchers said they developed a text-mining algorithm to prioritize research papers to read and include in their public database of scientific literature describing how environmental chemicals interact with genes to affect human health.
"Over 33,000 scientific papers have been published on heavy metal toxicity alone, going as far back as 1926," Allan Peter Davis, a project manager for the university's Comparative Toxicogenomics Database said. "We simply can't read and code them all. And, with the help of this new algorithm, we don't have to."
The sophisticated algorithm evaluates the text from thousands of papers and assigns a relevancy score to each document, a university release said Thursday.
"The score ranks the set of articles to help separate the wheat from the chaff, so to speak," research bioinformatician Thomas Wiegers said.
The researchers text-mined 15,000 articles and compared a representative sample to database entries that had been manually read and evaluated.
"The results were impressive," Davis said, noting the manual readers concurred with the algorithm 85 percent of the time with respect to the highest-scored papers.
"It's a tremendous time-saving step," Davis said. "With this we can allocate our resources much more effectively by having the team focus on the most informative papers."