A new method of labeling and retrieval for storing data in DNA form could make the method a usable solution to the growing amount of data being generated by humanity, researchers said in a new study. Photo by Caroline Davis2010/Flickr
June 10 (UPI) -- Scientists at the Massachusetts Institute of Technology have developed a new technique for labeling and retrieving DNA data files -- a breakthrough that could help shrink the carbon footprint of the rapidly expanding digital world.
In a proof-of-concept study, published Thursday in the journal Nature Materials, scientists accurately retrieved single image files stored as DNA sequences from a set of 20 photos.
Every day, billions of emails crisscross the information superhighway, and hundreds of millions of photos and documents are uploaded onto the cloud.
All that data doesn't just float around in the ether waiting to be retrieved -- digital files and the data within must be physically stored.
Not surprisingly, the demand for data storage has been rising exponentially.
To save space and energy, scientists have been looking for more efficient data storage solutions. DNA is one such solution.
DNA molecules evolved the ability to package genetic information at extremely high densities -- an ability that can be hijacked for digital data storage.
"DNA is a thousandfold denser than even flash memory, and another property that's interesting is that once you make the DNA polymer, it doesn't consume any energy," study co-author Mark Bathe, an MIT professor of biological engineering, told MIT News. "You can write the DNA and then store it forever."
Digital data deals in binary code, that is 0s and 1s. DNA can replicate this system using its four nucleotides, A, T, G, and C -- with G and C representing 0, for example, and A and T subbing for 1.
Various groups of scientists and engineers have previously used DNA to store information, encoding songs, movies and photos in double helix strands. DNA data storage has a lot going for it, they say -- it's stable, secure and efficient.
But there are drawbacks. Currently, synthesizing all that DNA code is expensive. Worse still, scientists don't yet have a very good way to sort through DNA code when files need to be pulled out of a large database.
To address the second of these two main problems, Bathe and his research partners at MIT developed a novel DNA data retrieval technique.
Instead of folding a bunch of DNA sequences together, researchers encapsulated DNA data files in tiny silica particles, each one measuring six micrometers wide. To organize the DNA files, scientists labeled each particle with a short DNA sequence advertising its contents.
In addition to the DNA barcodes, researchers were able to add so-called primers -- data tags that corresponded with the image subjects, like "orange" for a cat photo -- using fluorescent or magnetic particles.
The combination of barcodes and primers allowed scientists to use Boolean logic -- search term combinations like "president AND 18th century" -- to improve their search results when retrieving data.
"At the current state of our proof-of-concept, we're at the 1 kilobyte per second search rate," said lead author James Banal, an MIT senior postdoc.
"Our file system's search rate is determined by the data size per capsule, which is currently limited by the prohibitive cost to write even 100 megabytes worth of data on DNA, and the number of sorters we can use in parallel. If DNA synthesis becomes cheap enough, we would be able to maximize the data size we can store per file with our approach," Banal said.
The new data storage and retrieval system won't be ideal for all types of storage. Plus, there are other promising data storage technologies on the horizon, including quantum data centers.
But the new technology could prove useful for "cold" data storage, archival data that doesn't need to be frequently accessed.
"While it may be a while before DNA is viable as a data storage medium, there already exists a pressing need today for low-cost, massive storage solutions for preexisting DNA and RNA samples from COVID-19 testing, human genomic sequencing and other areas of genomics," Bathe said.