DNA contains four nucleotides: adenine, guanine, cytosine and thymine (aka A, G, C and T), which arrange into the famed double helix. The patterns they form store the code for making living organisms and has been known to survive for thousands of years. Synthetic DNA patterns have already been used for data storage.
“DNA is one of the best options, if not the best option, to store archival data especially,” said University of Illinois Urbana-Champaign student Chao Pan (right).
The aim was to discover new nucleotides that would work in the DNA structure and could be usefully distinguished using the single-strand DNA reading equipment created by UK-based Oxford Nanopore – in which the strand is pulled though a protein-based nano-scopic hole and read electro-chemically as it passes.
“We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly,” said Pan. “The deep-learning framework as part of our method to identify different nucleotides is universal, which enables the generalisability of our approach to many other applications.”
Average reading accuracy was above 60%, which according to the team is 39x better than random guessing.
Explained in the scientific paper ‘Expanding the molecular alphabet of DNA-based data storage systems with neural network nanopore readout processing’, published by the American Chemical Society, the boost in nucleotide options has the potential to nearly double storage density and nearly half recording latency.
How much data can DNA store?
“Every day, several petabytes of data are generated on the internet. Only one gram of DNA would be sufficient to store that data,” said fellow scientist Kasra Tabatabaei, of the Beckman Institute for Advanced Science and Technology (left).
ACS publication ‘Expanding the Molecular Alphabet of DNA-Based Data Storage Systems with Neural Network Nanopore Readout Processing‘ is available to read in full without payment.