Scaling up DNA data storage and random access retrieval

  • Doug Carmean ,
  • Luis Ceze ,
  • ,
  • Siena Dumas Ang ,
  • Robert Carlson ,
  • Yuan-Jyue Chen ,
  • Parikshit Gopalan ,
  • Gagan Gupta ,
  • Govinda Kamath ,
  • Randolph Lopez ,
  • Konstantin Makarychev ,
  • John Mulligan ,
  • Sharon Newman ,
  • ,
  • Lee Organick ,
  • Hsing-Yeh Parker ,
  • Miklos Z. Racz ,
  • Cyrus Rashtchian ,
  • Georg Seelig ,
  • Kendall Stewart ,
  • Christopher Takahashi ,

Nature Biotechnology | , Vol 36: pp. 242-248

Publication

Current storage technologies can no longer keep pace with exponentially growing amounts of data. 1 Synthetic DNA offers an attractive alternative due to its potential information density of ~ 1018B/mm3, 107 times denser than magnetic tape, and potential durability of thousands of years. 2 Recent advances in DNA data storage have highlighted technical challenges, in particular, coding and random access, but have stored only modest amounts of data in synthetic DNA. 3,4,5 This paper demonstrates an end-to-end approach toward the viability of DNA data storage with large-scale random access. We encoded and stored 35 distinct files, totaling 200MB of data, in more than 13 million DNA oligonucleotides (about 2 billion nucleotides in total) and fully recovered the data with no bit errors, representing an advance of almost an order of magnitude compared to prior work. 6 Our data curation focused on technologically advanced data types and historical relevance, including the Universal Declaration of Human Rights in over 100 languages, 7 a high-definition music video of the band OK Go, 8 and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault.