Trellis BMA: coded trace reconstruction on IDS channels for DNA storage
- Sundara Rajan Srinivasavaradhan ,
- Sivakanth Gopi ,
- Henry Pfister ,
- Sergey Yekhanin
International Symposium on Information Theory (ISIT) |
Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further by introducing redundancy in write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an insertion-deletion-substitution (IDS) channel and design both encoding schemes and low-complexity decoding algorithms for coded trace reconstruction.
We introduce Trellis BMA, a new reconstruction algorithm whose complexity is linear in the number of traces, and compare its performance to previous algorithms. Our results show that it reduces the error rate on both simulated and experimental data. The performance comparisons in this paper are based on
the Clustered Nanopore Reads Dataset publicly released with this paper. Our hope is that this dataset will enable research progress by allowing objective comparisons between candidate algorithms.
Publication Downloads
Clustered Nanopore Reads (CNR) Dataset
May 7, 2021
DNA storage aims to store information in the form of DNA sequences. This is a research project in Microsoft Research Redmond. This repo contains a dataset of real DNA sequences which can be used for benchmarking different trace reconstruction algorithms. There is no code. We release the dataset of clustered nanopore DNA reads together with our paper: Trellis BMA: coded trace reconstruction on IDS channels for DNA storage