{"id":745519,"date":"2021-05-11T16:35:32","date_gmt":"2021-05-11T23:35:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=745519"},"modified":"2021-05-11T16:43:53","modified_gmt":"2021-05-11T23:43:53","slug":"batch-optimization-for-dna-synthesis","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/batch-optimization-for-dna-synthesis\/","title":{"rendered":"Batch optimization for DNA synthesis"},"content":{"rendered":"

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies.<\/p>\n

We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool S of random quaternary strings of fixed length, partition S into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches.<\/p>\n

We introduce two ideas for batch optimization that both improve (in different ways) upon a na\u00efve baseline: (1) using both (ACGT)* and its reverse (TGCA)* as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in S do not contain repeats of the same character
\n(homopolymers), as compared to the case where strings in S are unconstrained<\/p>\n","protected":false},"excerpt":{"rendered":"

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13561,13546],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[251170],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-745519","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-algorithms","msr-research-area-computational-sciences-mathematics","msr-locale-en_us","msr-field-of-study-dna-data-storage"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-7-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/05\/Batch_opt_ISIT_2021.pdf","id":"745522","title":"batch_opt_isit_2021","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":745522,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/05\/Batch_opt_ISIT_2021.pdf"}],"msr-author-ordering":[{"type":"text","value":"Konstantin Makarychev","user_id":0,"rest_url":false},{"type":"text","value":"Miklos Racz","user_id":0,"rest_url":false},{"type":"text","value":"Cyrus Rashtchian","user_id":0,"rest_url":false},{"type":"edited_text","value":"Sergey Yekhanin","user_id":34990,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sergey Yekhanin"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[437022],"msr_project":[212072],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":212072,"post_title":"DNA Storage","post_name":"dna-storage","post_type":"msr-project","post_date":"2015-01-01 00:00:45","post_modified":"2022-12-22 16:36:38","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/dna-storage\/","post_excerpt":"Transitioned | This project enables molecular-level data storage into DNA molecules by leveraging biotechnology advances in synthesizing, manipulating and sequencing DNA to develop archival storage. Microsoft and University of Washington researchers are collaborating to use DNA as a high density, durable and easy-to-manipulate storage medium.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212072"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/745519"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/745519\/revisions"}],"predecessor-version":[{"id":745525,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/745519\/revisions\/745525"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=745519"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=745519"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=745519"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=745519"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=745519"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=745519"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=745519"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=745519"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=745519"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=745519"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=745519"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=745519"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=745519"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=745519"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=745519"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=745519"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}