{"id":158464,"date":"2010-01-01T00:00:00","date_gmt":"2010-01-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/online-maintenance-of-very-large-random-samples-on-flash-storage-2\/"},"modified":"2018-10-16T21:11:34","modified_gmt":"2018-10-17T04:11:34","slug":"online-maintenance-of-very-large-random-samples-on-flash-storage-2","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/online-maintenance-of-very-large-random-samples-on-flash-storage-2\/","title":{"rendered":"Online Maintenance of Very Large Random Samples on Flash Storage"},"content":{"rendered":"
Recent advances in \ufb02ash storage have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA\u2019s, laptops, and even servers. However, \ufb02ash storage has many unique characteristics that make existing data management\/analytics algorithms designed for magnetic disks perform poorly with \ufb02ash storage. For example, while random reads can be nearly as fast as sequential reads, random writes and inplace data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for \ufb02ash storage: e\ufb03ciently maintaining a very large random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric \ufb01le are not readily adapted to \ufb02ash. Second, we propose BFile, an energy-e\ufb03cient abstraction for \ufb02ash storage to store self-expiring items, and show how a B-File can be used to e\ufb03ciently maintain a large sample in \ufb02ash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with \ufb02ash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-File and to query the large sample stored in a B-File for a subsample of an arbitrary size. Finally, we present an evaluation with \ufb02ash storage that shows our techniques are several orders of magnitude faster and more energy-e\ufb03cient than (\ufb02ash-friendly versions of) reservoir sampling and geometric \ufb01le. A key \ufb01nding of our study, of potential use to many \ufb02ash algorithms beyond sampling, is that \u201csemi-random\u201d writes (as de\ufb01ned in the paper) on \ufb02ash cards are over two orders of magnitude faster and more energy-e\ufb03cient than random writes.<\/p>\n","protected":false},"excerpt":{"rendered":"
Recent advances in \ufb02ash storage have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA\u2019s, laptops, and even servers. However, \ufb02ash storage has many unique characteristics that make existing data management\/analytics algorithms designed for magnetic disks perform poorly with \ufb02ash storage. For […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193715],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-158464","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"Very Large Data Bases Endowment Inc.","msr_edition":"VLDB Journal, vol. 19, issue 1","msr_affiliation":"","msr_published_date":"2010-01-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"VLDB Journal, vol. 19, issue 1","msr_volume":"19","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"1","msr_organization":"","msr_how_published":"","msr_notes":"Special Issue for VLDB 2008 Best Papers","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"207335","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"Sampling.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/Sampling.pdf","id":207335,"label_id":0}],"msr_related_uploader":"","msr_attachments":[{"id":207335,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/Sampling.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"sumann","user_id":33753,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=sumann"},{"type":"text","value":"Phillip B. Gibbons","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/158464"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/158464\/revisions"}],"predecessor-version":[{"id":533691,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/158464\/revisions\/533691"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=158464"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=158464"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=158464"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=158464"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=158464"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=158464"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=158464"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=158464"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=158464"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=158464"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=158464"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=158464"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=158464"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=158464"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=158464"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=158464"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}