{"id":303512,"date":"2012-09-05T15:00:20","date_gmt":"2012-09-05T22:00:20","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=303512"},"modified":"2016-10-11T10:00:20","modified_gmt":"2016-10-11T17:00:20","slug":"better-way-store-data","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/better-way-store-data\/","title":{"rendered":"A Better Way to Store Data"},"content":{"rendered":"

By Douglas Gantenbein, Senior Writer, Microsoft News Center<\/em><\/p>\n

These days, nearly everyone stores things in the \u201ccloud\u201d\u2014business-critical documents, personal photos, e-mail accounts \u2026 everything.<\/p>\n

Microsoft introduced Windows Azure Storage (opens in new tab)<\/span><\/a> in 2008. Since then, that cloud offering has gained widespread use, not only within Microsoft, but also by thousands of external customers, and it currently stores more than 4 trillion objects (opens in new tab)<\/span><\/a>.<\/p>\n

Storing massive amounts of information in the cloud comes with costs, however\u2014primarily, the cost of storing all that digital data. Now a Microsoft Research team, working with members of the Windows Azure Storage group, has developed a powerful mathematical tool that significantly reduces the amount of space stored data requires. That, in turn, slashes the cost of storing that data, saving Windows Azure Storage millions of dollars.<\/p>\n

The work focuses on a particular challenge within the cloud: managing data to help keep it safe and secure while minimizing the amount of storage space it requires. That\u2019s a big challenge. For enterprises and people to trust their valuable data to the cloud, they need a high degree of confidence that their data will be safe. Anyone storing that data, meanwhile, wants to keep costs low.<\/p>\n

The easiest way to provide integrity for data is to duplicate it. Three full copies typically are enough to keep the data safe and durable in the event of server failures\u2014and servers will, in time, fail. But obviously, duplication uses a lot of storage. In the case of keeping three full copies, the storage cost, or \u201coverhead,\u201d is simply the whole number: Three.<\/p>\n

\"Jin

Jin Li<\/p><\/div>\n

Microsoft Research, combined with the Windows Azure team, took a new approach to reducing storage demands. The team included Cheng Huang (opens in new tab)<\/span><\/a> and Jin Li (opens in new tab)<\/span><\/a> from Microsoft Research Redmond (opens in new tab)<\/span><\/a>, Parikshit Gopalan and Sergey Yekhanin (opens in new tab)<\/span><\/a> from Microsoft Research Silicon Valley, and Huseyin Simitci, Yikang Xu, Aaron Ogus, and Brad Calder from Windows Azure Storage.<\/p>\n

The team built on a common approach to keeping data accessible and durable while requiring less space. That approach is to \u201ccode\u201d the data\u2014in effect, create a shortened description of the data, so that it can be reassembled and delivered to a user.<\/p>\n

Windows Azure Storage condenses stored data with a technique called \u201clazy erasure coding.\u201d The name comes from the way coding works in background, not in the critical write path. When a data chunk\u2014called an \u201cextent\u201d\u2014is opened and filled, it is duplicated with three full copies. When it is sealed, erasure coding is launched in the background, when the data-center level is low. The extent is split into equal-sized data fragments, coded to generate a number of parity fragments, with each of the data fragments and parity fragments being stored in a different physical unit strategically placed so that the failure of any single module in a data center\u2014be it a power unit, a switch, a computer, or a disk\u2014will affect only one data or parity fragment. Once the data is erasure-coded and all data and parity fragments are distributed, all three original copies can be deleted. Since the entire-erasure coding operation is performed in the background, it leads to minimum impact in performance.<\/p>\n

A well-understood way to perform the erasure-coding operation is called Reed-Solomon coding, which was devised in 1960 and was used in the U.S. space program to reduce communications errors. It also helped made compact discs possible by catching errors in the discs\u2019 digital coding. For example, by using 6+3 Reed Solomon code, which converts three copies of data to nine fragments\u2014six data and three parity, each \u2159 the size of the original data\u2014it cuts the data footprint in half, to an overhead cost of 1.5, resulting in not only half the necessary servers, but also half the power usage and half the physical server space. Lazy erasure coding leads to big cost savings.<\/p>\n

\"Parikshit

Parikshit Gopalan<\/p><\/div>\n

Coding data has a cost: It slows performance for servers to reassemble data from code, much as it might take a person longer to read a sentence in which every other letter is missing. Data retrieval also can be slowed if a data fragment is stored on a hard disk that has failed or is on a server that is temporarily offline while being upgraded. This is why the goal of this new approach is to reduce the time and cost of performing data retrieval, especially during hardware failures and common data-center operations such as software upgrades. In addition to reducing data-retrieval time, the goal of the new approach is to perform lazy erasure coding that enables even greater data compression\u2014reducing the data-storage overhead to 1.33 or lower.<\/p>\n

The team sought to achieve this with minimal performance losses. Just to achieve storage overhead of 1.33, it is possible to use a 12+4 Reed-Solomon coding, which splits the extent into 12 fragments, deriving four parity fragments that protect the 12 original ones, each 1\/12 size of the original data.<\/p>\n

\u201cBut there is an undesirable effect,\u201d Li says. \u201cIf a piece of data fails, you will need to read 12 fragments to reconstruct the data. That leads to 12 disk I\/O actions and 12 network transfers, and that\u2019s expensive, double the disk I\/O actions and network transfers needed in 6+3 Reed-Solomon coding.\u201d<\/p>\n

Huang explains further.<\/p>\n

\"Cheng

Cheng Huang<\/p><\/div>\n

\u201cEncoding and decoding is just one of many operations done in cloud-storage systems,\u201d he says. \u201cIf you spend a lot of computational resources on that,\u201d he says, \u201cthen it eats into other operations\u2014data compression, encryption, de-duplication, cleaning up redundant data, and other things.\u201d<\/p>\n

Reed-Solomon coding is designed for deep space communication, in which error occurs commonly and equally to both data and parity symbols and the design tenet is to tolerate as many errors as possible given a certain overhead. The error pattern in the data center behaves differently from that of deep space communication. First, a well-designed and -monitored data center has a low hard-failure rate. That means that most of the extents in the data center are healthy, with no failed data fragment. Only a small number of extents have one failed data or parity fragment. The extents with two or more failed data or parity fragments are rare and will only appear for a short duration, as the data center will repair those extents quickly and bring them to healthy state. Second, if a data fragment cannot be accessed, the predominant reason is a temporary error caused by a system upgrade, load balancing across servers, or other routine operations.<\/p>\n

Built upon the rich mathematical theory of locally decodable codes and probabilistically checkable proofs, this new approach is called Local Reconstruction Codes (LRCs) and enables data to be reconstructed more quickly than with Reed-Solomon codes, because fewer data fragments must be read to re-create the original data in the majority of the failure patterns. In fact, only half the fragments are required\u2014six, rather than 12. In addition, LRCs are much simpler mathematically than prior techniques, resulting in a smaller Galois field\u2014a mathematical construct that reflects the complexity of the operations used to combine the data pieces.<\/p>\n

The \u201clocal\u201d in the coding technique\u2019s name refers to the concept that, in the event of a fragment being offline, as for a server failure or an upgrade, the code needed to reconstruct data is not spread across the entire span of a data center\u2019s servers.<\/p>\n

\"Sergey

Sergey Yekhanin<\/p><\/div>\n

\u201cThe data needs to be available quickly, without reading too much data,\u201d Yekhanin says. \u201cThat\u2019s where the notion of locality comes from.\u201d<\/p>\n

The new coding approach also meets the two main criteria of data storage: Data needs to be reliably stored so it is durable, and it needs to be readily available. Data durability is excellent with LRCs\u2014a data chunk can have three failures and still be rebuilt with 100 percent accuracy. In the unlikely event of four failures, the success rate to rebuild drops to 86 percent. LRC code has better durability than triple-replica and 6+3 Reed-Solomon code.<\/p>\n

Best of all, the new coding approach results in a data overhead of 1.29\u2014a 14 percent reduction over Reed-Solomon code\u2019s overhead of 1.5. That\u2019s perhaps not a huge gain in itself, but spread over the enormous amounts of data contained in Windows Azure Storage, it\u2019s significant.<\/p>\n

\u201cThe new Local Reconstruction Codes allow us to achieve our target storage overhead to keep our storage prices low,\u201d says Calder, a Microsoft distinguished engineer. \u201cLRCs provide faster reconstruction times over prior known codes, while still providing the durability we need with low storage overhead.\u201d<\/p>\n

LRCs represent a significant achievement in information theory and storage design, and the work was awarded the Best Paper award during the 2012 USENIX Annual Technical Conference, for Erasure Coding in Windows Azure Storage<\/em> (opens in new tab)<\/span><\/a>.<\/p>\n

LRCs could find wide application in computing. Li says one possible use for them may be in \u201cflash appliances\u201d\u2014devices made by combining several flash-memory drives. These memory devices are fast but require a special process called \u201cgarbage collection\u201d to clean up old or unused data. LRCs could help improve this process, because their design enables the flash memory to operate efficiently even during garbage collection.<\/p>\n

This work shows the strength of Microsoft\u2019s diversity, from theoretical computer scientists such as Gopalan and Yekhanin, to communications and multimedia experts such as Huang and Li in Microsoft Research, to distributed storage and systems experts as Simitci, Xu, Ogus, and Calder in Windows Azure Storage.<\/p>\n

\u201cIt shows how broad the organization is,\u201d Yekhanin says. \u201cWe have lots of people working in lots of different areas, and they can connect to create some amazing technology.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"

By Douglas Gantenbein, Senior Writer, Microsoft News Center These days, nearly everyone stores things in the \u201ccloud\u201d\u2014business-critical documents, personal photos, e-mail accounts \u2026 everything. Microsoft introduced Windows Azure Storage in 2008. Since then, that cloud offering has gained widespread use, not only within Microsoft, but also by thousands of external customers, and it currently stores […]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[205399,194475,194488,194460],"tags":[195057,195270,213551,213566,213557,213560,213545,213554,213548,213563,204593],"research-area":[13563,13560,13555],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-303512","post","type-post","status-publish","format-standard","hentry","category-azure","category-database-data-analytics-platforms","category-program-languages-and-software-engineering","category-search-and-information-retrieval","tag-cloud-storage","tag-data-storage","tag-deep-space-communication","tag-flash-appliances","tag-galois-field","tag-information-theory","tag-lazy-erasure-coding","tag-local-reconstruction-codes-lrcs","tag-reed-solomon-coding","tag-storage-design","tag-windows-azure-storage","msr-research-area-data-platform-analytics","msr-research-area-programming-languages-software-engineering","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[437022,474786],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"September 5, 2012","formattedExcerpt":"By Douglas Gantenbein, Senior Writer, Microsoft News Center These days, nearly everyone stores things in the \u201ccloud\u201d\u2014business-critical documents, personal photos, e-mail accounts \u2026 everything. Microsoft introduced Windows Azure Storage in 2008. Since then, that cloud offering has gained widespread use, not only within Microsoft, but…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/303512"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=303512"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/303512\/revisions"}],"predecessor-version":[{"id":303560,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/303512\/revisions\/303560"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=303512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=303512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=303512"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=303512"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=303512"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=303512"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=303512"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=303512"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=303512"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=303512"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=303512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}