{"id":23073,"date":"2014-09-09T09:02:00","date_gmt":"2014-09-09T16:02:00","guid":{"rendered":"http:\/\/blogs.microsoft.com\/cybertrust\/2014\/09\/09\/measure-twice-cut-once-with-rma-methodology\/"},"modified":"2023-05-15T23:05:02","modified_gmt":"2023-05-16T06:05:02","slug":"measure-twice-cut-once-with-rma-methodology","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2014\/09\/09\/measure-twice-cut-once-with-rma-methodology\/","title":{"rendered":"Measure Twice, Cut Once, With RMA Methodology"},"content":{"rendered":"
I\u2019ve been beating our drum for a while now about the inevitability of failure in cloud-based systems. Simply put, the complexities and interdependencies of the cloud make it nearly impossible to avoid service failure, so instead we have to go against our instincts and actually design for this eventuality.<\/p>\n
Once you accept this basic premise, the next question is how exactly do we need to change our design processes? The Resilience Modeling and Analysis (RMA) methodology is a key part of the answer.<\/p>\n
RMA brings the master carpenter\u2019s \u201cmeasure twice, cut once\u201d philosophy to engineering. The goal is to help ensure teams think through as many of the potential reliability-related issues as possible before committing code to production\u2014not to prevent every single failure mode, but to limit the impact a failure could have on customers if they occur.<\/p>\n
To be clear, RMA is deeper and broader than basic fault modeling and root-cause analysis. Adapted from the industry-standard technique known as Failure Mode and Effects Analysis (FMEA), RMA is a four-phase process:<\/p>\n
By working through these four phases, teams can gain a more detailed understanding of where known failure points are, what the impact of known failure modes is likely to be, and where to target engineering investments to help mitigate the highest-priority risks.<\/p>\n
Feedback we\u2019ve received from service teams who have worked through this process, is that one of the key outcomes is spending less post-deployment time managing and responding to live-site issues. Tightening the focus to reducing the impact of the most likely failures reclaims time to spend on the fun stuff\u2014like developing customer-facing innovations.<\/p>\n","protected":false},"excerpt":{"rendered":"
I\u2019ve been beating our drum for a while now about the inevitability of failure in cloud-based systems. Simply put, the complexities and interdependencies of the cloud make it nearly impossible to avoid service failure, so instead we have to go against our instincts and actually design for this eventuality. Once you accept this basic premise, […]<\/p>\n","protected":false},"author":61,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"content-type":[3662],"topic":[3683],"products":[],"threat-intelligence":[],"tags":[3822],"coauthors":[3626],"class_list":["post-23073","post","type-post","status-publish","format-standard","hentry","content-type-news","topic-security-management","tag-microsoft-security-insights"],"yoast_head":"\n