Measure Twice, Cut Once, With RMA Methodology

By David Bills

September 9, 2014

Microsoft Security Insights

I’ve been beating our drum for a while now about the inevitability of failure in cloud-based systems. Simply put, the complexities and interdependencies of the cloud make it nearly impossible to avoid service failure, so instead we have to go against our instincts and actually design for this eventuality.

Once you accept this basic premise, the next question is how exactly do we need to change our design processes? The Resilience Modeling and Analysis (RMA) methodology is a key part of the answer.

RMA brings the master carpenter’s “measure twice, cut once” philosophy to engineering. The goal is to help ensure teams think through as many of the potential reliability-related issues as possible before committing code to production—not to prevent every single failure mode, but to limit the impact a failure could have on customers if they occur.

To be clear, RMA is deeper and broader than basic fault modeling and root-cause analysis. Adapted from the industry-standard technique known as Failure Mode and Effects Analysis (FMEA), RMA is a four-phase process:

Pre-work: Diagram your resources, dependencies, and component interactions.
Discover: Identify potential failures and resilience gaps for each interaction identified in the pre-work phase.
Rate: Perform an impact analysis of the potential failures you’ve identified.
Act: Invest in and produce work items to improve resilience.

By working through these four phases, teams can gain a more detailed understanding of where known failure points are, what the impact of known failure modes is likely to be, and where to target engineering investments to help mitigate the highest-priority risks.

Feedback we’ve received from service teams who have worked through this process, is that one of the key outcomes is spending less post-deployment time managing and responding to live-site issues. Tightening the focus to reducing the impact of the most likely failures reclaims time to spend on the fun stuff—like developing customer-facing innovations.

Best practices

Incident response

Microsoft Incident Response

Cybercrime
Published Jun 29, 2023

3 min read
Patch me if you can: Cyberattack Series

The Microsoft Incident Response team takes swift action to help contain a ransomware attack and regain positive administrative control of the customer environment.
Best practices

AI and machine learning

Microsoft Intune
Published Jun 26, 2023

7 min read
Why endpoint management is key to securing an AI-powered future

With the coming wave of AI, this is precisely the time for organizations to prepare for the future. To be properly ready for AI, Zero Trust principles take on new meaning and scope. The right endpoint management strategy can help provide the broadest signal possible and make your organization more secure and productive for years to come.
News

Email security
Published May 19, 2023

3 min read
Cyber Signals: Shifting tactics fuel surge in business email compromise

Business email operators seek to exploit the daily sea of email traffic to lure victims into providing financial and other sensitive business information.
Events

Security management

Microsoft Defender
Published May 15, 2023

8 min read
Microsoft Security highlights from RSA Conference 2023

At RSA Conference April 24 to 26, 2023, Microsoft Security shared solution news and insights. Watch Vasu Jakkal’s keynote on-demand (video courtesy of RSA conference).

Measure Twice, Cut Once, With RMA Methodology

Related Posts

Patch me if you can: Cyberattack Series

Why endpoint management is key to securing an AI-powered future

Cyber Signals: Shifting tactics fuel surge in business email compromise

Microsoft Security highlights from RSA Conference 2023

Get started with Microsoft Security