{"id":9792,"date":"2023-02-21T12:35:58","date_gmt":"2023-02-21T20:35:58","guid":{"rendered":"https:\/\/www.microsoft.com\/insidetrack\/blog\/?p=9792"},"modified":"2023-03-02T11:20:31","modified_gmt":"2023-03-02T19:20:31","slug":"rotating-devops-role-improves-engineering-service-quality","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/insidetrack\/blog\/rotating-devops-role-improves-engineering-service-quality\/","title":{"rendered":"Rotating DevOps role improves engineering service quality"},"content":{"rendered":"

\"MicrosoftAs many high-performing agile software engineering teams embrace a DevOps culture, they\u2019re adding the role of Directly Responsible Individual (DRI). The role is also known by various other names, such as Google\u2019s \u201cSheriff\u201d or Facebook\u2019s slightly different \u201cDesignated Response Individual.\u201d Rotating within an agile team, the DRI is responsible for service availability, service health, and incident management. The DRI advocates for the customer and drives positive changes to improve the customer experience with services.<\/p>\n

In Microsoft Digital, we\u2019re using a DRI to help us deliver better services faster and more cost effectively. The DRI actively looks at services in production, thereby helping our agile teams be proactive rather than reactive. This has helped us reduce\u2014by up to 50 percent\u2014the number of support tickets and bugs that we have to resolve. With the rest of the team free of this distraction, they have more time to deliver business value.<\/p>\n

We used to only get four to five hours per day of productive work out of each software engineer. Since adding this this role to our teams, productive time has increased to six hours per day. This role also reduces risk because resolving issues doesn\u2019t interfere with our ability to deliver on a sprint. In addition, we\u2019re finding that the DRI reduces the number of engagements we have with support, so these costs also are going down.<\/p>\n

[Take a look at how deploying Kanban at Microsoft leads to engineering excellence.<\/a> Find out more about transforming modern engineering at Microsoft.<\/a> Learn more about powering Microsoft\u2019s operations transformation with Microsoft Azure.<\/a>]<\/em><\/p>\n

DRI process and expectations<\/h2>\n

In Microsoft Digital, we have a primary DRI with a secondary DRI as a backup. The primary DRI is 100 percent allocated to this role and has no other team tasks. Each day, the primary DRI reviews incident logs, responds to critical incidents or patterns of incidents. They also log defects, and assign them to individuals based on root cause analysis. For visibility, the secondary DRI is looped into any issues. In the event the primary DRI is unavailable or busy, the secondary DRI steps in.<\/p>\n

DRI role rotation<\/h3>\n

The primary and secondary DRI role rotates across all team members. For a seamless transition, the secondary DRI becomes the primary DRI at the next rotation. The primary and secondary DRI don\u2019t overlap the Scrum Master role during the same sprint.<\/p>\n

The rotation cadence is two weeks, which aligns with the ideal two-week sprint cadence. This ensures that the DRI can participate in service reviews and other service-line meetings that are held every other week. It also ensures that the DRI has ample impact during the sprint and the opportunity to spend time in preferred engineering activities. Rotations start on the first day of the sprint and last until the first day of the next sprint. It\u2019s up to the sprint team to track and manage their DRI schedule.<\/p>\n

Sprint capacity<\/h3>\n

DRI activities require effort, and effort doesn\u2019t come free. Effort correlates to capacity, and existing engineering efforts need to change or stop to free up this capacity. For this reason, the primary DRI is not accounted for in the current sprint capacity. We schedule the primary DRI time as “days off” in Visual Studio Team Services (VSTS). This keeps DRI work from having an impact on the sprint plan. In the event the secondary DRI becomes heavily engaged, we have to re-plan the sprint accordingly.<\/p>\n

Incident management<\/h3>\n

The DRI responds to incidents in two ways:<\/p>\n