This is a long term study that is looking at the evolution of software processes over the years from multi-year development to release box set products, to the continuous deployment of services.
To assist in the research a number of telemetry systems have been developed to allow the complete life cycle of the development process to be captured. These processes have been transferred to product groups over the years and continue to capture the complete development life cycles. The research have analyzed multiple aspects of the development process in terms of verification, organization, software languages and project structures.
The research has captured the transition of Microsoft to a continuous deployment model. The variations of products within Microsoft, from pure services such as Bing, to products that are both on-Prem and services (such as Office) and Windows 10 which is an on-prem product with additional services, provides an excellent opportunities to understand the pros and cons of different techniques for different products.
There are many deployment models being used within Microsoft, both in terms of the frequency of deployment (from Windows deploying every 3 months to services deploying multiple times a day) and also how they manage the release of major features. Within projects the deployment frequency and also the verification processes vary considerably depending upon the criticality of the software and the level of customer acceptance for change.
The verification challenges vary greatly between products. While it is important to ensure a small UI change on a service does not break the page, but a measure of correctness is if customers ‘like’ the change. Whereas an update to an index service for Bing is dependent upon clearly deterministic measures in terms of performance.
Most of Microsoft release processes make use of gate or feature flags which provides the capability to enable or disable a specific piece of code for individual users. A new version of a component is released and the gate can determine if a specific user will run the new or old version of the component. A new change can be rolled out to a small number of users to ensure that the change does not fail, or it verify that the users ‘like’ the new change. After the developers are happy with the change it can be rolled out to all users. Alternatively if a change is found to contain bugs, the gate can turn of the change and users will revert back to the old version of the component. Another use of gates is to perform experiments, often referred to as AB testing.
Our research is primarily about identifying the optimum way for product groups to develop and release their software. The factors that determine this are related to the age of the product or service, the cost of failure of the product or service and the structure of the development organization. This is continuingly changing as new processes are developed and organizations adapt. Finally as the company has moved to home working during the pandemic this has added an additional complications in the deployment process.
Personne
Christian Bird
Principal Researcher
Jacek Czerwonka
Principal Engineering Manager
Kim Herzig
Principal Software Engineering Manager
Katja Kevic
Senior Software Engineer
Jennifer Beckmann
Principal Software Engineering Manager
Microsoft Office Fuel
Laurie Williams
Distinguished Professor
NCSU