{"id":642939,"date":"2020-03-23T10:37:52","date_gmt":"2020-03-23T17:37:52","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=642939"},"modified":"2020-04-30T15:54:48","modified_gmt":"2020-04-30T22:54:48","slug":"coyote-making-it-easier-for-developers-to-build-reliable-asynchronous-software","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/coyote-making-it-easier-for-developers-to-build-reliable-asynchronous-software\/","title":{"rendered":"Coyote: Making it easier for developers to build reliable asynchronous software"},"content":{"rendered":"
For developers, writing bug-free software that doesn\u2019t crash is getting difficult in an increasingly competitive world where software needs to ship before it becomes obsolete. This challenge is especially apparent with online cloud services, which are often dictated by aggressive shipping deadlines. Cloud services are distributed programs comprising multiple back-end systems that continuously exchange asynchronous signals while responding to incoming web requests. They are complex by nature, hard to get right, and require protection from failures that could jeopardize client data or halt key services.<\/p>\n
Such a programming environment is full of non-determinism (opens in new tab)<\/span><\/a><\/em>, or scenarios outside developers\u2019 control. For example, there\u2019s non-determinism in the scheduling of concurrent operations, the order in which messages are received, the random system failures, and the random firing of timers, either for retry logic or timeouts from other services that have become unresponsive. Non-deterministic systems exist in all software domains, not just cloud services, and best practices for building and testing these systems fall short. Techniques such as failure injection and stress testing can either be too complicated to set up or time-consuming with no guarantees that found bugs can be reproduced. Consider a cloud service that, let\u2019s say, implements the Raft consensus protocol among a group of machines in an effort to provide a highly reliable fault-tolerant cluster to clients. Such a system will have hundreds of messages flying back and forth between the machines. You do stress testing and don\u2019t find any bugs, but can you really be confident that you\u2019re ready to ship?<\/p>\n We\u2019re excited to announce the release of Coyote, an open-source .NET framework (opens in new tab)<\/span><\/a> from Microsoft Research that guides developers toward designing, implementing, and testing code in a way that embraces non-determinism and asynchrony and helps them create asynchronous systems quickly and confidently. Instead of trying to hide non-determinism, Coyote helps explicitly model non-determinism in a system and uses the information to provide a state-of-the-art testing tool. This advanced testing tool can control every source of non-determinism defined, including the exact order of every asynchronous operation, which allows it to systematically explore all the possibilities. The tool runs very quickly and reaches unheard-of levels of coverage of all non-deterministic choices in code, enabling it to find most of the tricky bugs in a way that\u2019s also trivial to reproduce and debug.<\/p>\n A result of years of investment from Microsoft Research in the space of program verification and testing, Coyote is being used to build various components of Microsoft Azure Compute (opens in new tab)<\/span><\/a>, such as Azure Batch (opens in new tab)<\/span><\/a>, and Microsoft Azure Blockchain (opens in new tab)<\/span><\/a>. The framework has received positive feedback (opens in new tab)<\/span><\/a> from the Azure teams using it. One engineer said, \u201cFeatures developed in Coyote test mode worked perfectly in production first time,\u201d while another noted, \u201cA feature that took six months without Coyote was developed in one month using Coyote.\u201d Engineers expressed experiencing a \u201csignificant confidence boost\u201d as a result, allowing them to \u201cchurn [out] code much faster than before.\u201d<\/p>\n Coyote, which evolved from a previous Microsoft Research project called P#, is a combination of a programming model, a lightweight runtime, and a testing infrastructure all packaged as a portable library with minimal dependencies. The framework supports two main programming models: an asynchronous tasks<\/em> programming model (in preview) and an asynchronous actors<\/em> programming model.<\/p>\n If you\u2019re happy developing your code using C# async\/await<\/em> construct for asynchronous tasks, then Coyote can add value on top of that. If you switch to the Coyote task library (opens in new tab)<\/span><\/a>, the Coyote testing tool will look for bugs by systematically exploring the concurrency between your tasks. However, while the C# async\/await<\/em> feature is wonderful, it sometimes yields code that is too parallel, resulting in a lot of complexity. For example, when performing two or more concurrent tasks, you may need to guard private data with locks, and then you have to worry about deadlocks. Coyote offers an alternative that solves this with the more advanced asynchronous actors programming model (opens in new tab)<\/span><\/a>.<\/p>\n Actors constrain your parallelism so that a given actor receives messages in a serialized order via an inbox. Actor models (opens in new tab)<\/span><\/a> have gained a lot of popularity, especially in the area of distributed systems, precisely because they help manage the complexity of a system. Actors essentially embrace asynchrony by making every message between actors an async<\/em> operation. Coyote fully understands the semantics of actors and can do a world-class job of testing them and finding even the most subtle bugs. The framework goes one step further, providing a type of actor called a state machine (opens in new tab)<\/span><\/a>, which it knows how to fully test, ensuring every state is covered and every state transition is tested.<\/p>\n <\/p>\nCoyote programming models<\/h3>\n