iDice: Problem Identification for Emerging Issues

One challenge for maintaining a large-scale software system, especially an online service system, is to quickly respond to customer issues. The issue reports typically have many categorical attributes that reflect the characteristics of the issues. For a commercial system, most of the time the volume of reported issues is relatively constant. Sometimes, there are emerging issues that lead to significant volume increase. It is important for support engineers to efficiently and effiectively identify and resolve such emerging issues, since they have impacted a large number of customers. Currently, problem identification for an emerging issue is a tedious and error-prone process, because it requires support engineers to manually identify a particular attribute combination that characterizes the emerging issue among a large number of attribute combinations. We call such an attribute combination effective combination, which is important for issue isolation and diagnosis. In this paper, we propose iDice, an approach that can identify the effective combination for an emerging issue with high quality and performance. We evaluate the effectiveness and efficiency of iDice through experiments. We have also successfully applied iDice to several Microsoft online service systems in production. The results confirm that iDice can help identify emerging issues and reduce maintenance effort.