{"id":6966,"date":"2024-02-14T09:50:20","date_gmt":"2024-02-14T17:50:20","guid":{"rendered":"https:\/\/www.microsoft.com\/insidetrack\/blog\/?p=6966"},"modified":"2024-02-14T12:24:25","modified_gmt":"2024-02-14T20:24:25","slug":"boosting-internal-audits-at-microsoft-with-audit-digitization-machine-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/insidetrack\/blog\/boosting-internal-audits-at-microsoft-with-audit-digitization-machine-learning\/","title":{"rendered":"Boosting internal audits at Microsoft with audit digitization, machine learning"},"content":{"rendered":"

\"MicrosoftImagine sifting through hundreds of photos of cupcakes to find two images of the same cupcake taken in the same place at the same time, within minutes. That\u2019s one of the ways that Microsoft\u2019s Audit, Risk, and Compliance (ARC) team made sure that the invoices the company is paying are accurate and legitimate\u2014and it\u2019s now able to do that better and at larger scale thanks to a new audit digitization project powered by machine learning (ML).<\/p>\n

When an external company submits a payment invoice to Microsoft, they must provide what\u2019s called proof of execution (POE) to Microsoft invoice approvers proving that the service was indeed performed. Hence the virtual mountain of visual evidence of not just cupcakes, but entire lunch buffets, swag orders, promotion campaigns, and more, contained in files of Microsoft PowerPoint decks, PDFs, and Microsoft Word documents.<\/p>\n

Microsoft\u2019s Audit team in Microsoft Finance periodically reviews subsidiaries to, among other things, make sure that invoice approvers, vendors, and suppliers are following company procurement policies and processes to protect company assets and interests.<\/p>\n

Matching cupcake photos are a flag of possible recycled POE\u2014signaling that a vendor may have reused an existing image, which is not considered legitimate proof of the service provided.<\/p>\n

Fei Guo is a senior data solutions manager for Microsoft Strategy and Solutions Technology, which is dedicated to analyzing business needs for the ARC team and connecting them with solutions. She was tasked by her business users to explore solutions for detecting recycled POEs using machine learning so they could easily find those needles in the haystack.<\/p>\n

\u201cFinding reused POE documents is extremely difficult for a human to do,\u201d Guo says. Because manually comparing millions of images isn\u2019t possible, auditors would typically test POs based on random samples. \u201cWe needed a way to detect similar images at scale.\u201d<\/p>\n

[Find out how Microsoft applied Azure Cognitive Services to automate partner claim validation.<\/em><\/a> Learn about automating revenue processing at Microsoft with Power Automate.<\/em><\/a>]<\/p>\n

The Goldilocks algorithm<\/h2>\n

Guo brought the challenge to an engineering team in Microsoft Audit and Compliance within the Microsoft Cloud + AI organization.<\/p>\n

They formed a volunteer team to explore audit digitization using machine learning built entirely on Microsoft Azure as a project for a Microsoft Hackathon, a yearly event where employees from across the company are invited to come up with freeform projects and solutions in an intensive three-day session.<\/p>\n

We know recycled POE is an industry-wide problem. We could solve it using new technology in areas of ML that I was excited to learn about.<\/p>\n

– Anuj Bansal, principal software engineer, Microsoft<\/p>\n<\/blockquote>\n

\"Guo
Fei Guo (left) and Anuj Bansal helped build a new ML-powered audit digitization tool that\u2019s helping Microsoft \u201cfind a needle in the haystack.\u201d (Photos by Fei Guo and Anuj Bansal)<\/figcaption><\/figure>\n

\u201cWe know recycled POE is an industry-wide problem,\u201d says Anuj Bansal, a principal software engineer for Commerce Financial Systems (CFS) who joined the Hackathon. \u201cWe could solve it using new technology in areas of ML that I was excited to learn about.\u201d<\/p>\n

If it worked, audit digitization using Azure technology had the potential to transform the auditing process, from just using samples at a minuscule proportion of the actual data volume to enabling the company to audit 100 percent of its data with greater accuracy in far less time.<\/p>\n

The team presented a proof of concept for their solution at the Hackathon, which they called the Recycled POE tool, and won leadership support to further develop it into reality.<\/p>\n

The first part was figuring out how to extract the data from the invoices, given the many different types of files and formats. Each POE typically generates around 10\u201312 images. ARC tests 10,000 purchase orders on average each year, and the volume grows by around 1 million images annually. They applied Microsoft Azure Cognitive Services to extract images from MS Invoice POE files to standardize the process across all document types.<\/p>\n

Next came the bigger challenge: finding the most accurate algorithm.<\/p>\n

They built a custom machine learning tool using an algorithm called a Hierarchical Navigable Small World (HNSW) graph to calculate the similarity score between images.<\/p>\n

We\u2019re transforming audit from reacting and sampling to a more live approach where you audit in a way that humans can\u2019t.<\/p>\n

– Jose De Jesus Sanchez Rico, senior software engineering manager, Microsoft<\/p>\n<\/blockquote>\n

\u201cWe went through a lot of the standard algorithms,\u201d Bansal says. \u201cSome were too slow; some were too sensitive. Others were faster, but we saw too many false positives and didn\u2019t get the best results. It was a journey to figure out which algorithm was the right one for us.\u201d<\/p>\n

With the \u201cjust right\u201d algorithm, images are now processed using the HNSW graph in batches and saved to a Microsoft Azure SQL database. All of this runs in the background while auditors can focus on other tasks, and the results are delivered to them in a Microsoft Power BI report.<\/p>\n

\"Invoicing
Microsoft\u2019s new ML-powered audit digitization tool allows the Microsoft Audit team to methodically track invoices step by step.<\/figcaption><\/figure>\n

After nearly a year experimenting and fine-tuning, the new ML features were launched last September. The system can process 21,000 images per minute, allowing large-scale detection of similar images submitted as POEs for different purchase orders.<\/p>\n

\u201cThe significance of this is that we\u2019re transforming audit from reacting and sampling to a more live approach where you audit in a way that humans can\u2019t,\u201d says Jose De Jesus Sanchez Rico, a senior software engineering manager for Microsoft\u2019s Core Functional Engineering who managed the team that developed the Hackathon project into production. \u201cIt\u2019s having that perspective of what only ML can do for you.<\/p>\n

Human learning<\/h2>\n

There\u2019s still more that audit digitization can do for Microsoft.<\/p>\n

The team is working on fine-tuning the results further by filtering out noise in the data such as logo images contained in documents and email signatures. The ML model can be strengthened still by feeding it more precision image metadata like time stamps and geotags.<\/p>\n

The Recycled POE tool has also been recently integrated with another innovation of the audit digitization journey: a Microsoft Teams audit digitization assistant bot that auditors can access to perform common functions.<\/p>\n

Taking the solution a step further, the team is also hoping to extend the Recycled POE tool to be used by invoice approvers as a proactive compliance check, rather than a reactive one, to catch mistakes before they\u2019re made in the first place.<\/p>\n

Jagannathan Venkatesan, a principal group engineering manager for Microsoft Finance Management, which oversees the Audit and Compliance team, says that he was especially impressed by the Hackathon team\u2019s perseverance and eagerness to learn more about machine learning.<\/p>\n

\u201cThere were setbacks and challenges from a performance perspective, but the team didn\u2019t give up,\u201d Venkatesan says. \u201cWe worked with the partner engineering teams, got guidance on how to improve, and we pulled it through.\u201d<\/p>\n

That makes them even better prepared for whatever comes next.<\/p>\n

\u201cWe work in Finance, and we work with a lot of data, so data science and ML have always been areas where we want to learn,\u201d Venkatesan says. \u201cSo, we took this as an opportunity to make an impact but at the same time, educate the team so that my engineering workforce is prepared for the technology shifts of tomorrow.\u201d<\/p>\n

\"Related<\/p>\n