Boosting internal audits at Microsoft with audit digitization, machine learning

|

Flagging suspicious invoices with recycled images has gotten more precise thanks to a new ML-powered audit digitization tool at Microsoft.

Microsoft Digital technical storiesImagine sifting through hundreds of photos of cupcakes to find two images of the same cupcake taken in the same place at the same time, within minutes. That’s one of the ways that Microsoft’s Audit, Risk, and Compliance (ARC) team made sure that the invoices the company is paying are accurate and legitimate—and it’s now able to do that better and at larger scale thanks to a new audit digitization project powered by machine learning (ML).

When an external company submits a payment invoice to Microsoft, they must provide what’s called proof of execution (POE) to Microsoft invoice approvers proving that the service was indeed performed. Hence the virtual mountain of visual evidence of not just cupcakes, but entire lunch buffets, swag orders, promotion campaigns, and more, contained in files of Microsoft PowerPoint decks, PDFs, and Microsoft Word documents.

Microsoft’s Audit team in Microsoft Finance periodically reviews subsidiaries to, among other things, make sure that invoice approvers, vendors, and suppliers are following company procurement policies and processes to protect company assets and interests.

Matching cupcake photos are a flag of possible recycled POE—signaling that a vendor may have reused an existing image, which is not considered legitimate proof of the service provided.

Fei Guo is a senior data solutions manager for Microsoft Strategy and Solutions Technology, which is dedicated to analyzing business needs for the ARC team and connecting them with solutions. She was tasked by her business users to explore solutions for detecting recycled POEs using machine learning so they could easily find those needles in the haystack.

“Finding reused POE documents is extremely difficult for a human to do,” Guo says. Because manually comparing millions of images isn’t possible, auditors would typically test POs based on random samples. “We needed a way to detect similar images at scale.”

[Find out how Microsoft applied Azure Cognitive Services to automate partner claim validation. Learn about automating revenue processing at Microsoft with Power Automate.]

The Goldilocks algorithm

Guo brought the challenge to an engineering team in Microsoft Audit and Compliance within the Microsoft Cloud + AI organization.

They formed a volunteer team to explore audit digitization using machine learning built entirely on Microsoft Azure as a project for a Microsoft Hackathon, a yearly event where employees from across the company are invited to come up with freeform projects and solutions in an intensive three-day session.

We know recycled POE is an industry-wide problem. We could solve it using new technology in areas of ML that I was excited to learn about.

– Anuj Bansal, principal software engineer, Microsoft

Guo and Bansal pose for separate photos. Guo is indoors in front of a washed-out background and Bansal is outside in front of greenery.
Fei Guo (left) and Anuj Bansal helped build a new ML-powered audit digitization tool that’s helping Microsoft “find a needle in the haystack.” (Photos by Fei Guo and Anuj Bansal)

“We know recycled POE is an industry-wide problem,” says Anuj Bansal, a principal software engineer for Commerce Financial Systems (CFS) who joined the Hackathon. “We could solve it using new technology in areas of ML that I was excited to learn about.”

If it worked, audit digitization using Azure technology had the potential to transform the auditing process, from just using samples at a minuscule proportion of the actual data volume to enabling the company to audit 100 percent of its data with greater accuracy in far less time.

The team presented a proof of concept for their solution at the Hackathon, which they called the Recycled POE tool, and won leadership support to further develop it into reality.

The first part was figuring out how to extract the data from the invoices, given the many different types of files and formats. Each POE typically generates around 10–12 images. ARC tests 10,000 purchase orders on average each year, and the volume grows by around 1 million images annually. They applied Microsoft Azure Cognitive Services to extract images from MS Invoice POE files to standardize the process across all document types.

Next came the bigger challenge: finding the most accurate algorithm.

They built a custom machine learning tool using an algorithm called a Hierarchical Navigable Small World (HNSW) graph to calculate the similarity score between images.

We’re transforming audit from reacting and sampling to a more live approach where you audit in a way that humans can’t.

– Jose De Jesus Sanchez Rico, senior software engineering manager, Microsoft

“We went through a lot of the standard algorithms,” Bansal says. “Some were too slow; some were too sensitive. Others were faster, but we saw too many false positives and didn’t get the best results. It was a journey to figure out which algorithm was the right one for us.”

With the “just right” algorithm, images are now processed using the HNSW graph in batches and saved to a Microsoft Azure SQL database. All of this runs in the background while auditors can focus on other tasks, and the results are delivered to them in a Microsoft Power BI report.

Invoicing steps: Job submittal, entering job details, document storage, extracting images, extracting features, and image processing.
Microsoft’s new ML-powered audit digitization tool allows the Microsoft Audit team to methodically track invoices step by step.

After nearly a year experimenting and fine-tuning, the new ML features were launched last September. The system can process 21,000 images per minute, allowing large-scale detection of similar images submitted as POEs for different purchase orders.

“The significance of this is that we’re transforming audit from reacting and sampling to a more live approach where you audit in a way that humans can’t,” says Jose De Jesus Sanchez Rico, a senior software engineering manager for Microsoft’s Core Functional Engineering who managed the team that developed the Hackathon project into production. “It’s having that perspective of what only ML can do for you.

Human learning

There’s still more that audit digitization can do for Microsoft.

The team is working on fine-tuning the results further by filtering out noise in the data such as logo images contained in documents and email signatures. The ML model can be strengthened still by feeding it more precision image metadata like time stamps and geotags.

The Recycled POE tool has also been recently integrated with another innovation of the audit digitization journey: a Microsoft Teams audit digitization assistant bot that auditors can access to perform common functions.

Taking the solution a step further, the team is also hoping to extend the Recycled POE tool to be used by invoice approvers as a proactive compliance check, rather than a reactive one, to catch mistakes before they’re made in the first place.

Jagannathan Venkatesan, a principal group engineering manager for Microsoft Finance Management, which oversees the Audit and Compliance team, says that he was especially impressed by the Hackathon team’s perseverance and eagerness to learn more about machine learning.

“There were setbacks and challenges from a performance perspective, but the team didn’t give up,” Venkatesan says. “We worked with the partner engineering teams, got guidance on how to improve, and we pulled it through.”

That makes them even better prepared for whatever comes next.

“We work in Finance, and we work with a lot of data, so data science and ML have always been areas where we want to learn,” Venkatesan says. “So, we took this as an opportunity to make an impact but at the same time, educate the team so that my engineering workforce is prepared for the technology shifts of tomorrow.”

Related links

Recent