Getting data reliability right at Microsoft with scalable engineering solutions

Gowri Krishnan stands in front of a green shrub with her arms crossed and smiles.
Gowri Krishnan leads Microsoft’s bid to apply modern engineering principles to data management inside the company. (Photo submitted by Gowri Krishnan)

After its employees, data is arguably Microsoft’s most valuable asset. The company relies on data to digitally transform how it operates its businesses, develop products and services, and upgrade the experiences it offers to customers and employees.

To harness the full potential of its data, Microsoft needed a scalable and supportable way to manage these vast flows of information that are the lifeblood of the company.

Praveen Krishnan is one Microsoft employee who needed a tool to manage his team’s data. An engineering manager for the Marketing Data and Insights team at Microsoft, he manages the team that owns marketing data and insights used for revenue-based marketing. This data is collected at marketing events and via marketing campaigns, and it’s used to create personalized customer experiences that support engagement and retention.

“We use the insights and intelligence from this data to empower our customers to use Microsoft technology to accelerate their transformation,” says Praveen, a senior software engineering manager in Microsoft Digital.

Clearly, customer data is very sensitive and must be secured and well managed—that’s the top priority of Praveen and his team.

“We need strict privacy controls in place, so we only enable access to the minimum data required for each use case,” Praveen says. “I always ask, ‘Who can access the marketing data that I manage, and why do they need it? What’s the potential business impact?’”

To answer these questions, Praveen used to have to track down all the teams and users at Microsoft who had used his data and prevent unnecessary data copies. This was a largely manual process.

“There has to be a better way,” he remembers thinking.

Data reliability is foundational to enabling responsible data democratization and developing impactful data applications. We saw an opportunity to deliver proactive and scalable engineering solutions for enterprise data management and operations.

– Gowri Krishnan, senior software engineer on the Microsoft Digital team

Praveen wasn’t alone in this problem. There was no easy way to track all the sources, copies, and use of data across the company. Many teams would often end up generating data and creating data products with the same goals. Such duplication caused data proliferation and increased the risks of data exposure.

Gowri Krishnan, a senior software engineer, and her MS Digital Data team were ready to take on these challenges.

“Since teams did not have full visibility in how their data was used, there were multiple copies of the same data across the company,” Gowri says. “Preventing proliferation required manual and time-intensive data governance practices that were hard to scale.”

Such risks are hard to recover from, especially when they’re found further into the production process.

“With the scale of our data and the complexity of data exposure risks, it can be challenging to detect and act on every data management risk across Microsoft’s data estate,” Gowri says.

It was clear that employees like Praveen needed scalable engineering solutions for holistic data reliability to secure their data assets, prevent data proliferation, and enable compliant use of their data.

“Data reliability is foundational to enabling responsible data democratization and developing impactful data applications,” Gowri says. “We saw an opportunity to deliver proactive and scalable engineering solutions for enterprise data management and operations.”

Enterprise Data Reliability Engineering (DRE) is Microsoft’s approach to scale its data management and operations capabilities. The DRE approach is centered on proactive and scalable engineering solutions for data reliability, characterized by a secure, discoverable, high-quality, compliant, and operationally efficient enterprise data estate.

Enterprise DRE offers scalable solutions for each facet of big data operations,” Gowri says. “It benefits everyone along the data management pipeline from data publishers to data consumers.”

[Read about how Microsoft turned to DevOps engineering practices to democratize data access at Microsoft. Learn how Microsoft powers digital transformation with modern data foundations. Find out how Microsoft unleashes the power of data with a modern data platform.]

Democratizing enterprise data management

The DRE team’s goal is to deliver scalable and intelligent engineering solutions to proactively prevent such risks.

“Our goal is to deliver scalable solutions as DRE foundations that enable every data publisher and data consumer at Microsoft to contribute to our enterprise data estate and adhere to our data management standard,” Gowri says.

The DRE foundation gives me actionable insights on my data estate and on how my users are using my data. With such optics in a single place, I can easily monitor the health of my data estate and use the built-in controls to fulfill essential data compliance and governance requirements.

– Praveen Krishnan, engineering manager on the Marketing Data and Insights team at Microsoft

Gowri and her team are onboarding teams in Microsoft Digital and across Microsoft onto the DRE.

“With DRE, we want to proactively detect, mitigate, and resolve data management risks that impact data reliability,” Gowri says. “We wanted to have scalable solutions with built-in intelligence to proactively prevent, detect, and address data security, reliability, and compliance risks. This enables us to scale data applications to support digital transformation at Microsoft.”

The DRE foundation connects employees like Praveen with a single integrated view of data assets at Microsoft with insights on their management and operations health, to proactively detect and mitigate data reliability risks with automated solutions and human actions. This is achieved by capturing telemetry related to all facets of data management and detecting and mitigating anomaly conditions by triggering automated actions and human engagement workflows.

“The DRE foundation gives me actionable insights on my data estate and on how my users are using my data,” Praveen says. “With such optics in a single place, I can easily monitor the health of my data estate and use the built-in controls to fulfill essential data compliance and governance requirements.”

Praveen, and soon others who manage data at Microsoft, can use DRE to gain full line-of-sight visibility of their data estate, the health of their data assets, and the use of their data by teams across Microsoft. Such insights enable Microsoft teams to manage their data estates efficiently and with compliance.

“Rather than building and operating services to ensure that our data is compliant, we can focus our engineering investments on differentiated marketing data applications,” Praveen says.

Building an enterprise DRE foundation

Building reusable foundations to enable the proactive management of such data reliability facets is key to enabling an organization to responsibly scale its data use for greater value outcomes and impact.

“Data management solutions must be scalable across an enterprise,” Gowri says. “They must be built and operated with consistency and as shared capabilities.”

The enterprise DRE foundation is made of the following components:

  • The enterprise data estate common data model is a set of standardized and extensible schemas to capture data and metadata for all assets in an enterprise data estate and their producers and consumers. Assets in an enterprise data estate include data infrastructure, data, data products built using data, and apps and services that use data products for insights and intelligence.
  • The enterprise data estate common data service is the API layer used to capture and manage data and metadata pertaining to data assets.
  • The enterprise data estate graph is constructed to relate data assets and their producers and consumers to construct the enterprise data estate lineage. It’s also used to extract meaningful information and insights, which serve as the foundation to scale intelligent actions based on these insights.
  • The enterprise data estate portal is the single destination for enterprise data estate insights and intelligence. It’s used to monitor and manage the health and reliability of the enterprise data estate.

Scaling DRE across the enterprise and beyond

Although visibility across a data estate is useful, the true DRE value proposition and differentiation comes from the insights and intelligence that can be used to mitigate and prevent data management risks.

“Getting actionable insights served to my team empowered us to reduce data fragmentation, optimize the deployment infrastructure choices of data products, and consolidate redundant consumer use cases of our data,” Praveen says.

Gowri and her team are applying the learnings from the partnership with Praveen in onboarding the marketing data team to incrementally scale in onboarding more teams from across Microsoft onto the DRE foundations.

Ultimately, the DRE foundations are key to enabling Microsoft’s mission to responsibly democratize data. Learnings and reusable solutions from this journey also benefit Microsoft’s customers, which Gowri and her team plan to do as they make progress.

“We are starting to realize the benefits of our DRE approach and foundations internally and have more to do and learn,” Gowri says. “We see these benefits as also being broadly applicable to Microsoft’s customers, and hope to share our learnings with the broader community.”

Read about how Microsoft turned to DevOps engineering practices to democratize data access at Microsoft

Learn how Microsoft powers digital transformation with modern data foundations.

Find out how Microsoft unleashes the power of data with a modern data platform.

Check out how Microsoft designed a modern data catalog to enable business insights.

Recent