This content has been archived, and while it was correct at time of publication, it may no longer be accurate or reflect the current situation at Microsoft.
Setting up and managing data products without an engineer didn’t seem possible.
But Aishwarya Dinde, a software engineer with Microsoft’s Enterprise 360 Data Intelligence team, discovered otherwise as she looked for a way to operationalize several machine learning models under a single platform. Her team sits in Microsoft Digital, the engineering organization at Microsoft that builds and manages the products, processes, and services that the company runs on.
“Our team builds and operates solutions for enterprise infrastructure and networking analytics and intelligence,” Dinde says. “We have several opportunities where AI and machine learning can help. So, we were looking for engineering practices to just build, deploy, and monitor a model. However, when we tried existing resources, we encountered challenges in implementing production and monitoring.”
Machine learning, like any application of data with business value, is a type of data product. When improperly deployed or managed, machine learning is susceptible to a variety of obstacles, including scalability, compliance, fragmentation, governance, visibility, and security.
Knowing the stakes, Dinde set out to find a solution that would empower the team without generating additional manual workload.
Fortunately, Guru Prasad, principal engineering lead on Microsoft Digital’s Data team, had already developed a solution: a new platform to simplify the process of turning machine learning models and broader types of data products into reusable products. The data product DevOps platform developed by Prasad and his team enables data practitioners like data engineers, data analysts, and data scientists to turn data products into scalable, resilient, and compliant services without needing to be a software or operations engineer.
It was an unexpected but welcome solution for the Enterprise 360 Data Intelligence team.
But what started out as an answer for Dinde and the Enterprise 360 Data Intelligence team ended up being a surprise benefit to Prasad and the Data Product DevOps platform.
[Find out how Microsoft turned to DevOps engineering practices to democratize access to data. Learn how Microsoft powers digital transformation with modern data foundations. Check out how Microsoft designed a modern data catalog to enable business insights.]
Starting from scratch with a new machine learning data product
Operationalizing data products like machine learning models takes a significant amount of time, and requires modern software and operations engineering know-how to do well. When this doesn’t happen, there can be inefficiencies in data and data product fragmentation, resulting in exposure risks and reduced operations efficiencies and compliance.
According to Prasad, fragmentation, or duplicate data products, makes it harder to maintain visibility into how many data products exist within your organization. Without proper lineage tracing, it’s almost impossible to know where and how data products are being used. Additionally, duplicate data products consume valuable resources, diminish quality, and proliferate data exposure risks.
“Every data science team needs help in starting a project,” Prasad says. “The engineers create repositories with prescriptive code templates, build release pipelines, unit and integration test cases, and help in securing artifacts from access breaches and misuse, while the data scientists write the model. The project goes through the prebuild, build, deploy and run, and operations and insights phases.”
These phases include a range of activities, such as preparing the data, building out infrastructure, deploying the model, and monitoring outcomes and quality.
“The challenge is each phase requires the data scientist and data engineering team to come together and run certain steps,” Prasad says.
Machine learning is new to us, but we deal with a lot of data. To support and operationalize machine learning use cases, we needed a platform to operationalize the model and monitor its health.
– Aishwarya Dinde, software engineer at Microsoft
Without these phases, you can’t push a model into production. But what if you don’t have an engineer on your team with these skills?
Dinde knew that her team still had to get their data product right.
“Machine learning is new to us, but we deal with a lot of data,” Dinde says. “To support and operationalize machine learning use cases, we needed a platform to operationalize the model and monitor its health.”
The Enterprise 360 Data Intelligence team’s use cases for building a machine learning model were promising, but without a way to implement best practices, like tracing lineage and monitoring for duplication, they were in limbo.
That’s where the Data Product DevOps platform comes into play.
“What engineers used to do is now handled by the platform,” Prasad says. “What used to take two sprints now takes a day of work, completely handled by the data scientists. Data scientists no longer have to worry about all of these steps, it’s all baked into the platform for them.”
And the business problems?
The Data Product DevOps platform captures metadata to eliminate the risks of proliferation and fragmentation.
Not just a partnership, a collaboration
After watching a demo of the Data Product DevOps platform, Dinde saw an answer to her team’s problems. “Why not leverage an existing platform instead of building a brand-new solution?” Dinde says.
But first, she had a few questions for Prasad.
“We told him our requirements as part of batch processing deployment in our team,” Dinde says. “We have existing resources already created—can we reuse them instead of recreating them?”
Prasad assured them that the platform could support the different types of machine learning models, and that it could support new and existing Microsoft Azure resources.
“We created an abstraction layer to help data practitioners build, deploy, and operate data products as scalable, resilient, and compliant services,” Prasad says. “By using the abstraction layer, data practitioners can productionalize their data products without needing to be software or operations engineers.”
The advantage posed by the platform—a simple onboarding experience built over a unified set of offerings that assured modern engineering practices—sold Dinde. “Once I saw his business argument, I was convinced the platform could help us focus on modelling.”
It was a positive relationship from the start, but also a unique one.
Dinde and her team started building out the use cases from base templates made available in the Data Product DevOps platform within a sandbox environment.
The results were immediate.
“It helped our data scientists overcome common issues and focus on developing the model instead of engineering tasks,” Dinde says. “The Data Product DevOps platform addresses compliance and security requirements so that we can just come up with the model code and deploy according to our requirements.”
But the relationship goes beyond just providing a solution.
Dinde and her team are now contributing code and updating templates to improve functions within the platform. “We realized we could reuse the existing templates to meet other needs by making small modifications,” Dinde says.
Prasad’s team was supportive of the contributions.
“Aishwarya’s not just a partner getting onboarded to use the platform,” Prasad says. “Her team is now involved in developing new capabilities that will also serve future users.”
The platform was built with the most important and valuable data product capabilities already available, but by opening the solution up to feedback, the Data Product DevOps platform continues to get stronger.
“It’s an incremental agile process that will keep us ahead of the competition,” Prasad says.
We want to get to a proactive state. If I want to see the health of the model, the platform will provide me with metrics, including visibility into health and infrastructure cost. We can continuously track business outcomes of the model.
– Guru Prasad, principal engineering lead
What’s next for the DevOps data platform?
Prasad is proud of the progress he and the Data team have made, but knows there’s more work to be done.
“We want to get to a proactive state,” Prasad says. “If I want to see the health of the model, the platform will provide me with metrics, including visibility into health and infrastructure cost. We can continuously track business outcomes of the model.”
In the future, the Data Product DevOps platform will be able to proactively identify fragmentation, anomalies, and redundancies. A dashboard will make it even easier for data scientists to evaluate their models and make necessary adjustments. Teams who have onboarded their data can set up alerts to monitor the health of a data product.
And that has Dinde excited to see what’s next for the Data Product DevOps platform.
“We’re always looking for the best solution, especially if it’s end-to-end,” Dinde says. “Data scientists can focus on model development, and Data Product DevOps will take care of model operationalization and health monitoring during build and run phases. This platform brings everything together.”
The partnership between the Microsoft Digital Data team and the Enterprise 360 Data Intelligence team started as a solution offering, but the contributions from each are transforming this eventual product into something even better.
“We saw an opportunity to create an end-to-end solution to data product democratization. We now have a way to achieve quality, reduce fragmentation, improve lineage tracing, strengthen security, enable governance, and boost operational excellence,” Prasad says. “The end goal of the DevOps data platform is to operationalize all data products by having all the engineering rigors in place without the requirement of a data engineer.”
Dinde echoes the statement.
“The flexibility for not only one model, but any model, means a data scientist can focus on their use cases while the platform takes care of all aspects of operationalization,” she says.
Find out how Microsoft turned to DevOps engineering practices to democratize access to data.
Learn how Microsoft powers digital transformation with modern data foundations.
Check out how Microsoft designed a modern data catalog to enable business insights.