Azure Data and Storage Archives - Inside Track Blog

How Microsoft kept its underwater datacenter connected while retrieving it from the ocean

Aleenah Ansari — Fri, 06 Sep 2024 14:05:03 +0000

[Editor’s note: This content was written to highlight a particular event or moment in time. Although that moment has passed, we’re republishing it here so you can see what our thinking and experience was like at the time.]

When Microsoft announced its plan to build an underwater datacenter, Lathish Kumar Chaparala was excited.

“During the initial rollout of Project Natick, I used to log on to their website and watch the live feed of the underwater camera that was mounted on the datacenter,” says Chaparala, a senior program manager on the networking team in Microsoft Digital, the engineering organization at Microsoft that builds and manages the products, processes, and services that Microsoft runs on.

Little did he know that he and his team would later be brought in to extend the network connectivity of this underwater datacenter so it could be safely fished out of the sea.

But the story begins much earlier than that.

We saw the potential benefit [of developing an underwater datacenter] to the industry and Microsoft. People responded to our work as if we were going to the moon. In our eyes, we were just fulfilling our charter—taking on challenging problems and coming up with solutions.

– Mike Shepperd, senior research and development engineer on the Microsoft Research team

The idea of an underwater datacenter came out of ThinkWeek, a Microsoft event where employees shared out-of-the-box ideas that they thought the company should pursue. One creative idea was put forth by employees Sean James and Todd Rawlings, who proposed building an underwater datacenter powered by renewable ocean energy that would provide super-fast cloud services to crowded coastal populations.

Their idea appealed to Norm Whitaker, who led special projects for Microsoft Research at the time.

Out of this, Project Natick was born.

Shepperd (right) and Samuel Ogden test the underwater datacenter from the power substation where the datacenter connects to land, just off the coast of the Orkney Islands. (Photo by Scott Eklund | Red Box Pictures)

“Norm’s team was responsible for making the impossible possible, so he started exploring the viability of an underwater datacenter that could be powered by renewable energy,” says Mike Shepperd, a senior research and development engineer on the Microsoft Research team who was brought on to support research on the feasibility of underwater datacenters.

It quickly became a Microsoft-wide effort that spanned engineering, research, and IT.

“We saw the potential benefit to the industry and Microsoft,” Shepperd says. “People responded to our work as if we were going to the moon. In our eyes, we were just fulfilling our charter—taking on challenging problems and coming up with solutions.”

Researchers on the project hypothesized that having a sealed container on the ocean floor with a low-humidity nitrogen environment and cold, stable temperatures would better protect the servers and increase reliability.

“Once you’re down 20 to 30 meters into the water, you’re out of the weather,” Shepperd says. “You could have a hurricane raging above you, and an underwater datacenter will be none the wiser.”

[Read about how Microsoft is reducing its carbon footprint by tracking its internal Microsoft Azure usage. Find out how Microsoft Digital is using a modern network infrastructure to drive transformation at Microsoft.]

Internal engineering team steps up

The Project Natick team partnered with networking and security teams in Microsoft Digital and Arista to create a secure wide-area network (WAN) connection from the underwater datacenter to the corporate network.

“We needed the connectivity that they provided to finish off our project in the right way,” Shepperd says. “We also needed that connectivity to support the actual decommissioning process, which was very challenging because we had deployed the datacenter in such a remote location.”

In the spring of 2018, they deployed a fully connected and secure datacenter 117 feet below sea level in the Orkney Islands, just off the coast of Scotland. After it was designed, set up, and gently lowered onto the seabed, the goal was to leave it untouched for two years. Chakri Thammineni, a network engineer in Microsoft Digital, supported these efforts.

Chakri Thammineni, a network engineer at Microsoft Digital, and his team came up with a network redesign to extend the network connectivity of the underwater datacenter. (Photo submitted by Chakri Thammineni | Inside Track)

“Project Natick was my first engagement after I joined Microsoft, and it was a great opportunity to collaborate with many folks to come up with a network solution,” Thammineni says.

Earlier this year, the experiment concluded without interruption. And yes, the team learned that placing a datacenter underwater is indeed a more sustainable and efficient way to bring the cloud to coastal areas, providing better datacenter responsiveness.

With the experiment ending, the team needed to recover the datacenter so it could analyze all the data collected during its time underwater.

That’s where Microsoft’s internal engineering teams came in.

“To make sure we didn’t lose any data, we needed to keep the datacenter connected to Microsoft’s corporate network during our extraction,” Shepperd says. “We accomplished this with a leased line dedicated to our use, one that we used to connect the datacenter with our Microsoft facility in London.”

The extraction also had to be timed just right for the same reasons.

“The seas in Orkney throw up waves that can be as much as 9 to 10 meters high for most of the year,” he says. “The team chose this location because of the extreme conditions, reasoning it was a good place to demonstrate the ability to deploy Natick datacenters just about anywhere.”

And then, like it has for so many other projects, COVID-19 forced the team to change its plans. In the process of coming up with a new datacenter recovery plan, the team realized that the corporate connectivity was being shut down at the end of May 2020 and couldn’t be extended.

“Ordering the gear would’ve taken two to three months, and we were on a much shorter timeline,” Chaparala says.

Shepperd called on the team in Platform Engineering, a division of Microsoft Digital, to quickly remodel the corporate connectivity from the Microsoft London facility to the Natick shore area, all while ensuring that the connection was secured.

The mission?

Ensure that servers were online until the datacenter could be retrieved from the water, all without additional hardware.

Lathish Kumar Chaparala, a senior program manager on the networking team in Microsoft Digital, helped extend network connectivity of Microsoft’s underwater datacenter so it could be safely retrieved from the sea. (Photo submitted by Lathish Kumar Chaparala | Inside Track)

“My role was to make sure I understood the criticality of the request in terms of timeline, and to pull in the teams and expertise needed to keep the datacenter online until it was safely pulled out of the water,” Chaparala says.

The stakes were high, especially with the research that was on the line.

“If we lost connectivity and shut down the datacenter, it could have compromised the viability of the research we had done up until that point,” Shepperd says.

A seamless collaboration across Microsoft Research and IT

To solve this problem, the teams in Core Platform Engineering and Microsoft Research had to align their vision and workflows.

“Teams in IT might plan their work out for months or years in advance,” Shepperd says. “Our research is on a different timeline because we don’t know where technology will take us, so we needed to work together, and fast.”

Because they couldn’t bring any hardware to the datacenter site, Chaparala, Thammineni, and the Microsoft Research team needed to come up with a network redesign. This led to the implementation of software-based encryption using a virtual network operating system on Windows virtual machines.

It’s exciting to play a role in bringing the right engineers and program managers together for a common goal, especially so quickly. Once we had the right team, we knew there was nothing we couldn’t handle.

– Chakri Thammineni, a network engineer in Microsoft Digital

With this solution in tow, the team could extend the network connectivity from the Microsoft Docklands facility in London to the Natick datacenter off the coast of Scotland.

“Chakri and Lathish have consistently engaged with us to fill the gaps between what our research team knew and what these networking experts at Microsoft needed in order to take action on the needs of this project,” Shepperd says. “Without help from their teams, we would not have been able to deliver on our research goals as quickly and efficiently as we did.”

Lessons learned from the world’s second underwater datacenter

The research on Project Natick pays dividends in Microsoft’s future work, particularly around running more sustainable datacenters that could power Microsoft Azure cloud services.

“Whether a datacenter is on land or in water, the size and scale of Project Natick is a viable blueprint for datacenters of the future,” Shepperd says. “Instead of putting down acres of land for datacenters, our customers and competitors are all looking for ways to power their compute and to house storage in a more sustainable way.”

This experience taught Chaparala to assess the needs of his partner teams.

“We work with customers to understand their requirements and come up with objectives and key results that align,” Chaparala says.

Ultimately, Project Natick’s story is one of cross-disciplinary collaboration – and just in the nick of time.

“It’s exciting to play a role in bringing the right engineers and program managers together for a common goal, especially so quickly,” Chaparala says. “Once we had the right team, we knew there was nothing we couldn’t handle.”

The post How Microsoft kept its underwater datacenter connected while retrieving it from the ocean appeared first on Inside Track Blog.

Instrumenting ServiceNow with Microsoft Azure Monitor

Inside Track staff — Mon, 01 Apr 2024 14:50:48 +0000

At Microsoft we have integrated our ServiceNow environment with Microsoft Azure Monitor to create a comprehensive, end-to-end monitoring solution that helps us to ensure performance, reliability, and service excellence across our ServiceNow environment. Combining ServiceNow with Azure Monitor helps us monitor the availability, performance, and usage of the ServiceNow platform and provide actionable insights to our engineering team. Our new monitoring environment results in greater platform reliability and health and a better experience for all our ServiceNow users.

We on the Microsoft Digital Employee Experience (MDEE) team support more than 170 thousand employees and partners in more than 150 countries and regions. We use ServiceNow to meet several service and support needs for our organization, including incident management, problem management, service-level agreement (SLA) measurement, request fulfillment, self-service support, and knowledge management. ServiceNow provides built-in, real-time reporting and dashboarding for the aforementioned areas. Global Helpdesk, our support and service management organization, receives more than 3,000 incoming service requests each day.

We have been using ServiceNow as our primary service and support platform since 2015. In July 2019, we established a strategic partnership with ServiceNow to help accelerate digital transformation for our customers. As part of this collaborative effort, we have been using the ServiceNow platform to automate internal support processes while helping ServiceNow to develop and improve their platform to create a more reliable, efficient, and trusted framework for employee support.

[Learn how we’re using Microsoft Teams and ServiceNow to enhance end-user support.]

Examining ServiceNow monitoring at Microsoft

We’ve used ServiceNow’s native reporting features to better understand our support environment since we started using the product in 2015. ServiceNow offers a robust reporting module. Users can generate ad hoc and scheduled reports in ServiceNow to observe current data or visualize and analyze other data, including most major ServiceNow module data. ServiceNow provides an integrated report designer interface and integrated dashboarding capabilities.

However, the massive scope of our ServiceNow environment makes end-to-end visibility of ServiceNow functionality difficult to achieve with the built-in ServiceNow reporting features. While we can report on many aspects of ServiceNow’s business and process-oriented functionality, we wanted to capture a more comprehensive instrumentation of ServiceNow functionality from the perspective of all roles that use ServiceNow. These roles include support agents, engineers, product owners, program managers, external partners, and users. We identified several areas of ServiceNow monitoring and reporting that we wanted to improve, including:

Overall platform health monitoring. While ServiceNow’s reporting provided great data for the processes happening within each module, we couldn’t use it to observe the platform itself, including the functional components and architecture that kept ServiceNow running.
End-to-end monitoring of business and technical functionality. We wanted to monitor ServiceNow from both business-oriented and technology-oriented perspectives, end to end. Our support and service metrics were important, but so was knowing that the systems that our support and service teams use were operating properly and effectively.
Proactive reporting. We wanted a less reactive monitoring environment in which we could actively search for issues and determine the health status of any or all parts of our ServiceNow architecture and functionality.
Near real-time monitoring. We wanted near real-time monitoring for all aspects of ServiceNow, not just the business-process results from each module, such as incident management or problem management. We needed to know immediately if an underlying technical component had an issue.
Comprehensive objective reporting. We wanted a better way to monitor ServiceNow from the outside, with an objective monitoring tool that wasn’t part of the ServiceNow platform.

Extending ServiceNow monitoring with Azure Monitor

To address the challenges and make the improvements we identified, we in MDEE have integrated Microsoft Azure Monitor with ServiceNow to create more robust reporting capabilities. Our solution is based in Azure Monitor, using Azure Monitor Application Insights to store data from ServiceNow that we’ve extracted and transformed using both ServiceNow’s native reporting data and an internally developed app that translates ServiceNow data and stores it in Application Insights.

Capturing complete data from ServiceNow

ServiceNow provides several native data-output streams. While the end-to-end process of capturing ServiceNow data in Application Insights involved a complex data flow, retrieving raw data from ServiceNow was simple. We used two of ServiceNow’s default data-output streams to capture data for Application Insights:

Node and semaphore information from xmlstats.do. Xmlstats.do contains comprehensive data for ServiceNow node and semaphore data. ServiceNow nodes host core functionality, while semaphores define how the load for those nodes is managed and distributed.
Usage and business process data from transaction logs. ServiceNow transaction logs contain records for all browser activity and REST API usage for an instance, including performance metrics such as response time and network latency.

Bridging the gap from ServiceNow to Application Insights with Azure Function Apps

We’re using two approaches to extract this data from ServiceNow, with one approach for each of the data types. Our push model assembles data in ServiceNow and pushes it to a REST API endpoint for consumption, while the pull model collects data from a ServiceNow API endpoint. Both models are based in a Microsoft Azure Function App that we call the Translator App. We use the Translator App to translate ServiceNow’s telemetry output, transform the data into usable JSON formatted output, and place the data in Application Insights.

Push model for platform data in nodes and semaphores

Our push model is rooted in an automation routine in ServiceNow that assembles and exports node and semaphore data to the Translator app. This data is collected from the xmlstats.do page in ServiceNow in five-minute increments and pushed to the Translator app in 15-minute intervals. The automation for the push model is built entirely in ServiceNow Studio, ServiceNow’s built-in editor for custom applications. The push model collects data for system health, notification events, and other key datapoints within the ServiceNow node or semaphore architecture. The push model works in near real-time and is best suited to smaller collections of datapoints or when we must perform calculations on the data points prior to exporting the data.

Pull model business data in transaction logs

The pull model is designed to manage a much larger data load. We use the pull model to extract all transaction log data from ServiceNow and place it into Application Insights. The pull model’s functionality is based in the Translator app. Using Azure Function App capability, the Translator app accesses a ServiceNow REST API endpoint and pulls data from a read replica database. Because we use a read replica database in ServiceNow, requests from the pull model don’t affect production functionality or affect the performance of production databases. We pull approximately 2 million to 6 million transactions each day from ServiceNow into Application Insights by using the Translator app.

The ServiceNow and Microsoft Azure Monitor solution architecture.

Creating actionable reporting with Application Insights

After our Translator app transforms and loads data from ServiceNow into Application Insights, we can take advantage of a robust and highly configurable reporting and monitoring environment. We can use Application Insights to monitor technical and business processes across the entire ServiceNow platform. Our Application Insights instance is the primary source for all our querying, reporting, and dashboarding activities for ServiceNow.

Reporting and dashboarding

With the data in Application Insights, we can use any reporting tools included with or compatible with Microsoft Azure Monitor to query and visualize our data. We use several important reporting tools, including:

Microsoft Power BI. We use Power BI for most of our dashboarding needs for end-to-end ServiceNow monitoring. Power BI uses a simple, yet powerful interface that users, like our program managers, can use to create their own insights into ServiceNow performance and functionality.
Log Analytics. Our engineers use Log Analytics extensively to quickly locate and identify specific transaction log behavior or trends. Log Analytics query language makes it easy for engineers to quickly find what they’re searching for, especially if they’re familiar with the underlying dataset and schema of the ServiceNow data sources.

Microsoft Azure Monitor provides a flexible and consistent platform on which we’ve built a broad and comprehensive monitoring solution for ServiceNow. Our new solution, built in Azure, helps us ensure that we deliver a reliable service as our ServiceNow environment grows and changes with our business. Combining ServiceNow with Azure Monitor helps us monitor the availability, performance, and usage of the platform. We can provide actionable insights to our engineering team, which results in higher platform reliability and health and a better experience for all our ServiceNow users.

ServiceNow has the extensibility to write custom apps to be able pull data out of your instances and Microsoft Azure gives you the flexibility to be able to ingest the data.
Monitor system health and create alerts and notifications in near real-time when certain metrics fall below thresholds. With the new solution, we can capture this data across the entire ServiceNow scope.
Track service health over time, correlate, and trend data. Historical and trend-based reporting helps us to anticipate upcoming events and adjust to meet upcoming demand. We can also more accurately predict failures and outages across the environment.
Report on feature usage within ServiceNow. This data empowers our program managers and engineers to understand how our customers are using or not using the features and modules that we release to determine return on our investment.

Learn how we’re using Microsoft Teams and ServiceNow to enhance end-user support.

The post Instrumenting ServiceNow with Microsoft Azure Monitor appeared first on Inside Track Blog.

Transforming data governance at Microsoft with Microsoft Purview and Microsoft Fabric

Lukas Velush — Tue, 19 Sep 2023 18:40:34 +0000

Data is an invaluable asset for all businesses. Over recent years, the exponential growth of data collection and ingestion has forced most organizations to rethink their strategies for managing data. Increasing compliance requirements and ever-changing technology prevent anyone from simply leaving their enterprise data in its current state.

We’re accelerating our digital transformation with an enterprise data platform built on Microsoft Purview and Microsoft Fabric. Our solution addresses three essential layers of data transformation:

Unifying data with an analytics foundation
Responsibly democratizing data with data governance
Scaling transformative outcomes with intelligent applications

As a result, we’re creating agile, regulated, and business-focused data experiences across the organization that accelerate our digital transformation.

[Unpack how we’re deploying a modern data governance strategy internally at Microsoft. Explore how we’re providing modern data transfer and storage service at Microsoft with Microsoft Azure. Discover how we’re modernizing enterprise integration services at Microsoft with Microsoft Azure.]

Accelerating responsible digital transformation

Digital transformation in today’s world is not optional. An ever-evolving set of customer expectations and an increasingly competitive marketplace prohibit organizations from operating with static business practices. Organizations must constantly adapt to create business resilience, improve decision-making, and increase cost savings.

Data is the fuel for digital transformation. The capability of any organization to transform is directly tied to how effectively they can generate, manage, and consume their data. These data processes—precisely like the broader digital transformation they enable—must also transform to meet the organization’s needs.

The Enterprise Data team at Microsoft Digital builds and operates the systems that power Microsoft’s data estate. We’re well on our way into a journey toward responsibly democratizing the data that drives global business and operations for Microsoft. We want to share our journey and give other organizations a foundation—and hopefully a starting point—for enabling their enterprise data transformation.

Seizing the opportunity for data transformation

Data transformation focuses on creating business value. Like any other organization, business value drives most of what we do. As Microsoft has grown and evolved, so has our data estate.

Our data was in silos. Various parts of the organization were managing their data in different ways, and our data wasn’t connected.

—Damon Buono, head of enterprise governance, Microsoft

At the genesis of our data transformation, we were in the same situation many organizations find themselves in. Digital transformation was a top priority for the business, and our data estate couldn’t provide the results or operate with the agility the business required.

We felt stuck between two opposing forces: maintaining controls and governance that helped secure our data and the pressure from the business to move fast and transform our data estate operations to meet evolving needs.

“Our data was in silos,” says Damon Buono, head of enterprise governance for Microsoft. “Various parts of the organization were managing their data in different ways, and our data wasn’t connected.”

As a result, a complete perspective on enterprise-wide data wasn’t readily available. It was hard to implement controls and governance across these silos, and implementing governance always felt it was slowing us down, preventing us from supporting digital transformation at Microsoft at the required pace.

“We needed a shared data catalog to democratize data responsibly across the company,” Buono says.

Transforming data: unify, democratize, and create value

Transforming our data estate fundamentally disrupted how we think about and manage data at Microsoft. With our approach, examining data at the top-level organization became the default, and we began to view governance as an accelerator of our transformation, not a blocker. As a result of these two fundamental changes, our data’s lofty, aspirational state became achievable, and we immediately began creating business value.

Our enterprise data platform is built on three essential layers of data transformation: unifying data with an analytics foundation, responsibly democratizing data with data governance, and scaling transformative outcomes with intelligent applications.

Unifying data with an analytics foundation

Establishing and adopting strong governance standards has helped Microsoft democratize access to data, says Damon Buono, head of enterprise governance for Microsoft. “When data is adequately democratized—safely accessible by everyone who should access it—transformation is accelerated,” Buono says.

Unified data is useful and effective data. Before our data transformation, we recognized the need to unify the many data silos present in the organization. Like many businesses, our data has evolved organically. Changes over the years to business practices, data storage technology, and data consumption led to increased inefficiencies in overall data use.

Analytics are foundational to the remainder of the data transformation journey. Without a solid and well-established analytics foundation, it’s impossible to implement the rest of the data transformation layers. A more centralized source of truth for enterprise data creates a comprehensive starting point for governance and creating business value with scalable applications.

With Microsoft Fabric at the core, our analytics foundation unifies data across the organization and allows us to do more with less, which, in turn, decreases data redundancy, increases data consistency, and reduces shadow IT risks and inefficiencies.

“It connects enterprise data across multiple data sources and internal organizations to create a comprehensive perspective on enterprise data,” Buono says.

Microsoft Fabric ensures that we’re all speaking the same data language. Whether we’re pulling data from Microsoft Azure, multi-cloud, or our on-premises servers, we can be confident that our analytics tools can interpret that data consistently.

Functionally, this reduces integration and operation costs and creates a predictable and transparent operational model. The unity and visibility of the analytics foundation then provide the basis for the rest of the transformation, beginning with governance.

Responsibly democratizing data with data governance

Data can be a transformative asset to the organization through responsible democratization. The goal is to accelerate the business through accessibility and availability. Democratizing data is at the center of our governance strategy. Data governance plays an active role in data protection and complements the defensive posture of security and compliance. With effective governance controls, all employees can access the data they need to make informed decisions regardless of their job function or level within the organization. Data governance is the glue that combines data discovery with the business value that data creates.

It’s critical to understand that governance accelerates our digital transformation in the modern data estate. Governance can seem like a burden and a blocker across data access and usage scenarios, but you cannot implement effective and efficient governance without a unified data strategy. This is why many organizations approach data governance like it’s a millstone hanging around their neck. Many organizations struggle with harnessing the power of data because they don’t have a data strategy and they lack alignment across the leadership teams to improve data culture.

In the Microsoft Digital data estate, governance lightens the load for our data owners, administrators, and users. Microsoft Purview helps us to democratize data responsibly, beginning with our unified analytics foundation in Microsoft Fabric. With a unified perspective on data and a system in place for understanding the entire enterprise estate, governance can be applied and monitored with Purview across all enterprise data, with an end-to-end data governance service that automates the discovery, classification, and protection of sensitive data across our on-premises, multi-cloud, and SaaS environments.

“The governance tools that protect and share any enterprise data are transparent to data creators, managers, and consumers,” Buono says. “Stakeholders can be assured that their data is being shared, accessed, and used how they want it to be.”

Our success begins with an iterative approach to data transformation. We started small, with projects that were simple to transform and didn’t have a critical impact on our business.

—Karthik Ravindran, general manager, data governance, Microsoft Security group

Responsible democratization encourages onboarding and breaks down silos. When data owners are confident in governance, they want their data on the platform, which drives the larger unification and governance of enterprise-wide data.

Scaling transformative outcomes with intelligent applications

The final layer of our data transformation strategy builds on the previous two to provide unified, democratized data to the applications and business processes used every day at Microsoft. These intelligent applications create business value. They empower employees, reduce manual efforts, increase operational efficiencies, generate increased revenue, and contribute to a better Microsoft.

How we transformed: iteration and progression

Microsoft Purview and Microsoft Fabric are enabling the company to rethink how we use data internally at Microsoft, says Karthik Ravindran, a general manager who leads data governance for the Microsoft Security group.

While the three layers provide a solid structure for building a modern data platform, they provide value only if implemented. Actual transformation happens in the day-to-day operations of an organization. We transformed by applying these layers to our business groups, data infrastructure, and even our cultural data approach at Microsoft Digital.

“Our success begins with an iterative approach to data transformation,” says Karthik Ravindran, a general manager who leads data governance for the Microsoft Security group. “We started small, with projects that were simple to transform and didn’t have a critical impact on our business.”

These early projects provided a testing ground for our methods and technology.

“We quickly iterated approaches and techniques, gathering feedback from stakeholders as we went, Ravindran says. “The results and learnings from these early implementations grew into a more mature and scalable platform. We were able to adapt to larger, more complex, and more critical sections of our data estate, tearing down larger data silos as we progressed.”

To understand how this worked, consider the following examples of our transformation across the organization.

Transforming marketing

The Microsoft Global Demand Center supports Microsoft commercial operations, including Microsoft Azure, Microsoft 365, and Dynamics 365. The Global Demand Center drives new customer acquisition and builds the growth and adoption of Microsoft products.

The Global Demand Center uses data from a broad spectrum of the business, including marketing, finance, sales, product telemetry, and many more. The use cases for this data span personas from any of these areas. Each internal Microsoft persona—whether a seller, researcher, product manager, or marketing executive—has a specific use case. Each of these personas engages with different customers to provide slightly different outcomes based on the customer and the product or service. It’s an immense swath of data consumed and managed by many teams for many purposes.

The Global Demand Center can holistically manage and monitor how Microsoft personas engage with customers by converging tools into the Microsoft Digital enterprise data platform. Each persona has a complete picture of who the customer is and what interactions or engagements they’ve had with Microsoft. These engagements include the products they’ve used, the trials they’ve downloaded, and the conversations they’ve had with other internal personas throughout their lifecycle as a Microsoft customer.

The enterprise data platform provides a common foundation for insights and intelligence into global demand for our products. The platform’s machine learning and AI capabilities empower next actions and prioritize how the Global Demand Center serves personas and customers. Moving the Global Demand Center toward adopting the enterprise data platform is iterative. It’s progressive onboarding of personas and teams to use the toolset available.

The adoption is transforming marketing and sales across Microsoft. It’s provided several benefits, including:

More reliable data and greater data quality. The unification of data and increased governance over the data create better data that drives better business results.
Decreased data costs. Moving to the enterprise data platform has reduced the overall cost compared to managing multiple data platforms.
Increased agility. With current and actionable data, the Global Demand Center can respond immediately to the myriad of daily changes in sales and marketing at Microsoft.

Improving the employee experience

Employee experience is paramount at Microsoft. The Microsoft Digital Employee Experience team is responsible for all aspects of the employee experience. They’re using the enterprise data platform to power a 360-degree view of the employee experience. Their insights tool connects different data across Microsoft to provide analytics and actionable insights that enable intelligent, personalized, and interconnected experiences for Microsoft employees.

The employee experience involves many data points and internal departments at Microsoft. Previously, when data was managed and governed in silos, it was difficult to build data connections to other internal organizations, such as Microsoft Human Resources (Microsoft HR). With the enterprise data platform, the Employee Experiences team can access the data they need within the controls of the platform’s governance capabilities, which gives the Microsoft HR department the stewardship and transparency they require.

The enterprise data platform creates many benefits for the Employee Experiences team, including:

Coordinated feature feedback and implementation. All planned software and tools features across Microsoft align with employee feedback and practical needs obtained from the enterprise data platform.
Better detection and mitigation of issues. Intelligent insights help Employee Experiences team members identify new and recurring issues so they can be mitigated effectively.
Decreased costs. The efficiencies created by using the enterprise data platform reduce engineering effort and resource usage.

Creating greater sustainability in operations

Microsoft Sustainability Operations supports efforts to increase global sustainability for Microsoft and minimize environmental impact. Sustainability Operations is responsible for environmental efforts across the organization, including waste, water, and carbon management programs.

Their internal platform, the Microsoft Cloud for Sustainability, is built on the enterprise data platform. It leverages the unified analytics and governance capabilities to create important sustainability insights that guide Sustainability Operations efforts and programs.

These insights are combined in the Microsoft Environmental Sustainability Report. This report contains 20 sections detailing how Microsoft works to minimize environmental impact. The report includes sections for emissions, capital purchases, business travel, employee commuting, product distribution, and managed assets, among others.

To provide the data for this report, Sustainability Operations has created a data processing platform with the Microsoft Cloud for Sustainability that ingests and transforms data from Microsoft Operations into a data repository. The unified data enables the team to create reports from many different perspectives using a common data model that enables quick integration.

Governance is central to the effective democratization of data, and when data is adequately democratized—safely accessible by everyone who should access it—transformation is accelerated. Modern governance is achievable using automated controls and a self-service methodology, enabling immediate opportunity to create business value.

—Damon Buono, head of enterprise governance, Microsoft

The Microsoft Environmental Sustainability Report supports decision-making at the enterprise and business group level, which enables progress tracking against internal goals, forecasting and simulation, qualitative analysis of environmental impact, and compliance management for both perspectives. These tools allow Microsoft Sustainability Operations to discover and track environmental hotspots across the global enterprise with greater frequency and more precision. Using these insights, they can drive changes in operations that create more immediate and significant environmental impact reductions.

Implementing internal data governance

Governance has been a massive part of our journey. Realizing governance as an accelerator of transformation has radically changed our approach to governance. Understanding who is accessing data, what they’re accessing, and how they’re accessing is critical to ensuring controlled and measured access. It also creates the foundation for building transparency into the enterprise data platform, growing user confidence, and increasing adoption.

“Governance is central to the effective democratization of data, and when data is adequately democratized—safely accessible by everyone who should access it—transformation is accelerated,” Buono says. “Modern governance is achievable using automated controls and a self-service methodology, enabling immediate opportunity to create business value.”

Our governance strategy uses data standards and models with actionable insights to converge our entire data estate, which spans thousands of distinct data sources. We built our approach to data governance on some crucial learnings:

Evidence is critical to driving adoption and recruiting executive support.
Automated data access and a data catalog are critical to consolidating the data estate.
Data issue management can provide evidence, but it doesn’t scale well.
A centralized data lake, scorecards for compliance, and practical controls help create evidence for governance in large enterprises.

We continue to drive the adoption of the enterprise data platform at Microsoft. As we work toward 100 percent adoption across the enterprise, we generate efficiencies and reduce costs as we go. The iterative nature of our implementation means we’ve been able to move quickly and with agility, improving our processes as we go.

We’re really very excited about where we are now with Purview, Fabric, and the entire suite of tools we now have to manage our data here at Microsoft. They are helping us rethink how we use data internally here at Microsoft, and we’re just getting started.

—Karthik Ravindran, general manager, data governance, Microsoft Security group

We’re also supporting organizational alignment and advocacy programs that will increase adoption. These programs include an internal data governance management team to improve governance, an enterprise data education program, and a training program for the responsible use of AI.

As our enterprise data estates expand and diversify, tools like Microsoft Purview and Microsoft Fabric have become indispensable in ensuring that our data remains an asset, not a liability. These tools offer a compelling solution to the pressing challenges of governing and protecting the modern data estate through automated discovery, classification, and a unified approach to hybrid and multi-cloud deployments.

“We’re really very excited about where we are now with Purview, Fabric, and the entire suite of tools we now have to manage our data here at Microsoft,” Ravindran says. “They are helping us rethink how we use data internally here at Microsoft, and we’re just getting started.”

Your feedback is valued, take our user survey here!

The post Transforming data governance at Microsoft with Microsoft Purview and Microsoft Fabric appeared first on Inside Track Blog.

Providing modern data transfer and storage service at Microsoft with Microsoft Azure

Inside Track staff — Thu, 13 Jul 2023 14:54:07 +0000

Companies all over the world have launched their cloud adoption journey. While some are just starting, others are further along the path and are now researching the best options for moving their largest, most complex workflows to the cloud. It can take time for companies to address legacy tools and systems that have on-premises infrastructure dependencies.

Our Microsoft Digital Employee Experience (MDEE) team has been running our company as mostly cloud-only since 2018, and continues to design cloud-only solutions to help fulfill our Internet First and Microsoft Zero Trust goals.

In MDEE, we designed a Modern Data Transfer Service (MDTS), an enterprise-scale solution that allows the transfer of large files to and from partners outside the firewall and removes the need for an extranet.

MDTS makes cloud adoption easier for teams inside Microsoft and encourages the use of Microsoft Azure for all of their data transfer and storage scenarios. As a result, engineering teams can focus on building software and shipping products instead of dealing with the management overhead of Microsoft Azure subscriptions and becoming subject matter experts on infrastructure.

[Unpack simplifying Microsoft’s royalty ecosystem with connected data service. | Check out how Microsoft employees are leveraging the cloud for file storage with OneDrive Folder Backup. | Read more on simplifying compliance evidence management with Microsoft Azure confidential ledger.]

Leveraging our knowledge and experience

As part of Microsoft’s cloud adoption journey, we have been continuously looking for opportunities to help other organizations move data and remaining legacy workflows to the cloud. With more than 220,000 employees and over 150 partners that data is shared with, not every team had a clear path for converting their transfer and storage patterns into successful cloud scenarios.

We have a high level of Microsoft Azure service knowledge and expertise when it comes to storage and data transfer. We also have a long history with legacy on-premises storage designs and hybrid third-party cloud designs. Over the past decade, we engineered several data transfer and storage services to facilitate the needs of Microsoft engineering teams. Those services traditionally leveraged either on-premises designs or hybrid designs with some cloud storage. In 2019, we began to seriously look at replacing our hybrid model, which included a mix of on-premises resources, third party software, and Microsoft Azure services, with one modern service that would completely satisfy our customer scenarios using only Azure—thanks to new capabilities in Azure making it possible and it being the right time.

MDTS uses out of the box Microsoft Azure storage configurations and capabilities to help us address legacy on-premises storage patterns and support Microsoft core commitments to fully adopt Azure in a way that satisfies security requirements. Managed by a dedicated team of service engineers, program managers, and software developers, MDTS offers performance, security, and is available to any engineering team at Microsoft that needs to move their data storage and transfer to the cloud.

Designing a Modern Data Transfer and Storage Service

The design goal for MDTS was to create a single storage service offering entirely in Microsoft Azure, that would be flexible enough to meet the needs of most engineering teams at Microsoft. The service needed to be sustainable as a long-term solution, continue to support ongoing Internet First and Zero Trust Network security designs, and have the capability to adapt to evolving technology and security requirements.

Identifying use cases

First, we needed to identify the top use cases we wanted to solve and evaluate which combination of Microsoft Azure services would help us meet our requirements. The primary use cases we identified for our design included:

Sharing and/or distribution of complex payloads: We not only had to provide storage for corporate sharing needs, but also share those same materials externally. The variety of file sizes and different payload characteristics can be challenging because they don’t always fit a standard profile for files (e.g., Office docs, etc.).
Cloud storage adoption (shifting from on-premises to cloud): We wanted to ensure that engineering teams across Microsoft that needed a path to the cloud would have a roadmap. This need could arise because of expiring on-premises infrastructure, corporate direction, or other modernization initiatives like ours.
Consolidation of multiple storage solutions into one service, to reduce security risks and administrative overhead: Having to place data and content in multiple storage datastores for the purposes of specific sharing or performance needs is both cumbersome and can introduce additional risk. Because there wasn’t yet a single service that could meet all their sharing needs and performance requirements, employees and teams at Microsoft were using a variety of locations and services to store and share data.

Security, performance, and user experience design requirements

After identifying the use cases for MDTS, we focused on our primary design requirements. They fell into three high-level categories: security, performance, and user experience.

Security

The data transfer and storage design needed to follow our Internet First and Zero Trust network design principles. Accomplishing parity with Zero Trust meant leveraging best practices for encryption, standard ports, and authentication. At Microsoft, we already have standard design patterns that define how these pieces should be delivered.

Encryption: Data is encrypted both in transit and at rest.
Authentication: Microsoft Azure Active Directory supports both corporate synced domain accounts, external business-to-business accounts, as well as corporate and external security groups. Leveraging Azure Active Directory allows teams to remove dependencies on corporate domain controllers for authentication.
Authorization: Microsoft Azure Data Lake Gen2 storage provides fine grained access to containers and subfolders. This is possible because of many new capabilities, most notably the support for OAuth, hierarchical name space, and POSIX permissions. These capabilities are necessities of a Zero Trust network security design.
No non-standard ports: Opening non-standard ports can present a security risk. Using only HTTPS and TCP 443 as the mechanisms for transport and communication prevents opening non-standard ports. This includes having software capable of transport that maximizes the ingress/egress capabilities of the storage platform. Microsoft Azure Storage Explorer, AzCopy, and Microsoft Azure Data Factory meet the no non-standard ports requirement.

Performance

Payloads can range from being comprised of one very large file, millions of small files, and every combination in between. Scenarios across the payload spectrum have their own computing and storage performance considerations and challenges. Microsoft Azure has optimized software solutions for achieving the best possible storage ingress and egress. MDTS helps ensure that customers know what optimized solutions are available to them, provides configuration best practices, and shares the learnings with Azure Engineering to enable robust enterprise scale scenarios.

Data transfer speeds: Having software capable of maximizing the ingress/egress capabilities of the storage platform is preferable for engineering-type workloads. It’s common for these workloads to have complex payloads, payloads with several large files (10-500 GB) or millions of small files.
Ingress and egress: Support for ingress upwards of 10 Gbps and egress of 50 Gbps. Furthermore, client and server software that can consume the maximum amount of bandwidth possible up to the maximum amount in ingress/egress possible on client and storage.

Data size/ bandwidth	50 Mbps	100 Mbps	500 Mbps	1 Gbps	5 Gbps	10 Gbps
1 GB	2.7 minutes	1.4 minutes	0.3 minutes	0.1 minutes	0.03 minutes	0.010 minutes
10 GB	27.3 minutes	13.7 minutes	2.7 minutes	1.3 minutes	0.3 minutes	>0.1 minutes
100 GB	4.6 hours	2.3 hours	0.5 hours	0.2 hours	0.05 hours	0.02 hours
1 TB	46.6 hours	23.3 hours	4.7 hours	2.3 hours	0.5 hours	0.2 hours
10 TB	19.4 days	9.7 days	1.9 days	0.9 days	0.2 days	0.1 days

Copy duration calculations based on data size and the bandwidth limit for the environment.

User experience

Users and systems need a way to perform manual and automated storage actions with graphical, command line, or API-initiated experiences.

Graphical user experience: Microsoft Azure Storage Explorer provides Storage Admins the ability to graphically manage storage. It also has storage consumer features for those who don’t have permissions for Administrative actions, and simply need to perform common storage actions like uploading, downloading, etc.
Command line experience: AzCopy provides developers with an easy way to automate common storage actions through CLI or scheduled tasks.
Automated experiences: Both Microsoft Azure Data Factory and AzCopy provide the ability for applications to use Azure Data Lake Gen2 storage as its primary storage source and destination.

Identifying personas

Because a diverse set of personas utilize storage for different purposes, we need to design storage experiences that satisfy the range of business needs. Through the process of development, we identified these custom persona experiences relevant to both storage and data transfer:

Storage Admins: The Storage Admins are Microsoft Azure subscription owners. Within the Azure subscription they create, manage, and maintain all aspects of MDTS: Storage Accounts, Data Factories, Storage Actions Service, and Self-Service Portal. Storage Admins also resolve requests and incidents that are not handled via Self-Service.
Data Owners: The Data Owner personas are those requesting storage who have the authority to create shares and authorize storage. Data Owners also perform the initial steps of creating automated distributions of data to and from private sites. Data Owners are essentially the decision makers of the storage following handoff of a storage account from Storage Admins.
Storage Consumers: At Microsoft, storage consumers represent a broad set of disciplines, from engineers and developers to project managers and marketing professionals. Storage Consumers can use Microsoft Azure Storage Explorer to perform storage actions to and from authorized storage paths (aka Shares). Within the MDTS Self Service Portal, a storage consumer can be given authorization to create distributions. A distribution can automate the transfer of data from a source to one or multiple destinations.

Implementing and enhancing the solution architecture

After considering multiple Microsoft Azure storage types and complimentary Azure Services, the MDTS team chose the following Microsoft Azure services and software as the foundation for offering a storage and data transfer service to Microsoft Engineering Groups.

Microsoft Azure Active Directory: Meets the requirements for authentication and access.
Microsoft Azure Data Lake Gen2: Meets security and performance requirements by providing encryption, OAuth, Hierarical namespace, fine grained authorization to Azure Active Directory entities, and 10+ GB per sec ingress and egress.
Microsoft Azure Storage Explorer: Meets security, performance, and user experience requirements by providing a graphical experience to perform storage administrative tasks and storage consumer tasks without needing a storage account key or role based access (RBAC) on an Azure resource. Azure Storage Explorer also has AzCopy embedded to satisfy performance for complex payloads.
AzCopy: Provides a robust and highly performant command line interface.
Microsoft Azure Data Factory: Meets the requirements for orchestrating and automating data copies between private networks and Azure Data Lake Gen2 storage paths. Azure Data Factory copy activities are equally as performant as AzCopy and satisfy security requriements.

Enabling Storage and Orchestration

As illustrated below, the first MDTS design was comprised entirely of Microsoft Azure Services with no additional investment from us other than people to manage the Microsoft Azure subscription and perform routine requests. MDTS was offered as a commodity service to engineering teams at Microsoft in January 2020. Within a few months we saw a reduction of third-party software and on-premises file server storage, which provided significant savings. This migration also contributed progress towards the company-wide objectives of Internet First and Zero Trust design patterns.

The first design of MDTS provides storage and orchestration using out of the box Microsoft Azure services.

We initially onboarded 35 engineering teams which included 10,000 Microsoft Azure Storage Explorer users (internal and external accounts), and 600 TB per month of Microsoft Azure storage uploads and downloads. By offering the MDTS service, we saved engineering teams from having to run Azure subscriptions themselves and needing to learn the technical details of implementing a modern cloud storage solution.

Creating access control models

As a team, we quickly discovered that having specific repeatable implementation strategies was essential when configuring public facing Microsoft Azure storage. Our initial time investment was in standardizing an access control process which would simplify complexity and ensure a correct security posture before handing off storage to customers. To do this, we constructed onboarding processes for identifying the type of share for which we standardized the implementation steps.

We implemented standard access control models for two types of shares: container shares and sub-shares.

Container share access control model

The container share access control model is used for scenarios where the data owner prefers users to have access to a broad set of data. As illustrated in the graphic below, container shares supply access to the root, or parent, of a folder hierarchy. The container is the parent. Any member of the security group will gain access to the top level. When creating a container share, we also make it possible to convert to a sub-share access control model if desired.

Microsoft Azure Storage Explorer grants access to the root, or parent, of a folder hierarchy using the container share access control model. Both engineering and marketing are containers. Each has a specific Microsoft Azure Active Directory Security group. A top-level Microsoft Azure AD Security group is also added to minimize effort for users who should get access to all containers added to the storage account.

This model fits scenarios where group members get Read, Write, and Execute permissions to an entire container. The authorization allows users to upload, download, create, and/or delete folders and files. Making changes to the Access Control restricts access. For example, to create access permissions for download only, select Read and Execute.

Sub-share access control model

The sub-share access control model is used for scenarios where the data owner prefers users have explicit access to folders only. As illustrated in the graphic below, folders are hierarchically created under the container. In cases where several folders exist, a security group access control can be implemented on a specific folder. Access is granted to the folder where the access control is applied. This prevents users from seeing or navigating folders under the container other than the folders where an explicit access control is applied. When users attempt to browse the container, authorization will fail.

Microsoft Azure Storage Explorer grants access to sub-folder only using the sub-share access control model. Members are added to the sub-share group, not the container group. The sub-share group is nested in the container group with execute permissions to allow for Read and Write on the sub-share.

This model fits for scenarios where group members get Read, Write, and Execute permissions to a sub-folder only. The authorization allows users to upload, download, create folders/files, and delete folders/files. The access control is specific to the folder “project1.” In this model you can have multiple folders under the container, but only provide authorization to a specific folder.

The sub-share process is only applied if a sub-share is needed.

Any folder needing explicit authorization is considered a sub-share.
We apply a sub-share security group access control with Read, Write, and Execute on the folder.
We nest the sub-share security group in the parent share security group used for Execute only. This allows members who do not have access to the container enough authorization to Read, Write, and Execute the specific sub-share folder without having Read or Write permissions to any other folders in the container.

Applying access controls for each type of share (container and or sub-share)

The parent share process is standard for each storage account.

Each storge account has a unique security group. This security group will have access control applied for any containers. This allows data owners to add members and effectively give access to all containers (current and future) by simply changing the membership of one group.
Each container will have a unique security group for Read, Write, and Execute. This security group is used to isolate authorization to a single container.
Each container will have a unique group for execute. This security group is needed in the event sub-shares are created. Sub-shares are folder-specific shares in the hierarchical namespace.
We always use the default access control option. This is a feature that automatically applies the parent permissions to all new child folders (sub-folders).

The first design enabled us to offer MDTS while our engineers defined, designed, and developed an improved experience for all the personas. It quickly became evident that Storage Admins needed the ability to see an aggregate view of all storage actions in near real-time to successfully operate the service. It was important for our administrators to easily discover the most active accounts and which user, service principle, or managed service identity was making storage requests or performing storage actions. In July 2020, we added the Aggregate Storage Actions service.

Adding aggregate storage actions

For our second MDTS design, we augmented the out of the box Microsoft Azure Storage capabilities used in our first design with the capabilities of Microsoft Azure Monitor, Event Hubs, Stream Analytics, Function Apps, and Microsoft Azure Data Explorer to provide aggregate storage actions. Once the Aggregate Storage Actions capability was deployed and configured within MDTS, storage admins were able to aggregate the storage actions of all their storage accounts and see them in a single pane view.

The second design of MDTS introduces aggregate storage actions.

The Microsoft Azure Storage Diagnostic settings in Microsoft Azure Portal makes it possible for us to configure specific settings for blob actions. Combining this feature with other Azure Services and some custom data manipulation gives MDTS the ability to see which users are performing storage actions, what those storage actions are, and when those actions were performed. The data visualizations are near real-time and aggregated across all the storage accounts.

Storage accounts are configured to route logs from Microsoft Azure Monitor to Event Hub. We currently have 45+ storage accounts that generate around five million logs each day. Data filtering, manipulation, and grouping is performed by Stream Analytics. Function Apps are responsible for fetching UPNs using Graph API, then pushing logs to Microsoft Azure Data Explorer. Microsoft Power BI and our modern self-service portal query Microsoft Azure Data Explorer and provide the visualizations, including dashboards with drill down functionality. The data available in our dashboard includes the following information aggregated across all customers (currently 35 storage accounts).

Aggregate view of most active accounts based on log activity.
Aggregate total of GB uploaded and download per storage account.
Top users who uploaded showing the user principal name (both external and internal).
Top users who downloaded showing the user principal name (both external and internal).
Top Accounts uploaded data.
Top Accounts downloaded data.

The only setting required to onboard new storage accounts is to configure them to route logs to the Event Hub. Because we can have an aggregate store of all the storage account activities, we are able to offer MDTS customers an account view into their storage account specific data.

Following the release of Aggregate Storage Actions, the MDTS team, along with feedback from customers, identified another area of investment—the need for storage customers to “self-service” and view account specific insights without having role-based access to the subscription or storage accounts.

Providing a self-service experience

To enhance the experience of the other personas, MDTS is now focused on the creation of a Microsoft Azure web portal where customers can self-service different storage and transfer capabilities without having to provide any Microsoft Azure Role Based Access (RBAC) to the underlying subscription that hosts the MDTS service.

When designing MDTS self-service capabilities we focused on meeting these primary goals:

Make it possible for Microsoft Azure Subscription owners (Storage Admins) to provide the platform and services while not needing to be in the middle of making changes to storage and transfer services.
The ability to create custom persona experiences so customers can achieve their storage and transfer goals through a single portal experience in a secure and intuitive way. Some of the new enterprise scale capabilities include:
- Onboarding.
- Creating storage shares.
- Authorization changes.
- Distributions. Automating the distribution of data from one source to one or multiple destinations.
- Provide insights into storage actions (based on the data provided in Storage Actions enabled in our second MDTS release).
- Reporting basic consumption data, like the number of users, groups, and shares on a particular account.
- Reporting the cost of the account
As Azure services and customer scenarios change, the portal can also change.
If customers want to “self-host” (essentially take our investments and do it themselves), we will easily be able to accommodate.

Our next design of MDTS introduces a self-service portal.

Storage consumer user experiences

After storage is created and configured, data owners can then share steps for storage consumers to start using storage. Upload and download are the most common storage actions, and Microsoft Azure provides software and services needed to perform both actions for manual and automated scenarios.

Microsoft Azure Storage Explorer is recommended for manual scenarios where users can connect and perform high speed uploads and downloads manually. Both Microsoft Azure Data Factory and AzCopy can be used in scenarios where automation is needed. AzCopy is heavily preferred in scenarios where synchronization is required. Microsoft Azure Data Factory doesn’t provide synchronization but does provide robust data copy and data move. Azure Data Factory is also a managed service and better suited in enterprise scenarios where flexible triggering options, uptime, auto scale, monitoring, and metrics are required.

Using Microsoft Azure Storage Explorer for manual storage actions

Developers and Storage Admins are accustomed to using Microsoft Azure Storage Explorer for both storage administration and routine storage actions (e.g., uploading and downloading). Non-storage admin, otherwise known as Storage Consumers, can also use Microsoft Azure Storage Explorer to connect and perform storage actions without needing any role-based access control or access keys to the storage account. Once the storage is authorized, members of authorized groups can perform routine steps to attach the storage they are authorized for, authenticating with their work email, and leveraging the options based on their authorization.

The processes for sign-in and adding a resource via Microsoft Azure Active Directory are found in the Manage Accounts and Open Connect Dialogue options of Microsoft Azure Storage Explorer.

After signing in and selecting the option to add the resource via Microsoft Azure Active Directory, you can supply the storage URL and connect. Once connected, it only requires a few clicks to upload and download data.

Microsoft Azure Storage Explorer Local and Attached module. After following the add resource via Microsoft Azure AD process, the Azure AD group itshowcase-engineering is authorized to Read, Write, and Edit (rwe) and members of the group can perform storage actions.

To learn more about using Microsoft Azure Storage Explorer, Get started with Storage Explorer. There are additional links in the More Information section at the end of this document.

Note: Microsoft Azure Storage Explorer uses AzCopy. Having AzCopy as the transport allows storage consumers to benefit from high-speed transfers. If desired, AzCopy can be used as a stand-alone command line application.

Using AzCopy for manual or automated storage actions

AzCopy is a command line interface used to perform storage actions on authorized paths. AzCopy is used in Microsoft Azure Storage Explorer but can also be used as a standalone executable to automate storage actions. It’s a multi-stream TCP based transport capable of optimizing throughput based on the bandwidth available. MDTS customers use AzCopy in scenarios which require synchronization or cases where Microsoft Azure Storage Explorer or Microsoft Azure Data Factory copy activity doesn’t meet the requirements for data transfer. For more information about using AzCopy please see the More Information section at the end of this document.

AzCopy is a great match for standalone and synchronization scenarios. It also has options that are useful when seeking to automate or build applications. Because AzCopy is a single executable running on either a single client or server system, it isn’t always ideal for enterprise scenarios. Microsoft Azure Data Factory is a more robust Microsoft Azure service that meets the most enterprise needs.

Using Microsoft Azure Data Factory for automated copy activity

Some of the teams that use MDTS require the ability to orchestrate and operationalize storage uploads and downloads. Before MDTS, we would have either built a custom service or licensed a third-party solution, which can be expensive and/or time consuming.

Microsoft Azure Data Factory, a cloud-based ETL and data integration service, allows us to create data-driven workflows for orchestrating data movement. Including Azure Data Factory in our storage hosting service model provided customers with a way to automate data copy activities. MDTS’s most common data movement scenarios are distributing builds from a single source to multiple destinations (3-5 destinations are common).

Another requirement for MSTS was to leverage private data stores as a source or destination. Microsoft Azure Data Factory provides the capability to use a private system, also known as a self-hosted integration runtime. When configured this system can be used in copy activity communicating with on-premises file systems. The on-premises file system can then be used as a source and/or destination datastore.

In the situation where on-premises file system data needs to be stored in Microsoft Azure or shared with external partners, Microsoft Azure Data Factory provides the ability to orchestrate pipelines which perform one or multiple copy activities in sequence. These activities result in end-to-end data movement from one on-premises file systems to Microsoft Azure Storage, and then to another private system if desired.

The graphic below provides an example of a pipeline orchestrated to copy builds from a single source to several private destinations.

Microsoft Azure Data Factory pipeline example. Private site 1 is the build system source. Build system will build, load the source file system, then trigger the Microsoft Azure Data Factory pipeline. Build is then uploaded, then Private sites 2, 3, 4 will download. Function apps are used for sending email notifications to site owners and additional validation.

For more information on Azure Data Factory, please see Introduction to Microsoft Azure Data Factory. There are additional links in the More Information section at the end of this document.

If you are thinking about using Microsoft Azure to develop a modern data transfer and storage solution for your organization, here are some of the best practices we gathered while developing MDTS.

Close the technical gap for storage consumers with a white glove approach to onboarding

Be prepared to spend time with customers who are initially overwhelmed with using Azure Storage Explorer or AzCopy. At Microsoft, storage consumers represent a broad set of disciplines—from engineers and developers to project managers and marketing professionals. Azure Storage Explorer provides an excellent experience for engineers and developers but can be a little challenging for less technical roles.

Have a standard access control model

Use Microsoft Azure Active Directory security groups and group nesting to manage authorization; Microsoft Azure Data Lake Gen2 storage has a limit to the number of Access Controls you can apply. To avoid reaching this limit, and to simplify administration, we recommend using Microsoft Azure Active Directory security groups. We apply the access control to the security group only, and in some cases, we nest other security groups within the access control group. We nest Member Security Groups within Access Control Security Groups to manage access. These group types don’t exist in Microsoft Azure Active Directory but do exist within our MDTS service as a process to differentiate the purpose of a group. We can easily determine this differentiation by the name of the group.

Access Control Security Groups: We use this group type for applying Access Control on ADLS Gen2 storage containers and/or folders.
Member Security Groups: We use these to satisfy cases where access to containers and/or folders will constantly change for members.

When there are large numbers of members, nesting prevents the need to add members individually to the Access Control Security Groups. When access is no longer needed, we can remove the Member Group(s) from the Access Control Security Group and no further action is needed on storage objects.

Along with using Microsoft Azure Active Directory security groups, make sure to have a documented process for applying access controls. Be consistent and have a way of tracking where access controls are applied.

Use descriptive display names for your Microsoft Azure AD security groups

Because Microsoft Azure AD doesn’t currently organize groups by owners, we recommend using naming conventions that capture the group’s purpose and type to allow for easier searches.

Example 1: mdts-ac-storageacct1-rwe. This group name uses our service standard naming convention for Access Control group type on Storage Account 1, with access control Read, Write, and Execute. mdts = Service, ac = Access Control Type, storageacct1 = ADLS Gen2 Storage Account Name, rwe = permission of the access control.
Example 2: mdts-mg-storageacct1-project1. This group name uses our service standard naming convention for Member Group type on Storage Account 1. This group does not have an explicit access control on storage, but it is nested in mdts-ac-storageacct1-rwe where any member of this group has the Read, Write, and Execute access to storage account1 because it’s nested in mdts-ac-storageacct1-rwe.

Remember to propagate any changes to access controls

Microsoft Azure Data Lake Gen2 storage, by default, doesn’t automatically propagate any access control changes. As such, when removing, adding, or changing an access control, you need to follow an additional step to propagate the access control list. This option is available in Microsoft Azure Storage Explorer.

Storage Consumers can attempt Administrative options

Storage Consumers use Microsoft Azure Storage Explorer and are authenticated with their Microsoft Azure Active Directory user profile. Since Azure Storage Explorer is primarily developed for Storage Admin and Developer personas, all administrative actions are visible. It is common for storage consumers to attempt administrative actions, like managing access or deleting a container. Those actions will fail due to only being accessed via access control lists (ACLs). There isn’t a way to provide administration actions via ACL’s. If administrative actions are needed, then users will become a Storage Admin which has access via Azure’s Role Based Access Control (RBAC).

Microsoft Azure Storage Explorer and AzCopy are throughput intensive

As stated above, AzCopy is leveraged by Microsoft Azure Storage Explorer for transport actions. When using Azure Storage Explorer or AzCopy it’s important to understand that transfer performance is its specialty. Because of this, some clients and/or networks may benefit from throttling AzCopy’s performance. In circumstances where you don’t want AzCopy to consume too much network bandwidth, there are configurations available. In Microsoft Azure Storage Explorer use the Settings option and select the Transfers section to configure Network Concurrency and/or File Concurrency. In the Network Concurrency section, Adjust Dynamically is a default option. For AzCopy, there are flags and environment variables available to optimize performance.

For more information, visit Configure, optimize, and troubleshoot AzCopy.

Microsoft Azure Storage Explorer sign-in with MSAL

Microsoft Authentication Library, currently in product preview, provides enhanced single sign-on, multi-factor authentication, and conditional access support. In some situations, users won’t authenticate unless MSAL is selected. To enable MSAL, select the Setting option from Microsoft Azure Storage Explorer’s navigation pane. Then in the application section, select the option to enable Microsoft Authentication Library.

B2B invites are needed for external accounts (guest user access)

When there is a Microsoft business need to work with external partners, leveraging guest user access in Microsoft Azure Active Directory is necessary. Once the B2B invite process is followed, external accounts can be authorized by managing group membership. For more information, read What is B2B collaboration in Azure Active Directory?

We used Microsoft Azure products and services to create an end-to-end modern data transfer and storage service that can be used by any group at Microsoft that desires cloud data storage. The release of Microsoft Azure Data Lake Gen 2, Microsoft Azure Data Factory, and the improvements in the latest release of Azure Storage Explorer made it possible for us to offer MDTS as a fully native Microsoft Azure service.

One of the many strengths of using Microsoft Azure is the ability to use only what we needed, as we needed it. For MDTS, we started by simply creating storage accounts, requesting Microsoft Azure Active Directory Security Groups, applying an access control to storage URLs, and releasing the storage to customers for use. We then invested in adding storage actions and developed self-service capabilities that make MDTS a true enterprise-scale solution for data transfer and storage in the cloud.

We are actively encouraging the adoption of our MDTS storage design to all Microsoft engineering teams that still rely on legacy storage hosted in the Microsoft Corporate network. We are also encouraging any Microsoft Azure consumers to consider this design when evaluating options for storage and file sharing scenarios. Our design has proven to be scalable, compliant, and performant with the Microsoft Zero Trust security initiative, handling extreme payloads with high throughput and no constraints on the size or number of files.

By eliminating our dependency on third-party software, we have been able to eliminate third-party licensing, consulting, and hosting costs for many on-premises storage systems.

Are you ready to learn more? Sign up for your own Microsoft Azure subscription and get started today.

To receive the latest updates on Azure storage products and features to meet your cloud investment needs, visit Microsoft Azure updates.

The post Providing modern data transfer and storage service at Microsoft with Microsoft Azure appeared first on Inside Track Blog.