Microsoft Azure Archives - Inside Track Blog

Microsoft’s upgraded transportation experience arrives in Puget Sound

Josh Krenz — Fri, 18 Oct 2024 16:00:37 +0000

There’s no doubt—hybrid work is the new norm. To adapt to the new world of hybrid work and achieve its vision of a truly modern employee experience, Microsoft is prioritizing the improvement of employees’ daily commutes. For a while now, Microsoft’s Puget Sound campus has provided workers with a system of shuttles and buses to travel between home, work, and other office buildings sprawled across multiple cities. Yet providing transportation alone hasn’t been enough. The transportation system needed a boost in user-friendliness to encourage new ridership and enhance the user experience.

To tackle this challenge head-on, Microsoft engineers in Puget Sound developed the Global Commute Service. The software comes in the form of a web and mobile app and has ever-improving features that streamline the commuting experience for employees.

One of these features is an upgraded user interface (UI) that is visually consistent with other Microsoft workplace applications. The familiar design and layout make the software more readily understandable and usable for employees. Riders are also empowered by the modern mobility platform with a trip-planning function, push notifications, real-time ETAs, and live vehicle map tracking for shuttle and Connector bus services.

Trip planner allows employees to plan for their multimodal trips and take the hassle of planning away. This allows employees to plan their end-to-end trips using Connector or shuttle, or on foot up to two weeks in advance.

At the same time as the UI upgrade, the entire backend of the experience was also updated. The updates gave Microsoft a scalable and extensible system that powers real-time updates and can be deployed globally. These improvements benefit the drivers and operators who manage these transportation services, giving them visibility into route usage, rider traffic, and automated vehicle dispatch.

[Find out what Microsoft is doing to create a digital workplace. Discover how Microsoft is reinventing the employee experience for a hybrid world.]

Switching up routes to deliver a new experience

For a long time, the booking platform known to the Puget Sound campus as MERGE (Manage Explore Reserve Go Anywhere) served as the main method for riders to get a seat on one of Microsoft’s buses or shuttles. It was the go-to for reserving a ride on a Connector shuttle, the fixed route shuttles that run on a loop around campus, and the on-demand shuttles that move people between offices.

The legacy booking platform served Puget Sound well but had a different interface than other services available to Microsoft employees, making for an inconsistent user experience. To further complicate matters, MERGE was closely tied to the local transportation services found exclusively in Puget Sound, meaning the app could not be easily replicated to other Microsoft campuses. It was also difficult to extract important and accurate data from the transportation system for operational insights.

The first thing we think about is the rider experience. We start with the physical world, the environment that we live and work in, then we think about the digital world. We want to deliver an experience that is centered around ease, flexibility, and choice.

—Esther Christoffersen, senior services manager, Real Estate and Facilities

All of this added up to one key takeaway—it was time to transform Merge into Global Commute Service, a new mobility experience that offers a consistent interface, modern features, scalability, and visibility.

Two teams worked in tandem to help upgrade transportation systems: Microsoft Digital Employee Experience, the organization that powers, protects, and transforms the company, and Microsoft’s real estate team who are responsible for managing and operating the company’s global facilities and services.

“The first thing we think about is the rider experience,” says Esther Christoffersen, a senior services manager with Real Estate and Facilities. “We start with the physical world, the environment that we live and work in, then we think about the digital world. We want to deliver an experience that is centered around ease, flexibility, and choice.”

The team knew that building a strong bridge between the physical and digital would empower riders with an improved transportation experience.

“We had to think about what really matters,” says Garima Gaurav, a senior product manager with Microsoft Digital Employee Experience. “That meant building something modern, real-time, and fast for riders. But we also wanted operational agility for the Real Estate and Facilities team.”

Improving mobility at Microsoft

The two organizations started brainstorming new rider experiences in 2019, but a few months into the project, the Puget Sound campus shifted to primarily remote with only essential employees working onsite.

“This was an opportunity to pause and really dive into the feedback to see what we could do better,” Christoffersen says.

We built a service that is robust, reliable, and scalable.

—Ram Kuppaswamy, principal software engineering manager, Microsoft Digital Employee Experience

With campus services pausing, Microsoft could disassemble the front-end (the web and app interface riders engage with) and the back-end (the operational workhorses that manage transportation services) without creating disruption.

Work started by decoupling Global Commute Service from the Puget Sound’s established back end. This allowed Microsoft’s new service to integrate with any transportation system. If, for example, a campus uses a new transportation system, Global Commute Service will connect seamlessly, offering riders a consistent experience no matter which Microsoft campus they were on.

“We built a service that is robust, reliable, and scalable,” says Ram Kuppaswamy, a principal software engineering manager with Microsoft Digital Employee Experience. “Now we can launch similar experiences for the rest of Microsoft’s campuses globally.”

Vehicles used to be dispatched manually. By selecting this partner, technology is driving everything from booking, managing dispatch, and assigning vehicles. It has also empowered us to provide features like real-time updates and communications with drivers. We can do it now.

— Garima Gaurav, a senior product manager with Microsoft Digital Employee Experience

Having separated the booking interface, Microsoft could transform the back-end management of its transportation system. This would give much needed visibility and ownership of operating data, the kind that enables real-time status updates and introduce new efficiencies, like automated vehicle dispatch and data-driven service scaling.

Ram Kuppaswamy and Esther Christoffersen were part of a partnership between the Microsoft Digital Employee Experience and Real Estate and Facilities teams to transform the company’s transportation experience at Microsoft. (Photos by Ram Kuppaswamy, and Esther Christoffersen)

To get there, Microsoft engaged with a new partner to help introduce these new data-driven optimizations across Puget Sound. Having onboarded the partner into Microsoft Azure, Microsoft now had access to transportation data that was once lacking.

“Vehicles used to be dispatched manually,” Gaurav says. “By selecting this partner, technology is driving everything from booking, managing dispatch, and assigning vehicles. It has also empowered us to provide features like real-time updates and communications with drivers.
We can do it now.”

This data introduced other benefits as well.

“In the past, we didn’t have a common dashboard for operations and engineering,” Kuppaswamy says. “There was no easy way to understand why an error in the system was occurring. We can have consistent understanding now.”

Access to this technology is also giving Microsoft’s transportation service more operational agility. Data can be augmented, and machine learning can be applied for better operational insights.

“We can share this data with our partners to adjust routes, increase or decrease the number of buses we have, and prioritize service and operational adjustments,” Christoffersen says.

Microsoft’s enterprise shuttle simulator

Microsoft was able to get a lot done with the majority of employees working remotely during the height of the pandemic. Unfortunately, it also meant there were few employees on campus to test the new service.

“That was an unexpected part of the lifecycle,” says Jessie Go, an application manager with Real Estate and Facilities. “With the pause in services, we had to do a lot of testing virtually. No one was traveling.”

How a rider books, how long it takes a driver to get to a stop, and how a rider is verified by a driver all needed to be tested for bugs. To ensure it worked in a real usage scenario, the Microsoft Digital Employee Experience’s engineering team worked onsite at the Puget Sound campus to run everything through the steps.

“We followed all the COVID safety protocols,” Kuppaswamy says. “One or two engineers would book a trip with a shuttle. We tested all the major use cases. It’s a new experience for the drivers as well. They got trained for the new technology.”

One trip at a time, Microsoft was able to validate the upgraded transportation experience. When employees came back, they loved the new experience. It was consistent and intuitive.

Booking a seat to a new future

Microsoft has launched a seamless transportation experience for riders.

Whether they want to use the web or a mobile app, riders have a consistent interface akin to other workplace services. Global Commute Service was deployed across Puget Sound’s new kiosks, giving users more options for how they want to schedule transportation.

“We want to provide Microsoft employees the best commute option for reaching any destination between home, office, or any building on campus,” Kuppaswamy says. “The first step was to make the experience consistent.”

We can make almost real-time updates in terms of routes and how often we hit them in our schedules. We weren’t able to do that before. The work we’ve done so far is impactful for scalability and insight.

—Jessie Go, application manager, Real Estate and Facilities

Riders will have access to real-time status updates on their transportation plans. When you’re moving Microsoft’s Puget Sound employee population around, that’s a big deal.

“We have around 55,000 employees or more in Puget Sound. We run a small city,” Christoffersen says. “Everything is organized; nothing is ambiguous. I can now see a shuttle on a map that’s moving in my direction. That creates a sense of confidence that reduces the stress of getting from point A to point B.”

Data visibility gives Microsoft the operational agility that was once lacking, allowing Real Estate and Facilities to give riders an even better transportation experience.

“We can make almost real-time updates in terms of routes and how often we hit them in our schedules,” Go says. “We weren’t able to do that before. The work we’ve done so far is impactful for scalability and insight.”

Now that modern transportation experiences exist for Microsoft’s campuses, the teams are thinking about how to further empower riders.

“The next big step is to combine every type of commute option, to provide a more holistic trip plan—be it the Microsoft offered transport options, driving, walking, or public transport” Gaurav says. “There are so many ways to move around campus. How can we support that? What’s the total length of time for walking, biking, or even using public transportation? Let’s give employees options so that they can decide the best way to get around.”

Meet riders where they are: mobile, desktop, or kiosk. The new transportation experience can be accessed in a variety of ways.
Testing at various stages of development is critical though difficult due to the offices being closed down.
Employees expect modern transportation experiences to be like what they see when booking a cab or some other mode of travel.
Digitally transforming a real-world service starts with the physical experience. Finding that intersection between physical and digital creates outcomes for users.
Ease, flexibility, and choice—those are three priorities for creating a better employee experience.

The post Microsoft’s upgraded transportation experience arrives in Puget Sound appeared first on Inside Track Blog.

Doing more with less: Optimizing shadow IT through Microsoft Azure best practices

Lukas Velush — Wed, 18 Sep 2024 13:43:59 +0000

You don’t know what you don’t know. In the world of IT, illuminating those hidden areas helps stave off nasty surprises.

When elements of IT infrastructure are shrouded in mystery, it can lead to security vulnerabilities, non-compliance, and poor budget management. That’s the trouble with shadow IT—a term for any technical infrastructure that conventional IT teams and engineers don’t govern.

At Microsoft, we’re on a journey to increase our shadow IT maturity, resulting in fewer vulnerabilities and increased efficiencies. To get there, we’re leveraging tools and techniques we’ve developed through our core discipline of Microsoft Azure optimization.

[See how we’re doing more with less internally at Microsoft with Microsoft Azure. Learn how we’re transforming our internal Microsoft Azure spend forecasting.]

The challenges of shadow IT

Shadow IT is the set of applications, services, and infrastructure that teams develop and manage outside of defined company standards.

It typically crops up when engineering teams are unable to support their non-engineering partners. That situation may arise from a lack of available engineering capacity or the need for specialized domain solutions. On top of those circumstances, modern tools enable citizen developers to stand up low-code/no-code solutions that enable businesses to reduce their dependency on traditional engineering organizations.

Six corporate function teams have been involved in creating shadow IT environments: business development, legal, finance, human resources, and our consumer and commercial marketing and sales organizations.

Many of the solutions they’ve developed make strong business sense—as long as they’re secure and efficient. That’s where our Microsoft Digital (MSD) team comes in.

Three years ago, our biggest driver was getting visibility into the shadow IT estate and finding ways to secure it. Now we’re at a point where we’re looking for cost savings—that’s a natural progression.

—Myron Wan, principal product manager, Infrastructure and Engineering Services team

Over the last few years, our IT experts have been working with the shadow IT divisions to increase the maturity of the solutions they’ve developed, taking them from unsanctioned toolsets lurking in the shadows to well-governed, compliant, and secure assets they can safely use to advance our business goals.

Our journey toward shadow IT maturity has been steadily progressing through unsanctioned usage, building fundamentals, then emerging, advanced, and optimized maturity.

Now that these shadow IT solutions are more secure and compliant, we’ve turned our attention to efficiency and optimization to ensure we’re able to do as much as possible with the least necessary budget expenditure.

“Three years ago, our biggest driver was getting visibility into the shadow IT estate and finding ways to secure it,” says Myron Wan, principal product manager within the Infrastructure and Engineering Services (IES) team. “Now we’re at a point where we’re looking for cost savings—that’s a natural progression.”

Because many of our shadow IT solutions leverage Microsoft Azure subscriptions, that was a natural place to start the optimization work.

Azure best practices, shadow IT efficiency

Fortunately, we have robust discipline around optimizing Microsoft Azure spend in conventional IT and engineering settings. Microsoft Azure Advisor, available through the Microsoft Azure Portal, has been providing optimization recommendations and identifying overspend for subscribers both within Microsoft and in our customers’ organizations for years.

The plan was to take applicable recommendations that we use in our core engineering organizations and distribute them to the shadow IT divisions.

—Trey Morgan, principal product manager, MSD FinOps

Trey Morgan is part of a cross-disciplinary technical and FinOps team helping optimize shadow IT at Microsoft.

Internally, we’ve added layers that help streamline the optimization process. One, called CloudFit, draws from a library of optimization recommendations, which are tailored to the specific needs of the teams we support. Then we use Service 360, our internal notification center that flags actions in need of addressing for our engineering teams, to route those recommendations to subscription owners within MSD, product groups, and business groups.

Optimization tickets then enter their queue and progress through open, active, and resolved statuses. It’s a standard method for creating and prioritizing engineering tasks, and Microsoft customers could accomplish a similar result by building a bridge between Microsoft Azure Advisor and their own ticketing tool, whether that’s Jira, ServiceNow, or others.

“We have an existing set of cost optimization recommendations that we use for a variety of different technologies like Azure Cosmos DB and SQL,” says Trey Morgan, principal product manager for MSD FinOps. “The plan was to take applicable recommendations that we use in our core engineering organizations and distribute them to the shadow IT divisions.”

Getting there was a matter of establishing visibility and building culture.

Shining a light on shadow IT spend

Many of the optimization issues within shadow IT divisions came about because of non-engineers’ and non-developers’ unfamiliarity or lack of training with subscription-based software. They might not have the background or expertise to set them up or even ensure that their subscriptions would terminate after they had served their purpose.

In some cases, vendors or contractors may have set up processes and then moved on once their engagement was complete. Each of these scenarios had the potential for suboptimal Azure spend.

Providing visibility into these issues was relatively simple. Because all Microsoft Azure subscriptions across our organization are searchable through our company-wide inventory management system and sortable by department, engineers were able to locate all the subscriptions belonging to shadow IT divisions. From there, they simply had to apply CloudFit recommendations to those subscriptions and loop them through Service 360.

Our people now have the information they need to act—our organizational leaders can visit their Service 360 dashboard or can review their action summary report to see what they can do to cut their costs. That’s where culture and education came into the equation.

“Culture is always the number-one challenge when items aren’t actually owned by a core engineering team,” Wan says. “When you have teams that are more about generating revenue or managing corporate processes, a lot of what we have to deal with is education.”

It wasn’t just educating teams about Microsoft Azure optimization techniques. CloudFit and Service 360 provided a lot of the guidance those teams would need to get the job done. To a great degree, non-engineering employees needed to build the discipline of receiving and resolving tickets like a developer or engineer would.

But through direct communications from FinOps tools and support from Wan’s colleagues in engineering, we’ve been meeting our goal of optimizing Azure spend in shadow IT divisions. In the first six months of this solution’s availability, we’ve saved $1 million thanks to various optimizations.

Microsoft Azure savings and organizational discipline

Shadow IT will always exist in some form or another, so this journey isn’t just about remedying past inefficiencies. It’s also about building a culture of optimization and best practices across shadow IT divisions as they use their Microsoft Azure subscriptions moving forward.

With these solutions and practices in place, we’ve moved on from a “get clean” and “stay clean” culture to one where we “start clean.”

—Qingsu Wu, principal program manager, IES

“As we get more mature and divisions build up their muscles, we’re actually getting to an ongoing state of optimization,” says Feng Liu, principal product manager with IES. “As we build up that culture and that practice, folks are becoming more aware and taking more ownership and accountability.”

Some shadow IT divisions are even going beyond FinOps recommendations. For example, our commercial sales and marketing organization uses shadow IT solutions so extensively and is so keen to optimize their budget that they’ve automated the implementation of recommendations and created their own internal FinOps team.

“The whole vision of our shadow IT program is helping business teams to be self-accountable and sustainable,” says Qingsu Wu, principal program manager for the Infrastructure and Engineering Services (IES) team. “With these solutions and practices in place, we’ve moved on from a ‘get clean’ and ‘stay clean’ culture to one where we ‘start clean.’”

It’s all part of building a more effective culture and practice to do more with less.

Understand your inventory. Spend time linking your organizational hierarchy to your Azure resources.
Get to a confident view of your estate and your data. It’s crucial.
Don’t be overly prescriptive. Be open to how you’re going to approach the situation.
Build sustainability into your efforts by getting non-engineering teams more comfortable with regular engineering practices and learning from each other.
Don’t overlook small wins. When they scale out across an entire organization, they can produce substantial savings.

The post Doing more with less: Optimizing shadow IT through Microsoft Azure best practices appeared first on Inside Track Blog.

Finding and fixing network outages in minutes—not hours—with real-time telemetry at Microsoft

Alex Fleck — Thu, 29 Aug 2024 15:00:00 +0000

With more than 600 physical worksites around the world, Microsoft has one of the largest network infrastructure footprints on the planet.

Managing the thousands of devices that keep those locations connected demands constant attention from a global team of network engineers. It’s their job to monitor and maintain those devices. And when outages occur, they lead the charge to repair and remediate the situation.

To support their work, our Real Time Telemetry team at Microsoft Digital, the company’s IT organization, has introduced new capabilities that help engineers identify network device outages and capture data faster and more extensively than ever before. Through real-time telemetry, network engineers can isolate and remediate issues in minutes—not hours—to keep their colleagues productive and our technology running smoothly.

Immediacy is everything

Aayush Dave, Astha Sinha, Abhijit Vijay, Daniel Menten, and Martin O’Flaherty (not pictured) are part of the Microsoft Digital Real Time Telemetry team enabling more up-to-date and extensive network device data.

Conventional network monitoring uses the Simple Network Management Protocol (SNMP) architecture, which retrieves network telemetry through periodic, pull-based polls and other legacy technologies. At Microsoft, that polling interval typically ranges between five minutes and six hours.

SNMP is a foundational telemetry architecture with decades of legacy. It’s ubiquitous, but it doesn’t allow for the most up-to-date data possible.

“The biggest pain point we’ve always heard from network engineers is latency in the data,” says Astha Sinha, senior product manager for the Infrastructure and Engineering Services team in Microsoft Digital. “When data is stale, engineers can’t react quickly to outages, and that has implications for security and productivity.”

Serious vulnerabilities and liabilities arise when a network device outage occurs. But because of lags between polling intervals, a network engineer might not receive information or alerts about the situation until long after it happens.

We assembled the Real Time Telemetry team as part of our Infrastructure and Engineering Services to close that gap.

“We build the tools and automations that network engineers use to better manage their networks,” says Martin O’Flaherty, principal product manager for the Infrastructure and Engineering Services team in Microsoft Digital. “To do that, we need to make sure they have the right signals as early and as consistently as possible.”

The technology that powers these possibilities is known as streaming telemetry. It relies on network devices compatible with the more modern gRPC Network Management Interface (gNMI) telemetry protocol and other technologies to support a push-based approach to network monitoring where network devices stream data constantly.

This architecture isn’t new, but our team is scaling and programmatizing how that data becomes available by creating a real-time telemetry apparatus that collects, stores, and delivers network information to service engineers. These capabilities offer several benefits.

The advantages of real-time network device telemetry

Prevention

Superior anomaly detection, reduced intent and configuration drift, the foundation for large-scale automation and less network downtime.

Security and compliance

Better detection of breaches, vulnerabilities, and bugs through automated scans of OS stalls, lateral device hijacking, malware, and other common vulnerabilities.

Observability

Visibility into real-time utilization data on network device stats, as well as steady replacement of current data collection technology and more scalable network growth and evolution.

Service quality

More rapid network fixes, leading to a reduction in the baselines for time-to-detection and time-to-migration for incidents.

“Devices are proactively sending data without having to wait for requests, so they function more efficiently and facilitate timely troubleshooting and optimization,” says Abhijit Vijay, principal software engineering manager with the Infrastructure and Engineering Services team in Microsoft Digital. “Since this approach pushes data continuously rather than at specific intervals, it also reduces the additional network traffic and scales better in larger, more complex environments.

At any given time, Microsoft operates 25,000 to 30,000 network devices, managed by engineers working across 10 different service lines. Accounting for all their needs while keeping data collection manageable and efficient requires extensive collaboration and prioritization.

We also had to account for compatibility. With so many network devices in operation, replacement lifecycles vary. Not all of them are currently gNMI-compatible.

Working with our service lines, we identified the use cases that would provide the best possible ROI, largely based on where we would find the greatest benefits for security and where networks offered a meaningful number of gNMI-compatible devices. We also zeroed in on the types of data that would be the most broadly useful. Being selective helped us preserve resources and avoid overwhelming engineers with too much data.

We built our internal solution entirely using Azure components, including Azure Functions and Azure Kubernetes Service (AKS), Azure Cosmos DB, Redis, and Azure Data Lake. The result is a platform that network engineers can use to access real-time telemetry data.

With key service lines, use cases, and a base of technology in place, we worked with network engineers to onboard the relevant devices. From there, their service lines were free to experiment with our solution on real-world incidents.

Better response times, greater network reliability

Service lines are already experiencing big wins.

In one case, a heating and cooling system went offline for a building in the company’s Millennium Campus in Redmond, Washington. A lack of environmental management has the potential to cause structural damage to buildings if left unchecked, so it was important to resolve this issue as quickly as possible. The service line for wired onsite connections sprang into action as soon as they received a network support ticket.

With real-time telemetry enabled, the team created a Kusto query to compare DOT1X access-session data for the day of the outage with a period before the outage started. Almost immediately, they spotted problematic VLAN switching, including the exact time and duration of the outage. By correlating the timestamps, they determined that the RADIUS registrations of the device owner had expired, which caused the devices to switch into the guest network as part of the zero-trust network implementation.

As a result, the team was able to resolve the registration issues and restore the heating and cooling systems in 10 minutes—a process that might have taken hours using other collection methods due to the lag-time between polling intervals.

“This has the potential to improve alerting, reduce outages, and enhance security,” says Daniel Menten, senior cloud network engineer for site infrastructure management on the Site Wired team. “One of the benefits of real-time telemetry is that it lets us capture information that wasn’t previously available—or that we received too slowly to take action.”

It’s about speeding up how we identify issues and how we then respond to them.

“With this level of observability, engineers that monitor issues and outages benefit from enhanced experiences,” says Aayush Dave, a product manager on the Infrastructure and Engineering Services team in Microsoft Digital. “And that’s going to make our network more reliable and performant in a world where security issues and outages can have a global impact.”

The future is in real time

Now that real-time telemetry has demonstrated its value, our efforts are focused on broadening and deepening the experience.

“More devices mean more impact,” Dave says. “By increasing the number of network devices that facilitate real-time telemetry, we’re giving our engineers the tools to accelerate their response to these incidents and outages, all leading to enhanced performance and a more robust network reliability posture.”

It’s also about layering on new ways of accessing and using the data.

We’ve just released a preview UI that provides a quick look at essential data, as well as an all-up view of devices in an engineer’s service line. This dashboard will enable a self-service model that makes it even easier to isolate essential telemetry without the need for engineers to create or integrate their own interfaces.

That kind of observability isn’t only about outages. It also enables optimization by helping engineers understand and influence how devices work together.

The depth and quality of real-time telemetry data also provides a wealth of information for training AI models. With enough data spread across enough devices, predictive analysis might be able to provide preemptive alerts when the kinds of network signals that tend to accompany outages appear.

“We’re paving the way for an AIOps future where the system won’t just predict potential issues, but initiate self-healing actions,” says Rob Beneson, partner director of software engineering on the Infrastructure and Engineering Services team in Microsoft Digital.

It’s work that aligns with our company mission.

“This transformation is enhancing our internal user experience and maintaining the network connectivity that’s critical for our ultimate goal,” Beneson says. “We want to empower every person and organization on the planet to achieve more.”

Here are some tips for getting started with real-time telemetry at your company:

Start with your users. Ask them about pain points, what scares them, and what they need.
Start small and go step by step to get the core architecture in place, then work up to the glossier UI and UX elements.
Be mindful of onboarding challenges like bugs in vendor hardware and software, especially around security controls.
You’ll find plenty of edge cases and code fails, so be prepared to invest in revisiting challenges and fixing problems that arise.
Make sure you have a use case and a problem to solve. Have a plan to guide your adoption and use before you turn on real-time telemetry.
Make sure you have the proper data infrastructure in place and an apparatus for storing your data.
Communicate and demonstrate the value of this solution to the teams who need to invest resources into onboarding it.
Prioritize visibility into the devices and data you’ve onboarded through pilots and hero scenarios, then scale onboarding further according to your teams’ needs.
Integrate as much as possible. Consider visualizations and pushing into existing network graphs and tools to surface data where engineers already work.

Learn more about Microsoft Azure Kubernetes Service monitoring and Microsoft Azure Functions.

The post Finding and fixing network outages in minutes—not hours—with real-time telemetry at Microsoft appeared first on Inside Track Blog.

Helping Microsoft employees understand their value with the Total Rewards Portal

Angela Gamba — Mon, 26 Aug 2024 15:00:31 +0000

Our total rewards communications are an essential aspect of empowering employees to understand the value of Microsoft compensation and, for employees in the United States, their benefits, while reminding them of the investment that the company is making in them. When done correctly, this empowerment leads to improved engagement and retention and increased quality of new hires. At Microsoft, the Total Rewards Portal (TRP) is the mechanism by which this value proposition is communicated and shared worldwide to our 220,000-plus global employees on an individual level.

The TRP has been on a journey since it first launched in 2015, undergoing several iterations including initially being hosted by a third-party vendor. In July 2021, it was brought in-house and hosted on Microsoft Azure to have more control and flexibility to further enhance the experience. As part of this continual improvement, understanding and hearing from employees and managers about their usage and satisfaction with the tool has been critical to its overall success and the latest iteration of the portal.

“We took a three-phased approach to help us inform the most recent design, as well as guide the objectives, goals and principles for the future state of this tool,” says Nur Sheikhassan, a principal group engineering manager on the Rewards and Compensation team in Microsoft Digital Employee Experience.

Phase 1—Understanding TRP usage

The goal of phase one was to establish a baseline understanding of usage and gather insights into what was working and what wasn’t. One-on-one interviews were conducted with both employees and managers to obtain feedback. Our key findings included:

Integration of tangible and intangible total rewards: Understanding that total compensation and benefits are often comprised of both tangible elements (such as money) and intangible elements (such as culture and work/life balance), we found it’s important to surface both in order to clearly communicate the total value of the compensation and benefits package and highlight it in a way that provides clarity rather than clutter.
Make TRP more discoverable: The discoverability of TRP, particularly outside of rewards season, had been particularly low. Clearly and consistently branding the site and situating it within areas employees commonly use could help them discover employee tools and content related to rewards more easily.
Optimize for the complete task flow: The tool needed to fully consider the complete flow of a potential task. We identified all the information one might be in search of to learn about their total compensation to ensure users do not feel the need to seek out additional resources on other sites and tools.
Consider surfacing contextual data: We needed to think critically about the contextual data that is brought in to TRP and use data that will only enhance understanding the value of the Microsoft compensation and benefits packages. By focusing on common scenarios such as managers’ rewards conversations with directs, hiring, and modeling future rewards, this would help to provide a clearer path to what data should be included.

Phase 2—Develop common TRP scenarios

The next phase was designed to build on the learnings from phase one, leveraging common TRP scenarios to help understand what is working and what is not. Exploring these scenarios uncovered opportunities for consideration and started to light up themes around the need to get to overall rewards understanding faster, drive meaning through contextual data, and seamlessly connect related tools and sites. The three key themes that came out of this phase were:

Clearly communicate value of total rewards opportunity: We wanted to display total rewards clearly, succinctly, and comprehensively, indicating relevant timeframes and breaking out cash base and bonus. We looked at utilizing data visualizations (i.e., charts) more strategically to help give a clearer view of changes and trends; otherwise, more detailed data in a table was considered to increase usability.
Empower rewards conversations: We needed to provide more conversation starters or pointers within the tool that can help managers be prepared and have more meaningful rewards conversations with their teams.
Optimize rewards workflows: Better access to related content and sites to enable task completion across the tools in the HR ecosystem was evaluated. Helping managers traverse the fragmented manager-tool ecosystem by providing relevant links, on-ramps, and off-ramps to related tools and information for a more seamless experience that improves workflow.

It was abundantly clear of the immediate appeal the new TRP design had on employees and managers alike. The clean, welcoming visuals and the ability to see more detail on each page in an easy-to-understand layout were all enhancements that were very well-received.

—Jennifer Hugill, principal program manager, Rewards and Compensation team, Microsoft Digital Employee Experience

Phase 3—Optimize TRP design

Building on the learnings captured from the first two phases, a redesigned user experience was developed including a high-fidelity prototype. The goal of phase three was to assess the usability of the new site and ultimately ensure that user needs and pain points were addressed with the new design.

“It was abundantly clear of the immediate appeal the new TRP design had on employees and managers alike,” says Jennifer Hugill, a principal program manager on the Rewards and Compensation team. “The clean, welcoming visuals and the ability to see more detail on each page in an easy-to-understand layout were all enhancements that were very well-received.”

A view of what the “overview” page of an individual contributor’s Total Rewards Portal (TRP) might look like with the intent of helping them better understand the value of their overall compensation. TRP also includes additional pages that break down sections like cash, stock, and benefits to provide users with more in-depth details. (The benefits detail is only available for US employees.)

Key benefits of the newly configured site include:

Seamless task completion and delightful moments: Core tasks were easily completed, with participants noting the ability to see multiple pieces of information in a single view. Calculations that had real impact were already done for the user, and thoughtful additions like talking points and personal notes are considered both helpful and delightful.
Continued focus on aggregation of information as a key value-add: Bringing together information from disparate systems and tools is important to users. For individual contributor (IC) views, it means showing all compensation-related information in TRP, including hourly wages for hourly workers and revenue and/or commitment-based incentives for sales employees. For managers, it’s bringing together information on direct reports into a single table.
System “insights” that help fulfill business and user goals: Providing system-derived “insights” in both manager and IC views allows employees to spend less time connecting the dots between pieces of information and more time making decisions and taking action. For managers, it provides team insights to ensure fair compensation and talent retention. In an IC view, it showcases opportunities to take advantage of additional benefits left on the table.

A view of what a team dashboard in TRP looks like for managers allowing them quick and easy access to the compensation information for their directs.

“The Total Rewards Portal provides seamless access to information about my team’s comprehensive rewards and compensation, allowing me to have meaningful discussions with my employees and leadership during Connect season and beyond,” says Michelle Huenink, a Microsoft manager and global enablement leader in Customer Service and Support.

Ultimately, the goal of the TRP is to show the value of an employee’s individual rewards, while empowering rewards conversations for managers and providing a complete data set to inform decisions. Keeping these core objectives at the heart of our future enhancements enables us to continue to have a tool that provides a consistent experience that our employees will use and enjoy as part of their overall Microsoft employee experience journey.

—Nur Sheikhassan, principal group engineering manager, Rewards and Compensation, Microsoft Digital Employee Experience

Vision and Future State

Putting all the user research findings together leads to clear business objectives and user experience goals for the future state of the TRP solution. These foundational elements ensure the right principles are in place for the product team providing the guideposts to stay true to the product objectives and goals.

“Ultimately, the goal of the TRP is to show the value of an employee’s individual rewards, while empowering rewards conversations for managers and providing a complete data set to inform decisions,” Sheikhassan says. “Keeping these core objectives at the heart of our future enhancements enables us to continue to have a tool that provides a consistent experience that our employees will use and enjoy as part of their overall Microsoft employee experience journey.”

In a competitive talent market, having a tool like the TRP really helps represent the value of Microsoft compensation and benefits and reminds employees of the company’s investment in them.
By using multiple micro-services, we can build a better experience to represent employee compensation at various stages of an employee’s journey with Microsoft.
Developing and using the Total Rewards Portal is providing us a strong return on our investment (ROI) over time, especially since it is now hosted on Microsoft Azure and because it was developed in-house.
Our sensitive compensation information for 220,000-plus employees stays within the control of Microsoft while informing our employees.

The post Helping Microsoft employees understand their value with the Total Rewards Portal appeared first on Inside Track Blog.

Azure resource inventory helps manage operational efficiency and compliance

Inside Track staff — Wed, 24 Jul 2024 19:16:50 +0000

One of the benefits of Microsoft Azure is the ease and speed in which cloud resources and infrastructure can be created or changed. Teams across Microsoft can scale up or scale down their cloud resources to meet their workload demands by adding or removing compute, storage, and network resources.

Microsoft Digital has developed tools and processes that help us effectively manage physical IT assets and resources. But with the increase in cloud resources comes some unique challenges. Conventional processes weren’t adequately giving us visibility into self-provisioned usage and related risks. Teams and business units at Microsoft could acquire cloud resources on behalf of the organization without passing through the traditional controls that give us some level of oversight and governance.

The adoption of self-service cloud technologies was making it difficult for us to keep up with rapid changes. We needed better visibility into Azure resource utilization for individual employees, groups, and roles. To improve our ability to manage Azure resources and to help ensure compliance, we developed processes to help us:

Create and maintain an inventory of the Azure subscriptions and resources used within the enterprise.
Define a methodology to help us correlate detailed resource-level records with operational visibility. This provides a cross-checked resource management mechanism that can be audited.
Develop a system for Azure usage management that uses the inventory to help us drive the most efficiency and value from our Azure resources.

Improving the efficiency of Azure resources

In a cloud environment, performance and availability of business workloads are often addressed by initially overestimating the compute and storage resources required. We didn’t have visibility to collect usage data or to determine whether the resources required to run an application were in alignment with the demand or needs of the business. To be more efficient with resources, we needed a way to identify underutilized capacity, dormant or orphaned resources, and other undesirable artifacts that can lead to increased costs and unnecessary risk or complexity. Our starting point in addressing the challenge was to gather and maintain an accurate inventory of the resources within Azure to help ensure that the proper controls are practiced, optimize resources, and mitigate unsanctioned cloud use.

Reducing risks through increased visibility

As an IT organization, we can’t manage risks that we can’t see. We require visibility into our environment to help us effectively measure, manage, and protect our infrastructure and systems. For our behavior-based Security Incident and Event Management (SEIM) systems to perform their functions, they rely on an accurate view into IT infrastructures. When assessing compliance, security, cost-effectiveness, efficiency, troubleshooting, or other important functions, we need the capability to view and delve into every resource to determine its purpose, who can access it, and its value to the business.

Understanding the risk and usage profiles of both sanctioned and unsanctioned Azure cloud resources requires the collection of accurate Azure resource and usage information—they’re necessary for correlating risks and behaviors. Implementing appropriate controls and a method to monitor for unsanctioned usage helps us reduce the risks associated with unsanctioned and unknown cloud resources. Those risks include:

Inefficient use of resources. Trying to manage and support unsanctioned cloud resources consumes unnecessary time, effort, and expense. Audits and investigations can provide inaccurate or less effective results, and it can be difficult, or impossible, for us to enforce security policies on unsanctioned cloud resources.
Process maturity and execution inefficiencies. Although we’re working to advance operational levels of process maturity, unsanctioned and unknown cloud resources can lead to inefficiencies in:
- Compliance and policy audits, and overall audit effectiveness.
- Inventory and configuration management processes and practices.
- Patch and vulnerability management.
- Quality and operational processes.
Data loss or leakage. Unsanctioned and unknown cloud resources expand our threat surface. If cloud services are used to store business data, it occurs outside of our organizational policies and controls—and that data could be exposed, or exploited.

Creating an Azure resource inventory with usage and reporting capabilities

Just about everything in Azure that’s associated with an account or a subscription is considered a resource. There can be thousands of resources used for a single Azure deployment, including virtual machines, Azure Blob storage, address endpoints, virtual networks, websites, databases, and third-party services.

To be able to produce a comprehensive inventory, we needed to be able to answer the following questions about all of the Azure resources in use across the organization:

What is it?
Where is it?
What is it worth?
Who can access it?

We’re responsible for managing the on-premises and cloud resources in our environment at Microsoft. Because cloud services are self-service and constantly changing, we needed to ensure that any methodology that we created to inventory Azure resources was agile enough to keep pace.

We designed an Azure inventory solution that would collect subscription information from our internal billing system, resource and usage data from Azure Resource Manager, and store it in an Azure SQL database. The collected data could then be audited and reported on.

High-level architecture of the Microsoft Digital Azure resource inventory solution

Step 1: Locating and identifying the subscriptions within the enterprise

Subscriptions help us organize access to cloud service resources. They also help control how resource usage is reported, billed, and paid for. Each subscription can have a different billing and payment setup, so you can have different subscriptions and different plans by department, project, regional office, and so on. Every cloud service belongs to a subscription, and the subscription ID may be required for programmatic operations.

To identify which subscriptions we had in the environment, we generated a list from our internal billing system. The list we pulled from the internal billing system represented our “universe” view of all of the Azure subscriptions we would be collecting resource information for in Azure Resource Manager.

NOTE: Customers with an Azure Enterprise Program Agreement can access usage and billing information through a representational state transfer (REST) API. An enterprise administrator must first enable access to the API by generating a key from the Microsoft Azure Enterprise Portal. Anyone with access to the enrollment number and the key has read-only access to the API and data.

Step 2: Ensuring access to the subscriptions

Azure Resource Manager is a central computing role within Azure that provides a consistent layer for administrating and managing cloud resources. It’s also the component responsible for providing access to detailed resource usage reports and data. We use Azure Resource Manager REST APIs to pull resource and usage information from Azure Resource Manager into the data collection solution we built.

To effectively monitor Azure cloud usage and access privileges, our administrators required both visibility and administrative access into subscriptions and resources to list, monitor, and manage them. We created an Azure Active Directory service principle object that provides read-only access to our automated data collection tool.

Step 3: Building a data storage solution for subscription and resource metadata

We built a storage solution for subscription and resource metadata that we collect from the billing system and Azure Resource Manager using Azure SQL. We use Blob storage for backup. The datasets that we collect from the APIs aren’t standard, so we parse and structure them before we place them into the Azure SQL database. Our primary data storage solution supports only structured data, but our backup Blob storage supports unstructured data.

Step 4: Constructing an automated data collection tool

The data for the Azure resource inventory comes from 60 APIs, so we couldn’t rely on manual processes to collect that data with any regular frequency. Manual processes don’t scale and aren’t cost effective. We constructed an automated data collection tool that calls the numerous REST APIs to capture and store the metadata on a daily basis. The automated tool is a Windows virtual machine that has a C# native application running on it that calls the 60 Azure REST APIs. The application captures and parses the returns of each dataset before storing it in the Azure SQL database. The tool then creates a backup copy in Azure Storage.

Using an automated tool for data collection provides reliable results on a predictable schedule and saves us a great deal of time and money

Step 5: Consolidate and link together datasets to create a subscription-level view

Each dataset represents a single object or view of the information. We use the unique subscription IDs and resource names to create subscription-level views that we can compare to our Azure baselines. After the data is consolidated and linked to its subscription ID and resource name, we can begin working with it to analyze and audit for specific activities, using familiar productivity tools like Power BI, Excel Power Query, or Excel PowerPivot. We regularly send Azure configuration insight reporting data to two internal portals—one that’s related to security and compliance, and another that reports organizational efforts to keep devices safe by keeping them current. We also use the resource information in our reporting to identify areas in which we have an opportunity to improve compliance through user education. Some of the reports we use include:

Azure Security Center alerts and compliance report. With this report, we pull a list of alerts that are found in Azure Security Center and provide detailed statistics, such as the number of High, Medium, and Low alerts found in the environment and the top subscriptions that are seeing alerts. The target audience is application teams and their organizations to help focus their efforts.
Compliance reporting by group. For our compliance reporting, we apply our baselines and aggregations to the Azure inventory. The compliance rates can be viewed at either an organization or team level to provide overall or drill-down information about compliance. The target audience is management and compliance leadership, to help them drive Azure security and compliance.
Compliance reporting for user role authorization. This report helps us identify user role authorization, assess them against the baselines as defined by the security use case, or narrative, and determine corresponding compliance rates against it per resource. This report includes the:
- Total number of administrators in the environment.
- Average administrator counts across groups and teams.
- Number and names of non-employees that have privileged roles in subscriptions (contributor, administrator, and so on).
- Number of potential unauthorized assignments.
- Names of the people who created the potential unauthorized assignments.
- Role type assignment details.
Resource type count report. This report includes a breakdown of resource type counts across the organization. including Azure SQL, Azure Virtual Network, virtual machines, Azure storage, and so on. It also contains a breakdown of resource type counts in the three fundamental cloud service models, infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).

We’ve improved our visibility of Azure resources, and that has numerous benefits. Azure makes it easier to provision virtual machines and scope and scale Azure resources for testing. The inventory makes us better able to identify qualified resources for testing products and services.
We can make better decisions about cloud utilization, and reduce costs. And we’re reducing risk through our ability to easily identify and mitigate unsanctioned cloud applications. We’re better able to manage and audit Azure resources, to meet compliance standards by providing oversight and governance.
We didn’t stop there—after creating the inventory, came the task of managing our resource and subscription configurations.

The post Azure resource inventory helps manage operational efficiency and compliance appeared first on Inside Track Blog.

Deploying Kanban at Microsoft leads to engineering excellence

Douglas Gantenbein — Fri, 19 Jul 2024 08:01:29 +0000

Microsoft has taken a page from the auto industry to use a process called Kanban (pronounced “con-bon”), a Japanese word meaning “signboard” or “billboard.” It was developed by a Toyota engineer to improve manufacturing efficiency.

Microsoft is using Kanban to drive engineering improvement and streamline workflows at Microsoft.

In its simplest form, Kanban involves creating a set of cards that track manufacturing or other step-by-step processes. These cards, tacked to a corkboard, can be used to highlight trouble spots and avoid overcapacity. That latter quality helps Kanban users resist loading up a job with too many side tasks.

“I learned about Kanban when I was in the Marine Corps,” says Ronald Klemz, a senior software engineer manager for Microsoft Commerce and Ecosystems. “When I joined Microsoft, I could see how it applied to software engineering.”

As it turns out, Microsoft already had an internal Kanban evangelist: Eric Brechner, who has since started his own company, leaving behind an influential legacy and a must-read book.

[Learn how Microsoft uses Azure Resource Manager for efficient cloud management.]

Although Kanban at Microsoft had a toehold, most engineers still used “scrum” or “Waterfall” development frameworks. Both attempt to help teams manage and assign workloads. Scrums, for instance, consist of regular planning meetings followed by two week to month long sprints that are meant to complete a particular stage of work.

We had a need to really visualize our work, which scrums couldn’t provide. Another engineer said, ‘Hey, have you heard about Kanban?’ We did some research and decided this was a good fit.

—Jon Griffeth, software development engineer, Microsoft Commerce and Ecosystems

While plenty of good work has come out of scrums and Waterfalls, they are not always ideal for driving engineering improvement. In scrums, for instance, the regular meetings can be time consuming and even though scrums are designed to break big jobs into manageable pieces, teams can still become overwhelmed if customers add new requirements on the fly.

“At the start of each two-week scrum cycle, you’re expected to know everything that you’re going to do in those two weeks,” says Snigdha Bora, an engineering lead with Microsoft Digital, the organization that powers, protects, and transforms Microsoft. “But there are things that will happen in those two weeks that you can’t know in advance. All of that goes away with Kanban because it has no limitations or artificial boundaries of a week or two weeks.”

“We were having problems managing with scrums, and were constantly missing sprint conclusions,” says Jon Griffeth, a software development engineer and program manager for Microsoft Commerce and Ecosystems. “We had a need to really visualize our work, which scrums couldn’t provide. Another engineer said, ‘Hey, have you heard about Kanban?’ We did some research and decided this was a good fit.”

Whether built with simple paper tags or using more sophisticated software versions, a Kanban board shows rows of cards arranged in columns that represent stages of a project’s workflow. Each card contains a specific task and who is responsible for it.

One of Kanban’s most valuable aspects is that each column is designed to self-limit work in progress. If an extra card is added that exceeds the agreed upon limit of tasks, the column heading might light up red, indicating a possible bottleneck that could delay work.

“It helps to simplify the workflow, so people aren’t getting hit with all kinds of sudden, ad hoc projects,” Klemz says. “They’re able to focus on the agreed-upon workflow.”

Griffeth agrees.

“When we would want to add an item to the workflow, Kanban helped us have more objective conversations about what we could and couldn’t do,” Griffeth says. “It also brings accountability within the team, and people get to pick a task and run with it. Then, if they are done with it, they can go to the next item on the priority list.”

Illustration shows a basic Kanban board, with tasks ordered by whether they have been started, are in process, or have been completed.

If you finish a model, you don’t have to go to the project manager and ask what needs to be done next. You can see what’s next right on the Kanban board, pick up the next step and run with it.

—Baala Arumugam, senior software engineer, Microsoft Commerce and Ecosystems

That last point underscores another advantage of how Kanban at Microsoft drives engineering improvement: Its visual nature makes it easy for someone who is a newcomer to a team, has been on vacation, or is a part-timer, to look at the Kanban board and immediately see what needs to be done.

“With Kanban, it’s much easier to pick things up if you’ve been gone for a couple of days or if you’re just coming into the team,” says Baala Arumugam, a senior software engineer for Microsoft Commerce and Ecosystems. “And if you finish a model, you don’t have to go to the project manager and ask what needs to be done next. You can see what’s next right on the Kanban board, pick up the next step and run with it.”

That is especially handy in a time when COVID-19 has essentially all Microsoft engineers working remotely, often in different time zones. With Kanban boards, often created with Microsoft Azure DevOps, they can always immediately see the status of a project.

Microsoft team members who have worked with Kanban include Baala Arumugam (center), Snigdha Bora (upper right), Jon Griffeth (lower right), Binu Surendranath (lower left), and Ronald Klemz (upper left).

Binu Surendranath’s team owns the tools, processes, and controls to ensure that Microsoft’s preferred suppliers and partners are paid in a timely way once invoices are approved. They also ensure tax and other statutory compliances globally, provide tax and statutory compliance information, and report payments to the Internal Revenue Service.

Those multiple workflows led to siloed work, with different members of the team unaware of what co-workers were working on, or how their work had an impact on others.

“Everybody had their own priorities,” Surendranath says. “If I’ve finished one part of the puzzle, I celebrate a victory. But that didn’t really make a dent in the overall project. We support global businesses that are expanding exponentially. Having common, quantifiable business outcomes for everyone to work towards became an obvious need.”

Kanban has helped his team create a more collaborative work environment while still giving engineers plenty of freedom for innovation and simplification to positively impact customer experience and business needs, Surendranath says.

Sounds good. But what about concrete benefits to Kanban at Microsoft? There are plenty.

“Gone are the days when we’d spend nine months on a quarterly update,” Surendranath says. “Now when you close and open Outlook, you have a new Outlook because of the frequent updates Microsoft makes to it and other apps. That takes a more agile development approach that Kanban works well with.”

The agility plays well with Microsoft customers, who like to see product improvements that are rapid and seamless. The same goes for the business expansion of Microsoft Azure and data center launches and announcements.

“From the time Microsoft CEO Satya Nadella announces a roll-out, we have just a few weeks to get everything up and running,” says Surendranath. “Kanban has really enabled us to meet that need with a high level of confidence and transparency. Kanban dashboard enabled real-time transparency on progress of business priorities and allowed us to manage our OKR (Objectives and Key Results) closely and were able to drive our monthly business reviews more efficiently. We started bringing up the dashboard during our business reviews to give transparency to all global stakeholders, which eventually helped build stronger trust.”

Kanban also helps Microsoft teams more effectively manage and deploy global statutory laws and compliance, which can change rapidly with predefined timelines and in most cases are non negotiable.

Griffeth’s engineers, meanwhile, were assigned the task of creating a new purchase order workflow for a team in India.

“We tracked a lot of what had to be done in Kanban,” he says. “It helped us see where a bottleneck might be, such as the product owner flooding the first step of the process with a lot of requests, or if code validation becomes a problem.”

The result: A smoother process, happier customers, and a team that worked well together. The team also saw improved productivity because no one was spending time in scrum meetings or working as scrum master. Internal customers and business groups embraced real-time transparency, accountability, and predictability on engineering dependencies.

Kanban continues to be a learning process for Microsoft engineers using it, and it has not yet gained truly widespread acceptance. But it has shown a path to make software development faster and more trouble-free, while helping teams work together more effectively.

Learn how Microsoft uses Azure Resource Manager for efficient cloud management.

Share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Deploying Kanban at Microsoft leads to engineering excellence appeared first on Inside Track Blog.

Enhancing space management internally at Microsoft with Wi-Fi data

Cristine Jung-Fassnidge — Thu, 18 Jul 2024 16:00:55 +0000

Space management and employee engagement are two critical aspects of any modern workplace, including internally here at Microsoft.

Figuring out how to get both right leads to important questions:

How can organizations understand the best use of their building spaces, including offices and common spaces, while providing better experiences for their employees? How can they reduce the cost and complexity of installing and maintaining IoT sensors to measure people density in different areas? How can they protect the privacy of employees and their devices and comply with privacy regulations?

This is what we asked ourselves when we set out to enhance both space utilization and the experience our employees have when they go into the office in our brand-new buildings here at Microsoft headquarters in Redmond, Washington.

We in Microsoft Digital, the company’s IT organization, knew that each new building would come with a wireless access point (WAP) system that employees use to access Wi-Fi. We knew the data from the access points could be used to measure the people density in different areas. The question was, how could we use this data to provide real-time insights to employees and facility managers privately and securely?

We identified an opportunity to reuse the existing devices and the data that we already had from these devices. It was a cost-optimized way of handling our requirements.

— Nritya Reddy, senior product manager, Microsoft Digital

Using WAP data to measure space utilization

Improving our space management using Microsoft Azure and AI is the focus for Nritya Reddy, Daniel Lee, Veeren Kumar Chimbili, Lakshmi Kothamasu, Sudhakar Sadasivuni, and Bharath Kumar.

Our solution, Space Busyness Insights, uses our standard Wi-Fi WAP devices located throughout each building to calculate data on space utilization. This data includes identifying unused areas, occupied spaces and the crowd density, and the availability and use of common areas. By analyzing this data, we can make informed decisions about how to best allocate additional space or repurpose existing areas for more effective use. Additionally, we can plan for future real estate requirements.

“We identified an opportunity to reuse the existing devices and the data that we already had from these devices,” says Nritya Reddy, a senior product manager on the Microsoft Digital team. “It was a cost-optimized way of handling our requirements.”

Our employees’ benefit from this solution is being able to view, in real time, the availability and activity in shared spaces such as kitchenettes and conference rooms. To implement this solution, we collaborated with our Infrastructure and security team, Innerspace (a third-party vendor), and Microsoft facilities managers. We integrated AI to enhance our data measurement and analysis capabilities, enabling us to create actionable plans for space management.

“The era of modern smart experiences with IoT hardware demands innovative solutions that can be stitched across multiple devices and protocols with a cost-efficient design and architecture. I consider this as an opportunity to use the signals from two ecosystems to build secure, privacy-protected, smart building experiences. This gives us further opportunities to explore various use cases with WAP technology without additional hardware integrations,” says Sudhakar Sadasivuni, principal group engineering manager, Microsoft Digital.

Our innovative approach of repurposing existing devices for new requirements emphasized cost optimization and helped us be frugal with our resources.

“We have an existing Wi-Fi infrastructure in all our buildings, provisioned via WAP devices by different vendors. They can provide a list of devices that are Wi-Fi- connectable and in the discoverable range of the given WAP device,” says Reddy. “By employing artificial intelligence and machine learning on this raw data, we can triangulate people density. Meaning, you would know how many people are in that specific area based on some of the devices that these people are carrying, either a laptop or a mobile phone, which are discovered by these WAP data points.”

We sell primarily to the largest enterprises, so we needed to build on a robust, highly secure, highly scalable, and universally trusted cloud infrastructure.

— Matt MacGillivray, co-founder and VP of Research and Development at InnerSpace

We partnered with InnerSpace, a vendor who has the logic and AI ML capabilities in their system to understand and make sense of the raw data that came from the WAP points and then provide meaningful people-count data.

“We sell primarily to the largest enterprises, so we needed to build on a robust, highly secure, highly scalable, and universally trusted cloud infrastructure,” says Matt MacGillivray, co-founder and VP of Research and Development at InnerSpace.

He shared how they used Microsoft Azure services to run their logic and provide the output.

“We used Azure Kubernetes to provide elastic capacity for our ingest pipeline and datastore, Azure App Services to run our client-facing web-based tooling, and Azure Container Instances to deploy containerized subsystems without needing to manage the machines running them,” MacGillivray says.

InnerSpace also uses proprietary AI logic to ensure that people aren’t counted double because they might be carrying more than one device. Based on the proximity of those devices and other logic and rules in place, they can help us determine space usage.

The device identifiers are shared between systems in a hashed way. This ensures that specific devices discovered cannot be identified and personal identification information (PII) is protected. We performed stringent Microsoft architectural and data privacy reviews to ensure that no private data is being leaked at any stage. In addition to privacy, scaling and security are other key aspects considered when exchanging data with external systems.

— Lakshmi Kothamasu, principal software engineering manager, Microsoft Digital

We implemented this solution in our Redmond East Campus buildings, and through this process, we get the information we need for space utilization with these two goals in mind for our employees:

Protect our employees’ personal information and privacy
Comply with privacy regulations

To make our solution work with these two goals in mind, we hash the media access control (MAC) addresses of the devices to anonymize the data we send to the third parties, and we perform Microsoft privacy reviews. We only provide InnerSpace with information that they need to analyze the data and make sure to protect everything else. Any data that can be identifiable and linked to a specific device or person(s) is hashed.

After encoding or hashing the data, we get the data that is pushed from our internal Microsoft team to our Device Management Services (DMS) Azure Event Hub.

From there, we have a federated authentication mechanism in place for our vendor, InnerSpace, for them to access the anonymized data from our Azure Event Hub. InnerSpace then runs their logic over that data and provides the people count in a space context back to Microsoft.

We also ensure that InnerSpace has our building maps with the access point (AP) IDs and locations on them so they can run their triangulation algorithms to pinpoint the number of people in any space at a given point in time.

When we get that information back, we can then use that data to review and analyze the information and make space utilization decisions.

“The device identifiers are shared between systems in a hashed way. This ensures that specific devices discovered cannot be identified and personal identification information (PII) is protected. We performed stringent Microsoft architectural and data privacy reviews to ensure that no private data is being leaked at any stage. In addition to privacy, scaling and security are other key aspects considered when exchanging data with external systems,” says Lakshmi Kothamasu, principal software engineering manager, Microsoft Digital.

A diagram of how our network traffic architecture flows from the WAP system to InnerSpace.

“We underwent a review where the subject matter experts on the privacy team reviewed the entire architecture and made sure that no device identifier or personally identifiable information of the employee was directly or indirectly being passed on,” Lakshmi says.

A diagram of how we follow the process flow to get information from the WAP system to InnerSpace.

Using our solution to plan for future needs

The benefit of this solution is that it enables real estate and facilities managers to optimize the space utilization and plan for future needs, and it empowers employees to make informed decisions about where and when to use common areas, such as the kitchenettes and the meeting rooms, in real time. We also use a smart building kiosk that allows employees to access the data in a simple and intuitive way.

“The smart building kiosk can be used to open an app, look at a map on the web, or go to a kiosk to see a map. When the employee zooms in, they can see if a space is busy or not,” Reddy says.

Maximizing cost savings

By using the existing WAP system instead of installing new sensors, we saved around $3 million in hardware costs for the East Campus buildings. Because the WAP system exists in all buildings, we can easily enable this solution in other buildings without additional hardware costs.

The cost avoidance isn’t just about not having to buy those IoT sensors and installing them, but also the continued maintenance and security of those devices. You have firmware updates and security updates in the future, so the life cycle costs come down to quite a bit of savings from not having to implement duplicative infrastructure.

— Daniel Lee, regional lead, Center of Innovation, Microsoft Global Workplaces Services

The cost savings go beyond just the hardware. In every building, the fundamental IT infrastructure includes WAP, which is essential for providing Wi-Fi connectivity. Our Microsoft internal team has developed a highly configurable solution that doesn’t require any code changes. To integrate a new building, we simply need to update the configuration with the new AP layout, and the system operates seamlessly. While the initial implementation at the East Campus took about three months, the process has been significantly streamlined for other locations and can now be completed in just a week or two.

“The cost avoidance isn’t just about not having to buy those IoT sensors and installing them, but also the continued maintenance and security of those devices. You have firmware updates and security updates in the future, so the life cycle costs come down to quite a bit of savings from not having to implement duplicative infrastructure,” says Daniel Lee, a regional lead on the Center of Innovation team in Microsoft Global Workplaces Services.

When considering a building, whether it’s a leased space for customers or a company’s own property, optimizing the use of space is crucial. Real estate comes with significant costs, not only in acquisition but also in ongoing maintenance. We need to ensure that employees are making effective use of these spaces. If they’re not, it’s important to understand why, so that we can address any issues and improve space utilization.

Gaining additional benefits

We’ve talked a lot about space benefits for planning for space and cost reduction, but let’s also talk about other benefits of using the solution:

Data-driven decisions: Removing emotional guesswork from space planning with clear-cut data on actual space usage.
Holistic analysis: Combining WAP data with other sensor signals like lighting and air quality for comprehensive space planning.
Rapid deployment: Streamlined process for implementing the solution in new locations.

By gathering and using the WAP device data, we can optimize space utilization, but also gain insights about what our employees need from us to optimize their experience.

How other companies can benefit from Microsoft’s solution

We have an aspiration of rolling our solution into the upcoming product Microsoft Places and making it self-sustained and scalable. Places is a product that aims to provide a holistic view of the physical and digital spaces in an organization and how they’re used by the employees.

I believe the key advantages of our solution are, first, the enhanced security that comes with not having to add extra hardware or devices. Second, we’ve managed to reduce the number of devices installed across the buildings. And third, because of these improvements, we’ve achieved additional cost savings for Microsoft. That’s the significant impact this solution has delivered to Microsoft.

— Bharath Kumar, principal PM manager, Microsoft Digital

We’re currently using this solution in seven buildings and our goal is to continue implementing this solution in our other buildings.

“I believe the key advantages of our solution are, first, the enhanced security that comes with not having to add extra hardware or devices. Second, we’ve managed to reduce the number of devices installed across the buildings. And third, because of these improvements, we’ve achieved additional cost savings for Microsoft. That’s the significant impact this solution has delivered to Microsoft,” says Bharath Kumar, principal PM manager, Microsoft Digital.

Other companies that have similar space management and employee engagement needs could benefit from Microsoft’s solution, because it uses existing Wi-Fi infrastructure, reduces the dependency on external sensors, protects the privacy of the employees, and provides a simple and intuitive way to access the data.

“Our aspiration, as we productize this solution, is to eliminate the dependency on anything but the actual product itself. One product we used is Azure Digital Twins, which gets the whole experience lighted by making sense of people count against the space and processing that information,” Reddy says.

Here are some tips on getting started at your company:

Consider implementing a similar solution to optimize space utilization and improve employee experience in your own buildings.
Use existing Wi-Fi infrastructure to reduce cost and dependency on external sensors and vendors.
Ensure that the solution protects employee privacy and complies with privacy regulations.
Stay informed about the latest developments and best practices in the field of space utilization and employee experience.

Create your own Azure free account today on the Microsoft Azure product page.

Want more information? Email us and include a link to this story and we’ll get back to you.
Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Enhancing space management internally at Microsoft with Wi-Fi data appeared first on Inside Track Blog.

Running our customer service and support contact centers on Microsoft Azure

Eric Scheffler — Thu, 11 Jul 2024 16:00:30 +0000

Providing exemplary support is critical to how we empower our customers to achieve more with Microsoft technologies and services.

We in Microsoft Digital, the company’s IT organization, recently migrated our global Microsoft customer service network to Microsoft Azure, creating a cloud network-based solution to connect our customers to the support services they need at Microsoft. With the new solution, our customers and customer service team members are connected faster, more reliably, and with improved network performance while maintaining secure and compliant connections.

Building a better customer support network

Eric Scheffler and Elaine McNeill are part of the team at Microsoft that has moved our contact center network infrastructure to Microsoft Azure.

Our Support Experience Group (SxG) within our Microsoft Cloud + AI division is driving transformation for Microsoft Support solutions; building on Microsoft solutions and infusing cutting-edge innovation to improve customer and agent experiences across all Microsoft businesses. Our SxG team provides platforms and services to almost 80,000 Microsoft support advocates including technical teams and customer support advocates from our network of global contact centers. Our customer support advocates and partners are integral in maintaining high-quality customer service and support for Microsoft products and solutions. Microsoft handles almost 200,000 support calls daily in 37 different languages worldwide. It’s a diverse and fast-paced environment where connecting support staff to the customer and the Microsoft services they support can be complex.

Our previous global network backbone served us for years through the deployment of key regional central hub sites. Hub sites were connected by physical point-to-point Multiprotocol Label Switching (MPLS) circuits deployed strategically to various sites globally. The MPLS network design is complex, costly, and inflexible.

By redesigning our network with Microsoft Azure Cloud Network solutions at the center, we’re addressing several challenges associated with traditional MPLS networks, such as:

Cost and complexity: MPLS networks are often expensive and complex to deploy.
Inflexibility: MPLS is designed for stable, point-to-point connections and can be too rigid for the dynamic and distributed nature of modern cloud computing. It struggles to efficiently handle the traffic patterns created by enterprises running workloads across multiple clouds.
Deployment speed: Setting up or modifying MPLS connections can take weeks or even months, which is not conducive to the agility required by businesses today. Cloud networks can be deployed and scaled much more rapidly.
Security and encryption: Traditional MPLS doesn’t offer encryption, which is increasingly important as operations move toward the cloud. A cloud network can provide consistent protection regardless of how users connect.

At the core of our transformation is a newly designed global, cloud-based network built on Azure Virtual WAN services called the SxG Cloud Network, built specifically for Microsoft customer services. The SxG Cloud Network directly connects advocates at Microsoft contact centers, remote advocates and internal support teams to the required services.

The SxG Cloud Network provides a highly reliable and high-performing network path into Azure, where support team members can access the tools and environments required to support our customers fully. Within the network, our customer service teams are connected to Azure Virtual Desktops that supply the tools and connectivity they need for troubleshooting, enabling them to connect with Microsoft customers worldwide through virtual private network (VPN) and Azure Virtual Network (VNet) peering.

The SxG Cloud Network resides on the Microsoft Azure tenant and consists of several virtual WAN hubs in key Azure regions across the globe. These hubs use Microsoft Azure Firewall to secure traffic flows within the cloud network using URL filtering, TLS inspection, and intrusion detection and prevention.

The Azure-based hubs provide a single access point that simplifies connectivity and creates a unified and consistent environment for all support advocates. We provide several connectivity methods for our Microsoft customer support advocates irrespective of location, including:

Point-to-site (P2S) VPN: This provides connectivity for the remote user working from home.
Site-to-site (S2S) VPN: We use S2S VPN to connect Microsoft contact centers using an S2S encrypted tunnel between the partner VPN concentrator and the SxG Cloud Network gateway.
VNet peering: We also support peering between a partner Azure tenant and the SxG Cloud Network Azure tenant. VNets on both tenants are directly peered and secured by Azure Firewall.

Point-to-site VPN

Remote Microsoft customer support advocates use Azure P2S VPN to connect directly to Microsoft services in Azure. We maintain several VPN hubs across global Azure regions to ensure that advocates experience the most direct network path to Azure. We use Azure networking components within Azure to connect to the required internal Azure resources.

To ensure that only necessary traffic goes through the VPN, VPN profiles are configured with split-tunnel routing that routes Microsoft specific traffic to Azure and the rest to the partner network or the public internet. This ensures that users can access local websites in the correct locale and languages they need, while also enabling low-latency access to the Microsoft corporate edge network.

The Azure VPN client facilitates connectivity between the local device and the Azure Virtual WAN gateway hosted in the SxG network. We use a single VPN profile configured with split tunneling for all VPN users. This is made possible by a key feature of Azure Virtual WAN that automatically connects P2S users directly to the closest region. Authentication is required to access the VPN and users authenticate using their Microsoft credentials through Entra ID and multi-factor authentication.

Site-to-site VPN

S2S VPN connections provide a secure encrypted VPN connection over the public internet to connect our contact centers to Microsoft customer support services in Azure. The contact center partner manages their network and the configuration of the device on their network, which establishes a VPN tunnel to the Azure Virtual WAN gateway hosted in the SxG Cloud Network.

VNet peering

When partners already have an Azure presence, Microsoft can connect the partner Azure network to the virtual WAN using Azure VNet peering. Traffic between the peered VNets doesn’t leave the global Azure backbone network. We use SxG VNet peering to connect VNets in the Microsoft tenant with VNets in the partner’s Azure tenant. VNet peering establishes a high-performance, trusted connection using Azure Firewall in the SxG Cloud Network to provide flow control and traffic protection.

An architecture diagram of the SxG Cloud Network.

Managing connectivity for voice services

Our advocates often support our customers with voice calls, and supporting an effective and efficient voice service is integral to the SxG Cloud Network.

We use Azure ExpressRoute connections to create a direct private network path from all our Azure Virtual WAN gateways to our voice services platform environment using an MPLS backbone. These global connections to our voice services hosted in Azure enable advocates connected to the SxG Cloud Network via P2S, S2S, or VNet peering to use our voice services. The Interhub feature in Azure Virtual WAN also provides seamless connectivity between hubs, ensuring that user network traffic takes the best path with minimal latency while traversing the Microsoft backbone network.

Microsoft customer service advocates voice services are now migrated to Azure Communication Services, which is connected to the SxG Cloud Network with ExpressRoute and keeps traffic on the reliable Azure backbone network.

The SxG Cloud Network has modernized how we connect to voice and data services hosted in Azure and can provide advocates access without needing to deploy physical circuits to contact center locations, saving time and money. It also creates a unified network environment, simplifying access points and functionality for our advocates.

With the flexibility and scalability of the SxG Cloud Network, we can manage our bandwidth needs better and have fewer physical circuits that are oversized for the traffic volume. This alone is reducing network costs by more than 60% in specific cases. While exact figures for cost savings and performance improvements can vary depending on the specific circumstances of a deployment, businesses often report significant reductions in total cost of ownership (TCO) and enhancements in network performance when migrating from MPLS to Azure cloud-based solutions.

Looking forward

As we look to the immediate future of the SxG Cloud Network, we’re excited about increasing Azure Communication Services traffic on our network for voice support, further unifying our services and leading to more significant cost savings and efficiency. We’ll continue searching for ways to improve the SxG Cloud Network, including moving the network edge closer to our users with new global virtual WAN hubs. This helps us deliver more effective and easy-to-use support services for Microsoft customers and the advocates who support them.

We’re benefiting from the SxG Cloud Network in several areas, including:

Experience enhanced support: Connect faster and more reliably to support services thanks to our migration to the Azure-based SxG Cloud Network, ensuring high-quality assistance whenever Microsoft customers need it.
Global reach, local service: The SxG Cloud Network spans countries and languages, providing a seamless support experience through a diverse team of professionals ready to assist customers.
Secure and simplified connectivity: Azure Virtual WAN offers various connection options, including VPN and VNet, to ensure a secure, direct connection to support resources.
Future-ready voice services: Azure Communication Services is creating a more integrated and cost-effective voice support system, enhancing the support experience while maintaining the highest network reliability standards.

Create a P2S User VPN connection using Azure Virtual WAN

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Running our customer service and support contact centers on Microsoft Azure appeared first on Inside Track Blog.

Creating a manageable Microsoft Azure subscription model

Inside Track bot — Thu, 06 Jun 2024 21:15:39 +0000

Editor’s note: This story was written by a bot powered by Microsoft Azure OpenAI. The bot interviews subject matter experts in Microsoft Digital to generate new stories quickly. We have humans in the loop to ensure the accuracy and completeness of our AI-powered stories.

In the rapidly evolving world of cloud services, managing technical subscriptions can become a daunting task. At Microsoft, we faced a similar challenge—Microsoft Azure subscription sprawl.

“If customers don’t have a formal system in place to manage their Azure subscriptions, it can lead to subscription sprawl,” says Trey Morgan, a principal product manager on our Microsoft Digital Azure Optimization team. “This can cause potential legal and security risks.”

Our solution?

The Azure Information Request System (AIRS).

The impact of AIRS has been significant, particularly in governance and cost management. By assigning subscriptions to the business hierarchy from day one, they don’t get lost in a company of our size. We can quickly identify who to contact for security issues, cost issues, and understand how these cloud resources fit into Microsoft’s business.

— Trey Morgan, principal product manager, Microsoft Digital Azure Optimization team

AIRS streamlines the process of setting up new Azure subscriptions.

Trey Morgan is a principal product manager on our Microsoft Digital Azure Optimization team.

“AIRS is an internal system we’ve developed that offers a solution to govern and track subscriptions, a strategy that Microsoft has effectively used,” Morgan says.

Users requesting a new subscription fill out a form detailing cost assignment and ownership. The system also helps assign the subscription to our business hierarchy, providing visibility on where the cloud resources fit within the company.

“The impact of AIRS has been significant, particularly in governance and cost management,” Morgan says. “By assigning subscriptions to the business hierarchy from day one, they don’t get lost in a company of our size. We can quickly identify who to contact for security issues, cost issues, and understand how these cloud resources fit into Microsoft’s business.”

We’ve also integrated AIRS with tooling that benefits several different parts of our business.

“Azure governance, security, finance, and leadership all benefit from AIRS,” Morgan says. “Without it, we would lack crucial information about these Azure subscriptions or why they exist.”

Azure subscription sprawl strategies

To prevent subscription sprawl in your Azure environment, consider implementing the following strategies:

Consistent landing zones: Establish consistent landing zones based on application archetype subscription strategies. This approach minimizes the growth of subscriptions by providing predefined structures for different types of workloads.
Requisite components definition: Expand the definition of requisite components to better align with the governance and compliance needs of a mature cloud enterprise. Clearly define what components are necessary for each subscription, ensuring that they meet organizational standards.
Subscription policies: Control the movement of Azure subscriptions out of the current directory and into it. Global administrators can allow or disallow users from changing the directory of an Azure subscription. For specific scenarios, configure a list of exempted users who can bypass the policy settings that apply to everyone else.
Restrict self-service subscriptions: Disable self-service purchases to prevent standard users from creating subscriptions without proper authorization.

Remember that effective governance and clear policies play a crucial role in managing subscription sprawl and maintaining a well-organized Azure environment. As we continue to evolve and improve AIRS, we hope our journey can provide valuable insights for other companies navigating their own cloud subscription challenges.

The future of AIRS

Having a company operating model and policies is effective and IT leaders need to adhere to them and regularly review cloud subscriptions and usage to use them for the greatest security, flexibility, and output value. As we look to the future, we’re confident that AIRS will continue to evolve and provide even greater benefits to the company.

Consider using a system like AIRS to streamline the process of setting up new Azure subscriptions and assign them to the business hierarchy. Here are some tips on how you can get started at your company:

Establish consistent landing zones based on application archetype subscription strategies to minimize the growth of subscriptions.
Expand the definition of requisite components to align with the governance and compliance needs of a mature cloud enterprise.
Control the movement of Azure subscriptions in and out of the current directory by setting subscription policies.
Disable self-service purchases to prevent standard users from creating subscriptions without proper authorization.
Remember that effective governance and clear policies play a crucial role in managing subscription sprawl and maintaining a well-organized Azure environment.

Create your Microsoft Azure free account today.

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Creating a manageable Microsoft Azure subscription model appeared first on Inside Track Blog.

Moving our network to the cloud with Microsoft Azure

Lukas Velush — Wed, 05 Jun 2024 15:00:56 +0000

Our ongoing move to cloud networking here at Microsoft is at the core of our larger connectivity strategy.

Very practically, this shift is playing a pivotal role in how we are and will continue to support our more than 221,000 employees across 180 countries and regions, many of whom are working remotely. Our need to enable our people to successfully work and connect from where they are remains paramount.

Adopting cloud networking isn’t simply moving network resources from the data center to the cloud—we’re transforming the way we think about networking altogether. It’s about creating a new way to approach our connectivity and the business it supports.

– Raghavendran Venkatraman, principal cloud network engineering manager, Microsoft Digital

And how are we doing all of this?

We’re not going far—we’re using our own suite of Microsoft Azure network products.

“Adopting cloud networking isn’t simply moving network resources from the data center to the cloud—we’re transforming the way we think about networking altogether,” says Raghavendran Venkatraman, a principal cloud network engineering manager in Microsoft Digital, the company’s IT organization. “It’s about creating a new way to approach our connectivity and the business it supports.”

Venkatraman’s team has been using Microsoft Azure to push cloud networking to the forefront of the company’s business strategy, where it’s being used as a tool to drive business agility and innovation, not just connect points on a network.

And cloud networking is evolving rapidly.

“Everything is dynamic,” says Tom McCleery, a principal group cloud networking engineering manager in Microsoft Digital. “Implementing cloud networking doesn’t involve waiting for new hardware to get deployed. Almost all aspects of the network are software controlled, and we manage our cloud network environment more like a software development project than a hardware management project.”

“This isn’t about improving networking,” McCleery says. “It’s about fundamentally redefining it and then blowing the top off what was possible with traditional networking. It’s a completely different game. We can create a more complex and capable network environment than you could ever realistically put together with hardware alone, and we can do it in minutes for a network environment that would have taken months or even years to deploy in the past.”

– Tom McCleery, a principal group cloud networking engineering manager, Microsoft Digital

Software-defined networking (SDN) and infrastructure as code (IaC) have been instrumental in redefining how we approach networking. Infrastructure as code is the fundamental principle underlying our entire cloud networking infrastructure. Using IaC, we can develop and implement a descriptive model that defines and deploys network components and determines how the components work together. IaC allows us to create and manage a massive network infrastructure with reusable, flexible, and rapid code deployments.

“This isn’t about improving networking,” McCleery says. “It’s about fundamentally redefining it and then blowing the top off what was possible with traditional networking. It’s a completely different game. We can create a more complex and capable network environment than you could ever realistically put together with hardware alone, and we can do it in minutes for a network environment that would have taken months or even years to deploy in the past.”

We’re approaching network development and deployment with a new perspective. Agility is the key.

Our network engineers have embraced the ability to almost instantly create network environments using IaC methods. Test environments that accurately mirror their production counterparts can be created in moments and decommissioned just as quickly, saving time, money, and effort for everyone involved.

Enabling innovation with modern cloud networking practices

It’s not just about quick deployment; it’s about agility across all aspects of network management. The software-defined networking model allows for rapid provisioning of network resources, automated management, accurate, real-time monitoring, and advanced security features that adapt to the ever-changing threat landscape.

We use Microsoft Azure DevOps, a source control system using Git, to track and manage our IaC templates, modules, and associated parameter files. With Azure DevOps, we can maintain a history of changes, collaborate within teams, and easily roll back to previous versions if necessary.

Using SDN in Azure, we are achieving unprecedented microservice-like agility at a cloud scale. This approach allows us to experiment and refine our network infrastructure configurations as code, enhancing our ability to innovate swiftly and efficiently. By integrating CI/CD practices, we have transformed our network into a truly elastic and dynamic system, capable of adapting seamlessly to our evolving needs.

– Ragini Singh, a principal group engineering manager, Microsoft Digital

We’ve implemented automated testing to create safeguards and tests to validate the correctness and functionality of our cloud network code before deployment.

We’re using configuration management to automate the configuration and provisioning of cloud network objects and services within our cloud network infrastructure. These tools make defining and enforcing desired configurations and deployment patterns easy to ensure consistency across different network environments.

“Using SDN in Azure, we are achieving unprecedented microservice-like agility at a cloud scale,” says Ragini Singh, a principal group engineering manager, Microsoft Digital. “This approach allows us to experiment and refine our network infrastructure configurations as code, enhancing our ability to innovate swiftly and efficiently. By integrating CI/CD practices, we have transformed our network into a truly elastic and dynamic system, capable of adapting seamlessly to our evolving needs.”

Ragini Singh, Raghavendran Venkatraman, and Tom McCleery are part of the team at Microsoft Digital transforming our network with Microsoft Azure.

Continuous integration (CI) pipelines automate the deployment process for our IaC-based cloud network infrastructure. When the infrastructure code passes all validation and tests. The CI pipeline triggers the deployment process automatically.

We’ve implemented robust monitoring and observability practices for deploying and managing our deployments. Monitoring and observability are helping us to ensure that our CI builds are successful, detect issues promptly, and maintain the health of our development process.

By following these steps and using continuous integration and development (CI/CD) practices, we can build, test, and deploy our cloud network infrastructure in a controlled and automated manner, creating a better employee experience by ensuring faster delivery, increased stability, and more effortless scalability.

Fast-tracking cloud networking development with Microsoft Azure

Our network engineering teams use Microsoft Azure to enable an agile deployment and management environment with instant global reach. The Azure network backbone provides instant reach to more than 60 regions worldwide with more than 165,000 miles of fiber optic and undersea cable systems.

Azure Virtual WAN has been instrumental in our recent global cloud networking transformation. We’re using Azure Virtual WAN to provide high-performance networking across our global presence, enabling reliable and security-focused connectivity for all Microsoft employees, wherever they are.

– Raghavendran Venkatraman, principal cloud network engineering manager, Microsoft Digital

We’re using this vast global network to create instant benefits for our employees and business through innovative uses of Microsoft Azure cloud networking components.

“Azure Virtual WAN has been instrumental in our recent global cloud networking transformation,” says Venkatraman, highlighting one of the Azure products currently pushing the boundaries of our cloud networking capabilities. “We’re using Azure Virtual WAN to provide high-performance networking across our global presence, enabling reliable and security-focused connectivity for all Microsoft employees, wherever they are.”

Microsoft Azure Virtual WAN simplifies large-scale branch connectivity and provides optimized and automated connectivity between on-premises workloads across multiple regions and Azure resources. It Integrates various connectivity options, including Azure VPN and Azure ExpressRoute. Azure VWAN enables us to facilitate centralized management and global and branch connectivity monitoring, enhancing the overall network management experience.

Azure Virtual WAN is one of several Azure cloud networking components that are enabling our transformation.

Microsoft Azure Firewall is a fully stateful firewall, providing network-level protection for our applications. We use Azure Firewall to inspect and filter traffic between different Azure Virtual Networks and on-premises networks. It provides application-level filtering capabilities to allow or deny traffic based on rules.

Microsoft Azure VPN enables secure communication between remote users, on-premises networks, and Azure resources over the public internet. Our remote users or branch offices can use Azure VPN to connect to Azure and on-premises resources securely using VPN tunnels. Azure VPN Integrates with Azure Firewall to inspect and filter VPN traffic for security purposes.

Microsoft Azure ExpressRoute provides a dedicated, private connection to Azure from our on-premises data centers, bypassing the public internet. ExpressRoute offers higher reliability, lower latency, and increased security compared to traditional internet-based connections. Integration with Azure Firewall ensures that traffic coming over ExpressRoute is inspected and filtered for security and compliance.

Microsoft Azure NAT Gateway enables outbound connectivity for resources traversing a virtual WAN environment or a virtual network, allowing access to the internet or other external services. Azure NAT Gateway is very useful for scenarios where internal resources need to initiate outbound connections. We use integration with Azure Firewall to control and monitor outbound traffic from Azure NAT Gateways to on-premises and Azure-based networks.

Enabling agility across the cloud networking environment

Together, these Azure products help create an agile, robust, scalable, and secure network architecture that allows us to fulfill several common scenarios that occur across our cloud network:

Secure internet access. We deploy Azure Firewall to inspect and filter outbound internet traffic from on-premises networks and Azure resources while NAT Gateway facilitates the actual outbound connectivity.
Hybrid connectivity. We use Azure VPN and Azure ExpressRoute to create a hybrid network architecture, allowing secure communication between on-premises and Azure resources.
Centralized management. We use Azure Virtual WAN for centralized management and connectivity optimization. Azure Virtual WAN enables us to connect multiple regions, on-premises resources, Azure resources, and branch offices seamlessly.
Localized network edges to improve regional performance. We’re increasing our use of the Azure global network as our primary global backbone. Using the Azure global network, we’ve enhanced regional network performance for many Microsoft employees and office locations by moving the network edge closer to our globally distributed employees.

Recently improved connectivity to our Microsoft Johannesburg location provides a compelling case study of how we’re using Azure to improve our networking posture and performance radically.

The solution relocates the internet edge for Johannesburg to the South Africa North region datacenter in South Africa, using Azure Firewall, Azure ExpressRoute, Azure Connection Monitor, and Azure VWAN. We’ve also evolved our DNS resolution strategy to a hybrid solution that hosts DNS services in Azure, which increases our scalability and resiliency on DNS resolution services for Johannesburg users. We’ve deployed the entire solution adhering to our infrastructure as code strategy, creating a flexible network infrastructure that can adapt and scale to evolving demands on the VWAN.

This transformation has been built by the hard work and ingenuity of our network engineers, who have adopted a new way of thinking about how our network functions.

– Tom McCleery, a principal group cloud networking engineering manager, Microsoft Digital

By relocating the network edge to the South Africa region in Azure instead of our data center edge in London and Dublin, connection latency from Johannesburg to other public endpoints in South Africa has dropped from 170 milliseconds to 1.3 milliseconds.

McCleery notes that the changes have been cultural as much as they’ve been technical.

“We’ve been operating our network the same way for more than 20 years,” he says. “This transformation has been built by the hard work and ingenuity of our network engineers, who have adopted a new way of thinking about how our network functions. It’s been a huge shift for them, and much of our innovation has come from a unique perspective or simply questioning how things have always been done. It’s the perspective that we will learn something every time we sit down and talk about this stuff together. With each deployment and iteration, we learn so much and come out of it even better equipped for the next project or problem.”

Succeeding and innovating as Customer Zero

The transformation of our cloud networking environment is a collaborative effort. Being Customer Zero at Microsoft means we’re using our own products and services to optimize our network performance, security, and scalability. By doing so, we pave the way for other customers to benefit from the same solutions and best practices.

One of the key advantages of being Customer Zero for the company is having a close partnership with the Azure engineering teams, who have provided feedback, support, and guidance throughout our transformation. We’ve been able to test new features and capabilities in real-world scenarios, identify and resolve issues quickly, and provide valuable insights for future enhancements.

For example, we were the first adopters of Azure Virtual WAN. Our deployment experiences helped shape the growth of the product and helped the Azure Virtual WAN product team understand how they could improve the user experience, the monitoring tools, and the automation capabilities.

Another benefit of being Customer Zero is accessing the latest innovations and technologies that Azure cloud networking offers. The Azure global network and products that support it give Microsoft Digital—just like any other Microsoft customer—access to an enterprise-scale platform on which we can optimize our network traffic, routing, security, and resilience.

Being Customer Zero also means being a leader and an industry advocate for Azure cloud networking. We can share the learnings and best practices gained from our network transformation journey with other Microsoft customers considering or undergoing similar changes. We’re advocates and innovators, demonstrating how Azure cloud networking can help customers achieve their business goals, such as enabling hybrid and multi-cloud scenarios, supporting remote work and collaboration, and accelerating digital transformation.

Looking forward

One of our key focus areas is the continued adoption of artificial intelligence and machine learning within our cloud network infrastructure. These technologies will enable Microsoft Digital to predict and prevent potential issues and optimize network performance proactively.

The evolution of the Internet of Things (IoT) and edge computing will also influence our cloud networking strategy. Azure’s IoT and edge services will allow for the deployment of network resources closer to the data source, reducing latency and enhancing the user experience.

The possibilities with Azure are endless. We’re just scratching the surface of what we can achieve. We aim to continue pushing the boundaries of cloud networking, making it more intelligent, automated, and even more aligned with our business objectives.

– Raghavendran Venkatraman, principal cloud network engineering manager, Microsoft Digital

As we continue to transform and adapt our global cloud networking environment, we remain committed to being Customer Zero for Azure cloud networking, expanding our network footprint, adopting new network services, and enhancing network automation and intelligence. By doing so, we aim to deliver a world-class network experience for our customers, partners, and employees.

Venkatraman sees a bright future for Microsoft’s cloud networking.

“The possibilities with Azure are endless. We’re just scratching the surface of what we can achieve. We aim to continue pushing the boundaries of cloud networking, making it more intelligent, automated, and even more aligned with our business objectives.”

Consider the following takeaways to help your organization begin or continue its cloud networking journey:

Embrace cloud networking for business agility. Adopting cloud networking transforms the approach to connectivity, driving business agility and innovation.
Use software-defined networking. Use infrastructure as code to deploy and manage a flexible and scalable network infrastructure rapidly.
Innovate with Azure Virtual WAN. Use Azure Virtual WAN for high-performance, secure, and reliable global connectivity.
Automate for efficiency. Implement automated testing and configuration management to streamline network management and deployment.
Monitor for success. Apply robust monitoring and observability practices to maintain the health of the network infrastructure.

Get started by learning how to deploy Azure VWAN with routing intent and routing policies.

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Moving our network to the cloud with Microsoft Azure appeared first on Inside Track Blog.

How Microsoft modernized its purchase order system with Azure microservices

Inside Track staff — Tue, 28 May 2024 17:05:04 +0000

[Editor’s note: This content was written to highlight a particular event or moment in time. Although that moment has passed, we’re republishing it here so you can see what our thinking and experience was like at the time.]

MyOrder, an internal Microsoft legacy application, processes roughly 220,000 purchase orders (POs) every year, which represent $45 billion in internal spending at Microsoft. Until recently, MyOrder was a massive, monolithic, on-premises application. It was costly to maintain, difficult to update, and couldn’t be accessed without a Microsoft-authorized VPN.

MyOrder struggled every May, when traffic could double—or even triple—from 1,000 purchase orders per day to 3,000. When users submitted purchase orders through the ASP.net-based website during these high-load periods, they frequently saw response times as high as 30 seconds, if the application didn’t outright crash or freeze.

Even when it worked as intended, MyOrder’s user experience could be frustrating.

“MyOrder was wizard-based, so users advanced through the app in a particular sequence,” says Vijay Bandi, a software engineer on the MyOrder team in Microsoft Digital. “If you advanced to a point where you didn’t have the information for a required field, you were stuck. It was an awful experience.”

Elsewhere at Microsoft, engineering teams are moving old, monolithic applications to the cloud for increased efficiency, scalability, and security—not to mention vastly improved user experiences. With MyOrder showing its age, the MyOrder team decided it was time to follow suit.

[Learn more on Microsoft’s modern engineering transformation, how it’s embracing a cloud-centric architecture, and how it’s designing a modern service architecture for the cloud.]

The MyOrder team (shown in this collage of Microsoft Teams screenshots) is practicing social distancing, working from home, and communicating exclusively online.

From server-based monolith to agile PaaS

MyOrder—which combined a front end, back end, and all related logic in one solution—was only half of the ancient, monolithic applications that comprised the legacy purchase order system. The other half was the Procurement Services Platform (PSP), a huge middleware services layer. PSP was comprised of about 60 smaller projects and 500 validation paths.

Built on top of PSP, MyOrder collected data from PSP and housed it in one of the 35 servers required to run the application. It was hosted in four separate virtual machines to support the load. The engineering team used a load balancer to distribute the load to each of the VMs. Caches were built into the servers, but because the caches were distributed among four different VMs, they were never in sync.

“Suppose a user creates a purchase order pointing to one server, and the request goes to the next server,” says Atanu Sarkar, also a software engineer on the Microsoft Digital MyOrder team. “In that case, the user could search for a PO but not find it if the cache isn’t updated.”

Fewer resources, greater flexibility with Azure

According to MyOrder Engineering Manager Rajesh Vasan, the team considered several platforms for the new solution before landing on Microsoft Azure.

“We looked at a standalone, private cloud instance of Service Fabric and at Azure App Service,” Vasan says. “Azure was expanding, though. They were investing a lot of time in PaaS (platform as a service) offerings, which meant that we could offload all the networking, configurations, and deployments to Azure, and just concentrate on the application code.”

That would be a welcome change compared to the old monolith.

“A change to a single line of code used to take so much time, because you needed to build the whole solution from scratch with thousands of lines of code,” Vasan says. “Debugging presented similar challenges.”

The legacy app also supported external services like SAP (Microsoft’s financial system of record) and Microsoft Approvals, plus some third-party integrations.

“All that functionality, all those integrations in one monolith, that was a problem,” Vasan says.

By moving to Azure, they could convert each individual function and integration into a single microservice.

“Let’s say I want to change the tax code for a specific country,” Vasan says. “In Azure, I know there’s one microservice that does tax code validation. I go there, I change the code, I deploy. That’s it. It’ll hardly take a week.”

The same scenario in the old software, he says, would take a couple of months.

Migrating databases without downtime

Creating that experience required careful consideration as to how the team would maintain the legacy app while building the new one and migrating from one to the other.

“The first step was building a single source of truth,” Vasan says. “We wanted to put all that data in the cloud so we had a single source for all transactional purchase order data.”

After the team moved the data onto Azure, they built connectors for existing and new components.

“Both the legacy service, which was an Internet Information Services (IIS) web service, and the new service, which would be Azure API components and serverless components acting as individual microservices, would connect to a single source of truth,” Vasan says. “That was the first step.”

The team then needed to decide which microservices to build and which to start building first.

“It gets tricky here,” Vasan says. “Some users were accessing data from the old app, so we had to sync back onto the old one as well, up to the point that all users were no longer using the legacy service.”

The team built APIs to access data and key microservices such as search and the user interface (which they completely remodeled using Angular). Next, they focused on building microservices that were directly related to purchase order processing.

After the team built the core microservices, they started moving tenants to the new infrastructure. By this point, they had eliminated PSP and its database entirely.

“That was a big milestone for us because while we were migrating tenants, we were also working to move everything to the new database,” Vasan says.

At that point, there was no duplicate data.

“We had our single source of truth,” Vasan says. “The entire PO processing pipeline was in the cloud.”

The team then began one of the more challenging aspects of the project: they released one of the microservices with A/B testing in place.

“One of our microservices would call the other microservices and the old PSP in parallel,” Vasan says. “After the call went through both, we compared the results to make sure they were consistent. We flighted this in the production environment until we found and fixed all the issues. Then we went live.”

The next step was designing administration and configuration.

“We completely rewrote all that into the new areas, plus another eight or nine microservices,” Vasan says.

By then, MyOrder was 100 percent Azure, with no legacy components at all.

The purchase order solution has redesigned the monolithic platform into microservices, Azure functions and native cloud services.

The benefits of microservices

The MyOrder team leaned on several Azure offerings to create the new infrastructure, including Azure Data Factory, Azure Cache for Redis, Azure Cognitive Search, and Azure Key Vault. The new, modernized version of MyOrder consists of 29 Azure microservices that are “loosely coupled and follow the separation of concern principle,” Vasan says.

Like the POE (Proof of Execution Procedure) service for PDS (Procurement Data Source) migration example, the microservices architecture made modifying existing capabilities and adding new ones relatively easy. Because it’s built on Azure, it’s highly scalable, so adding new tenants is much simpler.

The team is most thankful, though, for the ease with which they can maintain compliance. Because all code was housed within a single, monolithic application prior to the migration, and because some services within that monolith were financial in nature, the entire application was, in effect, subject to the requirements of the Sarbanes-Oxley (SOX) act.

“With a monolith,” Vasan says, “the moment you deploy code to a server, the entire server has to be SOX compliant.”

Because the team migrated the system to Azure microservices, microservices that are financial in nature are now separated from those that aren’t.

“With monoliths, every change is a SOX change, so it has to go through multiple approvals before it can be deployed,” Vasan says.

Using microservices “means leaner, shorter audits because the audits only apply to the SOX components, not the entire platform,” he says.

Of the 29 new microservices, eight require SOX compliance, and 20 don’t.

“We used to have SOX issues. Now we don’t. We’re more compliant and audit-friendly because of moving to Azure,” Vasan says.

SOX requirements also led to performance issues.

“Maintaining SOX compliance requires to adhere strict approval and release process including any backend updates to data as well,” MyOrder software engineer Umesh says.

Building for the future

One of the tenants the team migrated is Microsoft Real Estate and Security (RE&S), which is responsible for the construction of new datacenters and office buildings at Microsoft. RE&S purchase orders can represent hundreds of millions of dollars in costs. Now that those POs go through the modern MyOrder infrastructure, RE&S has reduced costs by $1.75 million per year, thanks to retiring many now unnecessary servers and reduced operational costs.

Next, the team is focusing on moving MyOrder data into a data lake.

“There’s an overall investment in the Microsoft organization around data lakes right now,” Vasan says. “Azure has a data lake offering, of course, and we’re creating this single source of truth that people are using to build insights around POs. If you want to create a purchase order automatically through an API, for example, you can do that now.”

“There is a fantastic opportunity to optimize and incorporate intelligence in the system leveraging machine learning and it has been kicked off with integration of category classification model i.e., software model” MyOrder software engineer Dewraj says.

“Besides there are active conversation and efforts are being made to also leverage ML (machine learning) to optimize the compliance checks for enhancing the GDPR (General Data Protection Regulation) compliance.”

“Moving away from batch processing to real time APIs to reduce the onboarding time and Purchase Order Turn Around Time. For example, PET (Planning Execution Tracker) and PDS retirement (POE, PCC, and Account Code) data is exposed through real time APIs.”

“v-Payments will help business users to procure small purchases without going through supplier onboarding process and would require minimal approval and validation. The user would have the flexibility to purchase from any AMEX supplier using v-Payment credit cards.”

Those capabilities are a far cry from those of the massive, monolithic legacy system that the reborn MyOrder has replaced.

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post How Microsoft modernized its purchase order system with Azure microservices appeared first on Inside Track Blog.

Enabling advanced HR analytics and AI with Microsoft Azure Data Lake

Alex Fleck — Fri, 24 May 2024 15:17:22 +0000

We’re on a mission to transform our human resources systems here at Microsoft. To make it happen, we’re upgrading the way we use analytics and AI.

Our digital transformation has been a twofold journey.

First, we upgraded our core processes, providing efficient and effective self-service portals for our employees and powerful tools for our HR team using SAP SuccessFactors. Those processes include the nuts-and-bolts applications associated with human capital management (HCM): the employee portal, rewards, payroll, and other essential HR functions.

With the core processes in place, our Microsoft Digital Employee Experience (MDEE) team had everything they needed to revolutionize the data at the center of HR.

The architecture we chose? Microsoft Azure Data Lake.

[Explore all the ways that AI is driving Microsoft’s digital transformation. Learn how Microsoft is creating the digital workplace.]

Building a modernized HR data estate

When data is scattered across disparate systems, it’s difficult to provide agility, insights, and advanced analytics through AI. In today’s world of big data and predictive intelligence, these capabilities aren’t just a luxury. They drive talent conversations, workforce planning, and an improved employee experience that affects business outcomes.

The Microsoft Digital Employee Experience HR Data and Insights team, including Johnson Samuel, Harsh Raj Singh Thakur, and Mithun Manganahalli Goud, were instrumental in implementing a new architecture for HR analytics and business insights.

But when an enterprise’s data is siloed or fragmented, those outcomes are out of reach.

“What happens when you don’t have a modern data architecture?” asks Harsh Raj Singh Thakur, principal software engineering manager on the MDEE HR Data and Insights team. “You have a tedious and drawn-out process before you can retrieve your metrics. It’s a cumbersome task, it’s expensive, it’s not easy to maintain, and there’s a lot of cost to get it all done.”

To make HR insights more accessible and insightful, MDEE first had to assemble a unified and accessible data estate. Our SAP SuccessFactors implementation for core HR processes helped lay the groundwork by streamlining external and operational data to make them more organized and available for processing.

With modern core processes in place, MDEE engineers could turn their attention to data.

The journey to data transformation

Like all large-scale transformations, this one involved a great deal of complexity and multiple touchpoints. Microsoft Azure Data Lake provided the modern analytics platform that would not only enable the team to ingest, store, transform, and analyze the data, but also deliver simpler data discoverability, maintain data security, and ensure compliance.

The HR Data Lake’s business coverage delivers value across Microsoft’s entire people analytics ecosystem from employee-facing, self-service utilities to large-scale, future-oriented planning.

Unifying the data

Considering the wide array of HR systems at Microsoft, it was important to bring all the data together to give HR an end-to-end view of the employee lifecycle and the moments that matter in an employees’ journey. At the same time, the team took efforts to reduce redundant data copies across the enterprise.

The ease of use from actually having everything collocated in an Azure Data Lake makes it easy to build out connected insights. It’s the foundation of our modernization journey.

—Harsh Raj Singh Thakur, principal software engineering manager, Microsoft Digital Employee Experience HR Data and Insights

“Enabling connected insights which are trusted and secure through a modern data platform in Azure Cloud was a key goal as we set out to drive the digital data transformation in the HR ecosystem,” says Johnson Samuel, principal group engineering manager for MDEE’s HR Data and Insights team.

Multiple systems make up the HR ecosystem: Employee Central for core HR, iCIMS for applicant tracking, listening systems, rewards, CRM, employee learning, and more. While each of these systems serves an important purpose, the potential to unlock insights by unifying all of their data is immense.

“The ease of use from actually having everything collocated in an Azure Data Lake makes it easy to build out connected insights.” Raj Singh Thakur says. “It’s the foundation of our modernization journey.”

Azure Data Lake Storage Gen2 serves as the common storage layer, which ingests data through Azure Data Factory, messaging systems, and other sources. By properly defining storage structures and models, the team had made the first step toward a more modern data platform.

Expanding the data footprint with new metrics and scorecards

Ever-increasing volumes of data illustrated the need for advanced analytics. They were no longer a choice—they were a necessity.

“There are many lines of businesses within HR, like Global Talent Acquisition, Talent and Learning, and HR Services who manage HR operations,” Samuel says. “We’ve enabled new capabilities for each of these different HR functions.”

Key metrics across the ecosystem include the recruiting funnel, workforce, headcount, employee engagement, learning and development, and other functions across HR. The analytics apparatus uses a combination of Azure Synapse Analytics, Azure Analysis Services, and Power BI Shared Datasets, while Microsoft Power BI is responsible for visualization.

This powerful combination of technologies helped build complex analytics and drove consistency across teams. It also unlocked the ability to bring disparate metrics together to help determine correlation and causation between different factors.

Data governance

Next, the team needed to ensure that engineers and end users could access data in the lake safely and securely. Good governance keeps data access compliant because users can only request information that’s relevant to their roles. Driven by the HR Privacy team and enabled by a home-grown security and governance platform, MDEE established column-level security (CLS) on the Data Lake.

“When an HR team requests data, they get access to only the specific data set,” Raj Singh Thakur says. “So if you’re looking for an employee’s name and alias but your role doesn’t require you to know their salary, gender, or other aspects of their identity, you won’t get access.”

This approach makes sure we respect our employees’ privacy and that we comply with local laws that regulate how we use our data. Data governance also includes data discoverability, quality, and lineage functionality, which the team established through Microsoft Purview and in-house solutions to support more complex scenarios.

Modern engineering

Modernizing our data architecture is expanding what the company’s HR teams can do, says Dawn Klinghoffer, vice president of People Analytics at Microsoft.

MDEE also developed key platform capabilities that ensure high-quality and trustworthy data across the estate and drive engineering efficiency.

Whether the metric is headcount, performance management, employee learning, or any other area, each of them follows the architectural pattern of a Data Lakehouse, a system where all information resides in the Data Lake, without the need to build separate data marts. It allows our engineers to scale storage and compute independently for greater efficiency.

Between telemetry dashboards that help engineers understand system health and continuous optimization across code and infrastructure, this new architecture has helped save significant Azure costs—a reduction of around 50% over 2 years. Meanwhile, enabling agile development and DevOps is helping the team deliver iteratively and realize business value faster.

But the real value lies in the insights that unified, normalized data empowers.

“We’ve normalized the data by leveraging a company-wide taxonomy that we can use across other projects very easily,” says Mithun Manganahalli Goud, principal software engineer on MDEE’s HR Data and Insights team. “So from a data-delivery service standpoint, we can provide information to a wide range of downstream systems and data consumers.”

Building a platform for the future

While the new architecture is actively meeting current reporting needs, MDEE also looked toward the future.

We’ve created a rich content system where we can manage emerging requirements with the current data and metadata, so it’s future-ready. We already have the process in place, so we won’t have to go back and reinvent the wheel.

—Mithun Manganahalli Goud, principal software engineer, Microsoft Digital Employee Experience Data and People Analytics

The platform is capable of enabling deep insights that leverage machine learning. While today’s focus is on descriptive and diagnostic functions, the team is working toward predictive and prescriptive analytics through AI and machine learning.

“We’ve created a rich content system where we can manage emerging requirements with the current data and metadata, so it’s future-ready,” Manganahalli Goud says. “We already have the process in place, so we won’t have to go back and reinvent the wheel.”

When our HR team takes the next step into AI-driven insights, the foundations will already be in place.

Driving human-centered innovation with Microsoft Azure Data Lake

Our modernized data architecture has enhanced the HR teams’ capabilities. Better data immediacy means data pulls that used to take 24 hours now get done in a fraction of the time—around four to six hours. Similarly, the time it takes to enable self-service access for bring-your-own-compute data processing is rapidly falling.

One of the most unique and forward-thinking outcomes is that we’ve been able to combine qualitative with quantitative data. We’re able to create data models with our survey information as well as more quantitative data like attrition and diversity, then combine them in an aggregated, de-identified way to understand broad insights.

—Dawn Klinghoffer, vice president, People Analytics

But the most powerful outcomes are the cross-category, cross-disciplinary insights that unified and accessible data provides for HR leaders.

“One of the most unique and forward-thinking outcomes is that we’ve been able to combine qualitative with quantitative data,” says Dawn Klinghoffer, vice president of People Analytics at Microsoft. “We’re able to create data models with our survey information as well as more quantitative data like attrition and diversity, then combine them in an aggregated, de-identified way to understand broad insights.”

The more people interact with the data, the more it will lead to deeper questions and better insights to drive their business or Microsoft as a whole.

—Patrice Pelland, partner group engineering director, Microsoft Digital Employee Experience

For example, by combining sentiment data with de-identified calendar and email metadata, we’ve been able to quantify the impact of blocking focus time on employees’ perception of work-life balance.

Focusing on self-service gives HR practitioners important flexibility, says Patrice Pelland, partner group engineering director for MDEE.

“Making data available to all people in a self-service, consumable way gives them the opportunity to ask the questions they don’t even know they have,” says Patrice Pelland, partner group engineering director for MDEE. “The more people interact with the data, the more it will lead to deeper questions and better insights to drive their business or Microsoft as a whole.”

Those questions and insights have already led to human-centered improvements and innovations. One example is the wide adoption of team agreements that empower employees to collectively self-determine the work modes that serve them best. HR’s work has even informed some of the “nudge” product features for employee experience tools like Microsoft Viva, for instance, recommending focus blocks to improve productivity and overall work-life balance—a metric that’s currently on the rise across Microsoft.

Ultimately, the more people who have access to high-quality, trustworthy data, the more we can provide a world-class experience for all employees.

“There’s a lot of envisioning based on the services that we’ve been building that people didn’t even think could exist,” Pelland says. “We’re building the foundational layers to offer things that will be truly transformational for the HR business. Whatever size your organization is, and whichever HCM you use, with Azure, you can do what we’re doing right now.”

The gold standard should be unity between transactional tools and data tools.
Start from an understanding that it’s about people and ground your work in that.
Think big but think holistically; start with a goal and work toward it iteratively.
Consider the experiences that will delight your end users.
Start from how you’re going to use the data, then work backward.
Collaborate early and often. Otherwise, preconceived notions can creep in.

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Enabling advanced HR analytics and AI with Microsoft Azure Data Lake appeared first on Inside Track Blog.