Microsoft Azure Archives - Inside Track Blog http://approjects.co.za/?big=insidetrack/blog/tag/microsoft-azure/ How Microsoft does IT Thu, 09 Apr 2026 16:06:38 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 137088546 Olutunde Makinde: From Lagos to Redmond, a Microsoft IT engineer’s journey http://approjects.co.za/?big=insidetrack/blog/olutunde-makinde-from-lagos-to-redmond-a-microsoft-it-engineers-journey/ Thu, 02 Apr 2026 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=22855 A career in Microsoft Digital, the company’s internal IT organization, puts employees at the center of one of the world’s most complex and forward‑leaning enterprise environments. This is the team that runs Microsoft on Microsoft technology and services—maintaining more than a million computing devices, enabling global collaboration, and shaping the employee experience for more than […]

The post Olutunde Makinde: From Lagos to Redmond, a Microsoft IT engineer’s journey appeared first on Inside Track Blog.

]]>
A career in Microsoft Digital, the company’s internal IT organization, puts employees at the center of one of the world’s most complex and forward‑leaning enterprise environments. This is the team that runs Microsoft on Microsoft technology and services—maintaining more than a million computing devices, enabling global collaboration, and shaping the employee experience for more than 200,000 people.

To accomplish these huge tasks, it’s essential to cultivate a range of perspectives, expertise, and lived experiences.

Olutunde Makinde is an example of this.

A photo of Makinde.

“A friend once laughed at me back in college when I said I wanted to work at Microsoft, like it was impossible. But I knew I could achieve the impossible if I could just be focused. I never gave up.”

Olutunde Makinde, senior service engineer, Microsoft Digital

Makinde, a senior service engineer in Microsoft Digital, came to the company the long way around—roughly 7,000 miles away from the Redmond, Washington, headquarters, in fact. He’s originally from Lagos, Nigeria.

As a global organization, Microsoft builds teams where people with different experiences and life journeys actively influence how products, services, and internal platforms are designed. Makinde, commonly known around the office as “Tunde” (“rhymes with Sunday,” he notes), embodies that diverse approach, bringing his unique insights and experiences to critical work at the company.

“A friend once laughed at me back in college when I said I wanted to work at Microsoft, like it was impossible,” Makinde says. “But I knew I could achieve the impossible if I could just be focused. I never gave up.”

Launching an IT career in Nigeria

Makinde’s journey to Microsoft began with earning a degree in computer engineering in Lagos, after which he found work as a network engineer. He spent the next several years developing his skills through certifications and other learning opportunities.

“I did a lot of self-paced training, learning how to configure Cisco routers. Eventually I became a Cisco-certified network professional (CCNP),” Makinde says. “Around that time, I had a friend who was preparing for Windows Server 2008 certifications, and through his study materials I started learning more about Microsoft and its products.”

Makinde’s first direct encounter with Microsoft came in 2014, when the company he worked for received a contract to deploy the first Microsoft Azure cloud installation in Nigeria.  

“I spent the last day of 2014 and the first day of 2015 at the customer site, figuring out how to connect their on-premises network to Azure,” Makinde says. “It had never been done before in Nigeria, and taking up that challenge really propelled me into the world of Microsoft-specific technology.”

From there, Makinde set his sights on a career at Microsoft. He parlayed his initial exposure to cloud architecture into a focus on Azure, as well as Amazon Web Services. After spending some time in the United Kingdom, he achieved his goal when he was hired by the Microsoft Digital team in 2022. He moved to the United States in 2025.

He credits support from his family, especially his wife, with helping him achieve his dreams.

“My wife was a pillar of support through every career transition, from Nigeria to the UK to the United States,” Makinde says. “She believed in me when I faced rejections, celebrated with me when I finally got the offer, and now keeps me grounded whenever work gets intense. I couldn’t have made this journey without her.”

Making an impact from day one

Kathren Korsky, a principal technical program manager in Microsoft Digital and Makinde’s hiring manager, remembers the impression he made right away. It was clear that Makinde’s experience and technical background were major assets.

“What caught my attention was how well-prepared he was for the conversation and how well he communicated,” Korsky says. “The stories he shared about his work with Azure deployment in Nigeria really drew my interest. But I was also intrigued by how he was able to bridge technology with the business world, working with different banks across the continent to gather requirements, understand them, and build solutions.”

Upon being hired at Microsoft, he initially worked remotely from the UK on a Redmond-based device and application management team. The team was looking to deploy Cloud PC internally and needed a system in which employees could request access and get approvals to use Cloud PCs.

“He was able to stand up a full Power Automate workflow within a short period, and with a very high degree of quality,” Korsky says. “Rarely did anyone find any defects or bugs in his system.”

Makinde’s designs drove value moving forward as well, as the team made updates to his initial workflows.

A photo of Korsky

“His design was so strong that we were basically able to follow exactly what he had created in Power Platform and build that exact same design in ServiceNow. It really expedited that whole process.”

Kathren Korsky, principal technical program manager, Microsoft Digital

ServiceNow was more commonly used for systems that involved access requests and approvals, but when a platform update from Power Automate was initiated the team found Makinde’s original design was durable enough to weather the shift.

“His design was so strong that we were basically able to follow exactly what he had created in Power Platform and build that exact same design in ServiceNow,” Korsky says. “It really expedited that whole process.”

Driving efficiency and managing change

Since moving to the United States to work at company headquarters, Makinde has continued to push important projects forward—working with different stakeholders to deploy policy changes across Microsoft, managing the Change Advisory Board (CAB) intake process, and driving configuration updates for security and first-party product deployments.

“There’s a lot of diligence required to see the edge cases happening, to pay attention to them, and to watch out for potential problems. Tunde stops rollouts regularly to flag potential defects or risks, which prevents issues from interrupting our work and reducing productivity.”

Jeff Duncan, principal service engineering manager, Microsoft Digital

Makinde learned how to assess change requests and understand risk profiles, as well as enforce best practices for managing change within the security environment. Within about a year, he was able to take the lead in the space and own the deployment process.

A single misconfigured policy can cause major disruption. Makinde’s role puts him in position to be the checkpoint that prevents incidents before they happen.

“There’s a lot of diligence required to see the edge cases happening, to pay attention to them, and to watch out for potential problems,” says Jeff Duncan, principal service engineering manager in Microsoft Digital and Makinde’s manager. “Tunde stops rollouts regularly to flag potential defects or risks, which prevents issues from interrupting our work and reducing productivity.”

Softer skills like transparency, collaboration, and clear communication across levels and teams are key aspects of Makinde’s work as well.

“Tunde is thoughtful and detail-oriented, and he’s very good at explaining the decision-making process when he provides overviews for leadership,” Duncan says. “There’s rational, logical reasoning behind the decisions he makes.”

Makinde has implemented new efficiencies in how he manages the CAB and deployment service using AI. This includes CABBIE—an AI-powered agent that automates CAB communications. For Intune deployments, he uses AI to streamline deployment coordination and package reviews. These innovations reflect our Customer Zero approach to AI adoption here in Microsoft Digital.

“We run weekly CAB meetings to review change requests. That comes with a lot of communication work — status updates, follow-ups, coordination with stakeholders. It was all manual,” Makinde says. “CABBIE pulls the data from Azure DevOps, generates the emails, updates requests, and logs approvals automatically. It saves time and reduces errors.”

Success at Microsoft Digital: Aptitude and curiosity

As the organization at the center of the company’s own digital transformation, we in Microsoft Digital function as a living showcase of what’s possible with Microsoft technology. Our team tests new capabilities at enterprise scale as Customer Zero for Microsoft, identifying gaps and providing insights to ensure our customers benefit from what we’ve learned.

Because the impact of Microsoft Digital extends far beyond internal systems, team members have to set the standard for digital excellence. They must demonstrate what enterprise transformation looks like in practice and empower customers with the confidence to pursue their own modernization journeys.

 Hiring talented people like Makinde is essential to this mission.

“There are three core traits I look for when hiring—aptitude, attitude, and curiosity,” Korsky says. “Aptitude is not only what you currently know, but your propensity and desire to learn and grow those skills. Attitude goes hand in hand with that—are you willing to demonstrate grit and perseverance? And then curiosity, because so much of what we do from an innovation perspective requires a willingness to challenge assumptions and think of completely new ways of doing things.”

Makinde’s journey here at Microsoft Digital embodies and illustrates the company’s larger story: how technical expertise, innovative thinking, and a commitment to continuous learning combine to deliver world-class results.

“I’m now up to 25 certifications, and I continue to learn how to do more at Microsoft to positively impact the organization and protect our employees’ experience across applications and devices.”

Olutunde Makinde, senior service engineer, Microsoft Digital

That attitude of persistent curiosity and the willingness to keep learning continue to fuel Makinde’s experience at Microsoft. 

“Self-improvement is a way of life for me that has driven my career forward,” Makinde says. “At an early stage in my career, I did a lot of self-training—from learning how to configure Cisco routers and switches, to migrating on-premises workloads to Azure and managing cloud resources. I’m now up to 25 certifications, and I continue to learn how to do more at Microsoft to positively impact the organization and protect our employees’ experience across applications and devices.”

Key takeaways

Olutunde Makinde’s career experience here in Microsoft Digital offers some important insights that you can apply to your own organizational development:

  • AI adoption starts with practical problems. Makinde’s use of AI to streamline CAB communications and deployment coordination shows how Customer Zero teams find real-world applications for emerging technology.
  • Different experiences and perspectives contribute to business success. Achieving ambitious goals as an organization is dependent upon attracting talented people like Makinde from a range of backgrounds, disciplines, and lived experiences.
  • Strong technical skills paired with innovative thinking drives value. Makinde’s contributions to flexible cloud deployment workflows are an example of how this combination pays dividends.
  • Proactive risk management and attention to detail can prevent large-scale disruptions. By being willing to stop rollouts and flag risks before they become problems, Makinde’s approach to his work exemplifies how thoughtful decision-making safeguards productivity and security.
  • Persistence, curiosity, and continuous learning are critical career accelerators. Having a long and successful career at a company like Microsoft goes beyond just technical aptitude; it also requires perseverance and a passion for learning. Makinde’s self-driven training efforts and his refusal to give up have enabled him to achieve what once seemed impossible.

The post Olutunde Makinde: From Lagos to Redmond, a Microsoft IT engineer’s journey appeared first on Inside Track Blog.

]]>
22855
Protecting anonymity at scale: How we built cloud-first hidden membership groups at Microsoft http://approjects.co.za/?big=insidetrack/blog/protecting-anonymity-at-scale-how-we-built-cloud-first-hidden-membership-groups-at-microsoft/ Thu, 26 Feb 2026 17:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=22465 Some Microsoft employee groups can’t afford to be visible. For years, we supported email‑based communities internally here at Microsoft whose very existence depends on anonymity. These include employee resource groups, confidential project teams, and other sensitive audiences where simply revealing who belongs can create real‑world risk. Traditional distribution groups make membership discoverable by default. Owners […]

The post Protecting anonymity at scale: How we built cloud-first hidden membership groups at Microsoft appeared first on Inside Track Blog.

]]>
Some Microsoft employee groups can’t afford to be visible.

For years, we supported email‑based communities internally here at Microsoft whose very existence depends on anonymity. These include employee resource groups, confidential project teams, and other sensitive audiences where simply revealing who belongs can create real‑world risk.

Traditional distribution groups make membership discoverable by default. Owners can see members. Admins can see members. In some cases, other users can infer membership through directory queries or tooling.

That model doesn’t work when anonymity is a requirement.

A photo of Reifers.

“When the SFI wave hit, it was made clear to us that we needed to keep our people safe, and to do that, we needed to build a new hidden memberships group MVP. We needed to raise the bar with modern groups, and we needed to do it in six months or miss meeting our goals.”

Brett Reifers, senior product manager, Microsoft Digital

For over 15 years, we relied on a custom, on‑premises solution that enabled employees to send and receive messages through groups with fully hidden memberships.

The system worked, but we were deprecating the Microsoft Exchange servers that it ran on. At the same time, we were also deploying our Secure Future Initiative (SFI), which required us to reassess legacy systems that could expose sensitive data or slow incident response, including hidden membership groups.

The system wasn’t broken, but it represented concentrated risk simply by existing outside our modern cloud controls and monitoring.

“When the SFI wave hit, it was made clear to us that we needed to keep our people safe, and to do that, we needed to build a new hidden memberships group MVP,” says Brett Reifers, a product manager in Microsoft Digital, the company’s IT organization. “We needed to raise the bar with modern groups, and we needed to do it in six months or miss meeting our goals.”

The mandate was clear. Preserve anonymity, eliminate on‑premises dependencies, and do it quickly.

A photo of Carson.

“Our solution would enable us to deprecate our legacy on-premises Exchange hardware while maintaining the privacy of our employee groups, and it would do so in a cloud-first manner.”

Nate Carson, principal service engineer, Microsoft Digital

Instead of retrofitting hidden membership into standard Microsoft 365 groups, we asked a different question: What if the group lived somewhere else entirely? What if users interacted with a simple, secure front end, while all membership expansion and mail flow occurred in a locked‑down tenant built specifically for this purpose?

That idea became the foundation for Hidden Membership Groups: A new cloud‑first architecture that would separate user experience, leverage first‑party Microsoft services, and keep our group memberships hidden from everyone—including owners and administrators—by design.

“Our solution would enable us to deprecate our legacy on-premises Exchange hardware while maintaining the privacy of our employee groups, and it would do so in a cloud-first manner,” says Nate Carson, a principal service engineer in Microsoft Digital.

Once we settled on a solution, our next step was to get support for solving a problem not many people thought much about.

“Not everyone was aware of how serious of a situation we were in,” Carson says. “We had to show everyone what was at stake, and to share our solution with them.”

After taking their plan on the road, the team got the buy in it needed, and that’s when the real work started.  

Planning to solve business problems with security built-in

Before we designed anything, we had to be clear about what success meant.

Hidden Membership Groups aren’t just another collaboration feature. They support scenarios where anonymity wasn’t optional—it’s foundational. That reality shaped every requirement that we built into our solution, including:

1. Absolute privacy

Group membership couldn’t be immediately visible to users, group owners, or administrators–under any circumstances. That requirement immediately ruled out standard group models.

2. Cloud only

Any new solution had to live entirely in our cloud, use first‑party services, and align with modern identity, security, and compliance practices. On‑premises infrastructure wasn’t an option.

3. Scale

Some groups had a handful of members. Others had tens of thousands. Membership changed frequently, and those changes had to propagate safely and predictably without exposing data or degrading performance.

4. Separation of concerns

User interaction and membership truth couldn’t live in the same place. Employees needed a simple way to discover groups, request access, and manage participation, without ever interacting with the system that stored or expanded membership.

5. Self‑service with guardrails

The solution needed to reduce operational overhead, not introduce a new bottleneck. Group lifecycle management had to be automated, auditable, and secure, while still giving teams flexibility.

6. Simple to use

Employees shouldn’t need special training. They shouldn’t need to understand tenants, identity synchronization, or mail routing. The experience needed to be intuitive, consistent, and accessible—without compromising security.

Once those requirements were clear, our solution started to emerge. Incremental changes wouldn’t be enough. A traditional group model wouldn’t work. The solution required a new architecture—one designed around isolation, automation, and intentional limitation.

That’s when we started the engineering work.

Creating a cloud-first architecture

Designing for hidden membership meant eliminating ambiguity. If any surface could reveal membership, even indirectly, it didn’t belong in the design.

That constraint led us toward a model built on strict isolation, explicit APIs, and intentionally narrow interfaces. The result is straightforward to use, but deliberately difficult to interrogate.

Two tenants, with sharply separated responsibilities

At the foundation of the solution is a two‑tenant model.

Our primary Microsoft 365 tenant is where employees authenticate, discover groups, and initiate actions. A secondary, isolated tenant hosts the distribution lists and performs mail expansion for Hidden Membership Groups.

A photo of Mace.

“Tenant isolation is what makes the privacy guarantee real. By moving membership expansion to a tenant that users and owners can’t access, we removed the possibility of accidental exposure. The system simply doesn’t give you a place where membership can be seen.”

Chad Mace, principal architect, Microsoft Digital

That separation matters because the secondary tenant isn’t designed for interactive use. Only Exchange and the minimum directory constructs required for mail routing and expansion are enabled.

Operationally, when an employee sends email to a Hidden Membership Group, they send to a mail contact visible in the corporate tenant. That contact routes to the corresponding distribution group in the isolated tenant, where membership expansion occurs. Expanded messages are then delivered back in recipients’ inboxes in the corporate tenant, so sent and received mail lives where users already work.

“Tenant isolation is what makes the privacy guarantee real,” says Chad Mace, a principal architect in Microsoft Digital. “By moving membership expansion to a tenant that users and owners can’t access, we removed the possibility of accidental exposure. The system simply doesn’t give you a place where membership can be seen.”

Identity without interactive access

This isolated tenant only works if it can resolve recipients. To enable that, our development team used Microsoft Entra ID multi‑tenant organization identity sync to represent corporate users in the secondary tenant.

These identities are treated as business guest identities, and we disable sign‑in to prevent interactive access. The tenant can perform expansion, but nothing more.

However, complete isolation wasn’t technically possible. Privileged access always exists at some level. The design response was to minimize that exposure. Access to the isolated tenant is tightly restricted, and membership changes flow through automation rather than broad UI-based administration.

The goal: reduce exposure to the smallest viable operational group.

API-first automation as the control plane

With tenancy and identity model established, the team needed a single, consistent way to create groups, connect objects across tenants, and manage changes without introducing new administrative workflows. That’s where the APIs come in.

A photo of Pena II.

“We split the backend into multiple APIs so the system could scale without becoming fragile. That let us separate everyday operations from high-volume membership work and keep performance predictable.”

John Pena II, principal software engineer, Microsoft Digital

The backend is intentionally modular, split into three distinct APIs:

  • The control API handles group creation, configuration, and cross‑tenant coordination.
  • The membership API handles standard add and remove operations.
  • The bulk membership APIs handle large‑scale operations involving tens of thousands of users, with services designed to run long‑lived jobs, manage throttling, and recover from partial failures.

“We split the backend into multiple APIs so the system could scale without becoming fragile,” says John Pena II, a principal software engineer in Microsoft Digital. “That let us separate everyday operations from high-volume membership work and keep performance predictable.”

The APIs run as PowerShell-based Azure Functions and use managed identity patterns, including federated identity credentials, to securely connect across tenants.

Creating the user experience with PowerApps

For the front end, we built a Canvas app in Power Apps, backed by Dataverse. The goal was speed and flexibility, without compromising strict privacy boundaries.

By using Power Apps as the primary interaction layer, we deliver a secure, modern experience without unnecessary custom infrastructure. The Canvas app provides a single, focused surface for discovering, joining, and managing hidden membership groups, while all sensitive operations remain behind controlled APIs and tenant boundaries. This separation allows the team to iterate quickly on experience design without weakening the privacy guarantees that the solution depends on.

Power Platform also simplifies how security is being enforced across the solution. Dataverse enables fine‑grained, role‑based access, ensuring users only see data they’re entitled to see—while keeping sensitive membership information entirely out of the client layer. That reduces long‑term maintenance overhead and makes it easier to evolve the solution as requirements change.

“From the beginning, we designed everything with security roles and workflows in mind,” says Shiva Krishna Gollapelly, senior software engineer in Microsoft Digital. “Dataverse let us control who could see or change data without building additional APIs or storage layers, and keeping everything inside the Power Apps ecosystem saved us a lot of maintenance over time.”

Dataverse plays a precise role here: it maintains the datastore the app needs to function without becoming a secondary membership repository.

A photo of Amanishahrak.

“Using the Power Platform let us move fast, integrate deeply with Microsoft identity, and enforce security without building a full web stack from scratch.”

Bita Amanishahrak, software engineer II, Microsoft Digital

From a security posture perspective, Dataverse security is used intentionally to restrict what different users can see and do, and the Power App was developed with security roles and workflows in mind.

Short version: the app brokers intent, the APIs execute it, and all the pieces that need to stay separate do exactly that.

“Using the Power Platform let us move fast, integrate deeply with Microsoft identity, and enforce security without building a full web stack from scratch,” says Bita Amanishahrak, a software engineer in Microsoft Digital.

The architectural intent is consistent throughout—isolate the sensitive plane and ensure the user plane operates only through controlled interfaces.

Benefits and impact

The most important outcome of the new architecture is also the simplest: Hidden membership stays hidden.

Anonymity isn’t enforced by policy. It’s enforced by architecture. Membership data never appears in the user experience or administrative tooling, and it doesn’t surface as a side effect of scale.

“We’re no longer asking people to trust that we’ll handle sensitive membership carefully through process,” Reifers says. “The system makes exposure structurally impossible.”

The impact was immediate.

At launch, we migrated more than 2,200 hidden membership groups, representing over 200,000 users, from the legacy on‑premises system into the new cloud‑first architecture. Groups ranged from small, tightly controlled communities to audiences with tens of thousands of members, all supported without special handling.

“Some of these groups are massive,” Pena says. “We knew from the beginning we were dealing with memberships in the tens of thousands, which is why we designed bulk operations as a first‑class capability instead of an afterthought.”

The separation between routine APIs and bulk‑membership APIs proved critical, enabling large migrations and ongoing changes without degrading day-to-day performance.

Operationally, moving to a cloud‑only model reduced both risk and complexity. Decommissioning the on‑premises Exchange infrastructure eliminated specialized maintenance requirements and improved monitoring, auditing, and access controls alignment with our modern cloud standards.

Delivery speed also mattered. Driven by Secure Future Initiative urgency and strong executive sponsorship, the team designed and delivered a minimum viable product in less than six months.

“That timeline forced discipline,” Reifers says. “We focused on what mattered: Security, privacy guarantees, scale, and a UX that wouldn’t disrupt group owners and/or members that had relied on a 15-year old tool.”

Everything else was secondary.

A photo of Gollapelly.

“Most users never think about tenants or APIs. They just see a clean experience that does what they need, without exposing anything it shouldn’t.”

Shiva Krishna Gollapelly, senior software engineer, Microsoft Digital

From an employee perspective, the experience became simpler and safer. Users now interact through a Power Platform app consistent with the rest of Microsoft 365.

Discovering a group, requesting access, or leaving a group no longer requires understanding the architecture behind it.

“Most users never think about tenants or APIs,” Gollapelly says. “They just see a clean experience that does what they need, without exposing anything it shouldn’t.”

The result is sustainable. The platform protects anonymity at scale, simplifies operations, boosts resiliency, and can evolve without reopening core privacy questions.

Moving forward

Delivering the initial solution was only the beginning.

The team sees Hidden Membership Groups as more than a single solution. It’s a reusable pattern for sensitive collaboration in a cloud‑first world: isolate what matters most, automate everything else, and design experiences that don’t require trust to be safe.

As adoption grows, the team plans to support additional anonymity-sensitive scenarios while maintaining the same underlying model.

“We don’t want every sensitive scenario inventing its own workaround,” Mace says. “This gives us a pattern we can reuse confidently.”

Future priorities include improving lifecycle and ownership experiences, strengthening auditing and reporting for approved administrators, and enhancing self‑service workflows—without compromising membership privacy. If it risks exposing membership, it doesn’t ship.

With the legacy system fully retired, Reifers reflects on what the team accomplished to get here.

“We shipped a new enterprise pattern in six months using our first party tools,” Reifers says. “We achieved this because a stellar team cared about the mission. That’s the takeaway.”

Key takeaways

Use these tips to strengthen your privacy, simplify your operations, and future-proof your organization’s collaboration systems:

  • Prioritize privacy by design. Embed privacy considerations from the start to protect sensitive information in all collaboration scenarios.
  • Architect for scale. Treat bulk operations to support large groups efficiently as a first-class capability.
  • Automate and modernize workflows. Replace legacy systems with cloud-native solutions to reduce risk, improve transparency, and enable continuous improvement.
  • Streamline user experience. Provide intuitive, consistent interfaces that make it easy for users to access, join, or leave groups without requiring technical knowledge.
  • Enforce strict access and auditing controls. Align monitoring and administration with modern cloud standards to maintain security and accountability.
  • Create reusable patterns. Establish and share successful privacy patterns to avoid reinventing solutions for each new case.
  • Focus on operational simplicity and resilience. Design systems that are easy to maintain and improve, freeing up teams to concentrate on innovation rather than upkeep.

The post Protecting anonymity at scale: How we built cloud-first hidden membership groups at Microsoft appeared first on Inside Track Blog.

]]>
22465
Powering data governance at Microsoft with Purview Unified Catalog http://approjects.co.za/?big=insidetrack/blog/powering-data-governance-at-microsoft-with-purview-unified-catalog/ Thu, 05 Feb 2026 17:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=22272 Data fuels everything that we do here at Microsoft, from the daily operations that keep the business running to the innovations that shape the future. But as data sprawls across teams, systems, and borders, the task of ensuring that it remains secure, accurate, and well-governed is a daunting one. A sound approach to data governance […]

The post Powering data governance at Microsoft with Purview Unified Catalog appeared first on Inside Track Blog.

]]>
Data fuels everything that we do here at Microsoft, from the daily operations that keep the business running to the innovations that shape the future.

But as data sprawls across teams, systems, and borders, the task of ensuring that it remains secure, accurate, and well-governed is a daunting one. A sound approach to data governance is the backbone of responsible data use across the enterprise, creating clarity around data ownership and access.

In an organization the size of Microsoft, no single team can carry this responsibility on its own. Effective data governance must be a distributed effort across all departments and functions.

This story explains how our marketing organization uses the Microsoft Purview Unified Catalog to organize and standardize the data we rely on daily. By putting clear ownership, consistent definitions, and reliable governance in place, we’re turning fragmented, unreliable data into an advantage that supports faster decisions and more effective campaigns.

Data governance at scale

As companies grow, their data governance becomes increasingly complex, with different teams creating their own versions of key data concepts, often without realizing it. The complexity is most visible in the way users across an organization define foundational terms.

A photo of Doughty.

“We found adoption to be much easier when helping teams focus on building more value in their data instead of driving governance like a compliance effort.”

Nick Doughty, senior product manager, Microsoft Purview Unified Catalog

Examples in marketing include what counts as a customer (active vs. inactive, marketing- or sales-qualified), what constitutes sensitive data (personally identifiable information, behavioral data, partner data), and what a metric means (conversion, engagement, attribution windows).

When inconsistent practices take hold, ownership becomes murky. With the increasing demands that managing data quality and integrity put on our leaders and their teams, effective data governance becomes one more hurdle to productivity.

“We started off implementing data governance like an issue register,” says Nick Doughty, a senior product manager within Microsoft Purview Unified Catalog. “Then we progressed to more of an enforcement method, similar to how we were doing security at the time. We found that when we started to push really hard on teams, similar to how we drove other compliance efforts, it was difficult for them to justify or understand why they would want the added governance.”

The introduction of Microsoft Azure Purview in 2020 marked a turning point.

A united platform for data governance, security, and compliance, Purview helps organizations understand, protect, and manage data across environments. It also addresses fragmented data, lack of visibility into where sensitive data lives and how it moves, compliance complexity with regulations (including GDPR and HIPAA), and security risks.

A photo of Mathur

“Our marketing teams used to spend hours hunting for the right customer list because multiple versions lived in different locations, each with unclear owners and inconsistent labels. Now our marketers can trust they are working from current information, while avoiding compliance risks associated with incorrect or unauthorized data.”

Sourabh Mathur, principal engineering lead, Global Marketing Engines and Experiences

The Purview Unified Catalog serves as the AI-powered backbone, automatically discovering, classifying, and organizing information so users can easily find and trust the data they need.

By launching the unified catalog, we gave our users a consistent way to understand and use their data, while reinforcing strong governance and compliance practices. The result is data that’s more discoverable, reliable, and actionable. (The product was renamed Microsoft Purview in 2022 and became part of Microsoft 365 compliance tools.)

“Our marketing teams used to spend hours hunting for the right customer list because multiple versions lived in different locations, each with unclear owners and inconsistent labels,” says Sourabh Mathur, a principal engineering lead in Global Marketing Engines and Experiences, who helped set up Purview for our marketing organization.

With the unified catalog in place, Purview surfaces the dataset, shows its lineage, and applies the correct sensitivity classifications.

“Now our marketers can trust they are working from current information, while avoiding compliance risks associated with incorrect or unauthorized customer data,” Mathur says.

Powering marketing at Microsoft with Purview

With more than 200 Microsoft Azure subscriptions, our marketing organization manages one of the largest data estates at the company. The team faces the constant challenge of scattered data, unclear data ownership, and inconsistent governance practices that slow down campaigns and increase compliance risk.

A photo of Biswal.

“Marketing can now scale governance across hundreds of data products, support self-service data collection with guardrails, automate access decisions, and enable AI workloads on trusted data.”

Deepak Kumar Biswal, principal software engineering lead, Global Marketing Engines and Experiences

By adopting Purview, our marketing team gained unified visibility, clearer classification standards, and smoother collaboration with other departments, like IT and legal. This reduces friction while strengthening data protection.

The result is an organization that moves faster with greater confidence in how it handles customer and campaign data.

Instead of relying on legacy knowledge, forcing users to dig through different servers and SharePoint sites, or constantly sending queries to the engineering teams, our marketing professionals can now explore the curated Purview Unified Catalog, making streamlined, efficient data discovery possible.

“Marketing can now scale governance across hundreds of data products, support self-service data collection with guardrails, automate access decisions, and enable AI workloads on trusted data,” says Deepak Kumar Biswal, a principal software engineering lead in Global Marketing Engines and Experiences. “Purview turns responsible data use into everyday practice, not extra work.”

Data governance and security: Two sides of the same coin

For our marketing organization, data governance and security are inseparable concepts. As soon as you have customer information, you need to make sure it’s secure—sensitive data must be carefully defined, consistently managed, and protected from misuse or breach.

Purview supports this goal by combining governance capabilities with security and compliance controls that provide added layers of protection.

Within marketing, the governance and security teams work closely together. Good governance measures ensure our data is properly defined and standardized, while strong security policies ensure it’s handled with proper safeguards. By pairing governance with strong security practices, our marketing team can remain compliant with data privacy laws, prevent misuse of sensitive information, and foster trust across their organization.

When our marketing team began its Purview journey five years ago, it adopted a centralized governance model. Much like the structure of a government—where federal, state, and local entities each play a role—our approach allows both centralized standards and local autonomy. This creates consistency across the organization without stifling agility.

Our Data Governance team took on the role of steward, defining standards, onboarding systems, and collaborating with its IT partners to connect data environments. Existing assets like data dictionaries and process flows were used to seed the catalog, ensuring the team started from known ground rather than reinventing definitions from scratch.

This deliberate, incremental approach allowed our marketing team to thoughtfully build out healthy governance practices. By moving slowly, the team learned from each step on its journey, refining processes and establishing consistent practices as it moved along.

For example, working closely with our team in Microsoft Digital allowed them to experiment with different ways of discovering and cataloging their data. This involved taking learn and refine how Purview tuned their data before they rolled anything out broadly.

Our goal is to transition to a completely federated model in which responsibility shifts outward. Rather than the marketing governance team doing all the stewardship, individual groups will take ownership of their data within Purview. This shift distributes accountability, embeds governance deeper into daily operations, and makes it easier for teams to monitor data quality and enforce standards on their own.

Impact across the enterprise

Since adopting Purview Unified Catalog, we’ve seen tangible results across our data estate and our data governance practices in marketing and across all verticals within the company. Here are some companywide highlights:

  • Better consolidation: We’ve unified five catalogs into one.
  • Increased scale: We added 250 data sources onboarded in six months, representing roughly 10 million assets.
  • Higher internal adoption: We set up more than 50 governance domains, an effort we supported with reusable training assets, guides, and onboarding materials.

The benefits also include and extend beyond marketing:

  • Teams across the company are gaining increased confidence in their data definitions.
  • Compliance and privacy obligations are being met more effectively.
  • Business value is being generated through better, more trusted use of data.
  • Organizations are benefiting from faster time-to-insight.

Launching the marketing governance domain

We’re using Purview to combine essential capabilities like data governance, classification, and quality checks across our Microsoft services, which creates a unified foundation for our enterprise-wide metadata management. These unified capabilities make Purview an indispensable tool for us, and for large-scale enterprises.

A photo of Singh

“With various role types like data curator and data reader, we can add more visibility into our data—where it lives, how it’s being used, and who are its primary owners. Clearly defining these parameters helps us use the data governance framework as a starting point and improve our data governance capabilities.”

Vinny Singh, principal program manager, Global Marketing Engines and Experiences

As early adopters of Purview Unified Catalog, the group launched the Marketing Governance domain, registering more than 200 data products using the Unified Catalog’s data map.

The products, spanning various datasets, are aligned with strict internal governance standards. This gives marketing the ability to govern, classify, and track data across its ecosystem—ensuring adherence to GDPR and other regulatory compliance measures.

“With various role types like data curator and data reader, we can add more visibility into our data—where it lives, how it’s being used, and who are its primary owners,” says Vinny Singh, a principal program manager in Global Marketing Engines and Experiences. “Clearly defining these parameters helps us use the data governance framework as a starting point and improve our data governance capabilities.”

Key takeaways

Our journey with Microsoft Purview Unified Catalog has generated key insights that you can apply to your own data governance efforts. These include:

  • Start small: Don’t try to “boil the ocean.” Begin with three to five governance domains and scale from there.
  • Leverage what you have: Data dictionaries, glossaries, and existing documentation provide a strong starting point for a governance platform founded on the Purview Unified Catalog.
  • Focus on value, not enforcement: Governance resonates when teams see how it helps them, not when it’s mandated.
  • Adapt to your organization: Each team at your company will use Purview differently. Flexibility helps encourage adoption.
  • Build community: Data governance is not a solo effort. Collaboration among stakeholders produces stronger standards and better results.

The post Powering data governance at Microsoft with Purview Unified Catalog appeared first on Inside Track Blog.

]]>
22272
Moving from a ‘Scream Test’ to holistic lifecycle management: How we manage our Azure services at Microsoft http://approjects.co.za/?big=insidetrack/blog/moving-from-a-scream-test-to-holistic-lifecycle-management-how-we-manage-our-azure-services-at-microsoft/ Thu, 20 Nov 2025 17:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=21193 Nearly a decade ago, as we began our journey from relying on on-premises physical computing infrastructure to being a cloud-first organization, our engineers came up with a simple but effective technique to see if a relatively inactive server was really needed. Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies […]

The post Moving from a ‘Scream Test’ to holistic lifecycle management: How we manage our Azure services at Microsoft appeared first on Inside Track Blog.

]]>
Nearly a decade ago, as we began our journey from relying on on-premises physical computing infrastructure to being a cloud-first organization, our engineers came up with a simple but effective technique to see if a relatively inactive server was really needed.

They dubbed it the “Scream Test.”

“We didn’t have a great server inventory and tracking system, and we didn’t always know who owned a server,” says Brent Burtness, a principal software engineer in Commerce Financial Platforms, who was one of the leaders for the effort in his group. “So, we essentially just turned them off. If someone screamed—‘Hey, why’d you turn off my server?’—then we’d know it was still being used.”

Today, the basic idea behind the Scream Test is being used across the company, but in a more holistic way. Importantly, it’s been incorporated into the overall lifecycle management of our computing infrastructure. And, through the automation tools provided by Microsoft Azure, we have a much more efficient process for making sure that we’re saving time and money by reducing the number of underused machines we operate, monitor, and maintain.

A photo of Apple

“We thought we were going to get rid of a small number of machines that weren’t being used. But we found the actual share was about 15% of all machines, which saved us a lot of effort of moving those unused machines to the cloud. In other words, we downsized on the way to the cloud, rather than after the fact.”

Pete Apple, cloud network engineering architect, Microsoft Digital

Uncovering more than expected

The Scream Test was part of the huge effort to evaluate our on-premises compute resources before we began moving to the Azure cloud. After all, why spend resources moving something that isn’t needed?

Pete Apple, who helped develop the concept of the Scream Test, is a cloud network engineering architect in Microsoft Digital, the company’s IT organization. Looking back, he remembers the surprising results that emerged when they began shutting down specific servers to see who noticed.

“We thought we were going to get rid of a small number of machines that weren’t being used,” Apple says. “But we found the actual share was about 15% of all machines, which saved us a lot of effort of moving those unused machines to the cloud. In other words, we downsized on the way to the cloud, rather than after the fact.”

As part of this process, Apple explains, our engineers looked at two related factors to reduce inefficiencies in our usage of computing resources.

The first was to identify systems that were used infrequently, at a very low level of CPU (sometimes called “cold” servers). From that, we could determine which systems in our on-premises environments were oversized—meaning someone had purchased physical machines according to what they thought the load would be, but either that estimate was incorrect or the load diminished over time. We took this data and created a set of recommended Microsoft Azure Virtual Machine (VM) sizes for every on-premises system to be migrated.

“We learned that there’s a lot of orphaned, or underutilized, resources out there,” Burtness says. “These were cases where the workload was so small on a server—like under 5% CPU—that it didn’t make sense to host it on its own machine. We could then move the task or application and get it down to just one or two CPUs on a virtual machine.”

At the time, we did much of this work manually, because we were early adopters. The company now has a number of products available to assist with this review of your on-premises environment, led by Azure Migrate.

Another part of the process was determining which systems were being used for only a few days a month or at certain busy times of the year. These development machines, test/QA machines, and user acceptance testing machines (reserved for final verification before moving code to production) were running continuously in the datacenter but were really only needed during limited windows. For these situations, we applied the tools available in Azure Resource Manager Templates and Azure Automation to ensure the machines would only run when needed.

Automating with Azure

Today, we don’t have to rely on anything as crude as the Scream Test to find unused and underused computing resources. With 98% of our IT resources operating in the Azure cloud, we have much greater insight into how efficient our network is, so much of the process can be automated.

“We’ve found this effort much easier to manage in the cloud, because all our computing resources are integrated with the Azure portal,” Apple says. “They have an API system and offer various tools within Azure Update Manager and Azure Advisor to help with cost efficiency. It’s kind of like a modern version of Clippy—’Hey, it looks like your VM isn’t being used much. Do you want to downsize that or turn it off?'”

(For the uninitiated, Clippy was the Microsoft Office animated paperclip assistant introduced in the late 1990s. It offered tips and help with tasks, like writing and formatting documents. Clippy became iconic for its quirky suggestions, including recommending that you remove things from your desktop that you weren’t using.)

Burtness smiles in a portrait photo.

“With everything being in the Azure portal or in Azure Resource Graph, it’s much more streamlined, and makes it easier to get that data out to the teams. They can then go into the portal and clean up the resource.”

Brent Burtness, principal software engineer, Commerce Financial Platforms

And simply taking the step of turning off stuff that we weren’t using turned out to be very effective. Thanks, Clippy!

Today, we approach this challenge in a more efficient and sophisticated way, taking advantage of Azure tools like Update Manager and Advisor.

“With everything being in the Azure portal or in Azure Resource Graph, it’s much more streamlined, and makes it easier to get that data out to the teams,” Burtness says. “We can run automated queries with Azure Resource Graph. Then we bring that information into our internal Service 360 tool, which we use to give action items to our developers. Each item gives them a link to Azure portal, and they can then go into the portal and clean up the resource.”

Managing for the lifecycle

One of the most important things we learned by using the Scream Test to identify inefficiencies and moving our systems from on-premises servers to the cloud was that it’s an ongoing process, not a fixed-end project.

“We had this idea that it was going to be a one-time event, that we’ll move to the cloud and then we’ll be done,” Apple says. “A better understanding is that it’s a lifecycle. We have integrated this concept of continual evaluation into our processes around everything that’s still on-premises, because we still have labs, we still have physical infrastructure.”

We continue to do this evaluation on a regular basis with both physical and virtual computing resources, because needs and usage are constantly changing.

Cutting our cloud costs

A text graphic shows the savings that one group at Microsoft achieved by becoming more efficient in their compute usage.
In a pilot set of Azure subscriptions, the Commerce Financial Platforms team reduced usage by 233 resources across 36 subscriptions and 17 services in 6 team groups, saving more than $15,000 in monthly operating costs.

“Now we have a basic process around a six-month cycle,” Apple says. “So, every six months we ask, does this still need to be on-premises or should we start moving it to the cloud? And we do the same thing with our cloud resources. Who’s still using these VMs? And we still go through the same review process to see if it’s needed, or if we can shut it down or move it.”

This has resulted in significant cost savings for the company. “We’re up to about 15% to 20% less compute cost, depending on the organization, because of this much better understanding of our business needs,” Apple says.

Better governance, increased security

Another major benefit of this process was establishing much stronger governance of compute resources across the entire organization.

“When we first did the Scream Test, we weren’t always really sure who owned what, in some cases,” Apple says. “We’ve fixed that as part of this process. This governance aspect is a key part of being more efficient with our resources.”

Burtness explains why this is so important.

“It’s critical to know exactly who to contact when there’s something wrong with the server,” Burtness says. “Now, with clearer ownership, clearer accountability, and better inventory, it’s a much better experience.”

Better governance also means tighter security, according to both Apple and Burtness.

“This is really important when it comes to threat-actor response,” Apple says. “Unused servers can often be an entry point for hackers. Or, say we discover that a machine or server is getting hacked; you need to talk to who owns it. If you don’t know, it takes you longer to track them down and combat the hack. That’s not great. Improving our governance has definitely made securing our environment easier.”

Key takeaways

Here are some things to keep in mind when managing your own enterprise compute resources for greater efficiency:

  • It’s not a one-time exercise. For the best results, you should be evaluating your computing resources on a regular schedule to identify ”cold” servers and unused infrastructure.
  • Adjust for variable usage patterns. It’s not just about unused servers. Some machines may only be needed for a business function during certain busy times of the year. Consider turning the machines on just to handle the load during those periods and turning them off the rest of the year.
  • Use Azure tools for greater insight. If you’re operating your infrastructure in the Azure cloud, you can much more easily monitor and address orphaned resources using automated tools such as Azure Advisor, Azure Resource Graph, and the Azure portal.
  • Apply your savings to other priorities. “The more efficient you are, the more savings can be applied to other projects or given back to your manager—who is going to be very happy with you,” Apple says.
  • Saving money is not the only benefit. You’ll not only save operating costs, you’ll have a reduced maintenance and monitoring load, better governance, and fewer security vulnerabilities.

The post Moving from a ‘Scream Test’ to holistic lifecycle management: How we manage our Azure services at Microsoft appeared first on Inside Track Blog.

]]>
21193
Vuln.AI: Our AI-powered leap into vulnerability management at Microsoft http://approjects.co.za/?big=insidetrack/blog/vuln-ai-our-ai-powered-leap-into-vulnerability-management-at-microsoft/ Thu, 16 Oct 2025 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20623 In today’s hyperconnected enterprise landscape, vulnerability management is no longer a back-office function—it’s a frontline defense. With thousands of devices from a multitude of vendors, and a relentless stream of Common Vulnerabilities and Exposures (CVEs), here at Microsoft we faced a challenge familiar to every IT decision maker: how to scale vulnerability response without scaling […]

The post Vuln.AI: Our AI-powered leap into vulnerability management at Microsoft appeared first on Inside Track Blog.

]]>
In today’s hyperconnected enterprise landscape, vulnerability management is no longer a back-office function—it’s a frontline defense. With thousands of devices from a multitude of vendors, and a relentless stream of Common Vulnerabilities and Exposures (CVEs), here at Microsoft we faced a challenge familiar to every IT decision maker: how to scale vulnerability response without scaling cost, complexity, or risk.

A photo of Fielder.

“While AI enables amazing capabilities for knowledge workers, it also increases the threat landscape, since bad actors using AI are constantly probing for vulnerabilities. Vuln.AI helps keep Microsoft safe by identifying and accelerating the mitigation of vulnerabilities in our environment.”

Brian Fielder, vice president, Microsoft Digital 

Enter Vuln.AI, an intelligent agentic system developed by our team in Microsoft Digital—the company’s IT organization—to transform how we identify, prioritize, and resolve vulnerabilities across our enterprise network.

Manual methods can’t keep up

As a company, we detect over 600 million cybersecurity threats every day, according to our latest Digital Defense Report. Some of those signals are bad actors probing our internal network and infrastructure looking for unpatched vulnerabilities. Our infrastructure supports over 300,000 employees and vendors, 25,000 network devices, and over 560 buildings across 102 countries. This scale means we face a constant stream of vulnerabilities—each requiring triage, impact analysis, and remediation.

“While AI enables amazing capabilities for knowledge workers, it also increases the threat landscape, since bad actors using AI are constantly probing for vulnerabilities. Vuln.AI helps keep Microsoft safe by identifying and accelerating the mitigation of vulnerabilities in our environment,” says Brian Fielder, a vice president within Microsoft Digital. 

Historically, our Infrastructure, Networking, and Tenant team here in Microsoft Digital relied on manual assessments to determine which network devices were impacted by new vulnerabilities. Traditional vulnerability scanning tools generate a lot of false positives and false negatives, and a significant amount of analysis still falls to security engineers, requiring manual validation before any vulnerability impact can be communicated to device owners. These manual methods were time-consuming, error-prone, and reactive—our security engineers were spending hours on each vulnerability, at times missing critical threats or sinking too much time into false alarms.

A photo of Bansal.

“AI’s true power lies in the problem it’s applied to. Start by identifying the most time-consuming or painful task in your organization-then explore how AI can augment or improve it. Begin with a small, targeted enhancement and iterate continuously.”

Ankit Bansal, senior product manager, Microsoft Digital

With the vast number of vulnerabilities coming in every day, security engineers needed a scalable way to quickly analyze, prioritize, and respond.

The solution: Vuln.AI

We already achieved dramatic impact with our AI Ops and Network Infrastructure Copilot, which is on track to save us over 11,000 hours of network service management time per year. We built Vuln.AI on top of that investment:

  1. The Research Agent analyzes vulnerability feeds and network metadata from our Infrastructure Data Lakehouse (IDL) built on top of Azure Data Explorer, which regularly ingests data from our device vendors and other sources. Once new vulnerabilities are detected, it automates the identification of impacted devices and integrates with other internal tooling for validation and reporting.
  2. The Interactive Agent acts as a gateway for engineers and device owners to ask follow-up questions and initiate remediation. Through agent-to-agent interaction, it leverages our Network Infrastructure Copilot to query the research agent’s findings. This agentic interface enables real-time decision-making and contextual insights.

Together, these agents are significantly improving our network security operations. The results we’re seeing so far are compelling:

  • A 70% reduction in time to vulnerability insights, enabling faster prioritization and mitigation, minimizing exposure windows.
  • Lower risk of compromise through increased accuracy, quicker detection, and containment of threats.
  • A stronger compliance posture that supports adherence to financial, legal, and regulatory requirements.
  • Higher accuracy in identifying vulnerable devices, reducing false positives and missed threats
  • Engineering hours saved and reduced fatigue, significantly improving productivity.

Our gains translate to lower operational risk, faster response times, and more resilient infrastructure—critical outcomes for any enterprise navigating today’s threat landscape.

“AI’s true power lies in the problem it’s applied to,” says Ankit Bansal, a senior product manager within Microsoft Digital. “Start by identifying the most time-consuming or painful task in your organization-then explore how AI can augment or improve it. Begin with a small, targeted enhancement and iterate continuously.”

How Vuln.AI works

The system continuously ingests our CVE data from our device suppliers’ API feeds and a publicly available database of known cybersecurity vulnerabilities.  It correlates that data with device attributes such as its hardware model and OS to identify the potential impact on the network and surface actionable insights.

Engineers interact with the system via Copilot, Teams, or custom tooling, which allows seamless integration with our network security teams’ daily workflows.

“We built a hybrid approach in Vuln.AI to guide LLMs through complex security advisories,” says Blaze Kotsenburg, a software engineer in Microsoft Digital. “By combining structured function calls, templated prompts, and data validation, we keep the model focused on producing reliable, actionable insights for vulnerability mitigation.”

A photo of Lollis.

“We chose Durable Functions for Vuln.AI because it allowed us to confidently orchestrate complex, stateful research. The reliability and simplicity of the framework meant we could shift our focus to engineering the intelligence behind the agent, especially the prompting strategies used in Vuln.AI’s backend processing.”

Mike Lollis, a senior software engineer in Microsoft Digital.

When it came to building Vuln.AI, we relied heavily on our own technology platforms, including: 

  • Azure AI Foundry for model development and deployment
  • Azure Data Explorer to store device metadata and CVEs
  • Agent to agent interaction with Network Copilotto query our database for device and inventory knowledge
  • Azure OpenAI models for natural language processing and classification
  • Azure Durable Functions for fine-grained orchestration and custom LLM workflows

“We chose Durable Functions for Vuln.AI because it allowed us to confidently orchestrate complex, stateful research,” says Mike Lollis, a senior software engineer in Microsoft Digital.  “The reliability and simplicity of the framework meant we could shift our focus to engineering the intelligence behind the agent, especially the prompting strategies used in Vuln.AI’s backend processing.”

Vuln.AI in action

Consider a common scenario: a new CVE that affects a network switch has just been published. Vuln.AI’s research agent immediately flags the vulnerability, maps it to potentially affected devices in our network inventory, and pushes the findings to an internal database.

A photo of Lee.

“AI is only as good as the data you provide. Much of the success with Vuln.AI came from our dedicated efforts to source comprehensive vulnerability data and device attributes. For effective AI-powered solutions, you really need to invest in a strong data foundation and a strategy for how to integrate into the rest of your infrastructure.”

Linda Lee, product manager II, Microsoft Digital

This data then becomes immediately accessible in our internal tools, where it is validated and approved by security engineers. Following this, network engineers are provided with precise information about their vulnerable devices.

Engineers can prompt Vuln.AI’s interactive agent to instantly retrieve the following information:

“12 devices impacted by CVE-2025-XXXX. Would you like me to suggest some next steps for mitigation or remediation?”

With Vuln.AI, network engineers can now begin vulnerability response operations much more quickly—no spreadsheet wrangling and no delays.

“AI is only as good as the data you provide,” says Linda Lee, a product manager II within Microsoft Digital. “Much of the success with Vuln.AI came from our dedicated efforts to source comprehensive vulnerability data and device attributes. For effective AI-powered solutions, you really need to invest in a strong data foundation and a strategy for how to integrate into the rest of your infrastructure.”

It’s about automating manual workflows and research.

“Vuln.AI has reduced our triage time by over 50%,” says Vincent Bersagol, a principal security engineer in Microsoft Digital.

This is allowing our engineers to focus on deeper analysis.

“The synergy between security and AI engineering has unlocked a new level of precision in vulnerability insights,” Bersagol says. “This is just the beginning.”

The journey ahead

Our journey with AI-powered vulnerability management has only just begun. Looking ahead, our roadmap for Vuln.AI includes:

  • Extending data coverage to include more hardware suppliers
  • Integrating more detailed device profiles for more targeted vulnerability response
  • Supporting autonomous workflows to streamline network engineers’ remediation efforts
  • Incorporating other AI agents to support more security use cases

These enhancements will further reduce risk, accelerate response times, and empower engineers to focus on more strategic initiatives.

“Trust is the foundation of everything we do in Microsoft Digital,” Bansal says. “Securing our network is essential to upholding that trust. Intelligent solutions like Vuln.AI not only help us stay ahead of emerging threats—they also establish the blueprint for integrating AI more deeply into our security operations.”

For IT leaders, Vuln.AI offers a blueprint for modern vulnerability management:

  • Scalable: Handles thousands of devices and vulnerabilities with ease
  • Accurate: Reduces false positives and missed threats
  • Efficient: Saves time, money, and resources
  • Secure: Built on Microsoft’s trusted AI and security frameworks

In a world where every second counts and any threat can be costly, Vuln.AI transforms vulnerability management from a bottleneck into a competitive advantage for Microsoft.

Key takeaways

As your organization looks for ways to improve security and threat response in a fast-changing landscape, consider the following insights on how AI is reshaping vulnerability management at Microsoft:

  • Fight fire with fire: The threat landscape has broadened dramatically due to bad actors using AI. Supplementing your own efforts with AI can help you manage your risk more effectively than traditional vulnerability management.
  • Agility is key: Effective vulnerability response hinges on acting fast. An AI-powered solution like Vuln.AI can cut the time needed to analyze and mitigate vulnerabilities by over 50%, enabling organizations to enhance security operations at scale.
  • The future is now: Looking ahead, Microsoft Digital will integrate agentic workflows into more security operations, boosting efficiency in risk prevention, threat detection and response, thereby enabling security practitioners and developers to focus on more strategic projects.

The post Vuln.AI: Our AI-powered leap into vulnerability management at Microsoft appeared first on Inside Track Blog.

]]>
20623
Keeping our in-house optical network safe with a Zero Trust mentality http://approjects.co.za/?big=insidetrack/blog/keeping-our-in-house-optical-network-safe-with-a-zero-trust-mentality/ Thu, 16 Oct 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20611 When it comes to corporate connectivity at Microsoft, a minute of lost connection can lead to catastrophic disruptions for our product teams, sleepless nights for our network engineers, and millions of dollars of lost value for the company. That’s why we built our own optical network at our headquarters in Washington state, and that’s why […]

The post Keeping our in-house optical network safe with a Zero Trust mentality appeared first on Inside Track Blog.

]]>
When it comes to corporate connectivity at Microsoft, a minute of lost connection can lead to catastrophic disruptions for our product teams, sleepless nights for our network engineers, and millions of dollars of lost value for the company.

That’s why we built our own optical network at our headquarters in Washington state, and that’s why we’re building similar networks at other regional campuses around the United States and the rest of the world.

With so much on the line, we need to make sure these in-house networks never go down.

But how are we doing that?

We’re applying the same robust Zero Trust approach we take to security and identity. While our optical networks are extremely reliable, any complex system can be knocked offline. In alignment with the Zero Trust mentality we have as a company, we trusted the integrity of what we’ve built, but we needed a resilient backup system that went beyond redundancy to provide true resilience.

Driven by this goal, we created a Zero Trust Optical Business Continuity Disaster Recovery (BCDR) network that combines two fully independent optical systems designed to sustain uninterrupted services, even during systemic failures. The result is more confidence for our employees and vendors, less pressure on our network engineers, and comprehensive network resilience that will protect us against a major outage.

The urgency of resilience

In 2021, our team in Microsoft Digital, the company’s IT organization, deployed our first next-generation optical network to serve the exclusive network needs of our Puget Sound metro campuses. It offers more bandwidth on less fiber for a lower operational cost than leasing from traditional carriers.

“Puget Sound is a highly concentrated developer network where we need to provide very high throughput,” says Patrick Alverio, principal group software engineering manager for Infrastructure and Engineering Services within Microsoft Digital. “Our optical system is the backbone of all that traffic.”

Our state-of-the-art optical network fulfills our need for fast and reliable connectivity at up to 400 Gbps between core sites, labs, data centers, and the internet edge. We built this network on the Reconfigurable Optical Add/Drop Multiplexer (ROADM) technology, delivering dynamic reconfiguration, colorless, directionless, contentionless (CDC) capabilities, flexible grid support, remote provisioning, and automation. It also features a full-mesh topology that provides a layer of redundancy.

But what if the entire ROADM-based system fails?

There are plenty of operational risks that can derail even the most robust network. Anything from misconfigured automation scripts to policy changes to misaligned software versioning to simple human error can cause outages.

A photo of Elangovan

“We don’t want even a second of downtime. We needed a life raft for when failures occur that could also function as a standby network for core site migrations or platform upgrades.”

Vinoth Elangovan, senior network engineer, Hybrid Core Network Services, Microsoft Digital

To some degree, those kinds of minor disruptions are inevitable. But catastrophic events like fiber cuts, failures in the ROADM operating system, or even natural disasters have the potential for even more wide-ranging disruption.

During a catastrophic outage, thousands of engineers, developers, researchers, and other technical employees who need access to crucial lab environments and data centers could lose connectivity. That can sabotage feature delivery, disrupt product patches, interrupt updates, and halt all kinds of core product functions.

On top of normal software development operations, new AI tools demand massive bandwidth and consistent uptime. Finally, our hybrid networks feature paths integrated with Microsoft Azure that consume on-premises resources, so they also stand to benefit from increased resilience.

A catastrophic network outage can cause incredible damage to all of these business functions. In fact, we experienced exactly that in 2022.

A fiber cut combined with a ROADM system hardware reboot caused a five-minute outage at our Puget Sound metro region. In this environment, every minute of lost connectivity can result in significant financial impact, making network resilience absolutely essential.

“We don’t want even a second of downtime,” says Vinoth Elangovan, senior network engineer, who designed and implemented the Zero Trust Optical BCDR network for Microsoft. “We needed a life raft for when failures occur that could also function as a standby network for core site migrations or platform upgrades.”

Delivering greater network resilience

To ensure we could deliver uninterrupted network connectivity even in the midst of a catastrophic outage, we needed to consider the technical demands of a truly resilient system. Five design pillars helped us assemble our architectural criteria:

  1. Independent optical systems: To provide true resilience, our primary and BCDR platforms needed to operate autonomously.
  2. Physically independent paths: Circuits should avoid shared conduits, fibers, and splices to operate completely independently.
  3. Separate control software: The primary and backup networks should operate through dedicated network management systems (NMSs), automation, and provisioning domains.
  4. Unified client interface: Both systems needed to terminate into the same interface to unify service for clients and applications.
  5. Survivability by design: We couldn’t assume that any system would be immune to failure. Instead, we built for the best possible outcomes.

The result was the Zero Trust Optical BCDR architecture, a layered approach to optical networking. It consists of our primary, ROADM-based transport layer and a secondary, MUX-based transport layer, both terminating into a single logical port channel.

“Our core responsibility is the employee experience, so our main design thrust was making sure service is seamless and uninterrupted—even during an outage.”

Vinoth Elangovan, senior network engineer, Hybrid Core Network Services, Microsoft Digital

Both systems are live and active, which means they deliver production services through their own independent fibers, power supplies, and software stacks. By layering fully independent optical domains and logically unifying them at the Ethernet edge, the network can sustain a complete failure of one system and maintain continuity.

That physical and operational independence is the difference between simple redundancy and robust resilience.

“Our core responsibility is the employee experience, so our main design thrust was making sure it’s seamless and uninterrupted—even during an outage,” Elangovan says.

Optical network backed by a BCDR network

A schematic of an optical network running between different nodes and backed up by a BCDR network.
The optical network in our Puget Sound region connects core sites to labs, datacenters, and the internet edge, while the BCDR network provides backup connections to deliver resilience in case of a catastrophic network failure.

A typical ROADM optical network connects campus and data center sites to the internet edge. Our design features three interconnected optical rings, with two internet edges as multi-directional nodes, while other sites operate as dual-degree nodes with bidirectional redundancy. Meanwhile, our campuses and datacenters are designated as critical sites and equipped with Optical BCDR links to ensure enhanced resiliency. In the event of a complete Optical ROADM line failure, these critical sites retain connectivity.

In the event of an outage on the primary network, the port channel handles forward continuity automatically, shifting WAN traffic between optical paths in real time.

The transition occurs seamlessly and transparently, with no noticeable impact to clients.

A photo of Martin

“Our initial goal was to provide high-throughput connectivity for major labs, with less than six minutes of downtime per year. That represents a service level of 99.999% network continuity, and we’re aiming for even better moving forward.”

Blaine Martin, principal engineering manager, Hybrid Core Network Services, Microsoft Digital

Coupling at the Ethernet layer provides clients and applications with one logical interface, automatic load balancing and traffic distribution, and seamless failover, regardless of which optical domain is providing service.

“Our initial goal was to provide high-throughput connectivity for major labs, with less than six minutes of downtime per year,” says Blaine Martin, principal engineering manager for Hybrid Core Network Services in Microsoft Digital. “That represents a service level of 99.999% network continuity, and we’re aiming for even better moving forward.”

A new era of confidence for network engineers

For the network engineers who keep Microsoft employees and resources connected, the Zero Trust Optical BCDR network relieves much of the pressure that comes from resolving outages.

“Before, we were dependent on a single system, even with redundancies, so the human experience was like firefighting. Now, if the primary optical network is having a problem, I don’t even see it.”

Kevin Bullard, principal cloud network engineering manager, Microsoft Digital

When a network goes down, engineers have an enormous set of responsibilities to manage: processing the incident report, assigning severity, performing checks, notifying internal teams, providing updates, and engaging with physical support teams—all with a profound urgency to restore productivity.

Dialing those pressures back has been a huge benefit.

“Before, we were dependent on a single system, even with redundancies, so the human experience was like firefighting,” says Kevin Bullard, Microsoft Digital principal cloud network engineering manager responsible for maintaining WAN interconnectivity between labs. “Now, if the primary optical network is having a problem, I don’t even see it.”

There will always be pressure on network engineers to restore connectivity during an outage, but they can breathe easier knowing it won’t cost the company millions of dollars as the time to resolve ticks away. And in non-emergency situations like core site migrations, the BCDR network provides a much easier way to shunt services while the main network is offline.

“Our internal users have become more confident that they can stay connected, no matter what,” says Chakri Thammineni, principal cloud network engineer for Infrastructure and Engineering Services in Microsoft Digital. “That gives the people responsible for maintaining our enterprise networks incredible peace of mind.”

Fortunately, there hasn’t been a substantial network outage in the Puget Sound metro area since 2022. But our network engineering teams know that if and when it happens, the BCDR network will be ready to maintain service continuity.

A photo of Alverio.

“We’re always looking ahead into industry trends to stay at the bleeding edge, whether that’s in the technology we provide for our customers or the networks we use to do our own work.”

Patrick Alverio, principal group software engineering manager, Infrastructure and Engineering Services, Microsoft Digital

With our Puget Sound network protected, we have plans in place to extend this model to other metro areas. Naturally, we have to balance population, criticality, and the knowledge that elevated reliability and availability come with a cost.

Our selection criteria for new BCDR networks have largely centered around two factors: expansions of AI-critical infrastructure and concentrations of secure access workspaces (SAWs) for technical employees. With these criteria in mind, we’re planning new BCDR networks first in the Bay Area and Dublin, then in Virginia, Atlanta, and London.

Zero Trust optical BCDR architecture represents a paradigm shift in enterprise network resilience, and we’re committed to expanding the model to benefit both conventional workloads and the expanding infrastructure demands of AI.

“We’re always looking ahead into industry trends to stay at the bleeding edge, whether that’s in the technology we provide for our customers or the networks we use to do our own work,” Alverio says. “We refuse to accept the status quo, and we’re elevating the experience for employees across Puget Sound and Microsoft as a whole.”

Driving AI innovation in optical network resilience

Our journey towards an AI-driven optical network is gaining momentum.

As part of our Secure Future initiative, we’ve automated our Optical Management Platform credential rotation and are actively developing intelligent incident management ticket enrichment, auto-remediation, link provisioning, deployment validation, and capacity planning.

AI plays a central role in this transformation.

With Microsoft 365 Copilot and GitHub Copilot integrated into our engineering workflows, we’re accelerating development cycles, improving code accuracy, and uncovering optimization opportunities that would otherwise take hours of manual effort.

These Copilots are also helping our engineers analyze network patterns, simulate outcomes, and validate deployment logic before execution, reducing human error and strengthening our Zero Trust posture. Over time, we’re evolving toward a system where AI not only assists but proactively predicts potential disruptions, recommends remediations, and continuously learns from operational telemetry.

These advancements are paving the way for a future where our optical infrastructure can anticipate issues, recover faster, and operate with the agility and assurance expected in a Zero Trust environment.

Key takeaways

If you’re considering implementing your own optical and BCDR networks, consider these tips:

  • Understand the technical components of resilience: Independent optical systems, physically independent paths, separate control software, a unified client interface, and survivability by design are the key technical components of true resilience.
  • Plan from a preparedness and value perspective: Evaluate the critical points in your infrastructure and determine where you can get the most value out of resilient connectivity.
  • Ensure your teams have the right skillset: Carefully consider the right workforce to run those systems and be accountable for their operation.

The post Keeping our in-house optical network safe with a Zero Trust mentality appeared first on Inside Track Blog.

]]>
20611
Unleashing API-powered agents at Microsoft: Our internal learnings and a step-by-step guide http://approjects.co.za/?big=insidetrack/blog/unleashing-api-powered-agents-at-microsoft-our-internal-learnings-and-a-step-by-step-guide/ Thu, 02 Oct 2025 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19793 Agentic AI is the frontier of the AI landscape. These tools show enormous promise, but harnessing their power isn’t always as straightforward as prompting a model or accessing data from Microsoft 365 apps. To reach their full potential in the enterprise, agents sometimes need access to data beyond Microsoft Graph. But giving them access to […]

The post Unleashing API-powered agents at Microsoft: Our internal learnings and a step-by-step guide appeared first on Inside Track Blog.

]]>
Agentic AI is the frontier of the AI landscape. These tools show enormous promise, but harnessing their power isn’t always as straightforward as prompting a model or accessing data from Microsoft 365 apps. To reach their full potential in the enterprise, agents sometimes need access to data beyond Microsoft Graph. But giving them access to that data relies on an extra layer of extensibility.

To meet these demands, many of our teams within Microsoft Digital, the company’s IT organization, have been experimenting with API-based agents. This approach combines the best of two worlds: accessing diverse apps and data repositories and eliminating the need to build an agent from the ground up.

We want to empower every organization to unlock the full power of agents through APIs. The lessons we’ve learned on our journey can help you get there.

The need for API-based agents

The vision for Microsoft 365 Copilot is to serve as the enterprise UX. Within that framework, agents serve as the background applications that streamline workflows and save our employees time.

For many users, the out-of-the-box access Copilot provides to Microsoft Graph is enough to support their work. It surfaces the data and content they need while providing a foundational orchestration layer with built-in capabilities around compliance, responsible AI, and more.

But there are plenty of scenarios that require access to other data sources.

“Copilot provides you with data that’s fairly static as it stands in Microsoft Graph,” says Shadab Beg, principal software engineering manager on our International Sovereign Cloud Expansion team. “If you need to query from a data store or want to make changes to the data, you’ll need an API layer.”

By using APIs to extend agents built on the Copilot orchestration layer, organizations can apply its reasoning capabilities to new data without the need to fine-tune their models or create new ones from scratch. The possibilities these capabilities unlock are driving a boom in API-based agents for key functions and processes.

“Cost is one of the most critical dimensions in how we design, deploy, and scale our solutions. Declarative API-driven agents in Microsoft 365 Copilot offer a path to unify agentic experiences while leveraging shared AI compute and infrastructure.”

A photo of Nasir.
Faisal Nasir, AI Center of Excellence and Data Council lead, Microsoft Employee Experience

In many ways, IT organizations like ours are the ideal places to implement API-based agents. Our teams are adept at creating and deploying internal solutions to solve technical challenges, and IT work is often about enablement and efficiency—exactly what agents do best.

“Cost is one of the most critical dimensions in how we design, deploy, and scale our solutions,” says Faisal Nasir, AI Center of Excellence and Data Council lead in Microsoft Employee Experience. “Declarative API-driven agents in Microsoft 365 Copilot offer a path to unify agentic experiences while leveraging shared AI compute and infrastructure. By aligning with core architectural principles such as efficiency, scalability, and sustainability, we can ensure these agents not only drive intelligent outcomes but also maximize value across service areas with minimal overhead.”

{Learn more about our vision and strategy around deploying agents internally at Microsoft.}

The Azure FinOps Budget Agent

Our Azure FinOps Budget Agent is a perfect example of a scenario for API-based agents.

The team responsible for managing our Microsoft Azure budget for IT services was looking for ways to reduce costs by 10–20 percent. To do that effectively, service and finance managers needed the ability to track their spending quickly, accurately, and easily.

The conventional approach to solving this problem would be creating a dashboard with access to the relevant data. The problem with a UI-based approach is that it tends to cater to more specific personas by providing data only they need while oversaturating others with information that’s irrelevant to their work.

Azure spend is basically the lifeline for our services,” says Faris Mango, principal software engineering manager for infrastructure and engineering services within Microsoft Digital. “Getting the information you need in a concise format that provides a nice, holistic view can be challenging.”

With the advent of generative AI and Microsoft 365 Copilot, the team knew that a natural language interface would be much more intuitive. The result was the Azure FinOps Budget Agent.

The team created the agent and the necessary APIs using Microsoft Visual Studio Code. Its tables and functions run on Azure Data Explorer, allowing the APIs and their consumers to access data almost instantaneously, thanks to its low latency and rapid read speeds.

The tool retrieves data by running Azure Data Factory pipelines that pull and transform data from three sources:

  • Our SQL Server for service budget and forecast data
  • Azure Spend for the actual spending amounts
  • Projected spending, a separate service stored in other Azure Data Explorer tables

Processing the information relies on our business logic’s join operations, followed by aggregations by fiscal year and service tree levels. These summarize the data per service, team group, service group, and organization.

After the back end processes the day’s data, it ingests the information into our Azure Data Explorer tables, which the agent accesses by calling via Kusto functions (the query language for Azure Data Explorer). The outcome is very low latency. Typically, the agent returns results in under 500 milliseconds.

For users, the tool is stunningly simple. They simply access Copilot and navigate to the Azure FinOps Budget Agent.

The agent provides three core prompts at the very top of the interface: “My budgets,” “Service budget information,” and “Service group budget information.” Clicking on one of these pre-loaded prompts returns role-specific information around budget, forecasts, actuals, projections, and variance, all at a single glance. The interface even includes graphs to help people track spending visually.

If users are looking for more specific information, they can input their own queries. For example:

  • “Get me the monthly breakdown of service Azure Optimization Assessment analytics.”
  • “Find me the service in this tree with the highest budget.”
  • “Show me the Azure budget for our facilities reporting portal.”
  • “Which service deviates most from its budget forecasts?”

The Azure FinOps Budget Agent primarily serves two groups: service managers who directly oversee spend for Azure-based services and FinOps managers responsible for larger budget silos.

Mango is responsible for the internal UI that helps network employees access parts of the Microsoft network. With 18–20K users per month, budgeting and forecasting are highly dynamic due to traffic fluctuations and the resourcing that supports them. He also oversees the internal portal that helps service engineers manage our networks. The tool is growing rapidly as we onboard more teams, so forecasting is anything but linear.

For both of these services, keeping close track of spending is essential. Mango finds himself checking the Azure FinOps Budget Agent about twice a month to gauge how his services are trending.

“It’s taking me less time to do analysis and come up with accurate numbers. And the enhanced user experience just feels more natural, like you’re asking questions conversationally rather than engaging with a dashboard.”

A photo of Mango.
Faris Mango, principal software engineering manager for infrastructure and engineering services, Microsoft Digital

For FinOps managers, the value is more high-level. They are responsible for overseeing tens of services featuring vast volumes of Azure usage across storage and compute while managing strict budgets. That requires constant vigilance.

Switching context from one dashboard to another to track different Azure management groups was a constant hassle for them. Now, they use the Azure FinOps Budget Agent to get an up-to-date view of the overall spend picture. It gives them a place to start. From there, they can drill down if he sees any abnormalities.

“It’s taking me less time to do analysis and come up with accurate numbers,” Mango says. “And the enhanced user experience just feels more natural, like you’re asking questions conversationally rather than engaging with a dashboard.”

The arrival of the Azure FinOps Budget Agent is just one example of how agents take your context and get your people the answers they care about faster at less cost.

Benefits like these are spreading across teams throughout Microsoft. Overall, we’ve been able to save 10–12 percent of our overall Azure cost footprint for Microsoft Digital, and individual users are thrilled at the amount of time and effort they’re saving.

“Now the info is at people’s fingertips. The advantage of an agent is that users don’t have to understand a complex UI, so they can get quick answers and get back to work.”

A photo of Beg.
Shadab Beg, principal software engineering manager, International Sovereign Cloud Expansion

Five key strategies for building an API-based agent

After seeing what we’ve accomplished with API-based agents, you might be wondering how to put them into action at your organization. This step-by-step guide can help you get there.

Building an API-based agent needs to fulfill multiple requirements. It has to expose APIs, align with real user needs, integrate seamlessly with Microsoft 365 Copilot, and work reliably, efficiently, and scalably. Achieving those outcomes depends on five key strategies.

Start with user intent, not the API

Start by asking a simple but powerful question: What will users actually ask your agent? Instead of designing the API first, flip the process:

  • Gather real user queries to understand actual use cases.
  • Refine the queries using prompt engineering techniques to align them with expected AI behavior.
  • Design the API to provide structured responses to those refined queries.

By starting with user intent, you ensure your agent answers real user questions directly, avoids over-engineering unnecessary endpoints, and delivers meaningful results without excessive back-end processing.

“Now the info is at people’s fingertips,” Beg says. “The advantage of an agent is that users don’t have to understand a complex UI, so they can get quick answers and get back to work.”

The advantage of an agent is that users don’t have to understand a complex UI, so they can get quick answers and get back to work.”

Key learning: An API that doesn’t align with user intent won’t be effective—even if you design it well.

Design APIs for Microsoft 365 Copilot Integration

It’s important to build an API schema that returns precise and structured data to make it easy for Copilot to consume. This ensures your APIs return data in a format that directly answers user queries. Copilot expects responses in under three seconds, so focus on optimizing API responses for low latency.

Once you have your list of key questions, design your API schema to return the exact data you need to answer those questions. Your goal should be to ensure every API response has a structure that makes it easy for Copilot to understand.

Teach Microsoft 365 Copilot to call your API

Copilot needs to know how to call your API. Manifests and OpenAPI descriptions accomplish that training.

Create detailed OpenAPI documentation and plugin manifests so Copilot knows what your API does, how to invoke it, and what responses to expect. You’ll likely need to adjust to these files through a process of trial and error.  

Scale APIs for performance and reliability

Once you have your schema and integration in place, it’s time to move on to the primary engineering challenge: making your API scalable, efficient, and reliable.

Prioritize the following goals:

  • Fast response times: Copilot expects quick answers.
  • High scalability: This ensures seamless performance at scale.
  • Reliable uptime: The system needs to remain robust.

We recommend setting a very strict latency limit while implementing your API to retrieve data, since Copilot needs time to generate its response. Existing API endpoints often involve complex data joins rather than simply returning rows from data tables. This complexity can lead to longer processing times, particularly with intricate queries that involve multiple data stores.

To address these potential delays, pre-cache results to significantly enhance performance. This can help overcome the latency requirements imposed by Copilot.

At this point, you’ll see why starting with user intent and iteratively refining API design is important. By grounding your work in user behaviors, you’ll align with the following best practices:

  • Structure your response to directly address user queries.
    Instead of just returning raw data, the API should provide meaningful insights Copilot can interpret. Prompt engineering marries user intent with the most understandable API schema.
  • Keep your API flexible enough to adapt to evolving business needs.
    Real-world workflows change over time, and an API should be able to support those changes without massive refactoring.
  • Avoid performance bottlenecks caused by unnecessary complexity.
    Understanding the exact data requirements up front prevents heavy joins, excessive filtering, and inefficient data retrieval logic.
  • Optimize for Copilot’s real-time response constraints.
    With a strict limit on latency, consider pre-optimization techniques like pre-caching results and simplifying query logic from the very beginning of your API implementation.

If you attempt to build a scalable, reliable API without first understanding how users will interact with your agent, you’ll spend months reworking the schema, debugging inefficiencies, and struggling with integration challenges.

Key learning: A fast, scalable, and reliable API isn’t just about technical optimization. It starts with a deep understanding of the questions it needs to answer and how to structure responses so Copilot can interpret them correctly.

Consider compliance and responsible AI

Unlike custom agents or OpenAI API integrations, knowledge-only agents require far less effort to meet Microsoft’s Responsible AI Standard. Microsoft tools’ built-in compliance capabilities handle much of the complexity. As a result, you can focus on efficiency and optimization rather than regulatory hurdles.

“Agent-based automation must balance speed with responsibility,” Nasir says. “We embed compliance, cost control, and telemetry from the start, so our systems don’t just scale, they mature.”

Key learning: It’s helpful to revisit your existing compliance, governance, and responsible AI processes and policies before implementing AI solutions. Copilot adheres to protective structures within your Microsoft technology ecosystem, so this process will ensure you’re starting from the most secure position.

APIs and the agentic future

Building API-based agents is more than just an integration exercise. It’s about creating scalable, intelligent, and compliant AI-driven workflows. By aligning your API design with user intent, you set Microsoft 365 Copilot free to retrieve and interpret information accurately. That leads to a seamless AI experience for your employees.

Thanks to Copilot’s built-in security and compliance features, API-based Copilot agents are some of the most efficient, compliant, and enterprise-ready ways to deploy AI solutions. They represent another step into an AI-first future tailored to your employees’ and organization’s needs.

Tools like API-based agents democratize the information we all need to do our jobs better, because we’re all getting the same data from the same place. This is why an AI-first mindset is actually human-first.

Key takeaways

Here are some things to keep in mind when designing agent-powered experiences that are fast, reliable, and aligned with user expectations.

  • Response time is key. Choose single APIs that have low latency to facilitate both the technical requirements of Copilot and users’ needs.
  • Consider the source. Data has to be high-quality on the backend. It’s worth reviewing your data and ensuring the hygiene is good.
  • Agents and APIs need to align. Design with task-centric, well-structured agents. Determine your high-level goals, then use the OpenAI standard, OpenAPI, or graph schemas to describe task endpoints. Define each API’s capability, input schema, and expected outcome very clearly.
  • Plan ahead to avoid surprises. Design your APIs to minimize potential side effects, especially through enabling natural-language-to-API mapping, because that’s the biggest change in methodology.
  • Design for visibility. Agents need to be observable and explainable, so implement metrics-driven monitoring. Having API-level telemetry in addition to Copilot-level telemetry enables continuous improvement.

The post Unleashing API-powered agents at Microsoft: Our internal learnings and a step-by-step guide appeared first on Inside Track Blog.

]]>
19793
Making transportation seamless and efficient with the power of data and AI at Microsoft http://approjects.co.za/?big=insidetrack/blog/making-transportation-seamless-and-efficient-with-the-power-of-data-and-ai-at-microsoft/ Thu, 02 Oct 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20462 It’s full speed ahead for the future of transportation at Microsoft. Five years ago, as a global pandemic shut down offices and commuting ground to a halt, Microsoft took the opportunity to overhaul the technology underpinning its transportation services. The result was a more modernized and integrated system that employees enjoyed as they resumed work […]

The post Making transportation seamless and efficient with the power of data and AI at Microsoft appeared first on Inside Track Blog.

]]>
It’s full speed ahead for the future of transportation at Microsoft.

Five years ago, as a global pandemic shut down offices and commuting ground to a halt, Microsoft took the opportunity to overhaul the technology underpinning its transportation services. The result was a more modernized and integrated system that employees enjoyed as they resumed work at our Puget Sound-based global headquarters.

Gaurav smiles in a portrait photo.

“Figuring out their commute should not be a pain point for employees. We’re harnessing our advanced technology and the power of AI to do the heavy lifting, so they don’t have to struggle to figure out how they’ll get to work.”

Garima Gaurav, senior product manager, Microsoft Digital

Today, with flexible work schedules the norm, the investment in these technologies—including improved UIs for employee-facing tools, better data handling and collection on the backend, and a more seamless experience—has paid dividends in terms of flexibility and efficiency.

As rates of in-office attendance creep up, our Commute Services group can quickly adjust and stay on top of demand, leaving us better positioned to meet our company’s ambitious sustainability goals.

And now, we’re embracing the Microsoft vision of an AI-powered future by adding agentic, predictive capabilities to our commuting tools, which makes booking a shuttle, Connector bus, or other transportation option fast and easy for our workers.

“Figuring out their commute should not be a pain point for employees,” says Garima Gaurav, a senior product manager in Microsoft Digital, the company’s IT organization. “We’re harnessing our advanced technology and the power of AI to do the heavy lifting, so they don’t have to struggle to figure out how they’ll get to work or to a meeting in a different building.”

Upgrading the transportation experience

We’ve always had clear goals for the type of transportation program we wanted to bring to our employees.

“The first thing we think about is the rider experience,” says Esther Christoffersen, a senior manager with Puget Sound Commute Operations. “We want to deliver an experience that is centered around ease, flexibility, and choice. We start with the physical world, the environment that we live and work in, and then we think about the digital world that employees interface with.”

But our technology systems didn’t always make it easy to accomplish those goals. So we undertook the overhaul of our commute tools, implementing a modern UI that was more consistent with other Microsoft workplace applications. At the same time, this work allowed our engineers to transform the back-end management of our transportation system, using Microsoft Azure to give them better visibility and clearer ownership of operating data.

Better data and tools meant empowering riders with mobility features like a trip-planning function, push notifications, real-time ETAs, and live vehicle map tracking for our shuttle and Connector bus services.

“We had to think about what really matters,” Gaurav says. “That meant building something modern, real-time, and fast for riders. But we also wanted operational agility for the Commute Services team.”

Getting there with the help of an AI agent

With the right technology in place, these tools are ready for agentic AI—and it’s here. While they can still use our internal desktop or mobile platforms to book a ride to work or a different campus location, employees can now also opt for the Employee Self-Service (ESS) agent we’ve developed.

Jessie Go, a technical program manager in the Real Estate and Facilities group, emphasizes the fluid, end-to-end experience that this AI agent can provide to commuters.

“If I’m a new employee, I want to know my commute options,” Go says. “I go into ESS and ask, ‘What are my options to get to campus?’ The agent gives me a list of commuter choices, and one is the Connector bus. I then ask it to help me book a Connector; the agent pulls up a booking tool and I schedule my Connector ride. It’s so much simpler.”

West smiles in a portrait photo.

“The ESS tool is kind of a one-stop-shop Copilot agent, aimed at helping our people with all of their work tasks.”

Becky West, principal group product manager, Microsoft Digital

ESS not only offers a user-friendly Copilot Chat interface, but also the potential to understand the rider’s transportation history and preferences.

“It allows users to have a more contextual, conversational experience,” says Ram Kuppaswamy, a principal software engineering manager in Microsoft Digital. “They can just say, ‘Book me a connector,’ and the agent can suggest options based on their previous ride history. It also offers one-click booking, which is used in 40% of all bookings today. It saves users a ton of time, and they really love it.”

It’s all part of making routine tasks frictionless and more efficient for Microsoft employees.

“We’re bringing the experience right to where the employees live, in the AI chat interface,” Gaurav says. “This way they can get all the information they need in one place, rather than 10 different places.”

Of course, ESS can do more than just help with transportation needs—it’s been rolled out company-wide, with the ability to answer employee questions and solve problems relating to anything from their benefits to IT issues to dining options.

“The ESS tool is kind of a one-stop-shop Copilot agent, aimed at helping our people with all of their work tasks,” says Becky West, a principal group product manager in Microsoft Digital. “In the Real Estate space, that might be help with booking a shuttle or seeing what’s for lunch in the cafeteria. In other areas, it might be getting assistance with questions about vacation policy, or what’s wrong with their computer.”

Keeping sustainable transportation top-of-mind

At Microsoft, we take sustainability seriously. Our transportation program is a key component of that effort.

“We offer shared transportation to employees to reduce single-occupancy vehicles on the road, and we’re transitioning our fleet to electric vehicles,” Christoffersen says. “It’s part of our corporate commitment to be carbon negative by 2030.”

Christoffersen smiles in a portrait photo.

“Our global headquarters in Redmond is the size of a small city, with transportation services that help employees get to, from, and around our campus. We continuously look at the data so that we balance the rider experience with running an efficient operation.”

Esther Christoffersen, senior manager, Puget Sound Commute Operations

Microsoft provides electric vehicle (EV) charging stations at many Puget Sound campus locations for employee use. We also offer transit passes, guaranteed rides home, and other rideshare options, giving commuters maximum flexibility.

The easier it is to access these services, the more single-occupancy vehicles we can remove from the region’s roads, which means less air pollution and traffic congestion for everyone.

Because Microsoft is one of the largest employers in the state of Washington, these efforts can make a real difference.

“Our global headquarters in Redmond is the size of a small city, with transportation services that help employees get to, from, and around our campus,” Christoffersen says. “We continuously look at the data so that we balance the rider experience with running an efficient operation.”

Looking toward the future

As AI-powered tools like the Employee Self-Service agent get even better and more broadly used across the company, our transportation services will continue to improve. We hope these services will eventually be available in other regions as well.

“The overall goal is to expand the discoverability of commute information to our workers around the globe,” Gaurav says. “So, whether an employee is in Silicon Valley, India, or somewhere else, they will be able to ask the AI tool for transportation options where they are located and get assistance. It’s a work in progress for us.”

Key takeaways

If you are looking to improve the transportation experience for employees at your organization, here are some important things to remember:

  • Keep your overarching goals front and center. Ease, flexibility, and choice are the three main principles we focus on when aiming to give our employees a first-class transportation experience, and those principles apply to any employee experience we build in Microsoft Digital.
  • Think both physically and digitally. Digitally transforming a real-world service starts with the physical experience; finding the intersection between the physical and the digital creates better outcomes for users.
  • Meet riders where they are. At Microsoft, this includes offering mobile, desktop, and agentic interfaces, letting our employees choose what works best for them.
  • The better the data, the better your service. Gathering relevant data about demand, usage, and satisfaction allows you to produce insights that lead to improved services.
  • Use AI to increase personalization. We’re developing an AI agent that knows more about our employees, which allows for easy customization and seamless, pain-free experiences with commute services.

The post Making transportation seamless and efficient with the power of data and AI at Microsoft appeared first on Inside Track Blog.

]]>
20462
Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure http://approjects.co.za/?big=insidetrack/blog/modernizing-it-infrastructure-at-microsoft-a-cloud-native-journey-with-azure/ Thu, 04 Sep 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20125 Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team. At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed […]

The post Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure appeared first on Inside Track Blog.

]]>

Engage with our experts!

Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team.

At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed devices—runs on the Microsoft Azure cloud.

The company’s massive transition from traditional datacenters to a cloud-native infrastructure on Azure has fundamentally reshaped our IT operations. By adopting a cloud-first, DevOps-driven model, we’ve realized significant gains in agility, scalability, reliability, operational efficiency, and cost savings.

“We’ve created a customer-focused, self-serve management environment centered around Azure DevOps and modern engineering principles,” says Pete Apple, a technical program manager and cloud architect in Microsoft Digital, the company’s IT organization. “It has really transformed how we do IT at Microsoft.”

“Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”

Apple is shown in a portrait photo.
Pete Apple, technical program manager and cloud architect, Microsoft Digital

What it means to move from the datacenter to the cloud

Historically, our IT environment was anchored in centralized, on-premises datacenters. The initial phase of our cloud transition involved a lift-and-shift approach, migrating workloads to Azure’s infrastructure as a service (IaaS) offerings. Over time, the company evolved toward more of a decentralized, platform as a service (PaaS) DevOps model.

“In the last six or seven years we’ve seen a lot more focus on PaaS and serverless offerings,” says Faisal Nasir, a principal architect in Microsoft Digital. “The evolution is also marked by extensibility—the ability to create enterprise-grade applications in the cloud—and how we can design well-architected end-to-end services.”

Because we’ve moved nearly all our systems to the cloud, we have a very high level of visibility into our network operations, according to Nasir. We can now leverage Azure’s native observability platforms, extending them to enable end-to-end monitoring, debugging, and data collection on service usage and performance. This capability supports high-quality operations and continuous improvement of cloud services.

“Observability means having complete oversight in terms of monitoring, assessments, compliance, and actionability,” Nasir says. “It’s about being able to see across all aspects of our systems and our environments, and even from a customer lens.”

Decentralizing our IT services with Azure

As Microsoft was becoming a cloud-first organization, the nature of the cloud and how we use it changed. As Microsoft Azure matured and more of our infrastructure and services moved to the cloud, we began to move away from IT-owned applications and services.

The strengths of the Azure self-service and management features means that individual business groups can handle many of the duties that Microsoft Digital formerly offered as an IT service provider—which enables each group to build agile solutions to match their specific needs.

“Our goal with our modern cloud infrastructure continues to be a solution that transforms IT tasks into self-service, native cloud solutions for monitoring, management, backup, and security across our entire environment,” Apple says. “This way, our business groups and service lines have reliable, standardized management tools, and we can still maintain control over and visibility into security and compliance for our entire organization.”

The benefits to our businesses of this decentralized model of IT services include:

  • Empowered, flexible DevOps teams
  • A native cloud experience: subscription owners can use features as soon as they’re available
  • Freedom to choose from marketplace solutions
  • Minimal subscription limit issues
  • Greater control over groups and permissions
  • Better insights into Microsoft Azure provisioning and subscriptions
  • Business group ownership of billing and capacity management

“With the PaaS model, and SaaS (software as a service), it’s more DIY,” Apple says. “Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”

“The idea of centralized monitoring is gone. The new approach is that service teams monitor their own applications, and they know best how to do that.”

Delamarter is shown in a portrait photo.
Cory Delamarter, principal software engineering manager, Microsoft Digital

Leveraging the power of Azure Monitor

Microsoft Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from cloud and on-premises environments. Across Microsoft, we use Azure Monitor to ensure the highest level of reliability for our services and applications.

Specifically, we rely on Azure Monitor to:

Create visibility. There’s instant access to fundamental metrics, alerts, and notifications across core Azure services for all business units. Azure Monitor also covers production and non-production environments as well as native monitoring support across Microsoft Azure DevOps.

Provide insight. Business groups and service lines can view rich analytics and diagnostics across applications and their compute, storage, and network resources, including anomaly detection and proactive alerting.

Enable optimization. Monitoring results help our business groups and service lines understand how users are engaging with their applications, identify sticking points, develop cohorts, and optimize the business impact of their solutions.

Deliver extensibility. Azure Monitor is designed for extensibility to enable support for custom event ingestion and broader analytics scenarios.

Because we’ve moved to a decentralized IT model, much of the monitoring work has moved to the service team level as well.

“The idea of centralized monitoring is gone,” says Cory Delamarter, a principal software engineering manager in Microsoft Digital. “The new approach is that service teams monitor their own applications, and they know best how to do that.”

Patching and updating, simplified

Moving our operations to the cloud also means a simpler and more automated approach to patching and updating. The shift to PaaS and serverless networking has allowed us to manage infrastructure patching centrally, which is much more scalable and efficient. The extensibility of our cloud platforms reduces integration complexity and accelerates deployment.

“It depends on the model you’re using,” Nasir says. “With the PaaS and serverless networks, the service teams don’t need to worry about patching. With hybrid infrastructure systems, being in the cloud helps with automation of patching and updating. There’s a lot of reusable automation layers that help us build end-to-end patching processes in a faster and more reliable manner.”

Apple stresses the flexibility that this offers across a large organization when it comes to allowing teams to choose how they do their patching and updating.

“In the datacenter days, we ran our own centralized patching service, and we picked the patching windows for the entire company,” Apple says. “By moving to more automated self-service, we provide the tools and the teams can pick their own patching windows. That also allowed us to have better conversations, asking the teams if they want to keep doing the patching or if they want to move up the stack and hand it off to us. So, we continue to empower the service teams to do more and give them that flexibility.”

Securing our infrastructure in a cloud-first environment

As security has become an absolute priority for Microsoft, it’s also been a foundational element of our cloud strategy.

Being a cloud-first company has made it easier to be a security-first organization as well.

“The cloud enables us to embed security by design into everything we build,” Nasir says. “At enterprise scale, adopting Zero Trust and strong governance becomes seamless, with controls engineered in from the start, not retrofitted later. That same foundation also prepares us for an AI-first future, where resilience, compliance, and automation are built into every system.”

Cloud-native security features combined with integrated observability allow for better compliance and risk management. Delamarter agrees that the cloud has had huge benefits when it comes to enhancing network security.

“Our code lives in repositories now, and so there’s a tremendous amount of security governance that we’ve shifted upstream, which is huge,” Delamarter says. “There are studies that show that the earlier you can find defects and address them, the less expensive they are to deal with. We’re able to catch security issues much earlier than before.”

“There are less and less manual actions required, and we’re automating a lot of business processes. It basically gives us a huge scale of automation on top of the cloud.”

Nasir is shown in a portrait photo.
Faisal Nasir, principal architect, Microsoft Digital

We use Azure Policy, which helps enforce organizational standards and assess compliance at scale using dashboards and other monitoring tools.

“Azure Policy was a key part of our security approach, because it essentially offers guardrails—a set of rules that says, ‘Here’s the defaults you must use,’” Apple says. “You have to use a strong password, for example, and it has to be tied to an Azure Active Directory ID. We can dictate really strong standards for everything and mandate that all our service teams follow these rules.”

AI-driven operations in the cloud

Just like its impact on the rest of the technology world, AI is in the process of transforming infrastructure management at Microsoft. Tasks that used to be manual and laborious are being automated in many areas of the company, including network operations.

“AI is creating a new interface of agents that allow users to interact with large ecosystems of applications, and there’s much easier and more scalable integration,” says Nasir. “There are less and less manual actions required, and we’re automating a lot of business processes. Microsoft 365 Copilot, Security Copilot, and other AI tools are giving us shared compute and extensibility to produce different agents. It basically gives us a huge scale of automation on top of the cloud.”

Apple notes that powerful AI tools can be combined with the incredible amount of data that the Microsoft IT infrastructure generates to gain insights that simply weren’t possible before.

“We can integrate AI with our infrastructure data lakes and use tools like Network Copilot to query the data using natural language,” Apple says. “I can ask questions like, ‘How many of our virtual machines need to be patched?’ and get an answer. It’s early, and we’re still experimenting, but the potential to interact with this data in a more automated fashion is exciting.”

Ultimately, Microsoft has become a cloud-first company, and that has allowed us to work toward an AI-first mentality in everything we do.

“Having a complete observability strategy across our infrastructure modernization helps us to make sure that whatever changes we’re making, we have a design-first approach and a cloud-first mindset,” Nasir says. “And now that focus is shifting towards an AI-first mindset as well.”

Key takeaways

Here are some of the benefits we’ve accrued by becoming a cloud-first IT organization at Microsoft:

  • Transformed operations: By moving from our legacy on-premises datacenters, through Azure’s infrastructure as a service (IaaS) offerings, and eventually to a platform as a service (PaaS) DevOps model, we’ve reaped great gains in reliability, efficiency, scalability, and cost savings.
  • A clear view: With 98% of our organization’s IT infrastructure running in the Azure cloud, we have a huge level of observability into our systems—complete oversight into network assessment, monitoring, compliance, patching/updating, and many other aspects of operations.
  • Empowered teams: Operating a cloud-first environment allows us to have a more decentralized approach to IT infrastructure. This means we can offer our business groups and service lines more self-service, cloud-native solutions for monitoring, management, patching, and backup while still maintaining control over and visibility into security and compliance for our entire organization.
  • Seamless updates: The shift to PaaS and serverless networking has enabled a more planned and automated approach to patching and updating our infrastructure, which produces greater efficiency, integration, and speed of deployment.
  • Dependable security: Our cloud environment has allowed us to implement security by design, including tighter control over code repositories and the use of standard security policies across the organization with Azure Policy.
  • Future-proof infrastructure: As we shift to an AI-first mindset across Microsoft, we’re using AI-driven tools to enhance and maintain our native cloud infrastructure and adopt new workflows that will continue to reap dividends for our employees and our organization.  

The post Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure appeared first on Inside Track Blog.

]]>
20125
Smarter labs, faster fixes: How we’re using AI to provision our virtual labs more effectively http://approjects.co.za/?big=insidetrack/blog/smarter-labs-faster-fixes-how-were-using-ai-to-provision-our-virtual-labs-more-effectively/ Thu, 24 Jul 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19628 Microsoft Digital stories Providing technical support at an enterprise of our size here at Microsoft is a constant balancing act between speed, quality, and scalability. Systems grow more complex, user expectations continue to rise, and traditional support models often struggle to keep up. Beyond the usual apps and systems everyone uses, many of our employees […]

The post Smarter labs, faster fixes: How we’re using AI to provision our virtual labs more effectively appeared first on Inside Track Blog.

]]>

Microsoft Digital stories

Providing technical support at an enterprise of our size here at Microsoft is a constant balancing act between speed, quality, and scalability. Systems grow more complex, user expectations continue to rise, and traditional support models often struggle to keep up. Beyond the usual apps and systems everyone uses, many of our employees require virtual provisioning for diverse tasks in many of our businesses. Supporting these virtualized environments is a special challenge.

To meet the growing demand for virtual lab usage across the organization, we turned to AI, not just to automate support responses but to fundamentally rethink how user issues are identified and resolved. This vision came to life through the MyWorkspace platform, where we in Microsoft Digital, the company’s IT organization, introduced a domain-specific AI assistant to streamline how we empower our employees to deploy new virtual labs.

The results have been dramatic: what was once a slow, manual process is now fast, efficient, and frictionless.

But the benefits extend beyond faster resolution times. This transformation represents a new approach to enterprise support—one that uses AI not just as a tool for efficiency, but as a strategic enabler. By embedding intelligence into the support experience, we’re turning complexity into a competitive advantage.

Scaling support in a high-demand environment

MyWorkspace is our internal platform for provisioning virtual labs. Originally developed to support internal testing, diagnostics, and development environments, it has since grown into a critical resource used by thousands of engineers and support personnel across the company.

Scaling the platform infrastructure was straightforward—adding capacity for tens of thousands of virtual labs was a technical challenge we could solve with ease, thanks to our Microsoft Azure backbone. As usage grew, the real strain didn’t show up in CPU load or storage limit, but rather in the support queue—every few months, a new wave of users was onboarded to MyWorkspace: partner teams, internal engineers, and external vendors. These new users, often with minimal experience of the platform, needed fast access and guidance from support.

The questions, though simple, piled up quickly.

Several Tier 1 support engineers repeatedly encountered the same questions from users, such as how to start a lab, what an error meant, and which lab to use for a particular test. These weren’t complex technical issues—they were basic, repetitive onboarding questions that represented a huge opportunity to introduce automation.

“We also found that there were a lot of users who found more niche issues, and those issues had been solved either by our community or by ourselves. In fact, we had a dedicated Teams channel specific to customer issues, and we found that there was a lot of repetition and that other customers were facing similar issues, and we did have a bit of a knowledge base in terms of how to solve these issues.”

A photo of Deans.
Joshua Deans, software engineer, Microsoft Digital

Unblocking a bottleneck with AI

Our support team set out to tackle a familiar but costly challenge: high volumes of low-complexity tickets that consumed valuable time without delivering meaningful impact. Instead of treating this as an unavoidable burden, we saw an opportunity to turn it into a self-scaling solution. If the same questions were being asked repeatedly—and the answers already existed in documentation, internal threads, or institutional knowledge—then an AI system should be able to surface those answers instantly, without human intervention.

“We also found that there were a lot of users who found more niche issues, and those issues had been solved either by our community or by ourselves,” says Joshua Deans, a software engineer within Microsoft Digital. “In fact, we had a dedicated Teams channel specific to customer issues, and we found that there was a lot of repetition and that other customers were facing similar issues, and we did have a bit of a knowledge base in terms of how to solve these issues.”

That insight led the MyWorkspace team to begin building what would become a transformative AI assistant: an automated support layer purpose-built for the MyWorkspace platform. Unlike traditional chatbots that rely on scripted responses or rigid decision trees, this assistant would leverage generative AI trained on a rich dataset of real-world support conversations, internal FAQs, and official documentation.

“So that’s where we found this opportunity to turn this scaling challenge into a scaling advantage, with help from AI. We took all those historical conversations of tier one staff helping new users—trained our AI to provide user education based on that training—and saved our Tier 1 staff from answering potential tickets.”

Vikram Dadwal, principal software engineering manager, Microsoft Digital

The result was a context-aware, responsive system capable of resolving common issues in seconds—not hours or days—dramatically easing the load on support teams while improving the user experience.

Built on Azure and Semantic Kernel

MyWorkspace’s core infrastructure is fully built on Azure services. At any given moment, it manages tens of thousands of virtual machines, scaling up and down with demand. That elasticity, combined with our internal developer tooling and AI orchestration capabilities, provided the perfect environment for an AI-powered support layer.

“So that’s where we found this opportunity to turn this scaling challenge into a scaling advantage, with help from AI,” says Vikram Dadwal, a principal software engineering manager within Microsoft Digital. “We took all those historical conversations of tier one staff helping new users—trained our AI to provide user education based on that training—and saved our Tier 1 staff from answering potential tickets.”

To build the assistant, the team used our Microsoft open-source framework, Semantic Kernel. Designed for generative AI integration, Semantic Kernel allows engineers to create prompt-driven, modular systems that can interact with large language models (LLM) without vendor lock-in.

This approach gave the team several advantages:

  • Flexibility in choosing and switching between LLM providers.
  • Fine-grained control over how prompts were structured and updated.
  • Extensibility through plugins and actions that tie the assistant into the broader ecosystem.

Crucially, the assistant was designed to be part of the platform’s architecture, capable of operating at the same level of scale and responsiveness as the labs it supported. Also, the assistant was initialized with a well-scoped system prompt, limiting its responses strictly to the MyWorkspace domain.

“On average, we measured these interactions at around 20 minutes from ticket submission to problem resolution. Now compare that with a 30-second AI interaction for resolving the same class of issues—that’s a 98% reduction in resolution time, a number we’ve validated with our support team and continue to track.”

Nathan Prentice, senior product manager, Microsoft Digital

Shifting from tickets to conversations

Whether users had questions about lab types, needed clarification on configuration details, or sought guidance during onboarding, the AI provided accurate, interactive responses without requiring human escalation. The experience was both faster and significantly better. Support engineers saw a noticeable reduction in repeat tickets, as common issues were resolved on the spot. Onboarding friction decreased, and users were confident that they could get the answers they needed instantly—no ticket, no delay, no need to track a support contact.

“On average, we measured these interactions at around 20 minutes from ticket submission to problem resolution,” says Nathan Prentice, a senior product manager within Microsoft Digital. “Now compare that with a 30-second AI interaction for resolving the same class of issues—that’s a 98% reduction in resolution time, a number we’ve validated with our support team and continue to track.”

Smart, interactive, and intuitive

Our Microsoft Digital team has recently implemented a new version of the MyWorkspace AI assistant that includes several major enhancements. The assistant now features adaptive cards, polished layouts, and a Microsoft 365 Copilot-aligned user experience, making it feel familiar and trustworthy for internal teams. The assistant can now distinguish between a question and an action. If a user says, “Start a SharePoint lab,” it responds with an interactive card and begins provisioning, bridging the gap between passive support and active enablement.

“One of the primary bottlenecks we previously faced in creating an AI solution to address frequently asked user questions was the lack of technology capable of generating accurate answers for complex technical queries and understanding nuanced user input. With the availability of Azure OpenAI models, we were able to effectively overcome this challenge, enabling our AI solution to deliver precise and context-aware responses at scale.”

A photo of Nair.
Anjali Nair, senior software engineer, Microsoft Digital

To guide our employees and improve discoverability, the assistant offers recommended prompts—just like Copilot does—helping new users understand what they can ask and how to get started.

Users can now rate responses, giving a thumbs up or down. These signals are aggregated and reviewed by the engineering team, ensuring continuous improvement and fine tuning over time.

Intelligent provisioning with multi-agent orchestration 

At Microsoft Digital, we’re reimagining how labs are provisioned by integrating AI-driven intelligence into the process. Traditionally, users are expected to know exactly what kind of lab environment they need. But in complex virtualization and troubleshooting scenarios, these assumptions often fall short. Should a user troubleshooting hybrid issues with Microsoft Exchange spin up a basic Exchange lab, or one that includes Azure AD integration, conditional access policies, and hybrid connectors? To eliminate this guesswork, our team is building a multi-agent system powered by the Semantic Kernel SDK multi-agent framework. This system interprets the user’s support context—often expressed in natural language—and automatically provisions the most relevant lab environment.

For example, a user might say, “I’m seeing sync issues between SharePoint Online and on-prem,” and the assistant will orchestrate the creation of a tailored lab that replicates that exact scenario, enabling faster diagnosis and resolution. With agent orchestration, each agent in the system is specialized: one might handle context interpretation, another lab configuration, and another cost optimization. These agents collaborate to ensure that the lab not only meets technical requirements but is also cost-effective. By leveraging telemetry and historical usage data, the system can recommend leaner configurations—such as using ephemeral VMs, auto-pausing idle resources, or selecting lower-cost SKUs—without compromising diagnostic fidelity. This intelligent provisioning framework is designed to scale, adapt, and continuously learn from usage patterns.

“One of the primary bottlenecks we previously faced in creating an AI solution to address frequently asked user questions was the lack of technology capable of generating accurate answers for complex technical queries and understanding nuanced user input,” says Anjali Nair, a senior software engineer within Microsoft Digital. “With the availability of Azure OpenAI models, we were able to effectively overcome this challenge, enabling our AI solution to deliver precise and context-aware responses at scale.”

With multi-agent orchestration, we’re taking a step towards a future where lab environments are not just automated, but intelligently orchestrated, context-aware, and cost-optimized—empowering engineers to focus on solving problems, not setting up infrastructure.

Scaling support without scaling headcount

The MyWorkspace assistant is a powerful example of how enterprise support can evolve through intelligence. By embedding AI into the support experience, we’ve turned complexity into a competitive edge—reshaping knowledge work and operations through AI’s problem-solving capabilities. As Microsoft advances as a Frontier Firm, MyWorkspace shows how we can scale support on demand, with intelligence built in. Routine queries are offloaded to AI, freeing Tier 1 teams to focus on critical issues and giving Tier 2 engineers space to innovate. Most importantly, support now scales with user demand—not headcount.

But this system does more than just respond—it learns. Every interaction becomes a data point. Each resolved issue feeds back to the assistant, sharpening its accuracy and expanding its knowledge. What started as a reactive Q&A tool is now growing into a proactive orchestrator that surfaces insights and points users to solutions, resolving issues before they ever become tickets.

“We have a lot more telemetry now, so users can provide feedback to our responses—for example, thumbs up or thumbs down feedback,” Deans says. “And we can actually view where the model is giving incorrect or inappropriate information, and we can use that to make adjustments to the prompt provided to the model.”

In this model, support becomes a seamless extension of the user experience. With the right AI architecture in place, it transforms a cost center into a strategic asset. The MyWorkspace assistant fulfills its role as an embedded, intelligent teammate—delivering answers, driving actions, and continuously improving over time.

Ultimately, our journey with MyWorkspace shows that meaningful AI adoption doesn’t have to begin with sweeping transformation. Sometimes, it starts with a helpdesk queue, a recurring issue, and the choice to build something smarter—something that learns, adapts, and empowers at every step.

Key takeaways

Here are some of our top insights from boosting our internal deployment of MyWorkspace with AI and continuous improvement.

  • Start small and specific. Focus on a defined domain—like MyWorkspace—and use existing support logs to train your assistant.
  • Invest in AI infrastructure. Tools like Semantic Kernel provide flexibility, especially in enterprise settings where vendor neutrality and customization matter.
  • Design for trust. Align your assistant’s UI with well-known systems like Microsoft Copilot to build user confidence.
  • Don’t wait for perfection. Launch a V1, gather feedback, and make improvements. AI assistants get better over time if you let them learn.
  • Think outside the ticket queue. The future isn’t just faster support—it’s intelligent, anticipatory systems that eliminate friction before it begins.

The post Smarter labs, faster fixes: How we’re using AI to provision our virtual labs more effectively appeared first on Inside Track Blog.

]]>
19628
Eight steps for managing your support team content with AI tools http://approjects.co.za/?big=insidetrack/blog/eight-steps-for-managing-your-support-team-content-with-ai-tools/ Thu, 17 Jul 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19518 Microsoft Digital stories Like many large organizations, our support team develops thousands of troubleshooting guides, self-help resources, knowledge base articles, and process documents to guide our support engineers in helping customers with technical issues. As we have grown over the years here at Microsoft, and with the continuous release of new products and features, these […]

The post Eight steps for managing your support team content with AI tools appeared first on Inside Track Blog.

]]>

Microsoft Digital stories

Like many large organizations, our support team develops thousands of troubleshooting guides, self-help resources, knowledge base articles, and process documents to guide our support engineers in helping customers with technical issues.

As we have grown over the years here at Microsoft, and with the continuous release of new products and features, these document repositories have become incredibly large, sometimes unmanageable; they also occasionally contain outdated information.

Back in July 2022, our Modern Solutions & Support (MSS) Supportability team had an idea for semantic search. This tool would potentially allow our support engineers to go to a single location and enter a search query that would scan the vast support document repositories, returning the most relevant results.  

Over the course of the year, team members explored different solutions and had conversations that eventually led them to the world of generative AI. The team knew they were on the cutting edge of something exciting with lots of possibilities. Then, a few months later, ChatGPT was announced and flipped the world on its head.

“It was fascinating how fast things were changing,” says DJ Ball, a senior escalation engineer on the MSS Supportability team. “OpenAI was so new, and its features were growing so fast—from GPT 2.0 to 3.0 to ChatGPT to 4.0 almost overnight. Keeping up with the technology is a challenge and big opportunity.”

Joining the generative AI revolution

The team quickly shifted gears, secured subscriptions to Microsoft Azure OpenAI, and stumbled upon an internal GPT playground focused on handling enterprise content. This playground happened to be very similar to what the Supportability Team was designing, which made joining forces much easier.

“People thought we planned it, which we didn’t, but for once we felt we were ahead of the game,” says Sam Larson, a senior supportability PM on the MSS Supportability team.

Then the development of an AI-based solution really picked up speed. No one had done what they were attempting to do with this ChatGPT technology, so they had to learn by digging in, playing around, and seeing what would happen. The MSS Supportability team provided continual feedback to the engineering team about what worked and what didn’t. This helped shape the product that became  Microsoft Azure AI Studio, a solution that makes integrating external data sources into the Microsoft Azure OpenAI Service simple.

With this development, the team was allowed to create their own private chat workspace, which they called Modern Work GPT (MWGPT). The Modern Solutions & Support Supportability team started by curating content from different sources for the Teams product and injecting that into the large language model (LLM).

By taking advantage of Azure Cognitive Search to help inject and chunk the documentation into smaller components, they were able to test the results with the help of subject matter experts (SMEs) across the Teams support business. The team has since expanded the tool to include all Modern Work Technology solutions—estimated to be more than 300,000 pieces of content for 34 products. Along the way, they’ve learned a lot about content curation, prompts, use case scenarios, and how LLMs work.

The team quickly realized that they needed more people to help them test the tool. Today, they continue to refine the content, updating its capabilities and testing it for accuracy.

One of the early volunteers was Mayte Cubino, now an AI strategy and programs director for the MSS organization. An engineer at heart, Cubino was excited by the rumblings she was hearing across the business about the possibilities to use ChatGPT in supporting customers.

From a support delivery standpoint, two questions stood out for Cubino:

  • How could we ensure a successful deployment of this new technology across all our support engineers?
  • How could this technology be the most helpful, without creating extra work for anyone?

Cubino started to document the content process. She helped the team see that some things were non-negotiable and outlined the steps they needed to focus on to ensure accuracy and responsible engagement with the model.

Cubino in a photo.

“There’s no question that there are a lot of variables that come into play that we are exposing our engineers to, and quality is a non-negotiable factor,” Cubino says. “We owe it to our customers, who turn to our support engineers to help them solve their most challenging technical problems.”

Unpacking our ‘8 Ds’ framework

This documentation process led to what became our “8 Ds” framework. These steps are designed to help develop agentic AI systems that don’t simply respond to inquiries, but also observe, decide, adapt, and learn.

1. Diversity: Ensuring diverse datasets and perspectives in AI development

Agentic AI that supports everyone must be built by—and for—everyone. That means training on linguistically and culturally rich data, designing with accessibility in mind, and continuously checking for exclusionary patterns.

When developing AI agentic models, begin by securing a breadth of viewpoints from different disciplines, demographics, and experiences. Diversity at the outset shapes every later choice, from dataset curation to interface tone, and is the surest guardrail against hidden bias and narrow problem framing.

Diversity is not a feature—it’s a foundation. Neurodiverse, multilingual, and underrepresented voices must be heard and embedded into the system. From pronoun recognition to language variety, a truly inclusive AI model builds trust not through perfection, but through recognition.

2. Data: The foundation of AI, ensuring quality and relevance

With a diversity lens in place, gather the data the agent will learn from. In the landscape of AI-powered support, everything begins with data. But not just any data. It must be rich, diverse, and contextual—support tickets, chat logs, voice transcripts, and knowledge base entries all serve as the lifeblood for AI model training.

Data isn’t merely an input—it’s a reflection of the lived customer experience. By curating high-quality, current, and representative datasets, organizations can shape AI models that understand nuanced intent, avoid bias, and respect privacy. Quality, representativeness, and clear provenance are critical here; rush this step and every downstream phase inherits the flaws.

3. Design: Crafting intuitive user experiences and efficient algorithms

In the design phase, translate the problem and the data into an architecture, user journey, and success criteria. Good design turns raw inputs into a coherent blueprint, aligning technical possibilities with real-world user needs. Ideation, testing, and prototyping are activities that govern this phase.

When preparing for AI implementation, you should identify how you can best integrate AI with your existing technology and services. It’s particularly useful to consider the following topics:

Content creation

Determining how you collect and store your data is vital. Among the key considerations are:

  • Data collection pipelines that support reliable model performance and a clean input for modelling, such as batch upload or continuous upload.
  • Storing your data in databases; the type of database you choose may change, depending on the complexity of the project and the different data sources required.
  • Data mining and data analysis of the results.
Weum smiles as he stands in the Microsoft Fargo office.

“Inputting all of your content into an AI model is like taking a flashlight and shining it into every dark corner of your content,” says Jason Weum, director of supportability on the MSS Supportability team. “You quickly realize what’s outdated.”

Responsible AI review

Plan for security before you start: Make sure you design your system to keep the data secure, designing for responsible AI and compliance with General Data Protection Regulation (GDPR) and other policies and standards. For example, Microsoft’s Responsible AI principles include fairness, reliability and safety, privacy and security, inclusiveness, and transparency and accountability.

Other design elements

Be sure to think about how to use technology efficiently and plan for how to train your AI model:

  • Evaluate the platforms your team uses to collate the technology used across the AI project to help speed up AI deployment.
  • Consider the network and memory resources your team needs to train your model, in addition to the ongoing costs. Writing and training algorithms can take a lot of time and computational power.

4. Development: Building and refining the AI model

It’s time to build—to code the pipelines, train the models, and craft the UX. Since design decisions are settled, developers can iterate quickly, without constantly revisiting fundamentals. During the development phase, you need to create and test grounding data sets.

This is a highly iterative process requiring substantial amounts of data. The following steps are critical during this phase:

  • Content preparation: This includes content preparation and data quality assessment using a combination of accuracy, bias, completeness, uniqueness, timeliness, validity, and consistency.
  • Content ingestion: This involves ingestion of curated data content, using the required formatting for the model. For example, larger documents should be chunked into smaller sections before ingestion.
  • Fine-tuning: We fine-tune the model to distinguish between (for example) a refund request and a product bug, while embedding empathy, precision, and flexibility into its responses. At this stage, we also harness tools like reinforcement learning and simulation environments to elicit a desired response from the model or to prevent it from generating certain types of output. The prompt can also be appended with grounding data from the curated content. This is an iterative process and may require several rounds of testing to achieve the desired results.

5. Debugging: Identifying and resolving issues and errors

Rigorous and uncompromised testing and training are crucial before you proceed towards deployment, but this can be a time-intensive process. Even the best-built models can drift. Here are some best practices for dealing with this challenge:

  • Systematically test the models, trace any errors, and harden performance. Catching edge-case failures before the agent starts making real decisions is far cheaper—and safer—than patching a live system.
  • Debugging an agentic AI model is a continual discipline, not a one-time task. It requires end-to-end monitoring to check for hallucinations, regressions, and unintended behaviors. We rely on telemetry, customer satisfactions scores, and human-in-the-loop feedback to expose blind spots in the model.
  • Fixing the bug isn’t enough: trace its roots, learn from the breakdown, and harden the system for what’s next.

Before you deploy and use your model, you need to understand whether it’s actually delivering the kind of results you want. Check if the results are accurate and see if the data you’re loading into the model will keep these models consistent and relevant over time. Weak, old data can create model drift, leading to inaccurate outcomes.

In this phase, consider these elements:

  • Responsible development and diagnosis: This is an important stage in building responsible AI systems. Key factors include developing proper data collection and handling practices, ensuring fairness in performance and representation, boosting transparency through validating citations, monitoring security and privacy, increasing accountability by including author contact info, and emphasizing inclusiveness by incorporating a feedback process into pre-deployment validation.
  • Validate the model deployment:­ After the AI model has been trained, it should be tested to ensure that it provides accurate responses and not hallucinations. The testing should be conducted in a controlled environment, and the model’s responses should be compared to the approved documents and data.

Testing the AI model throughout the process is critical to mitigate against any problems—such as overfitting and underfitting—that could undermine your model’s effectiveness once deployed.

  • Overfitting refers to an AI model that models the training data too well. This happens when a model learns the detail and noise in the training data to the extent that it negatively impacts its performance on new data—the noise or random fluctuations in the training data are picked up and erroneously adopted as concepts.
  • Underfitting refers to a model that can neither model the training data nor generalize to new data. An underfit machine learning model is not a suitable model; this will be obvious, as it will have poor performance on the training data. The remedy is to move on and try alternate machine learning algorithms.

6. Decision making: Empowering the AI model to make informed choices

Agentic AI isn’t just defined by its ability to answer questions, but also by the ability to decide.

Once you have a stable core, focus on the agent’s reasoning layer: policies, guardrails, evaluation metrics, and AI explainability (explaining how the model works and its expected output). This is where you confirm that the agent not only works, but that it behaves responsibly under the different types of scenarios it might face.

From interpreting tone to recognizing when to escalate, AI models must make real-time choices that echo human judgment. This requires context awareness, confidence scoring, and ethical constraint modeling. The model learns to prioritize factors such as fairness, customer sentiment, and organizational policies. For example, if a customer calls with a question regarding their bill, the model, might choose to respond first with compassion rather than procedure. In doing so, it doesn’t just act on behalf of your brand, it embodies it.

7. Deployment: Implementing the AI model into real-world applications

Deployment is defined as the process through which you integrate your AI model into an existing production environment to obtain effective business decisions based on data. It’s one of the last steps in the framework and should be preceded by a SME and validation-team sign-off process.

Rolling out agentic AI across support channels requires orchestration: integrating with CRMs, safeguarding compliance, conducting responsible AI reviews, configuring escalation protocols, and preparing for edge cases. We begin with pilots and cohorts, gradually expanding while listening closely to real-world signals.

This phase involves integrating the AI model into the service’s decision-making process and using the live data for the AI model to make predictions. When launched, it’s very important to continuously evaluate the AI model to ensure it still meets the business objectives, and the performance is at the level required.

This ensures the AI model’s performance is in line with the modelling phase and helps you identify when to retrain the model. It can also help you feel confident in using, interpreting, and challenging any outputs or insights generated by the AI model.

8. Documentation: Keeping comprehensive records for transparency and future reference

After completing the typical workflow with steps like data ingestion, pre-processing, model building and evaluation, and deployment, it’s time to document.

Documentation is the connective tissue that links creators, regulators, and users. Internally, it guides engineering, legal, and operations teams through the model’s capabilities, boundaries, and known issues. Externally, it reassures customers—clarifying when AI is in use, offering transparency in how decisions are made, and explaining how they can opt out.

Think of documentation not as paperwork, but as a living record that ensures every decision made by the AI can be understood, audited, and improved upon. Close the loop with thorough public-facing and internal documentation. While documents should evolve throughout the project, finalizing them ensures they capture the solution as actually shipped, which is vital for maintainability, audits, and future work.

Other considerations

Because agentic AI models are so new, the team is aware that they are doing important, pioneering work that will hopefully spread across the industry.

Smith smiles

“As we enter the agentic era of AI, we recognize the opportunity and importance of doing this responsibly,” says Ross Smith, a support leader at Microsoft. “We hope others can build on our lessons learned and improve their own support processes using AI tools.”

Starting to think about content support in this way led Smith to have conversations across Microsoft, exploring what other teams were doing, learning how to integrate best practices into the support solutions, and drilling down on a strategy for building responsible AI models from the start.

The team wanted to create a model that was responsible-AI ready. This required understanding its potential impacts—both beneficial and harmful—on people and society, and then taking the appropriate measures to mitigate anticipated harms and prepare responses for unanticipated ones.

Another consideration was how we measured success, not just of the model itself, but also how it impacted the business based on different use-case scenarios. Knowing that support engineers would be using the model to assist with administrative-type tasks (such as email and auto-summarization) and technical tasks (such as troubleshooting, learning, and debugging assistance) helped the team develop a set of metrics they could track to see how the model assisted the engineers, impacted their productivity, and contributed to our customer’s experience.

The application of the “8 Ds” framework can easily reach well beyond the technical support use case scenario to include a host of other company disciplines, including human resources, finance, sales, and legal.

“Humans and machines are more powerful working together than either one alone,” Smith says. “Those that really embrace and explore this new technology will be ready for the new roles that will be needed to make it successful.”

Qualified and experienced prompt engineers, strong content curators, and responsible AI experts will soon be in high demand as more companies employ AI technologies in their own operations.

What’s next

The team is looking forward to many new opportunities in the AI space, including:

  • Using Copilot in Dynamics 365 Customer Service and applying what they learn directly to the product. Increased use of Copilot by all Microsoft support engineers will help us continue to improve the Copilot experience and product performance.
  • Carrying on the work with Modern Work GPT to help improve the experiences our customers can have using the new Microsoft Azure AI Studio.
  • Continuing to learn, refine, and build new AI models, embracing improvements in the underlying AI technology to deliver better results to support engineers; this will also facilitate excellent customer support experiences.

The world is entering a new era of human and machine collaboration. It’s an exciting time in technology, as AI helps power monumental changes in how companies serve their customers.

“In this new era of AI-powered support, agents will play a critical role not just in resolving issues, but in shaping how support evolves,” says Diego Silva, a senior technical adviser and AI support specialist at Microsoft. “Empowering them with AI fosters an agentic mindset, where they become proactive problem-solvers and process improvers. With the right insights, we’re transforming support from reactive to strategic.”

Key takeaways

The MSS Supportability team continues to learn and adapt their AI models as new innovations surface. Here are some concepts they believe are critical to a successful AI tool deployment:

  • Know where your documents and data are.Often in large organizations there are multiple locations for storing different types of documents and data, including learning repositories, Microsoft SharePoint, internal troubleshooting repos, wikis, and much more.
  • Everything you put into the model matters. The curation and formatting of the documents and data ingested into the model is key. For documents, text or markdown formats continue to work best. If the data that you put into the model is wrong, conflicts across different sources, or is outdated, the model does not return quality answers.
  • Review and retrain. Stay up to speed on the latest information and training to keep your model from “drifting” and ensure the accuracy of the source documents and data ingested.
  • Gather feedback from subject matter experts. One of the best ways to improve the accuracy of the model is to ask users and subject matter experts for feedback on the results being returned. That way you can work to update your source content for higher-quality results.
  • Provide prompt training. People may not intuitively know how to prompt (interact with the model), so preparing your model in the development phase is key. Providing tips, tricks, and guidance for end users on how to ask questions (prompting) can also be helpful in the documentation phase.
  • Change management matters. Don’t take it for granted that everyone will see AI solutions as a huge opportunity. Change management activities to aid in adoption and knowledge helps drive excitement and use.
  • Embrace the future. This technology is moving fast, and everyone has an opportunity to learn, grow, and apply it to their business.

The post Eight steps for managing your support team content with AI tools appeared first on Inside Track Blog.

]]>
19518
Securing the borderless enterprise: How we’re using AI to reinvent our network security http://approjects.co.za/?big=insidetrack/blog/securing-the-borderless-enterprise-how-were-using-ai-to-reinvent-our-network-security/ Thu, 10 Jul 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19504 The modern enterprise network is complex, to say the least. Enterprises like ours are increasingly adopting hybrid infrastructures that span on-premises data centers, multiple cloud environments, and a diverse array of remote users. In this context, traditional security tools are still playing checkers while the malicious actors are playing chess. To make matters worse, attacks […]

The post Securing the borderless enterprise: How we’re using AI to reinvent our network security appeared first on Inside Track Blog.

]]>
The modern enterprise network is complex, to say the least.

Enterprises like ours are increasingly adopting hybrid infrastructures that span on-premises data centers, multiple cloud environments, and a diverse array of remote users. In this context, traditional security tools are still playing checkers while the malicious actors are playing chess. To make matters worse, attacks are increasingly enabled by AI tools.

That’s why here in Microsoft Digital, the company’s IT organization, we’re using a modern approach and toolset—including AI—to secure our network environment, turning complexity into clarity, one approach, tool, and insight at a time.

Leaving traditional network security behind

For years, traditional network security relied on a simple but increasingly outdated assumption: everything inside the corporate perimeter can be trusted. This model made sense when networks were static, users were on-premises, and applications lived in a centralized data center.

But that world is gone.

A photo of Venkatraman.

“Implicit trust must be replaced with explicit verification. That means rethinking how we monitor, how we respond, and how we design for resilience from the start.”

Raghavendran Venkatraman, principal cloud network engineering manager, Microsoft Digital

Today’s enterprise is dynamic, decentralized, and borderless. Hybrid work has become the norm. Cloud adoption is accelerating. Teams are globally distributed. Devices and data move constantly across environments. In this new reality, the network perimeter hasn’t just shifted—it has effectively vanished.

That’s where the cracks in legacy security models become impossible to ignore.

Visibility becomes fragmented. Security teams struggle to track what’s happening across a sprawling digital estate. Traditional monitoring tools focus on infrastructure uptime or device health—not on the actual experience of the people using the network. That disconnect creates blind spots, and blind spots create risk.

We know that this model no longer meets the needs of a modern, AI-powered enterprise. Every enterprise needs a new approach—one that assumes breach, enforces least-privilege access, and continuously verifies trust.

“Implicit trust must be replaced with explicit verification,” says Raghavendran Venkatraman, a principal cloud network engineering manager in Microsoft Digital. “That means rethinking how we monitor, how we respond, and how we design for resilience from the start.”

This shift is foundational to our security strategy. It’s not just about securing infrastructure—it’s about securing the experience. Because in a world where users, data, and threats are everywhere, trust has to be proved, not assumed.

Building a resilient and adaptive security strategy

To secure hybrid corporate networks effectively, organizations must go beyond traditional perimeter defenses. They need a comprehensive and adaptive security strategy—one that evolves with the threat landscape and aligns with the complexity of modern enterprise environments. The diversity of hybrid networks introduces new vulnerabilities and expands the attack surface. A static, one-size-fits-all approach simply doesn’t work anymore.

At Microsoft Digital, we’ve embraced a layered, cloud-first security model that integrates identity, access, encryption, and monitoring across every layer of the network. It’s embedded in everything we do. This model includes these key strategies, which we’ll expand upon in the following sections:

  • Adopting Zero Trust principles
  • Establishing identity as the new perimeter 
  • Integrating AI and machine learning
  • Enforcing network segmentation
  • Embracing continuous monitoring

Adopting Zero Trust principles

Zero Trust Architecture (ZTA) operates on a strict principle: “never trust, always verify.” That means no user, device, or application—whether it’s inside or outside the corporate network—is inherently trusted as they are in the traditional network security model.

A photo of McCleery.

“Zero Trust isn’t a product—it’s a mindset. It’s about assuming breach and designing defenses that minimize impact and maximize resilience.”

Tom McCleery, principal group cloud network engineer, Microsoft Digital

Every access request is evaluated against dynamic policies. These policies consider several factors—like user identity, device health, location, and how sensitive the data being accessed is. For example, if an employee tries to access a financial report from a corporate laptop at the office, they might get in, no problem. But that same request from a personal device in another country could get blocked or trigger extra authentication steps.

At the heart of ZTA are policy enforcement points that authorize every data flow. These checkpoints only grant access when all conditions are met, and they log every interaction for auditing and threat detection. This kind of granular control reduces the attack surface and limits lateral movement if there is a breach.

Adopting Zero Trust isn’t just a technical upgrade—it’s a strategic must. It boosts an organization’s ability to defend against modern threats like ransomware, insider attacks, and supply chain compromises.

“Zero Trust isn’t a product—it’s a mindset,” says Tom McCleery, a principal group cloud network engineer in Microsoft Digital. “It’s about assuming breach and designing defenses that minimize impact and maximize resilience.”

By embracing Zero Trust, we strengthen our security posture, lowers the risk of data breaches, and responds more effectively to emerging threats.

Establishing identity as the new perimeter

Identity is no longer just a component of security—it has become the new perimeter. Traditional security models focused on defending the network edge, assuming that everything inside the perimeter could be trusted. But in today’s hybrid and cloud-first environments, the perimeter has dissolved and that assumption is outdated and dangerous. Users, devices, and applications now operate across diverse locations and platforms, making perimeter-based defenses insufficient.

Identity-first security shifts the focus from securing the physical network to securing the identities—both human and machine—that interact with the network. This means every access request is treated as though it originates from an untrusted source, regardless of where it comes from. Whether it’s a remote employee logging in from a personal device or an automated workload accessing cloud resources, the system must verify who or what is making the request, assess the risk, and enforce least-privilege access across the user experience.

This approach enables organizations to implement more granular access controls. For example, a developer might be allowed to access a code repository but not production systems, and only during business hours from a managed device.  Similarly, a service account used by a Continuous Integration and Continuous Deployment CI/CD pipeline might be restricted to specific APIs and monitored for anomalous behavior. A CI/CD pipeline is an automated workflow that takes code from development through testing and into production.

By anchoring network security around verified identities, organizations reduce their attack surface and improve their ability to detect and respond to threats. This identity-centric model is not just a security enhancement—it’s a strategic shift that aligns with how modern enterprises operate.

Integrating AI and machine learning 

AI and machine learning (ML) are foundational pillars in our network security strategy. Intelligent automation and advanced analytics help us not only detect and respond to threats, but also continuously improve our security posture in an ever-changing landscape. Here’s how we’re using AI and ML in some critical aspects of our approach to modern network security:

  • Threat detection and intelligence. We deploy AI-powered monitoring tools that sift through billions of network signals and logs across our hybrid infrastructure. By applying sophisticated ML algorithms, we can identify abnormal behaviors such as unusual login attempts or unexpected data transfers that could indicate a potential breach. These insights allow our security teams to focus on the most critical alerts, reducing noise and accelerating incident investigation.
  • Automated response and containment. Through automation, our security systems can respond to threats in real time. For example, if our AI models detect suspicious activity on a device, automated workflows can immediately isolate the affected endpoint, block malicious traffic, or revoke access privileges, all without waiting for manual intervention. This rapid response capability is essential for minimizing the potential impact of attacks and protecting our critical assets.
  • Predictive analysis and proactive defense. We use predictive analytics to forecast emerging vulnerabilities before they can be exploited. By continuously training our models on the latest threat intelligence and attack patterns, we can anticipate risks and strengthen our defenses proactively—whether that means patching vulnerable systems, adjusting access controls, or updating our security policies.
  • User experience monitoring. We use AI to assess the real experience of our users, a critical measurement in a network environment where identity is the perimeter. By correlating performance metrics with security signals, we ensure that our security mechanisms don’t degrade productivity and that any anomalies impacting user experience are promptly addressed.
  • Continuous learning and improvement. Our AI and ML systems are designed to learn from every incident, adapt to new attack techniques, and evolve with the threat landscape. This continuous improvement loop enables our teams to stay ahead of sophisticated adversaries and maintain robust, resilient network security.

Advanced threats require advanced responses. By integrating AI and ML into our network security strategies, we’re enhancing our ability to detect and respond to threats swiftly, minimize potential damage, and foster a secure environment for innovation and collaboration across our global hybrid infrastructure.

Isolating networks to minimize risk

In a hybrid infrastructure, isolating network segments is a foundational security principle. By segmenting networks, we limit the scope of potential breaches and reduce the risk of lateral movement by attackers. For example, separating employee productivity networks from customer-facing systems ensures that if a vulnerability is exploited in one area, it doesn’t cascade across the entire environment.

This is especially critical in environments where sensitive customer data and internal development systems coexist. Our testing and development environments must remain completely isolated—not only from customer-facing services but also from internal productivity tools like email, collaboration platforms, and identity systems. This prevents test code or experimental configurations from inadvertently exposing production systems to risk.

We also establish policy enforcement points (PEPs) within each network segment. These act as control gates, inspecting and filtering traffic between zones. By placing PEPs at strategic boundaries, we can tightly control what moves between segments and detect anomalies early. This architecture ensures that, if a breach occurs, the “blast radius”—the scope of impact—is minimal and contained.

This layered approach to segmentation and isolation is essential for maintaining the integrity of our production systems, minimizing risk, and ensuring that our hybrid infrastructure remains resilient in the face of evolving threats.

Embracing continuous monitoring 

We’ve stopped thinking of monitoring as a one-time check. Now, it’s a continuous conversation with our network.

A photo of Singh.

“Conventional network performance monitoring—monitoring the systems and infrastructure that support our network—can only tell part of the story. To truly understand and meet our requirements, we must monitor user experiences directly.”

Ragini Singh, partner group engineering manager in Microsoft Digital

Continuous monitoring is how we stay ahead of issues before they impact our people. It’s how we keep our hybrid infrastructure resilient, performant, and secure—every second of every day.

We’ve built a monitoring ecosystem that spans our entire global network from on-premises offices to cloud-based services in Azure and software-as-a-service (SaaS) platforms. With the mindset that identity is the new perimeter, we’re using signals from all aspects of our environment and focusing on the user experience.

“Conventional network performance monitoring—monitoring the systems and infrastructure that support our network—can only tell part of the story,” says Ragini Singh, a partner group engineering manager in Microsoft Digital. “To truly understand and meet our requirements, we must monitor user experiences directly.”

This isn’t just about tools and dashboards. It’s about insight. We’re using synthetic and native metrics to build a hop-by-hop view of the user experience. That lets us pinpoint where things go wrong—and fix them fast. We’re even layering in automation to enable self-healing responses when thresholds are breached.

Continuous monitoring is a strategic shift that helps us protect our people, power our services, and deliver the seamless experience our employees expect.

Looking to the future

As enterprises continue to navigate the complexities of hybrid infrastructures, securing enterprise networks requires an agile, multifaceted approach that integrates Zero Trust principles, identity-first security, and advanced technologies like AI and ML. By shifting the focus from traditional perimeter defenses to a more holistic and adaptive security model, organizations can better protect their assets, maintain operational continuity, and foster innovation in an increasingly interconnected world.

Implementing these strategies not only enhances security but also positions organizations to leverage the full potential of their hybrid infrastructures, driving growth and success in the digital age.

Key takeaways

Here are five key actions you can take to strengthen your organization’s network security and embrace a modern approach to network security:

  • Adopt an identity-first security model. Shift your focus from traditional perimeter-based defenses to verifying and securing every user and device identity—regardless of location or network.
  • Integrate AI and machine learning into your security strategy. Continuously improve your security posture by using intelligent automation and analytics to detect, respond to, and predict threats more effectively.
  • Isolate network segments to minimize risk. Separate critical business functions, customer-facing services, and development environments to contain threats and ensure that any potential breach remains limited in scope.
  • Implement continuous monitoring across your hybrid infrastructure. Move beyond periodic checks by establishing real-time, user-centric monitoring to maintain resilience, performance, and rapid incident response.
  • Embrace a proactive, adaptive mindset. Regularly update your security policies, train your teams, and stay agile to address emerging threats and support innovation as your organization evolves.

The post Securing the borderless enterprise: How we’re using AI to reinvent our network security appeared first on Inside Track Blog.

]]>
19504