Getting to ‘search completeness’ internally at Microsoft

Mar 22, 2024   |  

Microsoft Digital PerspectivesMicrosoft is a big company with thousands of teams working in different ways based on the work they do. Despite that complexity, when our employees go looking for something, they expect an internal search portal that will find exactly what they’re looking for instantly—just like when they search on the internet. Yet when talking to these employees, each of them defines the scope of what they’re looking for quite differently.

  • A developer may want HR info, stack overflow, other technical info specific to their organization, or technical info from places like Microsoft Azure and Microsoft.com.
  • A salesperson may want HR info, customer information from our account management software and support services, or the latest public information about their customers.
Willingham smiles in a photo taken outside.
Dodd Willingham works on the Digitally Assisted Workday team in Microsoft Digital Employee Experience. His team’s job is to enhance the internal search experience for employees across Microsoft.

This blog explores the challenge of delivering the full scope of content each employee expects to find in search from their subjective view. This is what we call search completeness.

To start on the journey of getting to search completeness, you first must understand your user community:

  • How do they search? Do they use smart phones? Do they use Bing’s Work search tool? Do they use a corporate SharePoint portal? For Microsoft employees, it’s a mix of all of these.
  • Why are they searching? Are they trying to find another person? Are they researching content? Are they trying to find reference material?
  • What are they searching for? What content is most important for your employees to find?

[Read the first blog in our series, making content more accessible and searches more efficient at Microsoft.]

Understanding your user community

Reviewing search term frequency was one of our early steps in understanding our users. Looking at the number of times each search term was used, then looking at a sampling of those search terms made it very clear that the most common searches are for common employee actions, and that less common searches are typically persona specific. The chart below shows this well: high volume search terms that are common across most employees, and low-volume ones that tend to be org- or persona-specific.

Graphic showing that the vast majority of 500,000 searches per month at Microsoft are on a few popular terms like “holidays.”
Reviewing search term frequency was one of our early steps in understanding our users. We found that just a few common terms made up the vast majority of searches. We were able to use that info to improve the results for those top searches. Employees at Microsoft make about 500,000 searches per month.

Sometimes we could easily identify desired content from these popular search terms such as search terms related to documents. Microsoft.com and Stack overflow were also fairly popular.

Next, we realized there was a lot of content that was impossible to identify from search terms. We needed some other way of identifying desired content and found a way via Microsoft Azure Active Directory (AAD).

By using its authentication volume, we are able to see the most popular registered apps within the company. Many of these are included in Microsoft Search by default. SharePoint and OneDrive are good examples. Others have their own search capability that meets user expectations and doesn’t need its content included in enterprise search. Outlook is such an example. This left us with a significant volume of highly used apps whose content would be beneficial to add to enterprise search. The chart below gives you a taste of these results.

The most popular apps at Microsoft based on Azure Active Directory usage data, including SharePoint, Outlook, Teams, Dynamics 365, Azure DevOps, and Power BI.
Tapping into the apps that Microsoft employees use the most has helped us prioritize what to add to search first. We used Microsoft Azure Active Directory data to identify the company’s top apps list, and we’re currently adding the top 100 apps to our internal search capability.

Gathering the list of popular apps left us with a challenge of identifying popular content that isn’t defined as an app in AAD. We explored various ways of capturing this information but, so far, have not found any better method than user feedback and surveys.

The result of this work has yielded a “Top 100” list of content we want to add to enterprise search. So how do we go about getting this content added into our search results?

Methods of achieving search completeness

Graphic showing searching for all Microsoft content on premise, in the cloud, and with third parties using bookmarks, crawl and add to index, and federated search.
Our bid to transform internal search at Microsoft aims to include all Microsoft content in our search results.

Microsoft Search provides a number of different methods with which to bring in all the content. Each method has its own strengths and weaknesses, which we’ve summarized in the table below.

Tools Strengths Weaknesses
Bookmarks and Q&A
  • Can point at any URL
  • Can be targeted to security groups
  • Easy to maintain
  • Manual effort required by the admin
  • URLs can get out of date without the admin’s knowledge
  • A single URL response is delivered to a discrete list of search terms, which is limiting
Out-of-the-box Microsoft search crawling
  • Covers everything within One Drive and SharePoint by default
  • Includes everything in the compliance module
  • Offers lots of methods for addressing old sites, old content, legal retention, etc.
  • There’s lots of content outside of Microsoft 365 that users expect to be included
SharePoint Hybrid Crawler
  • Will crawl more than 160 different file types
  • Resulting content appears as natives within out-of-the-box Microsoft 365 search
  • Does not support OAuth (Open Authorization), which meant it could only be used for internet-published content
Search connectors
  • Can extend search crawling to a variety of additional content
  • Enable result display within “All” vertical as well as custom verticals
  • Support custom filters and result display layout
  • Fully met our security requirements from admin and user ACL (Access Control List) perspectives
  • Does not cover all content
  • Has limited volume for the number of connections allowed and item count supported
Microsoft Graph Custom Connectors
  • Can be built for any kind of content source
  • Can also hit the limited volume barrier mentioned above
  • Must be created and maintained by our search team
Federated search
  • Leverages existing search engines in other products so the Microsoft 365 search engine doesn’t have to do it all
  • Limited options available
  • User must be clear in their query or click on a custom vertical to see the results

What we are doing

So now the stage is set, we know the content we want to include, we know the methods available for doing it, we just need to implement the right method in each case.

Tool How we are using it
Bookmarks and Q&A
  • 1,150 bookmarks are in active use, about half of which point to sites and tools outside of ODSP.
    • About 30 bookmarks are targeted at specific audiences.
    • Using our custom telemetry, Bookmarks are clicked on in nearly half of all searches, primarily by the “General Employee” persona.
  • Fifteen Q&A are in active use, each one consisting of a small description of a popular subject and 5-10 common links associated with that subject.
Out-of-the-box Microsoft 365 search crawling
  • Corporate policy requires all ODSP content to be crawled. No site should turn off crawling.
  • When that is a problem, custom KQL (Keyword Query Language) is used in the “all vertical” to exclude the appropriate content from visibility while retaining it in the compliance module.
SharePoint Hybrid Crawler
  • Used to crawl internet content that employees find within the enterprise, such as learn.microsoft.com.
Search Connectors
  • Eight connections are in production now, and some of which include more than one source.
    • MediaWiki, ServiceNow, Website, and Microsoft Azure DevOps work item
  • About 2 million items are indexed.
    • Will be growing this to 30M as soon product capacity allows.
Microsoft Graph Custom Connectors
  • Two custom connectors are in production. One specific to a single kind of content, and the other is a generic connector that will bring in JSON formatted content provided by any interested party.
  • The generic connector currently has 10 content providers from across the company.
    • Generic connector includes ACL (Access Control List) fields, so security trimming can be enforced.
Federated Search
  • Federation to our primary Microsoft Dynamics 365 instance has been very popular.

We also use Microsoft Viva Topics and other product capabilities, which will be discussed in a future blog post.

Key Takeaways

At this point, search indexing encompasses 70 percent of the AAD Top Apps list as weighted by usage volume. We expect to reach 80 percent within the next year.

  • The content added through connectors and federated search is receiving 75,000 clicks per month––about 8 percent of our total click volume.
  • These connections have added 10 percent to the admin effort. For more detail, see the previous blog post in this series: Generating great results: Administering search at Microsoft.

We’ve also realized there are occasions where content should not be included in enterprise search but should be included in targeted custom search portals. The same methods described above can typically be used to support such custom portals. Our learning thus far will also be described in a future post.

We see some continuing challenges for which we do not yet have answers:

  1. At some point the administrative and resource overhead associated with adding additional sources of content will outweigh the benefit because we will be getting down to very seldom used content. We don’t know where that boundary is yet.
  2. We need to figure out how to stay in touch with continuing changes across the company, deprecating content when appropriate while adding new content sources when they come up.
  3. We haven’t figured out how to tune search relevance in a manner that works well for each persona.

Please return to this space for future stories in our ongoing series on transforming search completeness here at Microsoft.

Related links

Tags: , ,