Focusing on the user<\/strong><\/p>\nTo identify the key requirements for the data catalog, Lucas interviewed users across the company. Their jobs ranged from traditional data roles such as developers, data scientists, data analysts, and data stewards to common business roles like program managers and business managers. His interviews enabled Lucas\u2019s team to understand pain points, create prototypes, and conduct usability tests with data publishers and catalog users to ensure development of a truly usable data catalog that would accelerate data discovery for business insights.<\/p>\n
The Modern Catalog Team quickly found that data consumers spent most of their time tracking down the appropriate group owner of the data they needed, requesting and receiving authorization from the data owner, and then cleaning the data before being able to use it.<\/p>\n
\u201cIt was time-consuming for users of our data and challenging for us because we had to answer so many different emails,\u201d Rodriguez says.<\/p>\n
It was also difficult for users like Kathy Brustad, a senior data and applied scientist in Worldwide Learning, to assess data quality.<\/p>\n
\u201cIt was hard to identify the source of truth for certain data assets,\u201d Brustad says. \u201cIt was also hard to know how data had been transformed when it flowed from one source to another and finally ended up being used.\u201d<\/p>\n
After gathering pain points, Lucas\u2019s team worked with the Data Analytics Working Group, which is comprised of key principal-level representatives of Microsoft Digital who help shape data policies, to create a prioritized list of high-level requirements for the catalog. This led to the design of a modern data catalog that enables employees to intuitively browse for available assets and share their team\u2019s data using a single site.<\/p>\n
Brustad wanted to be able to identify the source of truth for data assets. Using the redesigned data catalog, she can assess data quality and its transformation over time by referencing the quality score, sample list of data, and lineage showing how the data connects to other datasets.<\/p>\n
\u201cThere are many different places you can find this data, because it\u2019s still getting replicated throughout the company,\u201d she says. \u201cKnowing that there are quality standards in Microsoft\u2019s data catalog gives me a higher level of confidence.\u201d<\/p>\n
Prioritizing governance and user feedback<\/strong><\/p>\nThe catalog also integrates governance into the data registration process. If data publishers already have their assets in Azure Data Lake and follow best practices for governance, their assets can automatically be scanned into the catalog.<\/p>\n
\u201cWe\u2019re able to turn around our analytics a lot faster because we can establish an automatic connection to the source of the data,\u201d Brustad says of the catalog, which she uses to find data to measure the impact of seller training programs and understand changes in seller behavior. \u201cIt gives us a 20 to 25 percent gain on the turnaround time.\u201d<\/p>\n
The catalog\u2019s connection to Azure Data Lake facilitates the asset upload process for Rodriguez, because her team\u2019s assets are already in Azure SQL Server and Azure Data Lake. The data catalog also improves the experience for consumers of her team\u2019s data.<\/p>\n
\u201cIt was appealing to have a centralized data catalog that helps customers know what data we have,\u201d Rodriguez says. \u201cNow, we can invite them to go to the data catalog and check out our team\u2019s assets.\u201d The catalog offers visibility not just into the technical information of the assets, but also into key governance metadata, such as compliance adherence and data quality measurements.<\/p>\n
The Modern Catalog Team is committed to continuously learning from employees by collecting feedback through telemetry, email, and a feedback button in the modern catalog. This came in handy when Rodriguez couldn\u2019t add her team\u2019s distribution list as an asset owner. She reached out to the team via email, and they provided her an immediate workaround and added the feature to their backlog. Rodriguez has proposed additional features for future iterations of the data catalog, such as a guide for naming and tagging assets and supporting data quality.<\/p>\n
The team also collects telemetry data to identify errors in data access, which are coupled with user interviews to understand their intent. These ongoing conversations inform future iterations of the catalog.<\/p>\n
\u201cIt\u2019s an intuitive platform, and the team is always available for feedback,\u201d Rodriguez says. \u201cThis make the process easier.\u201d<\/p>\n
Whether it\u2019s used to share data or understand behavior, the modern data catalog is an invaluable tool for employees.<\/p>\n
\u201cAs someone who works in the data science field, I\u2019m comfortable with going to the data catalog to procure data because I know that the data has been vetted,\u201d Brustad says.<\/p>\n
Learn how Microsoft developed its modern data catalog<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"This content has been archived, and while it was correct at time of publication, it may no longer be accurate or reflect the current situation at Microsoft. At Microsoft, employees were already aware of the power of using data to create experiences that people love. But that awareness wasn\u2019t enough to bring the data to […]<\/p>\n","protected":false},"author":146,"featured_media":5089,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"_hide_featured_on_single":false,"_show_featured_caption_on_single":true,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[1],"tags":[],"coauthors":[674],"class_list":["post-5085","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","m-blog-post"],"jetpack_publicize_connections":[],"yoast_head":"\n
How Microsoft connects high-quality, discoverable data - Inside Track Blog<\/title>\n \n \n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n \n \n \n\t \n\t \n\t \n