Our approach
We believe everyone can benefit from opening, sharing, and collaborating around data to make better decisions, improve efficiency, and help tackle some of the world’s most pressing societal challenges.
Set data collaboration principles
We adopted five principles to guide our participation in data collaborations: open; usable; empowering; secure; and private. These principles underpin our participation, and we hope other organizations can build on them to share their data responsibly.
Engage partnerships and explore projects
We believe success will depend on building deep collaborations with others from industry, government, and civil society around the world. This includes work with leading organizations in the open data movement, such as the Open Data Institute and The GovLab at New York University.
Make data sharing easier
We're committed to investing in the essential assets that will make data sharing easier, including the necessary tools; frameworks; and templates. This is especially important when it comes to opening and collaborating around data to solve important societal issues.
Closing the data divide
Access to data is a big challenge. The benefits for organizations of all sizes and the broader community are significant if we can work together to make progress on open data.
Industry Data for Society Partnership
Working across industry to make private sector data more open and accessible for societal good.
Solar farms mapping
The solar farms mapping data can help researchers identify factors driving land suitability for solar projects and help public agencies better plan siting of solar energy development in India.
HKH glacier mapping
Glacier mapping is key to ecological monitoring in the Hindu Kush Himalaya (HKH) region, climate change poses a risk to those dependent on the health of glacier ecosystems. The (HKH) glacier mapping dataset includes imagery with locations of glaciers.
Chesapeake land cover
The Chesapeake Conservancy created a landcover dataset for conservation efforts, this same data containing high-resolution aerial imagery and land cover labels can be used to train ML models to map an even wider area of land cover.
Concentrated Animal Feeding Operations (CAFO)
The poultry CAFO GitHub repository contains US-wide datasets of predicted poultry barn locations to help researchers identify CAFOs for conservation groups to address water and air quality issues.
TorchGeo
TorchGeo is a PyTorch domain library that includes several Geospatial benchmark datasets such as CDL, Landsat7, and Landsat8 to help support research tasks like image classification, semantic segmentation, object detection, instance segmentation, change detection, and more.
Microsoft Nonprofit Innovation Hub
The Nonprofit Innovation Hub is an open-source GitHub repository with lightweight solutions that enable nonprofits to innovate.
Legal frameworks
Data sharing agreements can take months to draw up, oftentimes deterring organizations from sharing data at all. As a first step toward building better processes and tools, we're sharing a set of data agreements to govern the sharing of data, particularly in the context of training AI models.
CDLA Permissive 2.0
The Community Data License Agreement (CDLA) Permissive 2.0 is an open data agreement designed to make it easier to share and collaborate with open data.
C-UDA 1.0
The Computational Use of Data Agreement (C-UDA) 1.0 is intended for use with datasets that may include material not owned by the data provider, but where it may have been assembled lawfully from publicly accessible sources.
DUA-OAI
The Data Use Agreement for Open AI Model Development (DUA-OAI) provides terms to govern the sharing of data by an organization with another for the purpose of allowing that second organization to use the data to train an AI model, where the trained model is open sourced.
DUA-DC
The Data Use Agreement for Data Commons (DUA-DC) can be used by multiple parties who want to share data through a common, Application Programming Interface (API)-enabled database.
Capabilities
Learn more about the tools and practices we employ to enable more secure and streamlined access to data.
Differential privacy
Differential privacy introduces statistical noise–slight alterations–to mask datasets and protect the privacy of individuals.
Azure confidential computing
Confidential computing helps to protect sensitive data in the cloud by offering security through data-in-use encryption–additional protection for your data while it's being processed.
Azure Open Datasets
A curated collection of publicly available datasets that are ready to use in machine learning workflows and easy to access from Azure services.
Researcher tools
Explore a collection of datasets, code, and models from Microsoft Research for the broader academic community to advance state-of-the-art research across all disciplines.