{"id":776320,"date":"2021-09-23T06:00:00","date_gmt":"2021-09-23T13:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=776320"},"modified":"2023-03-14T21:09:43","modified_gmt":"2023-03-15T04:09:43","slug":"real-world-evidence-and-the-path-from-data-to-impact","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/real-world-evidence-and-the-path-from-data-to-impact\/","title":{"rendered":"Real-world evidence and the path from data to impact"},"content":{"rendered":"\n
\n\t
\n\t\t
\n\t\t\t\t\t\tGroup<\/span>\n\t\t\tSocietal Resilience<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

From the intense shock of the COVID-19 pandemic to the effects of climate change, our global society has never faced greater risk. The Societal Resilience<\/a> team at Microsoft Research was established in recognition of this risk and tasked with developing open technologies that enable a scalable response<\/a> in times of crisis. And just as we think about scalability in a holistic way\u2014scaling across different forms of common problems, for different partners, in different domains\u2014we also take a multi-horizon view of what it means to respond to crisis.<\/p>\n\n\n\n

When an acute crisis strikes, it creates an urgency to help real people, right now. However, not all crises are acute, and not all forms of response deliver direct assistance. While we need to attend to foreground crises like floods, fire, and famine, we also need to pay attention to the background crises that precipitate them\u2014for many, the background crisis is already the foreground of their lives. To give an example, with climate change, the potential long-term casualty is the human race. But climate migration (opens in new tab)<\/span><\/a> is happening all over the world already, and it disproportionately affects some of the poorest and most vulnerable countries. <\/p>\n\n\n\n

Crises can also feed into and amplify one another. For example, the United Nations\u2019 International Organization for Migration (IOM) reports that migration in general (opens in new tab)<\/span><\/a>, and crisis events (opens in new tab)<\/span><\/a> in particular, are key drivers of human trafficking and exploitation. Migration push factors can become exacerbated during times of crisis, and people may face extreme vulnerability when forced to migrate amid a lack of safe and regular migration pathways (opens in new tab)<\/span><\/a>. Human exploitation and trafficking are a breach of the most fundamental human rights (opens in new tab)<\/span><\/a> and show what can happen when societies fail to prevent the emergence of systemic vulnerability within their populations. By tackling existing sources of vulnerability and exploitation now, we can learn how to deliver more effective responses to the interconnected crises of the future.
 
To build resilience in these areas, researchers at Microsoft and their collaborators have been working on a number of tools that help domain experts translate real-world data into evidence. All three tools and case studies presented in this post share a common idea: that a hidden structure exists within the many combinations of attributes that constitute real-world data, and that both domain knowledge and data tools are needed to make sense of this structure and inform real-world response. To learn more about these efforts, read the accompanying
AI for Business and Technology blog post (opens in new tab)<\/span><\/a>. Note that several of the technologies in this post will be presented in greater detail at the Microsoft Research Summit (opens in new tab)<\/span><\/a> on October 19\u201321, 2021. <\/p>\n\n\n\n

\n\t\n\t
\n\t\t
\n\t\t\t
\"Promo<\/a><\/figure>
\n

Microsoft Research Summit<\/h3>\n\n\n\n

October 19\u201321, 2021<\/strong><\/p>\n\n\n\n

At this inaugural event, researchers and engineers across Microsoft, and our colleagues in academia, industry, and government will come together to discuss cutting-edge work that is pushing the limits of science and technology. <\/p>\n\n\n\n

\n
Register<\/a><\/div>\n<\/div>\n<\/div><\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n

Supporting evidence-based policy<\/h2>\n\n\n\n

For crisis response at the level above individual assistance, we need to think in terms of policy\u2014how should we allocate people, money, and other resources towards tackling both the causes and consequences of the crisis? <\/p>\n\n\n\n

In such situations, we need evidence <\/em>that can inform new policies and evaluate existing ones, whether the public policy of governments or the private policy of organizations. Returning to the link between crises and trafficking, if policy makers do not have access to supporting evidence because it doesn\u2019t exist or cannot be shared, or if they are not persuaded by the weight of evidence in support of the causal relationship, they will not enact policies that ensure appropriate intervention and direct assistance when the time comes. <\/p>\n\n\n\n

Policy is the greatest lever we have to save lives and livelihoods at scale. Building technology for evidence-based policy is how we maximize our leverage as we work to make societies more resilient. <\/p>\n\n\n\n

Developing real-world evidence<\/h2>\n\n\n\n

Real-world problems affecting societal resilience leave a trail of \u201creal-world data\u201d (RWD) in their wake. This concept originated in the medical field to differentiate observational data collected for some other purpose (for example, electronic health records and healthcare claims) from experimental data collected through, and for the specific purpose of, a randomized controlled event (like a clinical trial). <\/p>\n\n\n\n

The corresponding notion of \u201creal-world evidence\u201d (RWE) similarly emerged in the medical field, defined in 21 U.S. Code \u00a7 355g (opens in new tab)<\/span><\/a> of the Federal Food, Drug, and Cosmetic Act as \u201cdata regarding the usage, or the potential benefits or risks, of a drug derived from sources other than traditional clinical trials.\u201d While our RWE research is partly inspired by the methods used to derive RWE from RWD in a medical context, we also take a broader view of what counts as evidence for decision making and policy making across unrelated fields. <\/p>\n\n\n\n

For problems like human trafficking, for example, it would be unethical to run a randomized controlled trial in which trafficking is allowed to happen. In this case, observational data describing victims of trafficking, collected at the point of assistance, is the next best source of data. Indeed, this kind of positive feedback loop, with direct assistance activities informing evidence-based policy and evidence-based policy informing the allocation of assistance resources, is one of the main ways in which targeted technology development could make a significant difference to real-world outcomes.  <\/p>\n\n\n\n

Empowering domain experts<\/h2>\n\n\n\n

In practice, however, facilitating positive feedback between assistance and policy activities means dealing with multiple challenges that hinder the progression from data to evidence, to policy, to impact. The people and organizations collecting data on the front line are rarely those responsible for making or evaluating the impact of policy, just as those with the technical expertise to develop evidence are rarely those with the domain expertise needed to interpret and act on that evidence. <\/p>\n\n\n\n

To bridge these gaps, we work with domain experts to design tools that democratize the practice of evidence development\u2014reducing reliance on data scientists and other data specialists whose skills are in short supply, especially during a crisis. <\/p>\n\n\n\n

Real-world evidence in action<\/h2>\n\n\n\n

Over the following sections, we describe tools for developing different kinds of real-world evidence in response to the distinctive characteristics\u2014and challenges\u2014of accessing, analyzing, and acting on real-world data. In each case, we use examples drawn from our efforts to counter human trafficking and modern slavery.<\/p>\n\n\n\n

Developing evidence of correlation from private data<\/h3>\n\n\n\n

Research challenge<\/h4>\n\n\n\n

When people can\u2019t see the data describing a phenomenon, they can\u2019t make effective policy decisions at any level. However, many real-world datasets relate to individuals and cannot be shared with other organizations because of privacy concerns and data protection regulations. <\/p>\n\n\n\n\n\n

This challenge arose when Microsoft participated in Tech Against Trafficking (TAT) (opens in new tab)<\/span><\/a>\u2014a coalition of technology companies (currently Amazon, BT, Microsoft, and Salesforce) working to combat trafficking with technology. In the 2019 TAT Accelerator Program (opens in new tab)<\/span><\/a>, TAT member companies worked together to support the Counter Trafficking Data Collaborative (CTDC) (opens in new tab)<\/span><\/a>\u2014an initiative run by the International Organization for Migration (IOM) that pools data from organizations including, IOM, Polaris, Liberty Shared, OTSH, and A21, to create the world\u2019s largest database of individual survivors of trafficking.  <\/p>\n\n\n\n

The CTDC data hub makes derivatives of this data openly available as a way of informing evidence-based policy against human trafficking, through data maps, dashboards, and stories that are accessible to policy makers. This raises risks to privacy. For example, if traffickers believe they have identified a victim within published data artifacts, they may assume that this implies collaboration with the authorities in ways that may prompt retaliation. To get around this, CTDC data is de-identified and anonymized using standard approaches. But this is cumbersome, forces a sacrifice of the data\u2019s analytic utility, and may not remove all residual risks to privacy and safety. <\/p>\n\n\n\n\n\n

Research question<\/h4>\n\n\n\n

How can we enable policy makers in one organization to view and explore the private data collected and controlled by another in a way that preserves the privacy of groups of data subjects, preserves the utility of datasets, and is accessible to all data stakeholders? <\/p>\n\n\n\n

Enabling technology<\/h4>\n\n\n\n
\n\t
\n\t\t
\n\t\t\t\t\t\tTool<\/span>\n\t\t\tSynthetic data showcase<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

We developed the concept of a Synthetic Data Showcase as a new mechanism for privacy-preserving data release, now available on GitHub (opens in new tab)<\/span><\/a> and as an interactive AI Lab<\/a>. Synthetic data is generated in a way that reproduces the structure and statistics of a sensitive dataset, but with the guarantee that every combination of attributes in the records appears at least k <\/em>times in the records of the sensitive dataset and therefore cannot be used to isolate any actual groups of individuals smaller than k<\/em>. In other words, we use synthetic data to generalize k<\/em>-anonymity (opens in new tab)<\/span><\/a> to all attributes of a dataset\u2014not just a subset of attributes determined in advance to be identifying in combination.  <\/p>\n\n\n\n

Alongside the synthetic data, we also release aggregate data on all short combinations of attributes, to both validate the utility of the synthetic data and to retrieve actual counts (as a multiple of k<\/em>) for official reporting. Finally, we combine both anonymous datasets in an automatically generated Power BI report for an interactive, visual, and accessible form of data exploration. The resulting evidence is at the level of correlation\u2014both across data attributes, as reflected by their joint counts, and across datasets, as reflected by the similarity of counts calculated over the sensitive versus synthetic datasets.<\/p>\n\n\n\n

\"Interactive<\/a>
In this example, we use Power BI to support privacy-preserving exploration of the anonymous datasets generated by our Synthetic Data Showcase tool. Having selected the records of victims in the age range 9\u201317, we can see the distributions of multiple additional attributes contained in these records: the year the victim was registered, gender, country of citizenship and exploitation, and type of labor or sexual exploitation. All of the counts in these distributions are dynamically generated by Power BI filtering and aggregating records of the synthetic dataset. These “estimated” counts are compared on the right with “actual” counts precomputed over the sensitive data, showing that the synthetic dataset accurately captures the structure of the sensitive data for the selected age range. For these victims aged 9\u201317, the association with “typeOfLabourOther” indicates a potential need to expand the data schema to support more targeted policy design tackling forced labor of children.<\/figcaption><\/figure>\n\n\n\n\n\n
\n\t
\n\t\t
\n\t\t\t\t\t\tTool<\/span>\n\t\t\tGlobal Synthetic Dataset<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

Our ability to publish and collaborate on open software enables us to work with IOM on creating a Synthetic Data Showcase for their full, de-identified victim database without ever accessing the highly sensitive data ourselves. Today, IOM has announced the resulting update (opens in new tab)<\/span><\/a> to the CTDC website, sharing data on more than three times as many victims as before. This includes several new data columns, with group-level privacy guarantees and utility that anyone can interactively verify.<\/p>\n\n\n\n

With the new Global Human Trafficking Synthetic Dataset, Synthetic Data Showcase has enabled IOM and CTDC to share data that couldn\u2019t otherwise be shared, helping address problems that couldn\u2019t otherwise be solved. In the following sections, we show how this dataset can be used to develop additional types of evidence to fight trafficking. <\/p>\n\n\n\n

\n\t
\n\t\t
\n\t\t\t\t\t\tTool<\/span>\n\t\t\tHuman Trafficking Case Data Standard (HTCDS)<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

IOM aims to share the new technique with counter-trafficking organizations worldwide as part of a wider program to improve the production of data and evidence on human trafficking. This includes establishing new international standards and guidance to support governments in producing high-quality administrative data, in partnership with the UN Office on Drugs and Crime, and a package of data standards and information management tools (opens in new tab)<\/span><\/a> for frontline counter-trafficking agencies.<\/p>\n\n\n\n\n\n

Further details of the TAT-CTDC-IOM-Microsoft collaboration: <\/p>\n\n\n\n