Human trafficking is a crime against humanity and a violation of the most fundamental human rights. No attempt to build a more resilient society can ignore the systemic issues that allow trafficking to take hold, since the ongoing existence of modern slavery, servitude, and exploitation challenge the very foundations of what we seek to protect. Nor can our efforts to protect the most vulnerable people in times of crisis fail to acknowledge the role of crisis events in fueling the systematic exploitation of such groups:
“Human traffickers are almost always the first responders to crisis zones around the world. Whether an environmental catastrophe or a military conflict or a sudden economic collapse, whenever crisis strikes, women, children, the poor, and outcastes are disproportionately affected, exploited, and recruited by traffickers”
—Siddharth Kara, Modern Slavery: A Global Perspective, 2017
To tackle human trafficking in both its chronic and acute forms, we must recognize that the fight against trafficking is itself a complex societal activity that requires resilience to succeed. It needs investment in the core “3 Ps (opens in new tab)” of anti-trafficking work – prevention, prosecution, and protection – supported by the critical and cross-cutting role of partnerships. Only through the proper coordination of partners spanning the public and private sector, coupled with the mutual evolution of technology and policy, can the human effort to combat trafficking be most effective.
It is within this context that Microsoft jointly founded (opens in new tab) the Tech Against Trafficking (opens in new tab) (TAT) coalition in 2018. Through our work on the TAT steering committee and across initiatives including an accelerator program, we have formed long-term partnerships and projects tackling this most urgent societal problem. With TAT, we recognized the fact that there is only so much that any one organization can achieve alone. The more that different organizations can share with one another, the better placed the community will be to deliver an effective and coordinated response.
However, the nature of trafficking complicates sharing in multiple ways. Different tools tackling the same problems are developed in different contexts, missing out on the benefits that come from shared tools and practices. This is compounded by the lack of shared information on the availability and effectiveness of existing tools, encouraging reinvention and fragmentation. Where data on the same phenomena are collected by different parties, the lack of data standards means that such data cannot be easily compared or combined. And perhaps most importantly, where collected data relate to individual victims, the need to protect the privacy and safety of these victims means that the data cannot be shared or published without expert analysis of the statistical disclosure risk. All of these factors undermine the timely sharing of information within the community, and shaped our selection of TAT initiatives that could transform the development of evidence-based policy for anti-trafficking efforts worldwide.
Interactive map of anti-trafficking tools
The first such initiative led to the development of the TAT Interactive Map (opens in new tab) in Power BI (opens in new tab), encompassing 300 counter-trafficking technology tools and designed to enable tool discovery, gap identification, and technology advocacy. The interactive map and corresponding tool survey have been jointly published (opens in new tab) with the Organization for Security and Co-operation in Europe (opens in new tab) (OSCE) and shared via testimony (opens in new tab) to a US Congressional Hearing (opens in new tab) on the role of technology in countering trafficking in persons.
The second such initiative was developed through the inaugural TAT Accelerator Program (opens in new tab) with the Counter-Trafficking Data Collaborative (opens in new tab) (CTDC). This collaborative, managed by the UN Migration agency, IOM, pools data from IOM (opens in new tab), Polaris (opens in new tab), and Liberty Shared (opens in new tab) to create the world’s largest database on identified victims of trafficking. The CTDC data hub makes derivatives of this data openly available via anonymized datasets, interactive dashboards, interactive maps, and topical data narratives.
However, while such data is essential evidence as to the existence and nature of trafficking, publishing it directly risks revealing the presence of individual victims to their traffickers, even if the data are de-identified and anonymized using standard approaches. This is not just a privacy risk, but also a safety risk: if traffickers believe a victim has received assistance, they may assume that this implies collaboration with law enforcement and hence the need for retaliation against the victim, or their friends, family, or community.
Privacy-preserving data platform
Our solution to this problem, described in this paper (opens in new tab) and available open-source on GitHub at microsoft/synthetic-data-showcase (opens in new tab), is to generate synthetic data and aggregate data for sharing in place of actual sensitive data. These datasets are brought together in an automatically generated Power BI (opens in new tab) interface that enables data access and exploration under an absolute guarantee of group-level privacy. In other words, no user of this interface or the underlying datasets should be able to identify combinations of attributes that describe groups of individuals below a specified minimum size.
Such group-level protection is especially important in contexts where the safety of data subjects is at risk. For example, imagine that a trafficker has exclusive control of a trafficking route between two countries, and has full knowledge of all the victims they have trafficked on this route. It would be easy for them to use this background knowledge to identify any of their victims in a sensitive dataset if it were to be published directly. Even if only the count of victims for that route were published, the closer the published count to the trafficker’s known number of victims, the stronger the inference the trafficker could make that each of their victims was present in the dataset.
However, if we only report combinations of attributes that are common in the sensitive dataset, and only disclose their approximate frequency, we restrict the extent to which a trafficker could make inferences about the presence of any given individual: smaller groups cannot be detected at all, larger groups offer “safety in numbers”, and differences in approximate counts over time cannot be used to detect small or precise changes.
In general, as the sizes of detected groups increase, the likelihood that adversaries have complete and exclusive knowledge of all group members diminishes. The policy challenge is therefore to set the minimum detectable group size above the level at which an adversary could have complete knowledge about, or take retaliatory action against, all individual members of such a group. The technical challenge is creating an anonymization method that enables such group-level disclosure control.
Our algorithm for generating synthetic data achieves such control through what we call k-synthetic anonymity: the records of the synthetic dataset are constructed such that all combinations of attribute values are common in the sensitive dataset – appearing k or more times – and therefore cannot be linked with groups of individuals below the specified privacy resolution k. Aggregate counts of records for all combinations of attributes up to a specified length are also precomputed and rounded down to the closest k to ensure that groups of individuals smaller than k cannot be identified in either of the two datasets. To ensure that the synthetic data preserves the high-level statistics of the sensitive dataset, the counts of individual attributes in the records of the synthetic dataset are also controlled such that they are the closest multiple of k not exceeding the actual aggregate value. The structure of the synthetic data, in terms of the distribution of attribute combinations, also preserves the structure of the sensitive data to the extent that is possible while respecting the absolute privacy guarantee of a minimum detectable group size k.
Outcomes
This solution is currently being adopted by IOM to transform how victim data is made available to the counter-trafficking community via the CTDC data hub and other channels. The open-source microsoft/synthetic-data-showcase (opens in new tab) solution for privacy-preserving data publishing is complemented by a second outcome from the first TAT accelerator – the Human Trafficking Case Data Standard – also available on GitHub at UNMigration/HTCDS (opens in new tab).
Both projects have been incorporated by IOM, in collaboration with the United Nations Office on Drugs and Crime (UNODC (opens in new tab)), into a guidance manual providing guidance to government agencies, law enforcement, and other stakeholders on the collection, storage, management, and sharing of human trafficking administrative data. An expert and stakeholder workshop in May 2021 provided final validation of the materials, supporting the international launch of training and capacity-building activities. The goal is to facilitate the production of consistent, high quality data that will strengthen the shared evidence base for anti-trafficking policy around the world.
Our work with Tech Against Trafficking, CTDC, and IOM is ongoing, as are collaborations with other organizations and stakeholders in the anti-trafficking community. These efforts build on and complement a wide range of additional activities and partnerships that Microsoft participates in to combat forced labor in its supply chain, and tackle the issues of human trafficking, exploitation, and child abuse more broadly. For more information, see Microsoft’s Modern Slavery and Human Trafficking Statement (opens in new tab).
The need for safe representation of at-risk populations is not limited to victims of human trafficking, and can support assistance and advocacy activities for any marginalized, exploited, or otherwise vulnerable populations. Similarly, in times of crisis we need tools not just for publishing anonymous datasets, but for providing privacy-preserving views onto private databases – potentially operated by distinct parties who cannot pool their data for privacy, trust, or regulatory reasons. Supporting federated and accessible forms of data exploration, analytics, machine learning, and causal inference is essential for real-world evidence development at scale, and forms the basis of ongoing work to be presented in future case studies.
Resilience principles in action
- Tackle the causes and consequences of systemic vulnerability
- Human trafficking is both a cause and consequence of vulnerability, with substantial harm.
- Convene and participate in broad coalitions with a bias for action
- Microsoft is a founding and active member of the Tech Against Trafficking (TAT) coalition.
- Contribute a “resilience toolbox” of open tools and technologies
- Synthetic data showcase (opens in new tab) is an open-source pipeline for privacy-preserving data sharing and analysis, while the TAT Interactive Map (opens in new tab) of anti-trafficking tools is openly accessible to all.
- Understand the links between people, practices, and outcomes
- Both tools are in support of evidence-based practice towards better real-world outcomes.
- Democratize expert workflows and real-world evidence development
- Interactive dashboards enable rich, code-free analysis of real-world data by non–data-scientists.
- Build trust through transparency and value-sensitive design
- We fill collaboration gaps for both data and tools through interfaces that promote transparency.
- Design for transformation at the mesoscale of activity
- We focus on making data and tools relevant and targeted to policymakers at the national level.