Societal Resilience Articles http://approjects.co.za/?big=en-us/research/ Thu, 09 Dec 2021 18:31:25 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Revealing the Hidden Structure of Corruption http://approjects.co.za/?big=en-us/research/articles/revealing-the-hidden-structure-of-corruption/ Thu, 09 Dec 2021 16:09:03 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=802705 How do you solve a problem like corruption? In this Societal Resilience case study, learn about our development of data tools that could bring new levels of transparency to public procurement data, building on collaborations with the World Bank and Inter-American Development Bank and supporting the new Microsoft ACTS (Anti-Corruption Technologies and Solutions) initiative.

The post Revealing the Hidden Structure of Corruption appeared first on Microsoft Research.

]]>
How do you solve a problem like corruption? In this Societal Resilience case study, learn about our development of data tools that could bring new levels of transparency to public procurement data, building on collaborations with the World Bank and Inter-American Development Bank and supporting the new Microsoft ACTS (Anti-Corruption Technologies and Solutions) initiative.

The annual cost of corruption is estimated as more than $2.6 trillion (opens in new tab), or 5% of the global gross domestic product (GDP). It is a problem that disproportionally affects those living in extreme poverty, especially when they are forced to spend a sizeable proportion of their income on bribe payments just to access basic public services like healthcare and clean water. However, the problem isn’t just about the loss of income – it’s about the loss of investment in public services when government funds are diverted by corrupt officials. Those in poverty end up paying more, and suffering more, for less. The resulting loss of faith in government and the rule of law can be devastating.

Corruption is commonly defined as the abuse of public office for private gain. But where do the potential gains come from? One major source is the staggering $10 trillion (opens in new tab) per year that governments award in contracts for public goods and services. The associated procurement process, in which bids are submitted by prospective suppliers and selected by government officials, represents an ideal opportunity for corrupt officials to award contracts based not on merit but on bribery, nepotism, cronyism, and other forms of self-interest.

While opportunistic acts of corruption are damaging, it is the systematic practice of corruption that erodes public trust and entrenches power in the hands of the few. The resulting power networks permeate both government and business, yet are all but invisible to the citizens who suffer under their influence. Building a more resilient society means addressing corruption as both a violation of fundamental human rights (opens in new tab) and the source of collective harm to societal development (opens in new tab).

A problem of relationships and relatedness

Corrupt actors can go to great lengths to conceal evidence of the relationships that represent the means, motive, and opportunity to engage in corruption.

In the case of grand corruption, it involves relationships between those in power (e.g., government officials) and those benefiting from the exercise of that power (e.g., favored suppliers awarded government contacts). In the case of collusion and cartel formation, it involves relationships between coordinating suppliers operating under the guise of competitive independence. In both cases, corrupt actors use a variety of methods to suppress evidence of the relationships that facilitate their corruption, including use of front persons, bid rigging, and opaque ownership structures (e.g., anonymous shell companies, trusts, or foundations).

However, in order to benefit from corruption, corrupt actors must act in observable ways – and it is from the combination of company registration data and observable patterns of activity that we can develop an overall measure of relatedness that allows us to reason about possibly suppressed relationships. For example, the coordinated activity required for bid rigging, whether through bid rotation or suppression, creates distinctive patterns of activity that take on added significance in the presence of other information connecting the bidding companies. This could be as little as a shared phone number, email address, or postal address, or even common use of a legal representative or document template.

Statistical indicators of relatedness do not guarantee a real-world relationship of course, just as a real-world relationship does not imply corruption. We must always be cautious about drawing false conclusions. However, no matter how we interpret close relatedness in any given context, tracking the existence of potential relationships is important for understanding how risk and influence could propagate. It is transparency into potential pathways, supported by evidence from source data, that could allow us to make the connections that would otherwise evade detection.

The need for a transparency engine

There is a broad consensus in the anti-corruption field that greater transparency can help reveal corruption in action, reducing the ability of corrupt actors to go undetected and unpunished. Open data is a key enabler of transparency, with initiatives like Open Contracting (opens in new tab), Open Corporates (opens in new tab), and Open Ownership (opens in new tab) helping to seed an open data ecosystem that can facilitate collective action, both by those in government as well as those holding governments accountable for their actions.

However, open data only creates transparency if it is actually used, and that requires tools that enable domain experts to view, explore, and make sense of the relevant data for themselves. While open data standards like the Open Contracting Data Standard (OCDS) (opens in new tab) are being adopted by governments around the world, OCDS tools (opens in new tab) remain at an early stage in their development with significant opportunity for innovation. In particular, the potential for new technology to transform anti-corruption activity can draw inspiration from the development of breakthrough technology in another related area – web search.

In the early days of the web, ranking web pages didn’t take into account the global structure of those pages – it relied on local content matches. This is like matching a beneficial ownership query against a published dataset that is sparsely populated with a high proportion of “adversarial” content (deliberate omission, misrepresentation, misspelling, etc.). Google’s PageRank (opens in new tab) algorithm transformed the quality of search results by taking into account the global structure created by hyperlinks. With beneficial ownership and control, we similarly need to unlock the implicit value of the explicit structures found in open datasets.

Just like a search engine, we need an anti-corruption “transparency engine” that collects, integrates, and presents the right information for users to make the right decision – but it cannot make the decision for them. How a transparency engine derives and communicates evidence of relatedness is critical for human decision making around whether and how to respond to potentially material relationships.

Today, on the United Nations’ International Anti-Corruption Day 2021, Microsoft is announcing (opens in new tab) a new effort to develop the kind of transparency engine that could help to reveal the influence of common beneficial ownership and control, even in the absence of accurate ownership data. While the real work is only just beginning, it is informed by research explorations spanning multiple years and partners.

We now review this history as an introduction to our current proof-of-concept, before concluding with the announcement of two new tools, released on GitHub today, that further strength the ability of the community to engage in collective, evidence-based action in the fight against corruption.

Learning from domain and research partners

There is one group of organizations in a privileged position to observe corrupt networks in action, and more importantly, to cut off the finance that sustains them: the banks that lend governments the money to pay for their public projects.

In 2018, we partnered with the World Bank (opens in new tab) to explore new ways in which data visualization and AI could help to identify and mitigate corruptions risks in the public procurement process. Our resulting proof-of-concept, presented at the biennial Anti-Corruption Collective Action Conference (opens in new tab) hosted by the Basel Institute on Governance, used Microsoft Power BI (opens in new tab) to create rich visual interfaces to both private datasets and “red flag” definitions provided by the World Bank.

Our World Bank collaboration also saw our first explorations towards quantifying relatedness, via various forms of spectral graph embedding closely related to the PageRank algorithm. Using both Adjacency and Laplacian Spectral Embedding (ASE and LSE), we were able to map the discrete graph structure of observed company relationships into a continuous vector space. These mappings preserve neighborhood structure while also accounting for the global structure of the graph, such that nearby companies in the graph tend to be placed at nearby points in the embedded space. This allows the distances between company vectors to define their relatedness, even in the absence of observed relationships.

Using this technique, we were able to answer one of the most challenging questions posed by the World Bank – how to reveal the implicit connections between colluding suppliers who never bid or win together precisely because they don’t want any documented evidence of their relationships.

In 2020, we used the same approach in a collaboration with the Inter-American Development Bank (IDB) looking specifically at open procurement data from Colombia. This was part of the new Microsoft ACTS (opens in new tab) (Anti-Corruption Technologies and Solutions) initiative launched on UN Anti-Corruption Day 2020 – continuing a longstanding collaboration with IDB (opens in new tab). This work, covered in three ACTS features (part 1 (opens in new tab), part 2 (opens in new tab), part 3 (opens in new tab)), saw us extend our use of spectral methods to the joint embedding of dynamic graphs that evolve over time. The approach, Omnibus embedding (opens in new tab), had previously been developed by our research collaborators at Johns Hopkins University and successfully applied by our team to a wide range of problems. In the case of our IDB collaboration, we used it to measure the change in the behavior of a company over time and detect anomalous patterns of activity (opens in new tab) in terms of contract awards.

In our current work, we are collaborating with the University of Bristol on the application of new methods for the joint embedding of dynamic multipartite graphs that span all relevant parties of the procurement ecosystem and their linked attributes: buyer and supplier organizations; tenders and line items; company owners and contact details; and so on. The underlying method of Unfolded Spectral Embedding (USE) offers a principled statistical foundation for comparing behavior at different points in time, with provable stability guarantees (opens in new tab) that constant node behavior at any time results in a constant node position. The same method also allows for combining behavioral signals from all time periods into a single vector representation for each node, enabling state-of-the-art statistical inference (opens in new tab). These are precisely the qualities we need to establish a principled measure of relatedness, informed by manifold geometry (opens in new tab) in the embedded space, that accounts for all the different kinds of relationship that can be observed in real-world data.

Transparency engine proof-of-concept

Our proof-of-concept solution uses dynamic multipartite graph embedding with USE to create vector-based representations of each company that incorporates information from its historic procurement activity, ownership structure, and other sources. The resulting “transparency engine” aims to detect otherwise undiscoverable sources of relatedness, and then breaks them down into visual explanations that can be explored and evaluated in the context of Microsoft Power BI. The following sequence of images demonstrates our approach applied to open government data from Brazil.

Our experimental transparency engine has shown that much can be achieved with existing open datasets, even with imperfect information on company ownership. For many kinds of anti-corruption analysis, it isn’t necessary to know precisely who controls a given company, but whether a given group of companies have sufficiently strong relatedness to suggest potential non-independence, prompting deeper investigations into possible coordination, collusion, or common beneficial control. So while establishing the ultimate beneficial owners of a company remains a significant challenge – and one that requires new global policies and reporting requirements for meaningful change – any future increase in the quality of ownership data will only improve the quality of inferred relationships.

Our embedding-based relatedness model also offers a principled foundation for the propagation of risk. We are continuing to work with our partners to understand how red flags (opens in new tab), typically defined at the level of individual entities, could diffuse through the embedded space to create a measure of relational risk exposure – the aggregate risk that an entity is exposed to via the overall structure of all entity relationships. The promise of this ongoing research is that it could automate much of the manual due diligence work that happens today – exploring all possible pathways by which an entity could be exposed to, or expose others to, corruption risk. This represents an even greater level of transparency into the procurement ecosystem as a whole, and a new kind of tool with which to detect, disrupt, and ultimately prevent corrupt activity.

Open data tools for the fight against corruption

So how do you solve a problem like corruption? There is no easy answer, but openness and transparency provide a clear path forward.

In early 2022, we will release an open-source version of our transparency engine that can be adapted and deployed for real-world use. The core algorithms for ASE, LSE, and Omnibus embedding, among others, are already available in the Microsoft graspologic (opens in new tab) package for graph statistics in Python, and we are currently working to incorporate USE and related methods.

Today, on UN Anti-Corruption Day 2021, we are also releasing two new tools that further support real-world evidence development in the fight against corruption, building on our parallel efforts in the fight against human trafficking (MSR blog (opens in new tab), AI for Business blog (opens in new tab), TechRepublic (opens in new tab), TechCrunch (opens in new tab), GeekWire (opens in new tab)).

The first release is an update to our Synthetic Data Showcase (opens in new tab) tool for privacy-preserving data sharing that reimplements the core data synthesis and aggregation components in Rust. This enables compilation to WebAssembly for optimized execution in the browser, which in turn allows us to convert our previous command line tool into an interactive client-side web application with no data ever leaving the device. Users are thereby able to curate their data release by making column selections and transformations that control the dimensionality of the sensitive dataset. This process is itself informed by metrics describing the privacy and utility of the synthetic dataset, which is regenerated on demand in a real-time feedback loop until it meets the requirements for release.

In the context of corruption, Synthetic Data Showcase could help to generate new kinds of open data that describe actual instances of corruption risk (e.g., detected using OCDS red flag definitions (opens in new tab)), not just the systems of activity in which such risks may be identified (e.g., procurement data published using OCDS). By sharing the characteristics of detected risks – but not in a way that is linkable to any individual or company – the anti-corruption community can more easily share data that can be used for higher-level risk mapping and evidence development.

The second release is a public preview of our new ShowWhy (opens in new tab) tool designed to support the kind of causal evidence development that can inform public policy. Causation represents a higher standard of evidence than the kinds of associations and correlations discovered during exploratory data analysis, but making causal claims from real-world data – in contrast to data obtained through randomized controlled trials, experiments, or A/B tests – is challenging. Since observational datasets are inherently biased, models of the causal relationships affecting both the domain and the data collection process are necessary to correct for this bias.

ShowWhy aims to make the end-to-end process of causal inference accessible to domain experts, using the Microsoft Research DoWhy (opens in new tab) and EconML (opens in new tab) Python packages behind the scenes in an easy to use, no-code application. ShowWhy guides the user though the process of defining all the data variables, causal graphs, and effect estimators necessary to answer a causal question. The end result of this process is a collection of interactive summaries, describing both the process and results, that can be openly presented and defended to a range of audiences.

ShowWhy could enable a significantly broader cross-section of the anti-corruption community to develop evidence about the causes and consequences of corruption, adding to existing work that shows, for example, that tender transparency reduces corruption risks (opens in new tab).

Open data policy and technology go hand-in-hand, but the current state of the art in anti-corruption tools is only scratching the surface of what is possible with modern approaches to data science and machine learning. And only by incorporating advances in visual analytics and HCI can we hope to build accessible data tools that realize the transparency promise of open data, for diverse users and audiences, spanning all spheres of government, business, and society.

Graph showing vector representations for comparing company behavior

Vector representations for comparing company behavior over a 3.5-year time period (projected down from the higher-dimensional embedded space for visualization purposes). The same company is shown in the same color, once per quarter, with changes in position representing changes in bidding activity (i.e., the buyers, tenders, and items associated with bids made by the company). Different companies with similar activity in any pair of time periods are assigned similar positions.

 

Scatterplots representing time-varying behavior

The same representation of time-varying behavior (right scatterplot) compared with all-time behavior (left scatterplot). Points in the left scatterplot represent a single vector-based representation of a company that integrates both time-varying bid activity and other company information (partners, address, email, phone numbers). Clusters of similar companies are shown in the same color.

 

Scatter chart of weakly related companies

Selecting three weakly related companies in the visual to the right and observing distant positions in both embeddings. The maximum and average relatedness measured between all pairs of companies is shown in the top right, with values of 0.17 and 0.11 (out of 1.00) confirming weak relatedness.

 

Scatter chart of strongly related companies

Selecting three strongly related companies in the visual to the right and observing similar positions in both embeddings. The maximum and average relatedness measured between all pairs of companies is shown in the top right, with values of 1.00 and 1.00 (out of 1.00) confirming strong relatedness.

 

Graph showing synchronous similarity

Showing an inferred edge (relationship) between companies that can be explained by “synchronous similarity” – similar behavior in the same time period, repeated across multiple time periods. Strongly synchronized behavior could come from the natural structure of competition in a given area, but it could also be an indicator of coordination, collusion, or common beneficial control.

Graph showing an inferred edge explained by asynchronous similarity

Showing an inferred edge (relationship) between companies that can be explained by “asynchronous similarity” – similar behavior in non-overlapping periods over time. Strongly asynchronized behavior could come from the natural seasonality of competition in a given area, but it could also be an indicator of the “same” company dropping out of the ecosystem and reentering under a new identity.

 

Graph showing synchronous similarity with a common contact

An example of synchronous similarity where the two companies of the selected edge share a substantial number of contact details, but no registered partners (owners). Shared activity (in orange; bottom and rightmost bars respectively) dominates the independent activity of each company. Common ultimate beneficial ownership and control is a distinct possibility, but this may still be legitimate depending on the context.

 

Graph showing asynchronous similarity with a common contact

An example of asynchronous similarity where the two companies of the selected edge share a single contact detail (e.g., address, email, or phone number), but no registered partners (owners). The two companies demonstrate alternating periods of bid activity with minimal overlap. Coordination over time is a distinct possibility, but this may still be legitimate depending on the context.

 

Graph showing asynchronous similarity

An example of asynchronous similarity where the two companies of the selected edge share a single contact detail (e.g., address, email, or phone number), but no registered partners (owners). Very few of the many shared items, buyers, and tenders occur in the same time period, and independent bid activity is almost completely separate in time. Common company identity under a different name is a distinct possibility, but this may still be legitimate depending on the context.

 

Graph showing clusters of related entities

Drilling down into the observed evidence for a given inferred relationship (right) in a given cluster (center), selected either directly or by searching for a given company. The user can inspect all elements of common and independent activity, including linked buyers, tenders, items, partner, contact details, and time periods. Together with information about the temporal patterns of activity exhibited by the two companies, the user can make their own judgement about the likelihood and significance of any potential real-world relationship, including whether it needs further investigation.

The post Revealing the Hidden Structure of Corruption appeared first on Microsoft Research.

]]>
Case Study: Covid-19 Vaccine Eligibility Bot http://approjects.co.za/?big=en-us/research/articles/case-study-covid-19-vaccine-eligibility-bot/ Thu, 20 May 2021 15:52:13 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=739147 Our goal with this project was to create a conversational chatbot to streamline the determination of vaccine eligibility in the United States, aggregating policies across regions and policy updates over time.

The post Case Study: Covid-19 Vaccine Eligibility Bot appeared first on Microsoft Research.

]]>
As Covid-19 vaccine rollouts began in January 2021, it became clear that the phased eligibility approach in use by the United States was causing widespread confusion. The question of eligibility has been a complex topic, with rules and qualifications differing state-by-state – and sometimes differing at county and city levels. In states like Arizona and Massachusetts, we observed eligibility differences from county to county. In Los Angeles and Chicago, eligibility criteria were different between large cities and their surrounding counties. In North Dakota, the vaccine eligibility criteria even varied between pharmacies and vaccinating locations.

Because of the technical complexity in navigating this information space, there is a healthcare equity concern as to the ability of different groups to access vaccination resources. Qualifying individuals in hard-hit communities are underserved compared to the well-connected and technically savvy. This is often due to a gap between the information that exists and the ability of people to access it – on their own terms, in a form they understand, and in their own language.

Our goal with this project was to create a conversational chatbot to streamline the determination of vaccine eligibility in the United States, aggregating policies across regions and policy updates over time. Users would interact with the bot via a simple workflow of Yes or No questions to determine their current eligibility and be directed towards authoritative sources for confirmation. We designed this bot to be accessible across a range of communication channels (e.g., web, SMS, and WhatsApp), with localizations available for communities that may otherwise be disadvantaged due to low levels of English proficiency.

Data management platform

A risk we identified right away was that inaccurate or out-of-date data could become a source of misinformation and confusion. We engaged with partners in the healthcare space – MITRE (opens in new tab) and The Fight Is In Us (opens in new tab) – to mitigate this risk, curating and auditing these criteria as they changed over the course of the pandemic.

One key decision we made in the early stages of this project was to develop it in the open. We wanted this to be an opportunity to build trust – not just with our partners and their stakeholders, but with the open-source developer community and potential hosts of adapted bots around the world. To achieve this goal, we used GitHub as the central repository of all information relating to vaccine eligibility in our target deployment context (i.e., the United States). To support rapid updates and adaptation to other contexts, we developed a simple data schema that could capture a hierarchy of regional policies using the familiar nested filesystem model.

One of our goals with this system was to make it accessible to non-developers. GitHub is traditionally a platform used for software engineering, and the partners using this platform would not necessarily be familiar with version control systems such as git developed for source code. We therefore built a Data Management Portal that uses GitHub’s API to interact with the knowledge repository automatically, on behalf of the party deploying the bot. This approach gave us the ability to build on top of GitHub’s trusted infrastructure and workflows to provide data editing tools for a non-developer audience.

Bot deployment

We deployed the Covid-19 Vaccine Eligibility Bot to thefightisinus.org (opens in new tab) on March 25th, 2021. The Fight Is In Us represents a coalition of coalitions, themselves spanning corporations, NGOs, and community organizations that have been instrumental in the promotion of Covid-19 blood therapeutics – specifically convalescent plasma infusions. With the introduction of the Vaccine Eligibility Bot, along with a complementary bot they had developed for monoclonal antibody infusion, the scope of this coalition grew from information regarding one particular blood therapeutic into a suite of bots for a range of blood therapeutics as well as vaccinations.

screenshot of

The Fight is In Us

One of our other partners in this process was Bing, with the goal of integrating the bot into the Bing experience. As of launch, a link to the Vaccine Eligibility Bot is shown on both the Bing Covid dashboard and the Bing search engine results page.

map showing vaccine eligibility on bing

Vaccine Eligibility on Bing

Bot future

By this time, the Federal government had taken steps to clarify the process of determining eligibility, directing states to make all adults in the United States eligible for the vaccination by May 1st, 2021. This welcome clarity made it clear that the long-term utility of this bot would shift from the eligibility question of “Can I receive the vaccine?” to the access questions of “Where/how/when can I receive the vaccine?”

While the future is uncertain, we see near-term possibilities for the current bot that include international rules and deployments, rules for pediatric eligibility, and helping to clarify eligibility rules and recommendations for Covid-19 booster shots.

Beyond this particular bot, we anticipate that the patterns used here can be adapted for use in future crisis response and community outreach activities. Given our time constraints, we were not able to provide extensive localizations or deploy an SMS endpoint for the initial bot release. Given more lead-time, we could have reached a much broader population of underserved groups and individuals from the outset.

For now, we will continue building these capabilities as a reference implementation for a future class of “dynamic policy” bots – bots that provide a unified entry point to, and streamlined path through, a complex and ever-changing policy landscape. The open-source availability of such configurable and data-driven bots represents a comprehensive approach to policy coordination and communication, especially in times of crisis, addressing critical issues such as eligibility and access with speed, accuracy, and clarity. Most importantly, it shows a small but significant way in which we can promote more equitable distribution of essential resources, services, and support – for vaccines, healthcare, and beyond.

Key lessons and takeaways

We found great value in defining a public system for performing data management over geographically hierarchical policy rules. By leveraging known and proven techniques for managing information systems (e.g., git, pull requests, review cycles), we were able to avoid a large volume of work that would have been involved in developing a bespoke system. Organizations implementing data systems such as these should strongly consider similar approaches that leverage the transparency and workflow of platforms like GitHub.

Had we established the bot architecture and workflows prior to the outbreak of the Coivd-19 pandemic, the entire suite of bots described in this case study would have been available much sooner, with greater potential to help those in need. To that end, we recommend that governments, NGOs, and the technology community develop their policy coordination and communication tools before the next crisis occurs – and consider bots as part of the solution. We believe that open-source data management systems, with open data, interfaces, and review processes, will play an important role in managing both the tail of the Covid-19 pandemic as well as future crises, whatever and whenever they may be.

Finally, we want to express our gratitude to our partner teams at MITRE, Bing, and GeneralUI in helping to make this bot a reality.

Resilience principles in action

  1. Tackle the causes and consequences of systemic vulnerability
    • The Covid-19 pandemic has exacerbated vulnerabilities related to the social determinants of health. Some of the groups at greatest risk from the virus are also the hardest to reach, with complex and evolving eligibility requirements representing a barrier that perpetuates healthcare inequity.
  2. Convene and participate in broad coalitions with a bias for action
  3. Contribute a “resilience toolbox” of open tools and technologies
  4. Understand the links between people, practices, and outcomes
    • We have observed the practice of decentralized policy communication creating adverse outcomes for groups who are unable to access, interpret, and act on the policy information that is relevant to them, i.e., to determine eligibility and schedule and receive a complete course of vaccination.
  5. Democratize expert workflows and real-world evidence development
    • Covid-19 is everyone’s concern, but there are not enough experts to answer everyone’s questions directly. Bots help to scale experts by absorbing the overhead of policy implementation. In many cases, the real-world actions taken by the users of bots contribute to real-world sources of evidence.
  6. Build trust through transparency and value-sensitive design
    • We are conservative about the information captured by the bot and communicate the rationale in a clear and concise privacy statement at the initiation of each conversation. The sequence of questions aims to minimize the number of simple Yes/No answers to an eligibility determination.
  7. Design for transformation at the mesoscale of activity
    • Our goal is not just to create a Vaccine Eligibility Bot for Covid-19 in the US, but to create a framework for use at other vaccine stages, in other counties, and for other pandemics and large-scale emergencies in general. We want to help bots become a standard part of the toolbox with which organizations coordinate and communicate information to the communities that they serve.

The post Case Study: Covid-19 Vaccine Eligibility Bot appeared first on Microsoft Research.

]]>
Case study: Tech Against Trafficking http://approjects.co.za/?big=en-us/research/articles/case-study-tech-against-trafficking/ Thu, 20 May 2021 15:52:06 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=739159 Only through the proper coordination of partners spanning the public and private sector, coupled with the mutual evolution of technology and policy, can the human effort to combat trafficking be most effective.

The post Case study: Tech Against Trafficking appeared first on Microsoft Research.

]]>
Human trafficking is a crime against humanity and a violation of the most fundamental human rights. No attempt to build a more resilient society can ignore the systemic issues that allow trafficking to take hold, since the ongoing existence of modern slavery, servitude, and exploitation challenge the very foundations of what we seek to protect. Nor can our efforts to protect the most vulnerable people in times of crisis fail to acknowledge the role of crisis events in fueling the systematic exploitation of such groups:

“Human traffickers are almost always the first responders to crisis zones around the world. Whether an environmental catastrophe or a military conflict or a sudden economic collapse, whenever crisis strikes, women, children, the poor, and outcastes are disproportionately affected, exploited, and recruited by traffickers”
Siddharth Kara, Modern Slavery: A Global Perspective, 2017

To tackle human trafficking in both its chronic and acute forms, we must recognize that the fight against trafficking is itself a complex societal activity that requires resilience to succeed. It needs investment in the core “3 Ps (opens in new tab)” of anti-trafficking work – prevention, prosecution, and protection – supported by the critical and cross-cutting role of partnerships. Only through the proper coordination of partners spanning the public and private sector, coupled with the mutual evolution of technology and policy, can the human effort to combat trafficking be most effective.

It is within this context that Microsoft jointly founded (opens in new tab) the Tech Against Trafficking (opens in new tab) (TAT) coalition in 2018. Through our work on the TAT steering committee and across initiatives including an accelerator program, we have formed long-term partnerships and projects tackling this most urgent societal problem. With TAT, we recognized the fact that there is only so much that any one organization can achieve alone. The more that different organizations can share with one another, the better placed the community will be to deliver an effective and coordinated response.

However, the nature of trafficking complicates sharing in multiple ways. Different tools tackling the same problems are developed in different contexts, missing out on the benefits that come from shared tools and practices. This is compounded by the lack of shared information on the availability and effectiveness of existing tools, encouraging reinvention and fragmentation. Where data on the same phenomena are collected by different parties, the lack of data standards means that such data cannot be easily compared or combined. And perhaps most importantly, where collected data relate to individual victims, the need to protect the privacy and safety of these victims means that the data cannot be shared or published without expert analysis of the statistical disclosure risk. All of these factors undermine the timely sharing of information within the community, and shaped our selection of TAT initiatives that could transform the development of evidence-based policy for anti-trafficking efforts worldwide.

Interactive map of anti-trafficking tools

The first such initiative led to the development of the TAT Interactive Map (opens in new tab) in Power BI (opens in new tab), encompassing 300 counter-trafficking technology tools and designed to enable tool discovery, gap identification, and technology advocacy. The interactive map and corresponding tool survey have been jointly published (opens in new tab) with the Organization for Security and Co-operation in Europe (opens in new tab) (OSCE) and shared via testimony (opens in new tab) to a US Congressional Hearing (opens in new tab) on the role of technology in countering trafficking in persons.

Tech Against Trafficking Interactive Map of Anti-Trafficking Tools

Tech Against Trafficking Interactive Map of Anti-Trafficking Tools. Shows how 111 tools in Tool Category “Victim/Trafficker Identification” (shown as orange nodes in the graph and cards below) form clusters based on their Trafficking Type and Technology. Cards may be expanded for further details, including links to external content. Different pages of the map show tool clusterings for different pairs of attributes (e.g., Target Sector – Target Users).

The second such initiative was developed through the inaugural TAT Accelerator Program (opens in new tab) with the Counter-Trafficking Data Collaborative (opens in new tab) (CTDC). This collaborative, managed by the UN Migration agency, IOM, pools data from IOM (opens in new tab), Polaris (opens in new tab), and Liberty Shared (opens in new tab) to create the world’s largest database on identified victims of trafficking. The CTDC data hub makes derivatives of this data openly available via anonymized datasets, interactive dashboards, interactive maps, and topical data narratives.

However, while such data is essential evidence as to the existence and nature of trafficking, publishing it directly risks revealing the presence of individual victims to their traffickers, even if the data are de-identified and anonymized using standard approaches. This is not just a privacy risk, but also a safety risk: if traffickers believe a victim has received assistance, they may assume that this implies collaboration with law enforcement and hence the need for retaliation against the victim, or their friends, family, or community.

Privacy-preserving data platform

Our solution to this problem, described in this paper (opens in new tab) and available open-source on GitHub at microsoft/synthetic-data-showcase (opens in new tab), is to generate synthetic data and aggregate data for sharing in place of actual sensitive data. These datasets are brought together in an automatically generated Power BI (opens in new tab) interface that enables data access and exploration under an absolute guarantee of group-level privacy. In other words, no user of this interface or the underlying datasets should be able to identify combinations of attributes that describe groups of individuals below a specified minimum size.

chart: CTDC global dataset on victims of trafficking

An example “synthetic data showcase” for the CTDC global dataset on identified victims of trafficking. Shows the distributions of attribute values in the synthetic data for the selections Gender=Male and Age=30–38. The chart to the right compares these estimated values (in light green) to the actual precomputed values (in dark green), revealing only minor differences between the two datasets.

Such group-level protection is especially important in contexts where the safety of data subjects is at risk. For example, imagine that a trafficker has exclusive control of a trafficking route between two countries, and has full knowledge of all the victims they have trafficked on this route. It would be easy for them to use this background knowledge to identify any of their victims in a sensitive dataset if it were to be published directly. Even if only the count of victims for that route were published, the closer the published count to the trafficker’s known number of victims, the stronger the inference the trafficker could make that each of their victims was present in the dataset.

However, if we only report combinations of attributes that are common in the sensitive dataset, and only disclose their approximate frequency, we restrict the extent to which a trafficker could make inferences about the presence of any given individual: smaller groups cannot be detected at all, larger groups offer “safety in numbers”, and differences in approximate counts over time cannot be used to detect small or precise changes.

In general, as the sizes of detected groups increase, the likelihood that adversaries have complete and exclusive knowledge of all group members diminishes. The policy challenge is therefore to set the minimum detectable group size above the level at which an adversary could have complete knowledge about, or take retaliatory action against, all individual members of such a group. The technical challenge is creating an anonymization method that enables such group-level disclosure control.

Our algorithm for generating synthetic data achieves such control through what we call k-synthetic anonymity: the records of the synthetic dataset are constructed such that all combinations of attribute values are common in the sensitive dataset – appearing k or more times – and therefore cannot be linked with groups of individuals below the specified privacy resolution k. Aggregate counts of records for all combinations of attributes up to a specified length are also precomputed and rounded down to the closest k to ensure that groups of individuals smaller than k cannot be identified in either of the two datasets. To ensure that the synthetic data preserves the high-level statistics of the sensitive dataset, the counts of individual attributes in the records of the synthetic dataset are also controlled such that they are the closest multiple of k not exceeding the actual aggregate value. The structure of the synthetic data, in terms of the distribution of attribute combinations, also preserves the structure of the sensitive data to the extent that is possible while respecting the absolute privacy guarantee of a minimum detectable group size k.

Outcomes

This solution is currently being adopted by IOM to transform how victim data is made available to the counter-trafficking community via the CTDC data hub and other channels. The open-source microsoft/synthetic-data-showcase (opens in new tab) solution for privacy-preserving data publishing is complemented by a second outcome from the first TAT accelerator – the Human Trafficking Case Data Standard – also available on GitHub at UNMigration/HTCDS (opens in new tab).

Both projects have been incorporated by IOM, in collaboration with the United Nations Office on Drugs and Crime (UNODC (opens in new tab)), into a guidance manual providing guidance to government agencies, law enforcement, and other stakeholders on the collection, storage, management, and sharing of human trafficking administrative data. An expert and stakeholder workshop in May 2021 provided final validation of the materials, supporting the international launch of training and capacity-building activities. The goal is to facilitate the production of consistent, high quality data that will strengthen the shared evidence base for anti-trafficking policy around the world.

Our work with Tech Against Trafficking, CTDC, and IOM is ongoing, as are collaborations with other organizations and stakeholders in the anti-trafficking community. These efforts build on and complement a wide range of additional activities and partnerships that Microsoft participates in to combat forced labor in its supply chain, and tackle the issues of human trafficking, exploitation, and child abuse more broadly. For more information, see Microsoft’s Modern Slavery and Human Trafficking Statement (opens in new tab).

The need for safe representation of at-risk populations is not limited to victims of human trafficking, and can support assistance and advocacy activities for any marginalized, exploited, or otherwise vulnerable populations. Similarly, in times of crisis we need tools not just for publishing anonymous datasets, but for providing privacy-preserving views onto private databases – potentially operated by distinct parties who cannot pool their data for privacy, trust, or regulatory reasons. Supporting federated and accessible forms of data exploration, analytics, machine learning, and causal inference is essential for real-world evidence development at scale, and forms the basis of ongoing work to be presented in future case studies.

Resilience principles in action

  1. Tackle the causes and consequences of systemic vulnerability
    • Human trafficking is both a cause and consequence of vulnerability, with substantial harm.
  2. Convene and participate in broad coalitions with a bias for action
    • Microsoft is a founding and active member of the Tech Against Trafficking (TAT) coalition.
  3. Contribute a “resilience toolbox” of open tools and technologies
  4. Understand the links between people, practices, and outcomes
    • Both tools are in support of evidence-based practice towards better real-world outcomes.
  5. Democratize expert workflows and real-world evidence development
    • Interactive dashboards enable rich, code-free analysis of real-world data by non–data-scientists.
  6. Build trust through transparency and value-sensitive design
    • We fill collaboration gaps for both data and tools through interfaces that promote transparency.
  7. Design for transformation at the mesoscale of activity
    • We focus on making data and tools relevant and targeted to policymakers at the national level.

The post Case study: Tech Against Trafficking appeared first on Microsoft Research.

]]>
Case study: Mapping Organizational Resilience http://approjects.co.za/?big=en-us/research/articles/case-study-mapping-organizational-resilience/ Thu, 20 May 2021 15:52:01 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=739129 How can we help organizations to understand their capacity for resilience before it is needed, and how can we help them to increase that capacity before it is too late?

The post Case study: Mapping Organizational Resilience appeared first on Microsoft Research.

]]>
“When I think about digital transformation now, I break it into two things. I think about resilience and what Microsoft can do to help any business be more resilient”
Satya Nadella (opens in new tab), Microsoft Q4 2020 Earnings Call

The concept of organizational resilience can be used to explain why some organizations succeed while others fail in times of change. Every organization has limits beyond which it ceases to operate effectively.  When emerging crises push organizations beyond these limits, there is immense pressure to adapt to the new environment – and quickly. Organizations that fail to adapt may no longer be able to achieve their purpose, return a profit, or justify their continued existence.

Business closures at the onset of the Covid-19 pandemic are a stark illustration of how many organizations operate at the limits of their capacity, even in normal times. Under sustained pressure, even the best organizations may be pushed to their limits. How can we help organizations to understand their capacity for resilience before it is needed, and how can we help them to increase that capacity before it is too late?

One important step is digital transformation. All the forces driving this cross-industry process have been multiplied by the pandemic, and Microsoft has published a playbook (opens in new tab) for accelerating the kind of digital transformation that facilitates business resilience (opens in new tab). But what does a “digitally transformed” organization look like, how does it operate over time, and how can we identify behaviors that represent both resilience and rigidity in the face of threats?

Organizational network analysis

For several years, we have been working with the Microsoft Workplace Analytics (WPA) team to explore how advances in graph statistics could be applied to Organizational Network Analysis (ONA), as well as how such network analysis could reveal insights about how work actually gets done in practice. This can be quite different to how work is organized “in theory” as represented by formal org charts, and the degree of alignment can vary dramatically both across different parts of the organization and over time.

Our initial explorations focused on the use of network visualization to map out the structure of organizations based on pairwise collaboration between individuals, inferred from shared activity across Outlook and Teams (e.g., email or message exchange; shared meetings or channels). We were interested in the identification of workgroups that reveal the organic, emergent, and informal organization of work. Using network community detection algorithms to infer the hierarchical structure of these collaboration networks, we were able to provide a contrasting model of organization to the hierarchical reporting structures of org charts.

Societal resilience workgroups map

These workgroup maps gave us something tangible to share with company executives as offering new perspectives on their own organizations. They allowed vague concepts like “siloed” and “stretched” to become self-evident in the visual structure of workgroups, but also raised several important questions: How well do workgroups align with reporting hierarchies? How much do workgroups vary their patterns of collaboration over time? And how can we measure and visualize these qualities as a way to understand the culture of collaboration across an organization?

Workgroup mapping

Our workgroup mapping (opens in new tab) paper describes how we answered these questions using two new metrics, representing (1) the “freedom” of workgroups to collaborate across organizational boundaries, and (2) the “fluidity” of workgroup relationships over time. To date, we have used these metrics (and accompanying presentation generating pipeline) to share collaboration insights with the leaders of hundreds of major companies who have provided their tenant data for this purpose.

When the Covid-19 pandemic struck, we turned our attention to how such collaboration metrics might be used as a barometer of organizational health over time – revealing how organizations responded to the unprecedented shock and its enduring consequences for the practice of work. For those organizations who were able to transition to home working, the signals from communication and collaboration software would, for the first time, capture all interactions in their new, virtual, and “digitally transformed” workplaces. What could this tell us about the response of our own organization, our customer organizations, and the industries that we serve? And what do different levels of collaboration metrics across workgroups tell us about the resilience of the broader organization?

Organizational resilience

Our toward resilience (opens in new tab) and uncovering resilience (opens in new tab) articles describe our process of deriving new collaboration metrics that would allow us to understand the impact of the pandemic on our own organization, Microsoft. For example, we define “churn” as the proportion of collaborative relationships from one period that are lost in the following period. If we align these periods either side of a disruptive event (such as a pandemic passing its tipping point), we might expect to more resilient organizations experience less churn. Indeed, this is what we observed in general for Microsoft – but that’s not the whole story.

organizational resilience 'churn' map

Most employees retained their existing collaborations and even expanded their networks beyond their customary workgroups. For the strategic and operational “control center” of the organization, however, the degree of churn was dramatic. This makes sense – the charter of such groups is to absorb external shocks and chart a course towards recovery while allowing the core engines of productivity to proceed uninterrupted. Such focused churn is a sign of dedicated crisis response, and when combined with the net growth of individual networks, is a clear marker of organizational resilience. In contrast, a “panic response” would manifest as large-scale churn across the organization as connections and work are dropped, while “threat rigidity” would show as individual networks growing smaller and more stable as people double down on what and who they already know.

Building on this single-company view, we expanded our analysis to examine the effects of the Covid-19 pandemic across organizations and industries. The latest Microsoft Work Trends Index (opens in new tab) describes how one such analysis incorporated 122 billion email interactions and 2.3 meeting interactions across industries and countries, showing an overall increase in the “siloed” nature of workgroups with strong internal relationships and clear group boundaries (the “modularity” metric). In this preprint (opens in new tab) describing a related collaboration with the University of Washington and Johns Hopkins University, we present the analysis of 360 billion email interactions across more than 4000 organizations. The main finding here is that the workgroup silos emerging in response to the pandemic have a different membership structure to pre-pandemic times – and that this new structure has persisted over time.

In other words, we have observed the mass adaptation in the organization of work towards smaller, more defined, and fundamentally different workgroups than existed before the pandemic. While such adaptation suggests resilient response, whether the resulting structures helps to drive resilient recovery is an open and active research question.

Resilience principles in action

  1. Tackle the causes and consequences of systemic vulnerability
    • The pandemic-induced shift to remote working has created an always-on culture that puts the health of both employees and businesses at risk, since people and organizations operating closest to their limits are the least resilient to future shocks. Workgroup maps provides a unified and unit-level view of how the structure and demands of work activities are evolving over time.
  2. Convene and participate in broad coalitions with a bias for action
    • Within Microsoft, we are developing our organizational resilience capabilities in collaboration with Workplace Analytics, Office, and others. Externally, we are collaborating with Johns Hopkins University on “Organizational Dynamics to Enable Post-Pandemic Return to Work” through a funded study on pandemic preparedness (opens in new tab) and the University of Washington Foster School of Business on the implications of such organizational dynamics for business management.
  3. Contribute a “resilience toolbox” of open tools and technologies
    • We have released the graph statistics and visualization capabilities behind our workgroup maps via the open-source Graspologic (opens in new tab) library. This library is the result a merging our previous topologic package with the GraSPy package developed by our collaborators at JHU, creating a unified source of advanced graph algorithms for the Python community.
  4. Understand the links between people, practices, and outcomes
    • This work focuses on capturing and modelling the people networks implicit in organizational collaboration logs. How these networks partition into communities – and how these communities can be characterized by various graph statistics and network metrics – yields high-level insights into the practice of work that can be compared against known work outcomes and other covariates.
  5. Democratize expert workflows and real-world evidence development
    • Our workgroup mapping pipeline automates collaboration log analysis and presentation-building workflows in ways that democratize expert workflows from two distinct areas of expertise. The resulting presentation artefacts, together with the outputs of our large-scale analyses, represent new sources of real-world evidence that can be used to inform and evaluate organizational policy.
  6. Build trust through transparency and value-sensitive design
    • Any use of organizational network analysis must respect the privacy of individual employees represented in unit-level visualizations such as network layouts (where each node is one member of the organization). Our approach is to select metrics that are neither inherently good or bad, but which represent desirable variation in practice, and calculate and communicate these at a level that aggregates the contributions of many individuals (e.g., the sub-organization or workgroup level).
  7. Design for transformation at the mesoscale of activity
    • Organizations are a key entity at the mesoscale of society. Revealing the structure and dynamics of organizational activities in a way that helps shape positive organizational policies is an efficient way of reaching and benefiting many individuals. Even within organizations, we focus on the mesoscale of communities or workgroups as the implicit and underrecognized units of organization against which policy interventions should be planned, implemented, evaluated, and revised.

The post Case study: Mapping Organizational Resilience appeared first on Microsoft Research.

]]>