Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
- Alexandra Olteanu ,
- Carlos Castillo ,
- Fernando Diaz ,
- Emre Kiciman
Social data in digital form, which includes user-generated content, expressed or implicit relationships between people, and behavioral traces, are at the core of many popular applications and platforms, and drive the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, product, celebrity, or other entity, as well as enabling better decision making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This survey recognizes that the rigor with which these issues are addressed by different researchers varies across a wide range. We present a framework for identifying a broad range of menaces in the research and practices around social data.
Failures of imagination: Discovering and measuring harms in language technologies
Auditing natural language processing (NLP) systems for computational harms remains an elusive goal. Doing so, however, is critical as there is a proliferation of language technologies (and applications) that are enabled by increasingly powerful natural language generation and representation models. Computational harms occur not only due to what content is being produced by people, but also due to how content is being embedded, represented, and generated by large-scale and sophisticated language models. This webinar will cover challenges with locating and measuring potential harms that language technologies—and the data they ingest or generate—might surface, exacerbate, or cause. Such harms can range from more overt issues, like surfacing offensive speech or reinforcing stereotypes, to more subtle issues, like nudging users toward undesirable patterns of behavior or triggering memories of traumatic events.
Join Microsoft researchers Su Lin Blodgett and Alexandra Olteanu, from the FATE Group at Microsoft Research Montréal, to examine pitfalls in some state-of-the-art approaches to measuring computational harms in language technologies. For such measurements of harms to be effective, it is important to clearly articulate both: 1) the construct to be measured and 2) how the measurements operationalize that construct. The webinar will also overview possible approaches practitioners could take to proactively identify issues that might not be on their radar, and thus effectively track and measure a wider range of issues.
Together, you’ll explore:
- Possible pitfalls when measuring computational harms in language technologies
- Challenges to identifying what harms we should be measuring
- Steps toward anticipating computational harms
Resource list:
- A Critical Survey of “Bias” in NLP (opens in new tab) (Publication)
- When Are Search Completion Suggestions Problematic? (opens in new tab) (Publication)
- Social Data (opens in new tab) (Publication)
- Characterizing Problematic Email Reply Suggestions (opens in new tab) (Publication)
- Overcoming Failures of Imagination in AI Infused System Development and Deployment (opens in new tab) (Publication)
- Defining Bias with Su Lin Blodgett (opens in new tab) (Podcast)
- Language, Power and NLP (opens in new tab) (Podcast)
- Su Lin Blodgett (opens in new tab) (researcher profile)
- Alexandra Olteanu (opens in new tab) (researcher profile)
*This on-demand webinar features a previously recorded Q&A session and open captioning.
Explore more Microsoft Research webinars: https://aka.ms/msrwebinars (opens in new tab)