2024 Recipients
Fernando Diaz
Carnegie Mellon University
Generative Tip of the Tongue Retrieval Support
Tip of the tongue retrieval refers to searching for an item a user has seen before but whose name they do not recall. Although generative AI has led to advancements in how we search for information, tip of the tongue retrieval remains difficult due to the imprecise, incorrect, and contextual information provided by searchers. We propose a LLM-based simulation framework for developing interactive tip of the tongue retrieval systems. By adopting a retrieval-based evaluation and optimization architecture, this framework can be applied in enterprise settings without encoding confidential information in model parameters.
Co-author(s): To Eun Kim, Carnegie Mellon University
Sharon Ferguson
University of Waterloo
A Novel AI-Powered System for Building Shared Understanding in Teams
We envision human-AI collaboration that supports human-human collaboration by characterizing, analyzing and providing suggestions for collaboration grounded in team science literature. Shared understanding, or the extent to which team members are «on the same page» about their goals, processes, and interactions, is important for efficient and enjoyable teamwork. Yet, lapses in shared understanding are easy for teams to miss. We aim to use traditional ML algorithms combined with LLMs to measure shared understanding based on a team’s chat messages or video calls. The system will identify lapses in Shared Understanding in real-time and ask clarifying questions to help teams get back on the same page. By encouraging team members to reflect on their understanding, proactively identifying misunderstandings, and prompting perspective-taking, this system will bring the depth of team science literature to everyday teams.
Co-author(s): Sirisha Rambhatla, University of Waterloo; Alison Olechowski, University of Toronto
Gary Hsieh
University of Washington
Accelerating Research Translation into Design Practice Using Generative AI
Valuable insights embedded in scientific publications are rarely translated into formats that design practitioners can easily consume and apply in their work. These research-to-practice gaps impede the diffusion of innovation and undermine evidence-based decision making. In this work, we explore the potential of generative AI to convert academic findings into a designer-friendly format. We will examine how to design these translational artifacts so that they are credible and are appropriately tailored to individual designers’ needs.
Co-author(s): Lucy L. Wang, University of Washington
Harmanpreet Kaur
University of Minnesota
Using Adversarial Design to Support Appropriate Reliance on Generative AI Outputs
The value of integrating generative AI into applications is limited by whether people are able to appropriately rely on its outputs—while over-reliance can result in potentially harmful content being adopted, under-reliance prevents people from taking advantage of AI assistance. This project proposes adversarial design as an approach for supporting appropriate reliance on generative AI outputs, and tests its efficacy using LLM-based assistants for tasks like coding, search, and writing. For instance, we will introduce adversarial outputs in syntactic and semantic elements of the generated code, as well as via a chat interface that acts as a “provocateur.” With this approach, we aim to reduce immediate acceptance or rejection of AI outputs, and nudge people to deliberate about them instead.
Rui Zhang
The Pennsylvania State University
Improving Productivity by Gradient-based Prompt Optimization for LLMs
This proposal aims to develop a novel gradient-based prompt optimization technique to unlock the prowess of LLMs on complex and challenging tasks to elevate productivity. The novelty of our method lies in our design of a white-box prompt optimization algorithm over open-source language models by leveraging both the gradient of task accuracy and language model decoding probability. This leads us to automatically and efficiently create effective, interpretable, and transferable prompts on a variety of small and large language models on complex and diverse tasks. Our research advances prompt optimization by integrating gradients and LLM probabilities, creating long-term impacts on white-box optimization research to close the gap between small and large LMs.
-
Eytan Adar
University of Michigan, Ann Arbor
Codes of Generative Conduct: Collaboratively Authoring and Testing Policies to Govern LLMs
When we write, we are guided not only by our intended audience but also by the constraints set by our communities—the standards of our workplace, legal frameworks, and industry norms. Even sub-communities naturally develop their variants of these rules. For large language models to better support generative writing, they should adhere to the same set of community-set standards and rules. In our proposed work, we tackle how groups can collaboratively construct, test, and use writing standards to guide generative applications.
Co-author(s): Eric Gilbert, University of Michigan, Ann Arbor
Varol Akman
İHSAN DOĞRAMACI BİLKENT ÜNİVERSİTESİ
ChatGPT as a Stage Manager in Large-Scale Institutional Practices
Large-scale institutional structures are characterized by the collective action of numerous employees at different levels working towards common goals. Decision-making in these structures is complex due to the absence of egalitarianism and the presence of an authority hierarchy. Group agency emerges within these structures, where multiple agents work together based on joint commitments to advocate for shared goals. The project at hand aims to investigate decision-making and its consequences in such institutional structures using ChatGPT.
Danqi Chen
Princeton University
Grounding Large Language Models with Internal Retrieval Corpora
This proposal tackles the research problem of augmenting large language models (LLMs) with internal retrieval corpora, so we can enable LLMs to generate text that conforms to up-to-date and domain-specific information without additional re-training. The proposal aims to address two technical challenges: a) how to incorporate a large number of (and potentially noisy) retrieved tokens in a limited context window; b) How to train high-quality retrievers for specific domains for better integration with black-box LLMs.
Noshir Contractor
Northwestern University
Deciphering the Structural Signatures of High-Performing Human-AI Teams
This research project explores the dynamics of human-AI collaboration in problem-solving and creative thinking tasks. We will extend prior Human-Autonomy Teaming research that studies human-AI teams with the Wizard-of-Oz methodology’s help by replacing a human confederate with an LLM-powered AI teammate. The experiment involves manipulating task types and AI teammate functions to examine how people orient themselves toward intelligent machine teammates and how technology can be designed to be a collaborator, not a chatbot. Participants will communicate in a group chat environment while conducting tasks, and their experiences, performance, and social interactions will be surveyed, analyzed, and compared to identify the mechanisms of high-performing Human-AI teams.
Philip Guo
UC San Diego
Beyond just programming: using an LLM’s knowledge about the human world to improve human-AI collaboration for data science
We plan to build an LLM-based data science assistant that reasons about both the content of the code that it generates and the real-world domain that the requested analysis is about. Combining knowledge about both code and the human world can hopefully enable our LLM to perform more rigorous data analyses than current coding assistants like GitHub Copilot. The goal of this project is to advance the new future of work by empowering everyone across the workplace — from field technicians to customer support representatives to executives — to directly analyze data that matters most for their jobs. More broadly, we want to use AI to enable everyone to analyze the data that is most meaningful and relevant to them, even if they are not data science or programming experts.
John Horton
Massachusetts Institute of Technology
Understanding the Effects of Github Copilot on Worker Productivity and Skill Evolution in the Online Labor Market
Our project aims to understand the impact of Github Copilot on software engineers and other technical professionals’ productivity, skill adoption, and labor market outcomes in a real labor market. LLMs have been shown to affect worker productivity on researcher-assigned tasks, but less is known about how workers use such tools in their real-life work, or what happens in equilibrium as more workers adopt them. We plan to use data from a large online labor market merged with survey data where workers report when they began using Copilot to understand how this tool impacted workers productivity, wages, and skill upgrading.
Co-author(s): Apostolos Filipas, Fordham University; Emma van Inwegen, MIT
Anastasios (Tasos) Kyrillidis
Rice University
Efficient new-task adaptation in the era of Transformers and Federated Learning
Starting from traditional Transformer models, one of the goals of this project is to introduce efficient ways to i) adapt to new tasks without re-training the models from scratch; ii) combine existing trained models in a meaningful way, extending MoE modules beyond basic feed-forward layer splitting; and iii) consider novel federated learning scenarios based on Transformer-based models, where computational-/communication- bottlenecks require novel transformer decompositions, beyond sparse MoEs. Prior advances from this collaboration (results on asynchronous federated learning scenarios, new uses of a mixture of experts for zero-shot personalization in federated learning, and novel transformer-model decomposition for efficient distributed computing, respectively) will be extended based on the above directions.
Matthew Lease
University of Texas at Austin
Providing Natural Language Decision-Support via Large Language Models
Large language models (LLMs) are transforming the way people seek, explore, and assess information for decision making. By attending to how people naturally interact with each other, LLM-based decision-support tooling can strengthen both the quality of support offered and the accompanying user experience. Because LLMs are fallible and can hallucinate incorrect information, LLM interactions should also be designed to handle such fallibility in stride, akin to how we interact with one another around normal human fallibility. This project will explore such opportunities and challenges in designing ways to provide LLM-based decision support that is innovative, intuitive, and effective.
Mor Naaman
Cornell Tech
Impact of Language Models on Writing and its Outcomes in the Workplace
Use of autocomplete features and other language and text suggestions powered by large language models can shift people’s writing, influence their thinking, and may lead to different outcomes for the person exposed to them. Using large-scale online experiments, this project aims to understand the potential of such AI-based products used in workplace settings to result in these different outcomes. In particular, we aim to understand the disparate effect these products may have on people from different backgrounds who may have different language styles. We will further investigate strategies to mitigate any negative outcomes.
Hamed Zamani
University of Massachusetts Amherst
Improving Productivity by Personalizing Large Language Models
Large language models (LLMs) have recently revolutionized natural language processing, including a number of user-facing applications that largely impact users’ productivity, such as QA, writing assistants, and task management. In user-facing applications, it is widely accepted that users have different needs and behave differently. This suggests the importance of personalization in these applications. The goal of this project is to study different approaches for personalizing LLMs and their potential impact on user’s satisfaction and productivity.
Amy Zhang
University of Washington
Interactive Personalized Information Artifacts to Summarize Team Communication
We examine how to design team-AI systems embedded into teams’ existing communication spaces that can use LLMs to leverage rich prior communication data to support team goals and activities. Information artifacts that convey information from prior communication logs should not only summarize content but also support additional team goals such as sharing out different information to different audiences and serving as a springboard for follow-on discussion. We aim to use LLMs in a novel system for generating interactive summary artifacts that live within group chat and that summarize recorded video meetings. These artifacts will use LLMs to 1) personalize the content of the summary to different team members based on their conversation history, such as by highlighting specific points for follow-on discussion, 2) enable users to interactively expand the summary and dive into the context of the original conversation, and 3) allow users to customize an artifact in order to share it with different audiences.