-
Carnegie Mellon University: Maxine Eskenazi (opens in new tab) (PI), Shikib Mehri (opens in new tab)(PhD student)
Microsoft: Payal Bajaj, Ashutosh Adhikari, Vishrav Chaudhary, Sarv Ghotra, Baolin Peng
We propose to leverage TNLG in combination with the schema-guided paradigm to facilitate zero-shot transfer to unseen dialog tasks. While the schema-guided paradigm would handle generalization to new policies, TNLG would allow zero-shot NLU/NLG. Concretely, we propose four different approaches for using TNLG to facilitate zero-shot transfer: (1) using TNLG naively without the schema-guided paradigm, (2) representing the dialog policy as a sequence that is input to TNLG, (3) fine-tuning TNLG to attend to a graph – based representation of the dialog policy and (4) exploring the use of fusion mechanisms (e.g., ColdFusion) with TNLG. Through these approaches, we hope to assess the impact of large-scale pre-training on zero-shot transfer in end-to-end dialog and better understand the capabilities of TNLG.
-
Carnegie Mellon University: Katerina Fragkiadaki (opens in new tab), (PI), Tom Mitchell (opens in new tab) (PI), Chris Atkeson (opens in new tab) (PI), Nikos Gkanasios (PhD student), Ayush Jain (opens in new tab) (Masters student), Yunchu Zhang (opens in new tab) (Masters student)
Microsoft: Kriti Aggarwal (opens in new tab), Vishrav Chaudhary (opens in new tab), Sarv Ghotra (opens in new tab), Julia Kiseleva (opens in new tab)
We will explore and develop algorithms to simplify human collaboration with robots and make programming collaborative robots (co-robots or cobots) less expensive, with the assistance of human teachers that employ natural language descriptions paired with visual or kinesthetic demonstrations, in order to teach robotic agents new skills or help them adjust/improve existing ones.
-
Harvard: Junwei Lu (opens in new tab) (PI), Tianxi Cai (opens in new tab) (PI), Katherine P. Liao (opens in new tab), Doudou Zhou (PhD student), Keming Lu (Masters student), Zebin Wang (PhD student), Shuting Sheng (PhD student), Yue Liu (opens in new tab) (PhD student) (opens in new tab)
Microsoft: Peter Potash, Vishrav Chaudhary, Eric Lin, Kiran Prasad, Tristan Naumann
Recent studies such as Microsoft’s Turing Natural Language Representation (TNLR) models has been widely applied to numerous practical tasks, including summarization and question answering. We propose to transfer the TNLR models to the electronic health record (EHR) datasets and handle the disparity, bias, and the privacy during such process. We aim to learn the representation of EHR codes from the learning procedure and reveal novel clinical insights.
-
Harvard: Stuart M. Shieber (opens in new tab) (PI), Yuntian Deng (opens in new tab) (PhD student), Simas Šakenis (opens in new tab) (student)
Microsoft: Kiran Prasad, Karan Saxena, Ashutosh Adhikari, Vishrav Chaudhary, Paul Smolensky, Roland Fernandez
In recent years, large pretrained language models have demonstrated their ability to generate fluent text. However, they do not enjoy as much success in tasks requiring reasoning. Motivated by the fact that reasoning is a crucial part of natural-language understanding and generation, our goal in this proposal is to improve the reasoning ability of large pretrained language models. To this end, we argue that the normal end-to-end training scheme that only uses the inputs and the desired reasoning outcomes as supervision is unlikely to provide enough learning signal to the model and propose to augment the training process with instructional scaffolding, which provides intermediate reasoning steps for some of the training examples. The proposed approach will be evaluated on tasks that require both language fluency and logical reasoning.
-
MIT: Song Han (opens in new tab) (PI), Ligeng Zhu (opens in new tab) (PhD student), Ji Lin (opens in new tab) (PhD student), Han Cai (opens in new tab), Hanrui Wang (opens in new tab)
Microsoft: Eric Lin, Sarv Ghotra, Saksham Singhal, Karthik Mohan, Payal Bajaj, Karan Saxena, Vishrav Chaudhary, Yu Cheng, Subho Mukherjee
Natural language processing (NLP) has made tremendous progress, thanks to advanced models such as the Generative Pre-training Transformer (GPT) and Turing-NLG (TNLG). Because of their massive size, these models are difficult to fine-tune, let alone deploy to real-world applications. Given the growing demand for small model size, fast response time, and low computational cost, we propose to study (1) efficient training of large NLP models from scratch with different levels of sparsity (2) efficient fine-tuning large NLP models with limited memory budgets (3) explore real-world application of TNLG based on proposed efficient techniques.
-
University of Michigan: Joyce Chai (opens in new tab) (PI), Rada Mihalcea (opens in new tab) (PI), Lu Wang (opens in new tab) (PI)
Students: Laura Biester (opens in new tab), Shuyang Cao (opens in new tab), Muhammad Khalifa (opens in new tab), Andrew Lee (opens in new tab), Ziqiao Ma (opens in new tab), Do June Min (opens in new tab), Joseph Peper (opens in new tab), Siqi Shen, Shane Storks (opens in new tab), Peter Yu (opens in new tab), Frederick Xinliang Zhang (opens in new tab), Yichi Zhang (opens in new tab)
Postdocs: Jonathan Kummerfeld (opens in new tab), Ian Stewart (opens in new tab)
Research scientists: Veronica Perez-Rosas (opens in new tab)
Microsoft: Karan Saxena, Kiran Prasad, Kriti Aggarwal, Vishrav Chaudhary, Dean Carignan
Large pre-trained Transformer models have achieved state-of-the-art performance in a wide range of natural language processing tasks. However, due to Transformer’s entangled multi-head multi-layer architecture, it is hard to interpret why a prediction is made. While large models are used for tasks with great societal impacts, it is critical to understand the rationales behind model decisions along with their reasoning process, model limitations, and their flaws. In this project, we aim to improve the understanding and transparency of Turing Models through two major research tasks: T1) investigating whether they can reason logically and coherently similar to how humans do, and T2) examining and quantifying the prevalence of bias (especially the less-studied racial bias) based on a new benchmark dataset and a novel entity-focused approach.
-
Stanford University: Percy Liang (opens in new tab) (PI)
Microsoft: Christian Cosgrove, Payal Bajaj, Barun Patra, Vishrav Chaudhary, Ahmed H. Awadallah
As large language models become more ubiquitous, it is important to characterize their properties so that downstream users of these models can have a better sense of what their capabilities and risks are. We will develop metrics that capture properties such as coverage across domains and dialects, robustness to perturbations, propensity to memorize (with copyright and privacy implications), faithfulness of generated text, risks for disinformation, ability to augment humans in interactive tasks, and others.