Tightly Connecting Vision and Language

Remarkable progress has been made at the intersection of vision and language. While showing great promise, current vision and language models may only weakly “connect” the two modalities and often fail in the wild. In this talk, I will present our recent efforts aiming to bridge this gap along two dimensions: informativeness and controllability. In particular, I will describe how we can leverage large-scale datasets, including our recently-released CC12M and Localized Narratives, to benefit existing vision-and-language tasks as well as to enable new applications.

Speaker Details

Soravit (Beer) Changpinyo is a Software Engineer at Google Research. His research interests are in machine learning with applications to computer vision and natural language processing. Prior to joining Google, he was a PhD candidate and an Annenberg Fellow at the University of Southern California, advised by Fei Sha.

Series:: Microsoft Vision+Language Summer Talk Series
Date:: August 25, 2021
Speakers:: Soravit (Beer) Changpinyo
Affiliation:: Google

- Chunyuan Li
  
  Principal Researcher
- Jianwei Yang
  
  Principal Researcher
- Pengchuan Zhang
  
  Senior Researcher
- Zhe Gan
  
  Principal Researcher
Research Area
- Artificial intelligence
Research Lab
- Microsoft Research Lab - Redmond
Group
- Deep Learning Group

Series: Microsoft Vision+Language Summer Talk Series

Three Explorations on Pre-Training: an Analysis, an Approach, and an Architecture
September 10, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Learning Commonsense Understanding through Language and Vision
September 1, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Tightly Connecting Vision and Language
August 25, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Learning from Unlabeled Videos for Recognition, Prediction, and Control
August 18, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Visual question answering and reasoning over vision and language: Beyond the limits of statistical learning?
August 11, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
August 4, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Zero-Shot Detection via Vision and Language Knowledge Distillation
July 28, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Grounded Visual Generation
July 21, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
MDETR: Modulated Detection for End-to-End Multi-Modal Understanding
July 16, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
A Truly Unbiased Model
July 7, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.
Visual Recognition beyond Appearances, and its Robotic Applications
June 30, 2021
Speakers:

Chunyuan Li,

Jianwei Yang,

Pengchuan Zhang

, et. al.