LLaVA: Large Language and Vision Assistant - Microsoft Research: Videos

Building Next-Gen Multimodal Foundation Models for General-Purpose Assistants

LLaVA is an open-source project, collaborating with research community to advance the state-of-the-art in AI. LLaVA represents the first end-to-end trained large multimodal model (LMM) that achieves impressive chat capabilities mimicking spirits of the multimodal GPT-4. The LLaVA family continues growing to support more modalities, capabilities, applications and beyond.

Videos

Peter Lee standing posing for the camera

23:42

Keynote: Research in the Era of AI

January 30, 2024

Peter Lee