LLaVA: Large Language and Vision Assistant

Building Next-Gen Multimodal Foundation Models for General-Purpose Assistants

LLaVA is an open-source project, collaborating with research community to advance the state-of-the-art in AI. LLaVA represents the first end-to-end trained large multimodal model (LMM) that achieves impressive chat capabilities mimicking spirits of the multimodal GPT-4. The LLaVA family continues growing to support more modalities, capabilities, applications and beyond.

LLaVA represents a cost-efficient approach to building general-purpose multimodal assistant. It is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

Recent development

LLaVA: The first open-source project to GPT-V alternative. [Project (opens in new tab)] [Paper (opens in new tab)] [Github (opens in new tab)] [Demo (opens in new tab)] [Data (opens in new tab)] [Model (opens in new tab)] [Scaling Note (opens in new tab)]
LLaVA-Med: The first multimodal assistant in the healthcare domain [Github (opens in new tab)] [Paper (opens in new tab)]
LLaVA-Interactive: An all-in-one demo to demonstrate the visual interaction/generation capabilities beyond language interaction alone, supported by LLaVA (opens in new tab), SEEM (opens in new tab) and GLIGEN (opens in new tab).
Multimodal Foundation Models: A 118-page survey on the evolution, trends and our position of multimodal foundation models: “Multimodal Foundation Models: From Specialists to General-Purpose Assistants” (opens in new tab). This is based on CVPR 2023 Tutorial (opens in new tab). [Note on Large Multimodal Models (opens in new tab)] [Slides (opens in new tab)] [YouTube (opens in new tab)] [Bilibili (opens in new tab)]
Instruction Tuning with GPT-4: the “first attempt” to use GPT-4 data for LLM self-instruct tuning. [Project (opens in new tab)] [Paper (opens in new tab)] [Github (opens in new tab)] [My Learnings (opens in new tab)]