多模态模型知识点整理

一、综述文章

  • A Survey on Multimodal Large Language Models
  • Awesome-Multimodal-Large-Language-Models

二、多模态模型案例

  • MiniGPT
  1. 使用大型语言模型为MiniGPT-4构建视觉语言理解能力|微信登录可看视频回放
  2. Minigpt-4:Enhancing Vision-language Understanding with Advanced Large Language Models
  3. GitHub - Vision-CAIR/MiniGPT-4
  • ChatBridge
  1. ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
  2. https://github.com/TISUnion/ChatBridge
  • LLaVA
  1. LLaVA:Large Language and Vision Assistant 
  2. GitHub - haotian-liu/LLaVA: Visual Instruction Tuning
  3. Haotian Liu
  • Google PaLM-E
  1. PaLM-E: An Embodied Multimodal Language Model
  2. PaLM-E: The INSANE Multimodal Language Model for Robotics (Google)
  • VIMA
  1. VIMA | General Robot Manipulation with Multimodal Prompts
  2. VIMA: Multi-Modal LLM Prompts for Robotics (Stanford, NVIDIA)
  • OpenAI GPT-4V
  1. The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
  2. GPT-4V(ision) system card
  • Tencent Macaw-LLM
  1. Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
  2. GitHub - lyuchenyang/Macaw-LLM
  3. Youtube: Macaw-LLM Demo
  • Microsoft Kosmos-2
  1. Kosmos-2: Grounding Multimodal Large Language Models to the World
  2. GitHub - Kosmos-2
  3. YouTube:What is Kosmos-2 ?
  • Meta LLaMA
  1. LLaMA: Open and Efficient Foundation Language Models
  2. GitHub - facebookresearch/llama: Inference code for LLaMA models
  • Google Gemini

三、Transformer

  • 浅谈 Attention 机制的理解
  • 预训练语言模型的前世今生 - 从Word Embedding到BERT
  • 词向量(one-hot/SVD/NNLM/Word2Vec/GloVe)

四、视频多模态

  • 视频理解多模态大模型(大模型基础、微调、视频理解基础)
  • 人机视频对话|Video-LLaMA多模态框架,使大型语言模型具备了理解视频内容的能力

你可能感兴趣的:(多模态大模型,人工智能)