Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 1.63 KB

File metadata and controls

41 lines (29 loc) · 1.63 KB
title Multimodal & Vision-Language
aliases
Multimodal & Vision-Language
cssclasses
moc

Multimodal & Vision-Language

Models bridging modalities: contrastive vision-language (CLIP), text-to-image, audio/speech transformers, and general multimodal work.

69 documents.

Start here

  1. AI Dev 26 x SF | Paige Bailey: Research to Reality · 🎓 lecture · intro
  2. AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI · 🎓 lecture · intro
  3. Learning Transferable Visual Models From Natural Language Supervision · 📄 paper · advanced
  4. Hierarchical Text-Conditional Image Generation with CLIP Latents · 📄 paper · advanced
  5. NVLM: Open Frontier-Class Multimodal LLMs · 📄 paper · frontier
  6. PaliGemma 2: A Family of Versatile VLMs for Transfer · 📄 paper · frontier

All documents

TABLE WITHOUT ID
  link(file.link, default(title, file.name)) AS Document,
  default(source, "") AS Type,
  default(published, "") AS Date
FROM #topic/multimodal and -"atlas"
SORT level ASC, published ASC

(The list above renders in Obsidian with the Dataview plugin. On GitHub, browse Start here or the full index.)

Related topics

Computer Vision · Generative Models · Language Models & Pretraining


← Atlas home