Multimodal & Vision-Language

Models bridging modalities: contrastive vision-language (CLIP), text-to-image, audio/speech transformers, and general multimodal work.

69 documents.

Start here

AI Dev 26 x SF | Paige Bailey: Research to Reality · 🎓 lecture · intro
AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI · 🎓 lecture · intro
Learning Transferable Visual Models From Natural Language Supervision · 📄 paper · advanced
Hierarchical Text-Conditional Image Generation with CLIP Latents · 📄 paper · advanced
NVLM: Open Frontier-Class Multimodal LLMs · 📄 paper · frontier
PaliGemma 2: A Family of Versatile VLMs for Transfer · 📄 paper · frontier

All documents

TABLE WITHOUT ID
  link(file.link, default(title, file.name)) AS Document,
  default(source, "") AS Type,
  default(published, "") AS Date
FROM #topic/multimodal and -"atlas"
SORT level ASC, published ASC

(The list above renders in Obsidian with the Dataview plugin. On GitHub, browse Start here or the full index.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal & Vision-Language

Start here

All documents

Related topics

FilesExpand file tree

multimodal.md

Latest commit

History

multimodal.md

File metadata and controls

Multimodal & Vision-Language

Start here

All documents

Related topics