| title | Multimodal & Vision-Language | |
|---|---|---|
| aliases |
|
|
| cssclasses |
|
Models bridging modalities: contrastive vision-language (CLIP), text-to-image, audio/speech transformers, and general multimodal work.
69 documents.
- AI Dev 26 x SF | Paige Bailey: Research to Reality · 🎓 lecture · intro
- AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI · 🎓 lecture · intro
- Learning Transferable Visual Models From Natural Language Supervision · 📄 paper · advanced
- Hierarchical Text-Conditional Image Generation with CLIP Latents · 📄 paper · advanced
- NVLM: Open Frontier-Class Multimodal LLMs · 📄 paper · frontier
- PaliGemma 2: A Family of Versatile VLMs for Transfer · 📄 paper · frontier
TABLE WITHOUT ID
link(file.link, default(title, file.name)) AS Document,
default(source, "") AS Type,
default(published, "") AS Date
FROM #topic/multimodal and -"atlas"
SORT level ASC, published ASC
(The list above renders in Obsidian with the Dataview plugin. On GitHub, browse Start here or the full index.)
Computer Vision · Generative Models · Language Models & Pretraining