[Research] Add 2 elite-tier projects to Open Foundation Models (#130)

alvinreal · web-flow · commit 61847f9f8d4c · 2026-04-07T10:23:39.000+02:00
Add MMaDA (1,620 stars): Open-sourced multimodal large diffusion language model

with unified architecture for text, image generation and multimodal reasoning.

MIT licensed, NeurIPS 2025. Published May 2025, actively maintained.

Add faster-whisper (21,985 stars): Reimplementation of Whisper using CTranslate2

for up to 4x faster inference with same accuracy. Supports batched processing

and 8-bit quantization. Essential production tool for speech recognition.

Co-authored-by: alvinreal &lt;alvinreal@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -157,6 +157,7 @@
 
 #### Multimodal Models (Vision + Language)
 
+- **[MMaDA (Gen-Verse)](https://github.com/Gen-Verse/MMaDA)** ![GitHub stars](https://img.shields.io/github/stars/Gen-Verse/MMaDA?style=social) - Open-sourced multimodal large diffusion language model with unified architecture for text, image generation and multimodal reasoning. MIT licensed, NeurIPS 2025.
 - **[Qwen3-VL (Alibaba)](https://github.com/QwenLM/Qwen3-VL)** ![GitHub stars](https://img.shields.io/github/stars/QwenLM/Qwen3-VL?style=social) - Latest flagship VLM with native 256K context (expandable to 1M), visual agent capabilities, 3D grounding, and superior multimodal reasoning. Major leap over Qwen2.5-VL.
 - **[GLM-4.5V / GLM-4.1V-Thinking (Zhipu AI)](https://github.com/zai-org/GLM-V)** ![GitHub stars](https://img.shields.io/github/stars/zai-org/GLM-V?style=social) - Strong multimodal reasoning with scalable reinforcement learning. Compares favorably with Gemini-2.5-Flash on benchmarks.
 - **[MiniCPM-V 2.6](https://github.com/OpenBMB/MiniCPM-V)** ![GitHub stars](https://img.shields.io/github/stars/OpenBMB/MiniCPM-V?style=social) - Handles images up to 1.8M pixels with top-tier OCR performance. Excellent for on-device deployment.
@@ -165,6 +166,7 @@
 #### Speech & Audio Models (TTS, STT, Music)
 
 - **[Whisper (OpenAI → community forks)](https://github.com/openai/whisper)** ![GitHub stars](https://img.shields.io/github/stars/openai/whisper?style=social) - The gold-standard open speech-to-text model. Massive community fine-tunes available.
+- **[faster-whisper (SYSTRAN)](https://github.com/SYSTRAN/faster-whisper)** ![GitHub stars](https://img.shields.io/github/stars/SYSTRAN/faster-whisper?style=social) - Reimplementation of Whisper using CTranslate2 for up to 4x faster inference with same accuracy. Supports batched processing and 8-bit quantization.
 - **[OuteTTS / CosyVoice 2](https://github.com/edwko/OuteTTS)** ![GitHub stars](https://img.shields.io/github/stars/edwko/OuteTTS?style=social) - High-quality open TTS with natural prosody and multilingual support.
 - **[Fish Speech / StyleTTS 2](https://github.com/fishaudio/fish-speech)** ![GitHub stars](https://img.shields.io/github/stars/fishaudio/fish-speech?style=social) - Zero-shot TTS with excellent voice cloning. Extremely popular in 2026.
 - **[MusicGen / AudioCraft (Meta)](https://github.com/facebookresearch/audiocraft)** ![GitHub stars](https://img.shields.io/github/stars/facebookresearch/audiocraft?style=social) - Open music and audio generation models.