LMMs-Lab

All

39 repositories

lmms-eval
Public
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
benchmark evaluation agi
benchmark evaluation agi video-understanding vlm multimodal large-language-models vision-language-model llm-evaluation audio-evaluation
Python
•
Other
•619•4.3k•25•13•Updated Jul 15, 2026Jul 15, 2026
.github
Public
Python
•1•1•0•2•Updated Jul 15, 2026Jul 15, 2026
LLaVA-OneVision-2
Public
Fully Open Framework for Democratized Multimodal Training
llm mllm vision-language-model
llm mllm vision-language-model llava qwen3 llava-onevision
Python
•
Apache License 2.0
•76•1.1k•51•10•Updated Jul 13, 2026Jul 13, 2026
SkillOpt-Lite
Public
SkillOpt-Lite and HarnessOpt: Optimize your skill or harness with one line of vibe
Python
•
MIT License
•5•88•1•0•Updated Jul 10, 2026Jul 10, 2026
lmms-engine
Public
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
agi multimodal video-generation
agi multimodal video-generation large-language-models unified-multimodal-models
Python
•37•803•10•1•Updated Jul 9, 2026Jul 9, 2026
EASI
Public
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
multimodal-models mllm spatial-intelligence
multimodal-models mllm spatial-intelligence mllm-evaluation
Python
•
Apache License 2.0
•8•118•2•1•Updated Jul 1, 2026Jul 1, 2026
VLMEvalKit
Public
An open-source evaluation toolkit to evaluate MLLMs on Spatial Intelligence using the EASI protocol
Python
•
Apache License 2.0
•1•19•0•0•Updated Jul 1, 2026Jul 1, 2026
NEO
Public
NEO Series: Native Vision-Language Models from First Principles
agi vlm multimodal
agi vlm multimodal large-language-models mllm multimodal-large-language-models encoder-free-vlm native-multimodal-model
Python
•
Apache License 2.0
•31•867•2•0•Updated Jul 1, 2026Jul 1, 2026
LongVT
Public
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
agi vlm multimodal
agi vlm multimodal mllm long-video-understanding multimodal-large-language-models large-multimodal-models tool-using-agent
Python
•
Apache License 2.0
•14•253•3•0•Updated Jun 24, 2026Jun 24, 2026
OneVision-Encoder
Public
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
lmms vision-transformer llava
lmms vision-transformer llava
Python
•
Apache License 2.0
•20•384•12•3•Updated Jun 20, 2026Jun 20, 2026
engram
Public
Privacy-first AI memory layer - Signal for AI Memory. E2EE, local-first, works with Claude, Cursor, and any MCP-compatible AI.
privacy encryption ai
privacy encryption ai memory mcp cursor e2ee claude local-first llm
TypeScript
•
Other
•2•23•0•0•Updated Jun 12, 2026Jun 12, 2026
Evolving-Visual-Generation
Public
[Roadmap] Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
awesome image-generation world-modeling
awesome image-generation world-modeling agentic visual-generation
TeX
•5•124•0•0•Updated Jun 9, 2026Jun 9, 2026
ParaVT
Public
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
reinforcement-learning tool-use long-video-understanding
reinforcement-learning tool-use long-video-understanding video-llm grpo agentic-rl multimodal-rl
Python
•
Apache License 2.0
•2•54•0•0•Updated Jun 2, 2026Jun 2, 2026
lmms-lab-writer
Public
Agentic LaTeX Writer - Local-first editor for AI-assisted academic writing
editor latex ai
editor latex ai writing academic-writing
TypeScript
•
MIT License
•21•253•6•1•Updated Jun 1, 2026Jun 1, 2026
SimpleStream
Public
A simple video streaming baseline that outperforms SOTAs.
Python
•8•147•1•0•Updated May 1, 2026May 1, 2026
multimodal-search-r1
Public
[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Python
•
Apache License 2.0
•26•469•3•0•Updated Apr 7, 2026Apr 7, 2026
OpenMMReasoner
Public
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
agi vlm multimodal
agi vlm multimodal mllm multimodal-large-language-models large-multimodal-models multimodal-reasoning
Python
•
Apache License 2.0
•5•164•6•0•Updated Mar 30, 2026Mar 30, 2026
co-scientist
Public
For AI Agents to post ideas on their owns.
TypeScript
•
MIT License
•0•7•0•0•Updated Mar 1, 2026Mar 1, 2026
homebrew-tap
Public
Homebrew tap for LMMs-Lab applications
Ruby
•0•1•0•0•Updated Jan 29, 2026Jan 29, 2026
opencode
Public
The open source coding agent.
TypeScript
•
MIT License
•23k•1•0•0•Updated Jan 20, 2026Jan 20, 2026
LLaVA-OneVision-1.5-RL
Public
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
Python
•
Apache License 2.0
•4•51•1•0•Updated Dec 19, 2025Dec 19, 2025
multimodal-sae
Public
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
Python
•
Other
•12•199•5•0•Updated Sep 26, 2025Sep 26, 2025
VideoMMMU
Public
Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Python
•
Other
•3•71•3•1•Updated Sep 5, 2025Sep 5, 2025
sglang
Public
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Python
•
Apache License 2.0
•7.2k•3•0•0•Updated Aug 26, 2025Aug 26, 2025
DiffSynth-Studio
Public
Enjoy the magic of Diffusion models!
Python
•
Apache License 2.0
•1.2k•0•0•0•Updated Aug 23, 2025Aug 23, 2025
lean-runner
Public
Deploying High-Performance Lean 4 Server in One Click
prover lean4
prover lean4
Python
•
MIT License
•0•9•0•1•Updated Aug 14, 2025Aug 14, 2025
MGPO
Public
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
0•55•4•0•Updated Jul 23, 2025Jul 23, 2025
sae
Public
A framework that allows you to apply Sparse AutoEncoder on any models
Python
•3•54•3•0•Updated Jul 11, 2025Jul 11, 2025
openevolve
Public
Open-source implementation of AlphaEvolve
Python
•
Apache License 2.0
•1.1k•2•0•0•Updated Jun 20, 2025Jun 20, 2025
DeepEyes
Public
Python
•
Apache License 2.0
•77•3•0•0•Updated Jun 16, 2025Jun 16, 2025

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.