Skip to content
Change the repository type filter

All

    Repositories list

    • lmms-eval

      Public
      One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
      Python
      4443.3k2849Updated Dec 5, 2025Dec 5, 2025
    • lmms-engine

      Public
      A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
      Python
      2666981Updated Dec 5, 2025Dec 5, 2025
    • OpenMMReasoner

      Public
      OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
      Python
      311610Updated Dec 5, 2025Dec 5, 2025
    • LongVT

      Public
      LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
      Python
      410700Updated Dec 4, 2025Dec 4, 2025
    • VLMEvalKit

      Public
      An open-source evaluation toolkit to evaluate MLLMs on Spatial Intelligence using the EASI protocol
      Python
      01302Updated Dec 4, 2025Dec 4, 2025
    • LLaVA-OneVision-1.5

      Public
      Fully Open Framework for Democratized Multimodal Training
      Python
      45643271Updated Dec 4, 2025Dec 4, 2025
    • EASI

      Public
      Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
      34520Updated Dec 4, 2025Dec 4, 2025
    • NEO

      Public
      NEO Series: Native Vision-Language Models from First Principles
      Python
      1528000Updated Oct 21, 2025Oct 21, 2025
    • .github

      Public
      1101Updated Sep 29, 2025Sep 29, 2025
    • multimodal-sae

      Public
      [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
      Python
      916640Updated Sep 26, 2025Sep 26, 2025
    • VideoMMMU

      Public
      Python
      26230Updated Sep 5, 2025Sep 5, 2025
    • sglang

      Public
      SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
      Python
      3.6k300Updated Aug 26, 2025Aug 26, 2025
    • MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
      Python
      1735940Updated Aug 26, 2025Aug 26, 2025
    • Enjoy the magic of Diffusion models!
      Python
      1k000Updated Aug 23, 2025Aug 23, 2025
    • Deploying High-Performance Lean 4 Server in One Click
      Python
      0901Updated Aug 14, 2025Aug 14, 2025
    • MGPO

      Public
      High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
      05240Updated Jul 23, 2025Jul 23, 2025
    • sae

      Public
      A framework that allows you to apply Sparse AutoEncoder on any models
      Python
      14520Updated Jul 11, 2025Jul 11, 2025
    • openevolve

      Public
      Open-source implementation of AlphaEvolve
      Python
      712200Updated Jun 20, 2025Jun 20, 2025
    • DeepEyes

      Public
      Python
      62300Updated Jun 16, 2025Jun 16, 2025
    • agent-rl

      Public
      A fork version of verl to support multi-turn tool use and many more agentic tasks.
      Python
      59100Updated Jun 14, 2025Jun 14, 2025
    • Aero-1

      Public
      Python
      67830Updated May 4, 2025May 4, 2025
    • EgoLife

      Public
      [CVPR 2025] EgoLife: Towards Egocentric Life Assistant
      Python
      1935290Updated Mar 19, 2025Mar 19, 2025
    • LongVA

      Public
      Long Context Transfer from Language to Vision
      Python
      19398270Updated Mar 18, 2025Mar 18, 2025
    • open-r1-multimodal

      Public
      A fork to add multimodal model training to open-r1
      Python
      681.4k221Updated Feb 8, 2025Feb 8, 2025
    • demos

      Public
      Python
      0000Updated Sep 18, 2024Sep 18, 2024
    • DeepseekLeanPlayground

      Public
      The math library of Lean 4
      Lean
      912000Updated Aug 7, 2024Aug 7, 2024
    • Otter

      Public
      🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
      Python
      2083.3k622Updated Mar 5, 2024Mar 5, 2024
    • Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
      Python
      2245560Updated Jul 4, 2023Jul 4, 2023