Skip to content
Change the repository type filter

All

    Repositories list

    • EPLB

      Public
      Expert Parallelism Load Balancer
      Python
      196000Updated Apr 1, 2025Apr 1, 2025
    • smallpond

      Public
      A lightweight data processing framework built on DuckDB and 3FS.
      Python
      439000Updated Apr 1, 2025Apr 1, 2025
    • DualPipe

      Public
      A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
      Python
      312000Updated Apr 1, 2025Apr 1, 2025
    • Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
      288000Updated Apr 1, 2025Apr 1, 2025
    • Python
      16k000Updated Apr 1, 2025Apr 1, 2025
    • 3.9k000Updated Apr 1, 2025Apr 1, 2025
    • Analyze computation-communication overlap in V3/R1.
      144000Updated Apr 1, 2025Apr 1, 2025
    • 3FS

      Public
      A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
      C++
      986000Updated Apr 1, 2025Apr 1, 2025
    • DeepEP

      Public
      DeepEP: an efficient expert-parallel communication library
      Cuda
      1.1k000Updated Apr 1, 2025Apr 1, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      792000Updated Apr 1, 2025Apr 1, 2025
    • Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
      Python
      1.2k000Updated Apr 1, 2025Apr 1, 2025
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      500000Updated Apr 1, 2025Apr 1, 2025
    • llama.cpp

      Public
      LLM inference in C/C++
      C++
      14k000Updated Mar 31, 2025Mar 31, 2025
    • Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
      Jupyter Notebook
      310000Updated Mar 30, 2025Mar 30, 2025
    • Qwen

      Public
      The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
      Python
      1.7k000Updated Mar 28, 2025Mar 28, 2025
    • QwQ

      Public
      QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
      Python
      26000Updated Mar 27, 2025Mar 27, 2025
    • Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
      Jupyter Notebook
      1.5k000Updated Mar 27, 2025Mar 27, 2025
    • Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
      Python
      1k000Updated Mar 19, 2025Mar 19, 2025
    • The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
      Python
      313000Updated Mar 18, 2025Mar 18, 2025
    • MoBA

      Public
      MoBA: Mixture of Block Attention for Long-Context LLMs
      Python
      129000Updated Mar 7, 2025Mar 7, 2025
    • DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
      Python
      1.8k000Updated Mar 4, 2025Mar 4, 2025
    • 12k000Updated Mar 4, 2025Mar 4, 2025
    • FlashMLA

      Public
      FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs
      C++
      926000Updated Mar 1, 2025Mar 1, 2025
    • CodeElo

      Public
      CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
      Python
      8000Updated Feb 3, 2025Feb 3, 2025
    • Janus

      Public
      Janus-Series: Unified Multimodal Understanding and Generation Models
      Python
      2.2k000Updated Feb 1, 2025Feb 1, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      13k000Updated Jan 26, 2025Jan 26, 2025
    • DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
      1k000Updated Sep 24, 2024Sep 24, 2024
    • ESFT

      Public
      Expert Specialized Fine-Tuning
      Python
      261000Updated Sep 22, 2024Sep 22, 2024
    • [ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
      Python
      358000Updated Aug 21, 2024Aug 21, 2024
    • Python
      232000Updated Aug 16, 2024Aug 16, 2024