Skip to content
Change the repository type filter

All

    Repositories list

    • ScaleLLM

      Public
      A high-performance inference system for large language models, designed for production environments.
      C++
      37488488Updated Dec 19, 2025Dec 19, 2025
    • cutlass

      Public
      CUDA Templates for Linear Algebra Subroutines
      C++
      1.6k000Updated Nov 6, 2025Nov 6, 2025
    • nixl

      Public
      NVIDIA Inference Xfer Library (NIXL)
      C++
      208000Updated Nov 4, 2025Nov 4, 2025
    • dynamo

      Public
      A Datacenter Scale Distributed Inference Serving Framework
      Rust
      748000Updated Nov 4, 2025Nov 4, 2025
    • whl

      Public
      repository to host python whl package.
      HTML
      0000Updated Sep 13, 2025Sep 13, 2025
    • flux

      Public
      A fast communication-overlapping library for tensor/expert parallelism on GPUs.
      C++
      85000Updated Apr 15, 2025Apr 15, 2025
    • 3FS

      Public
      A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
      C++
      977000Updated Feb 28, 2025Feb 28, 2025
    • FlashInfer: Kernel Library for LLM Serving
      Cuda
      606000Updated Feb 27, 2025Feb 27, 2025
    • FlashMLA

      Public
      C++
      918000Updated Feb 26, 2025Feb 26, 2025
    • vcpkg

      Public
      C++ Library Manager for Windows, Linux, and MacOS
      CMake
      7.3k000Updated Feb 24, 2025Feb 24, 2025
    • 0000Updated Jun 5, 2024Jun 5, 2024
    • Fast and memory-efficient exact attention
      Python
      2.2k000Updated Oct 15, 2023Oct 15, 2023
    • 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
      Rust
      1k000Updated Aug 4, 2023Aug 4, 2023
    • xformers

      Public
      Hackable and optimized Transformers building blocks, supporting a composable construction.
      Python
      746000Updated Aug 1, 2023Aug 1, 2023
    • Transformer related optimization, including BERT, GPT
      C++
      927000Updated Jul 28, 2023Jul 28, 2023
    • optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
      C++
      37000Updated Jul 24, 2023Jul 24, 2023