Skip to content
Change the repository type filter

All

    Repositories list

    • TANGRAM

      Public
      An Unstructured and Memory-Efficient Framework for LLM Serving and KV Cache Management.
      JavaScript
      0000Updated Mar 13, 2026Mar 13, 2026
    • [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
      Python
      Apache License 2.0
      416000Updated Feb 5, 2026Feb 5, 2026
    • [NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
      Python
      01600Updated Jan 25, 2026Jan 25, 2026
    • Python
      MIT License
      0010Updated Jan 23, 2026Jan 23, 2026
    • A real-time streaming assistant powered by Rebellions NPU, designed to operate without visual feedback and optimized for low-latency.
      Python
      0000Updated Jan 22, 2026Jan 22, 2026
    • Reduced-precision inference (PTQ) / training (QAT, FQT) framework for LLMs
      Python
      0100Updated Dec 15, 2025Dec 15, 2025
    • RILQ

      Public
      Python
      Apache License 2.0
      0510Updated Oct 24, 2025Oct 24, 2025
    • sqil

      Public
      Python
      0600Updated Oct 15, 2025Oct 15, 2025
    • NVFP4 Emulation Library
      Python
      0000Updated Sep 2, 2025Sep 2, 2025
    • 0000Updated Jul 26, 2025Jul 26, 2025
    • Quantization Framework for LLM Inferences
      Python
      3600Updated Mar 11, 2025Mar 11, 2025
    • MX-QLLM

      Public
      LLM Inference with Microscaling Format
      Python
      53430Updated Nov 12, 2024Nov 12, 2024
    • pim-iree

      Public
      Compiler and runtime implementation for PIM device.
      C++
      Apache License 2.0
      864200Updated Dec 15, 2023Dec 15, 2023
    • serpim

      Public
      👻
      C++
      Apache License 2.0
      864000Updated Dec 14, 2023Dec 14, 2023
    • iree

      Public
      👻
      C++
      Apache License 2.0
      864000Updated Dec 14, 2023Dec 14, 2023
    • TSLD

      Public
      [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
      Python
      11800Updated Dec 6, 2023Dec 6, 2023
    • TVM-VTA

      Public
      setting
      CMake
      0000Updated Apr 28, 2023Apr 28, 2023
    • tpu-mlir

      Public
      Machine learning compiler based on MLIR for Sophgo TPU.
      C++
      Other
      201000Updated Jan 16, 2023Jan 16, 2023
    • AI System Design - Final Project
      0000Updated Dec 20, 2022Dec 20, 2022
    • Python
      Apache License 2.0
      21001Updated Nov 4, 2022Nov 4, 2022
    • Inference code for AI Challenge (Dec 2020)
      Jupyter Notebook
      GNU General Public License v3.0
      0600Updated Feb 22, 2022Feb 22, 2022
    • TernGEMM

      Public
      TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference
      C++
      GNU General Public License v3.0
      11410Updated Feb 22, 2022Feb 22, 2022
    • Layer-wise Pruning of Transformer Heads for Efficient Language Modeling
      Python
      GNU General Public License v3.0
      12200Updated Feb 22, 2022Feb 22, 2022
    • Python
      GNU General Public License v3.0
      0800Updated Feb 22, 2022Feb 22, 2022
    • Python
      Apache License 2.0
      0000Updated Aug 31, 2021Aug 31, 2021
    • Cuda
      0000Updated Aug 12, 2021Aug 12, 2021
    • lsq-lab

      Public
      Python
      MIT License
      0000Updated Aug 9, 2021Aug 9, 2021
    • Samsung 2021 QPyTorch Lab
      Jupyter Notebook
      1000Updated Aug 9, 2021Aug 9, 2021
    • optimus + timeloop implementation
      Python
      MIT License
      6000Updated May 10, 2021May 10, 2021
    • [ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
      Python
      Apache License 2.0
      284000Updated Mar 12, 2020Mar 12, 2020