cyLi-Tiger

Follow

Chengyuan Li cyLi-Tiger

Follow

10 followers · 8 following

Beijing

Achievements

Achievements

Stars

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,289 669 Updated Mar 18, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 42,529 6,444 Updated Mar 25, 2025

RUC-GSAI / YuLan-Mini

A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.

Python 165 12 Updated Mar 20, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,494 1,561 Updated Mar 25, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 14,968 1,884 Updated Mar 25, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 944 47 Updated Feb 25, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 260 28 Updated Nov 22, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,911 186 Updated Mar 24, 2025

mutonix / pyramidinfer

Python 39 1 Updated Nov 25, 2024

xlite-dev / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

3,708 261 Updated Mar 4, 2025

modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Python 7,599 781 Updated Mar 24, 2025

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 614 40 Updated Mar 6, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,433 92 Updated Mar 24, 2025

ClubieDong / QAQ-KVCacheQuantization

QAQ: Quality Adaptive Quantization for LLM KV Cache

Python 47 7 Updated Mar 27, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

656 32 Updated Mar 21, 2025

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,090 118 Updated Mar 23, 2025

Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm

Python 339 38 Updated Jan 28, 2025

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,469 173 Updated Jun 25, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,219 73 Updated Mar 6, 2025

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 12,986 1,869 Updated Mar 1, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 88,213 23,673 Updated Mar 25, 2025

daquexian / onnx-simplifier

Simplify your onnx model

C++ 4,007 393 Updated Sep 3, 2024

MegEngine / MegCC

MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器

C++ 482 59 Updated Oct 23, 2024

Flyincry / fabric-first-go-app

Forked from go-fabric/fabric-first-go-app

使用Fabric-sdk-go开发的第一个web service，包括链码服务

Go 1 Updated Jun 19, 2022

cyLi-Tiger / Single-Cycle-CPU-10

VerilogHDL单周期CPU（支持10条指令）

Verilog 5 2 Updated Jun 10, 2022

YanYeG / fabric-go-sdk

Forked from sxguan/fabric-go-sdk

Go 1 1 Updated Oct 31, 2021

Ewenwan / MVision

机器人视觉移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习无人驾驶

C++ 8,209 2,794 Updated Jul 9, 2024

chester256 / Model-Compression-Papers

Papers for deep neural network compression and acceleration

397 80 Updated Jun 21, 2021

eip-work / kuboard-press

Kuboard 是基于 Kubernetes 的微服务管理界面。同时提供 Kubernetes 免费中文教程，入门教程，最新版本的 Kubernetes v1.23.4 安装手册，(k8s install) 在线答疑，持续更新。

JavaScript 23,226 1,552 Updated Mar 22, 2025

snap-stanford / pretrain-gnns

Strategies for Pre-training Graph Neural Networks

Python 992 165 Updated Jul 29, 2023