Skip to content

Commit bc000bb

Browse files
authored
[Research] Add 2 elite-tier projects to Inference Engines & Serving (#131)
## Projects Added ### One-API (songquanpeng/one-api) - ⭐ Stars: 31,512 (threshold: 1000+) - 🔄 Active: 2026-01-09 (within 6 months) - 🏭 Production: LLM API gateway with rate limiting and quota management - 📚 Quality: MIT license, full documentation ### OpenLLM (bentoml/OpenLLM) - ⭐ Stars: 12,273 (threshold: 1000+) - 🔄 Active: 2026-04-06 (within 6 months) - 🏭 Production: Enterprise-grade LLM serving platform - 📚 Quality: Apache 2.0 license, OpenAI-compatible API Category: Inference Engines & Serving (§3) Research Date: 2026-04-07 Co-authored-by: alvinreal <alvinreal@users.noreply.github.com>
1 parent 077038e commit bc000bb

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,8 @@
207207
- **[LightLLM](https://github.com/ModelTC/LightLLM)** ![GitHub stars](https://img.shields.io/github/stars/ModelTC/LightLLM?style=social) - Pure Python-based LLM inference and serving framework with lightweight design, easy extensibility, and high-speed performance. Integrates optimizations from FasterTransformer, TGI, vLLM, and SGLang.
208208
- **[TabbyAPI](https://github.com/theroyallab/tabbyAPI)** ![GitHub stars](https://img.shields.io/github/stars/theroyallab/tabbyAPI?style=social) - FastAPI-based API server for ExLlamaV2/V3 backends. OpenAI-compatible API with support for model loading/unloading, embeddings, speculative decoding, multi-LoRA, and streaming.
209209
- **[GPUStack](https://github.com/gpustack/gpustack)** ![GitHub stars](https://img.shields.io/github/stars/gpustack/gpustack?style=social) - GPU cluster manager that orchestrates inference engines like vLLM and SGLang. Automated engine selection, parameter optimization, and distributed multi-GPU deployment for high-performance AI workloads.
210-
- **[LLMRouter](https://github.com/ulab-uiuc/LLMRouter)** ![GitHub stars](https://img.shields.io/github/stars/ulab-uiuc/LLMRouter?style=social) - Intelligent routing system that optimizes LLM inference by dynamically selecting the most suitable model for each query. Smart multi-model orchestration with load balancing and cost optimization.
210+
- **[One-API](https://github.com/songquanpeng/one-api)** ![GitHub stars](https://img.shields.io/github/stars/songquanpeng/one-api?style=social) - LLM API management and key redistribution system. Unifies multiple providers (OpenAI, Anthropic, Azure, etc.) under a single OpenAI-compatible API with built-in rate limiting, quota management, and cost tracking. MIT licensed.
211+
- **[OpenLLM (BentoML)](https://github.com/bentoml/OpenLLM)** ![GitHub stars](https://img.shields.io/github/stars/bentoml/OpenLLM?style=social) - Production-grade platform for running any open-source LLMs as OpenAI-compatible API endpoints. Supports 50+ models with built-in streaming, batching, and auto-acceleration. Apache 2.0 licensed.
211212

212213
#### Quantization, Distillation & Optimization
213214

0 commit comments

Comments
 (0)