[Research] Add 2 elite-tier projects to Inference Engines & Serving (#131)

alvinreal · web-flow · commit bc000bbd7e41 · 2026-04-07T10:22:28.000+02:00
## Projects Added

### One-API (songquanpeng/one-api)

- ⭐ Stars: 31,512 (threshold: 1000+)

- 🔄 Active: 2026-01-09 (within 6 months)

- 🏭 Production: LLM API gateway with rate limiting and quota management

- 📚 Quality: MIT license, full documentation

### OpenLLM (bentoml/OpenLLM)

- ⭐ Stars: 12,273 (threshold: 1000+)

- 🔄 Active: 2026-04-06 (within 6 months)

- 🏭 Production: Enterprise-grade LLM serving platform

- 📚 Quality: Apache 2.0 license, OpenAI-compatible API

Category: Inference Engines &amp; Serving (§3)

Research Date: 2026-04-07

Co-authored-by: alvinreal &lt;alvinreal@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -207,7 +207,8 @@
 - **[LightLLM](https://github.com/ModelTC/LightLLM)** ![GitHub stars](https://img.shields.io/github/stars/ModelTC/LightLLM?style=social) - Pure Python-based LLM inference and serving framework with lightweight design, easy extensibility, and high-speed performance. Integrates optimizations from FasterTransformer, TGI, vLLM, and SGLang.
 - **[TabbyAPI](https://github.com/theroyallab/tabbyAPI)** ![GitHub stars](https://img.shields.io/github/stars/theroyallab/tabbyAPI?style=social) - FastAPI-based API server for ExLlamaV2/V3 backends. OpenAI-compatible API with support for model loading/unloading, embeddings, speculative decoding, multi-LoRA, and streaming.
 - **[GPUStack](https://github.com/gpustack/gpustack)** ![GitHub stars](https://img.shields.io/github/stars/gpustack/gpustack?style=social) - GPU cluster manager that orchestrates inference engines like vLLM and SGLang. Automated engine selection, parameter optimization, and distributed multi-GPU deployment for high-performance AI workloads.
-- **[LLMRouter](https://github.com/ulab-uiuc/LLMRouter)** ![GitHub stars](https://img.shields.io/github/stars/ulab-uiuc/LLMRouter?style=social) - Intelligent routing system that optimizes LLM inference by dynamically selecting the most suitable model for each query. Smart multi-model orchestration with load balancing and cost optimization.
+- **[One-API](https://github.com/songquanpeng/one-api)** ![GitHub stars](https://img.shields.io/github/stars/songquanpeng/one-api?style=social) - LLM API management and key redistribution system. Unifies multiple providers (OpenAI, Anthropic, Azure, etc.) under a single OpenAI-compatible API with built-in rate limiting, quota management, and cost tracking. MIT licensed.
+- **[OpenLLM (BentoML)](https://github.com/bentoml/OpenLLM)** ![GitHub stars](https://img.shields.io/github/stars/bentoml/OpenLLM?style=social) - Production-grade platform for running any open-source LLMs as OpenAI-compatible API endpoints. Supports 50+ models with built-in streaming, batching, and auto-acceleration. Apache 2.0 licensed.
 
 #### Quantization, Distillation & Optimization