An LLM API proxy router that receives requests from clients like Claude Code and Cursor, forwards them to configured backend Providers through model mapping and routing strategies, supporting both streaming (SSE) and non-streaming proxying.
Core problem it solves: Chinese domestic models have frequent rate limits, switching between multiple providers is cumbersome, and concurrency control is missing.
- Developers using Claude Code / Cursor / Codex / Pi with Chinese domestic models (Zhipu, Moonshot, Minimax, etc.)
- Those who want automatic retries for rate-limit errors, scenario-based model switching, and concurrency queue management
- Anyone looking for a turnkey solution without the hassle
| Feature | Description |
|---|---|
| Automatic error retries | Exponential backoff retries for recoverable errors (429/400/network timeouts), completely transparent to the client |
| Concurrency queue | Per-Provider concurrency limits with queueing; supports adaptive concurrency that auto-adjusts based on load, no manual tuning needed |
| Multi-API format support | Supports OpenAI (Chat Completions, Responses) and Anthropic (Messages) — client and upstream formats can be freely combined. Built-in DeepSeek reasoning_thinking patches |
| Stream response timeout | Per-model configurable stream timeout to prevent stuck connections when the model stops producing output |
| Real-time monitoring | SSE-based live view of active requests, queue status, and streaming output with structured display adapted for Claude Code |
| Request logs | Full four-stage tracing (client request → upstream request → upstream response → client response), with log file archiving |
| Feature | Description |
|---|---|
| Rich model auto-switching | Failover, context overflow auto-switch to larger context models, multimodal request auto-switch, time-based scheduled switching |
| Quick setup | Select client → select provider → enter API key, done in 3 steps. Pre-configured parameters for Zhipu, Moonshot, Minimax and other domestic providers |
| Provider network proxy | Per-provider HTTP/SOCKS5 proxy for overseas APIs (OpenAI, Anthropic) |
| Proxy enhancement (experimental) | Tool call loop detection (N-gram) + Token usage estimation + Cache hit rate estimation |
| Usage dashboard | Usage statistics by time, model, and key dimensions; 5-hour sliding window optimized for Coding Plans |
| Multi-key management | Independent Router keys + model whitelists (allowed_models) for multi-user/multi-project isolation |
| Upgrade notifications | Automatic new version notifications + one-click upgrade |
API Compatibility: Supports both Anthropic and OpenAI API formats. Client and upstream formats can be freely combined. Google Gemini API format is not yet supported.
| Provider Management + Concurrency Control | Real-time Monitoring |
|---|---|
![]() |
![]() |
| Model Mapping | Retry Rules |
|---|---|
![]() |
![]() |
| Dashboard | Request Logs |
|---|---|
![]() |
![]() |
| Proxy Enhancement (Experimental) |
|---|
![]() |
npx llm-simple-routerVisit http://localhost:9981/admin. On first visit, the Setup page will ask you to set an admin password. Data is stored in ~/.llm-simple-router/.
Admin Dashboard > Providers page > Add Provider. Select a Coding Plan to auto-fill the Base URL, then just enter the API Key.
You can also use the Quick Setup page: select client → select provider → enter API key, done in 3 steps.
Admin Dashboard > Model Mappings page.
Core concept: The client sends a request with model name A. The Router replaces it with backend model name B according to the mapping rule, then forwards the request:
Claude Code (model A) → Router (A → B) → Provider API (model B)
Simply configure "client model = A, backend model = B, select provider" in the mapping table.
When no environment variables are set, Claude Code uses these default model names: opus, sonnet, haiku. If the backend is a Zhipu Coding Plan, the mapping configuration would be:
| Client Model | Backend Model | Provider | Time Window |
|---|---|---|---|
| opus | glm-5.1 | Zhipu Coding Plan | All day |
| sonnet | glm-5.1 | Zhipu Coding Plan | All day |
| haiku | glm-5-turbo | Zhipu Coding Plan | All day |
You can also use time-based switching for peak hours:
| Client Model | Backend Model | Provider | Time Window |
|---|---|---|---|
| sonnet | glm-5.1 | Zhipu Coding Plan | 00:00-14:00 |
| sonnet | kimi-for-coding | Moonshot | 14:00-18:00 |
| sonnet | glm-5.1 | Zhipu Coding Plan | 18:00-24:00 |
Create a Router API key in the admin dashboard, then choose one of the following methods. Only one is needed.
Option 1: shell alias (recommended)
Minimal configuration. Claude Code uses default model names (opus / sonnet / haiku), and the Router converts them via the mapping table:
alias clode='\
export ANTHROPIC_AUTH_TOKEN="<your-router-key>" && \
export ANTHROPIC_BASE_URL="http://127.0.0.1:9981" && \
claude'You can also specify model names directly via environment variables, bypassing Router mapping:
alias clode='\
export ANTHROPIC_AUTH_TOKEN="sk-router-xxxxxxxx" && \
export ANTHROPIC_BASE_URL="http://192.168.1.111:9981" && \
export ANTHROPIC_MODEL="glm-5" && \
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.1" && \
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5" && \
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-5-turbo" && \
export ANTHROPIC_SMALL_FAST_MODEL="glm-5-turbo" && \
claude'For debugging, add:
claude --dangerously-skip-permissions --verbose --debug, or setexport DEBUG=claude:*for detailed logs.
Option 2: ~/.claude/settings.json
Add the configuration to the env field in ~/.claude/settings.json (same effect as exporting environment variables):
Minimal configuration:
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "<your-router-key>",
"ANTHROPIC_BASE_URL": "http://127.0.0.1:9981"
}
}Override model names:
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "sk-router-xxxxxxxx",
"ANTHROPIC_BASE_URL": "http://192.168.1.111:9981",
"ANTHROPIC_MODEL": "glm-5",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.1",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-5-turbo",
"ANTHROPIC_SMALL_FAST_MODEL": "glm-5-turbo"
}
}Environment variables in settings.json apply to all projects. To apply only to the current project, place them in
.claude/settings.json(in the project root).
Edit ~/.codex/config.toml to add the Router as a custom provider:
model_provider = "llm-simple-router"
model = "deepseek-v4-flash"
preferred_auth_method = "apikey"
[model_providers.llm-simple-router]
name = "LLMSimpleRouter"
base_url = "http://127.0.0.1:9981/v1"
env_key = "ROUTER_KEY"
wire_api = "responses"Set the environment variable (your Router API key):
export ROUTER_KEY="<your-router-key>"Codex connects to Router via OpenAI Responses API (
wire_api = "responses"). Themodelfield should be the client model name configured in Router.
Edit ~/.pi/agent/models.json to add the Router as a provider:
{
"providers": {
"llm-simple-router": {
"baseUrl": "http://127.0.0.1:9981",
"api": "anthropic-messages",
"apiKey": "<your-router-key>",
"models": [
{
"id": "glm-5.1",
"name": "glm-5.1",
"reasoning": true,
"input": ["text"],
"contextWindow": 200000,
"maxTokens": 64000
},
{
"id": "deepseek-v4-flash",
"name": "deepseek-v4-flash",
"reasoning": true,
"input": ["text"],
"contextWindow": 1000000,
"maxTokens": 64000,
"compat": {
"requiresReasoningContentOnAssistantMessages": true,
"thinkingFormat": "deepseek"
},
"thinkingLevelMap": {
"off": null,
"minimal": null,
"low": null,
"medium": null,
"high": "high",
"xhigh": "max"
}
}
]
}
}
}Pi connects to Router via Anthropic Messages API (
api: "anthropic-messages"). DeepSeek models requirecompat.thinkingFormat: "deepseek"andthinkingLevelMapto correctly handle reasoning output.
# Claude Code (shell alias)
clode
# Claude Code (settings.json)
claude
# Codex
codex
# Pi Coding Agent
piOption 1: Pull pre-built image (recommended)
# One-click start with data persistence to ~/.llm-simple-router/
docker compose up -ddocker-compose.yml pulls the pre-built image from ghcr.io by default, with data mapped to ~/.llm-simple-router/ on the host.
You can also use docker run directly:
docker run -d \
--name llm-router \
-p 9981:9981 \
-v ~/.llm-simple-router:/app/data \
-e DB_PATH=/app/data/router.db \
-e TZ=Asia/Shanghai \
--restart unless-stopped \
ghcr.io/zhushanwen321/llm-simple-router:latestEnvironment variables are set through the Setup page; no .env file needed.
Option 2: Build locally
Edit docker-compose.yml, comment out the image line, uncomment the build section, then:
docker compose up -d --buildAfter upgrading via the Web UI, the service needs to restart. Use one of the following deployment methods to ensure automatic recovery after crashes or upgrades.
# Install PM2
npm install -g pm2
# Install Router globally
npm install -g llm-simple-router
# Start (PM2 auto-restarts crashed processes)
pm2 start llm-simple-router --name llm-router
# View logs
pm2 logs llm-router
# Enable startup on boot
pm2 startup
pm2 saveUpgrade flow: Web UI one-click upgrade → click restart → PM2 auto-spawns new process (< 1s interruption).
Create service file /etc/systemd/system/llm-simple-router.service:
[Unit]
Description=LLM Simple Router
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/llm-simple-router
Restart=always
RestartSec=3
Environment=PORT=9981
Environment=LOG_LEVEL=info
# Configure other environment variables as needed
# Environment=DB_PATH=/var/lib/llm-simple-router/router.db
[Install]
WantedBy=multi-user.targetNote: The
ExecStartpath depends on how Node.js is installed. Usewhich llm-simple-routerto find the actual path.
# Enable and start
sudo systemctl enable llm-simple-router
sudo systemctl start llm-simple-router
# Check status and logs
sudo systemctl status llm-simple-router
journalctl -u llm-simple-router -fUpgrade flow: Web UI one-click upgrade → click restart → systemd auto-restarts (< 1s interruption).
No extra configuration needed. After Web UI upgrade and clicking restart, the Router will automatically spawn a new process and exit the old one. Brief interruption of about 1-2 seconds.
Note: If you directly
Ctrl+Cor close the terminal, the service won't auto-recover. Use PM2 or systemd for production.
Claude Code → Router (model mapping + auto-retry + concurrency control) → Zhipu GLM / Kimi / Other Providers
System Context (details):
graph LR
Clients["Claude Code / Cursor / Other Clients"]
Admin["Admin"]
Router>"LLM Simple Router"]
Providers>"Zhipu / Moonshot / OpenAI / Anthropic / ..."]
Clients -->|"API Request<br/>Bearer Token"| Router
Admin -->|"Admin Dashboard<br/>/admin/"| Router
Router -->|"Forward Request<br/>SSE Stream"| Providers
Request Processing Pipeline (details):
flowchart LR
A[Client Request] --> B[Authentication]
B --> C[Model Mapping<br/>+ Routing Strategy]
C --> H[Multimodal Detection<br/>+ Overflow Detection]
H --> D[Concurrency Queue]
D --> E[Call Upstream<br/>Auto-Retry on Failure]
E --> F[Log + Metrics]
F --> G[Return Response]
E -.->|Failure| C
When the Router receives a request: Authentication → find backend Provider via mapping rules → multimodal detection (auto-switch to fallback model for images/audio) → context overflow detection → queue for concurrency control → forward to upstream (auto-retry on failure; under Failover strategy, switches Provider) → log and record metrics → return response.
All secrets are set through the Setup page. Optional configuration:
| Variable | Default | Description |
|---|---|---|
PORT |
9981 |
Server port |
DB_PATH |
~/.llm-simple-router/router.db |
SQLite database path |
LOG_LEVEL |
info |
Log level |
TZ |
Asia/Shanghai |
Timezone |
STREAM_TIMEOUT_MS |
3000000 |
Stream proxy idle timeout (ms) |
RETRY_MAX_ATTEMPTS |
3 |
Max retry attempts |
RETRY_BASE_DELAY_MS |
1000 |
Retry base delay (ms) |
# Backend (hot reload)
npm run dev
# Frontend (hot reload, proxies API to backend :9980)
cd frontend && npm run dev
# Build
npm run build:full
# Test
npm test
# Lint
npm run lint|
541815155 |
Feishu Xu Ditao (Lao Ba) |
Feishu Group Scan to join |
MIT






