OpenAI-compatible multi-provider router
Quota-aware routing β’ OAuth onboarding β’ Persistent storage β’ Request tracing β’ Automatic model discovery
MultiVibe acts as an OpenAI-compatible gateway that lets you route requests across multiple provider accounts while keeping a single /v1 API surface:
- OpenAI-compatible API
GET /v1/modelsGET /v1/models/:idPOST /v1/chat/completionsPOST /v1/responsesPOST /v1/responses/compact
- Streaming over SSE or WebSocket
- HTTP streaming uses plain
POSTwithstream: true - HTTP response stream is
text/event-stream /v1/responsesalso acceptsws:///wss://and Codex-style JSONresponse.createframes/v1/chat/completionsand/v1/responses/compactremain HTTP-only
- HTTP streaming uses plain
- Multi-account routing with quota-aware failover
- Model aliases (for example
small) with ordered fallback across providers/models - OAuth onboarding from dashboard (manual redirect paste flow)
- Manual OpenAI-compatible connections with custom
baseUrl+ API key - Persistent account storage across container restarts
- Request tracing v2 (retention capped at 1000, server pagination, tokens/model/error/latency stats, optional full payload)
- Usage stats endpoint with global + per-account + per-route aggregates over full history
- Time-range stats (
sinceMs/untilMs) while keeping only the latest 1000 full traces
Screenshots below are taken in sanitized mode (
?sanitized=1).
When a request arrives, MultiVibe chooses an account with this strategy:
- Prefer accounts untouched on both windows (5h + weekly)
- Otherwise prefer account with nearest weekly reset
- Fallback by priority
- On
429/quota-like errors, block account and retry on next
When the requested model is an alias, MultiVibe resolves it to ordered target models and automatically falls back across target models/providers as quotas are hit.
Aliases may also intentionally reuse an already exposed provider model name. In that case, the alias overrides the provider model and routes requests using the alias target order instead.
Everything important is file-based and survives restart (if /data is mounted):
/data/accounts.json/data/oauth-state.json/data/requests-trace.jsonl/data/requests-stats-history.jsonl
Trace retention is capped to the latest 1000 entries. Stats history is append-only and keeps lightweight request metadata for long-term cost/volume tracking.
Docker compose already mounts
./data:/data.
docker compose up -d --build- Dashboard:
http://localhost:1455 - Health:
http://localhost:1455/health
Because this is often deployed remotely (Unraid/VPS), onboarding uses a manual redirect paste flow:
- Open dashboard
- For OpenAI accounts, enter the account email
- Click Start OAuth
- Complete login in browser
- Copy the full redirect URL shown after the callback completes
- Paste that URL in the dashboard and click Complete OAuth
Mistral accounts still use manual token entry in the dashboard.
OpenAI-compatible accounts use manual baseUrl + API key entry in the dashboard.
Default expected redirect URI:
http://localhost:1455/auth/callback
curl http://localhost:1455/v1/modelsExample model object returned:
{
"id": "gpt-5.3-codex",
"object": "model",
"created": 1730000000,
"owned_by": "multivibe",
"metadata": {
"context_window": null,
"max_output_tokens": null,
"supports_reasoning": true,
"supports_tools": true,
"supported_tool_types": ["function"]
}
}curl -X POST http://localhost:1455/v1/chat/completions \
-H "content-type: application/json" \
-d '{
"model": "gpt-5.3-codex",
"messages": [{"role":"user","content":"hello"}]
}'curl -N -X POST http://localhost:1455/v1/responses \
-H "content-type: application/json" \
-d '{
"model": "gpt-5.3-codex",
"input": "hello",
"stream": true
}'const ws = new WebSocket("ws://localhost:1455/v1/responses", {
headers: {
Authorization: "Bearer YOUR_TOKEN",
},
});
ws.onmessage = (event) => {
console.log(JSON.parse(event.data));
};
ws.onopen = () => {
ws.send(
JSON.stringify({
type: "response.create",
model: "gpt-5.3-codex",
input: [
{ role: "user", content: [{ type: "input_text", text: "hello" }] },
],
stream: true,
}),
);
};curl -X POST http://localhost:1455/admin/model-aliases \
-H "x-admin-token: change-me" \
-H "content-type: application/json" \
-d '{
"id": "small",
"targets": ["gpt-5.1-codex-mini", "devstral-small-latest"],
"enabled": true,
"description": "Small coding model pool"
}'# Paginated API (recommended)
curl -H "x-admin-token: change-me" \
"http://localhost:1455/admin/traces?page=1&pageSize=100"# Legacy compatibility mode
curl -H "x-admin-token: change-me" \
"http://localhost:1455/admin/traces?limit=50"curl -H "x-admin-token: change-me" \
"http://localhost:1455/admin/stats/usage?sinceMs=1735689600000&untilMs=1738291200000"curl -H "x-admin-token: change-me" \
"http://localhost:1455/admin/stats/traces?sinceMs=1735689600000&untilMs=1738291200000"Optional filters:
accountId=<id>route=/v1/chat/completionssinceMs=<epoch_ms>untilMs=<epoch_ms>
Model alias admin endpoints:
GET /admin/model-aliasesPOST /admin/model-aliasesPATCH /admin/model-aliases/:idDELETE /admin/model-aliases/:id
| Variable | Default | Description |
|---|---|---|
PORT |
1455 |
HTTP server port |
STORE_PATH |
/data/accounts.json |
Accounts store |
OAUTH_STATE_PATH |
/data/oauth-state.json |
OAuth flow state |
TRACE_FILE_PATH |
/data/requests-trace.jsonl |
Request trace file (retained to latest 1000 entries) |
TRACE_STATS_HISTORY_PATH |
/data/requests-stats-history.jsonl |
Lightweight request history for long-term stats |
TRACE_INCLUDE_BODY |
true |
Persist full request payloads; trace stats still work when disabled |
PROXY_MODELS |
gpt-5.3-codex,gpt-5.2-codex,gpt-5-codex |
Fallback comma-separated model list for /v1/models |
MODELS_CLIENT_VERSION |
1.0.0 |
Version sent to /backend-api/codex/models for model discovery |
MODELS_CACHE_MS |
600000 |
Model discovery cache duration (ms) |
ADMIN_TOKEN |
change-me |
Admin endpoints auth token |
CHATGPT_BASE_URL |
https://chatgpt.com |
Upstream base URL |
UPSTREAM_PATH |
/backend-api/codex/responses |
Upstream request path |
UPSTREAM_COMPACT_PATH |
/backend-api/codex/responses/compact |
Upstream path for /v1/responses/compact |
OAUTH_CLIENT_ID |
app_EMoamEEZ73f0CkXaXp7hrann |
OpenAI OAuth client id |
OAUTH_AUTHORIZATION_URL |
https://auth.openai.com/oauth/authorize |
OAuth authorize endpoint |
OAUTH_TOKEN_URL |
https://auth.openai.com/oauth/token |
OAuth token endpoint |
OAUTH_SCOPE |
openid profile email offline_access |
OAuth scope |
OAUTH_REDIRECT_URI |
http://localhost:1455/auth/callback |
Redirect URI |
MISTRAL_COMPACT_UPSTREAM_PATH |
/v1/responses/compact |
Mistral upstream path for compact responses |
MAX_ACCOUNT_RETRY_ATTEMPTS |
10 |
Max accounts to try on quota/rate-limit errors |
MAX_UPSTREAM_RETRIES |
5 |
Retries per upstream request (429/5xx) |
UPSTREAM_BASE_DELAY_MS |
2000 |
Base backoff delay for upstream retries (ms) |
npm install
npm --prefix web install
npm run build
npm run startPRs and issues are welcome.
If you open a PR:
- keep it focused
- include before/after behavior
- include screenshots for UI changes




