Yet Another LLM Proxy - A lightweight, modular LLM proxy with OpenAI-compatible API, fallback routing, and response parsing.
- Modular Architecture: Clean separation of concepts with dedicated modules for routing, logging, API endpoints, and configuration
- Backend Failover: Automatically routes to fallback backends when primary backends fail
- Request/Response Logging: Detailed logs of all requests and responses for debugging
- Database Support: SQLite (default) or PostgreSQL(TODO test) for persistent logging with JSONB columns
- OpenAI Compatibility: Works with OpenAI-compatible clients and tools
- Chat, Embeddings, Rerank & Messages: Supports
/v1/chat/completions,/v1/embeddings,/v1/rerank, and/v1/messages(Anthropic API) endpoints - Anthropic API Translation: Route Anthropic Messages API requests to OpenAI-compatible backends with automatic format conversion
- Runtime Registration: Register new backends without restarting the proxy
- Environment Variable Support: Configure via environment variables in YAML files
- Streaming Support: Transparent handling of streaming responses with SSE error detection
- Model Inheritance: Create derived models with configuration overrides (e.g.,
GLM-4.7:CursorextendsGLM-4.7) - Model Copying: Duplicate existing models via API for easy configuration reuse
- Template-Based Parsing: Use Jinja2 templates for custom response parsing and extraction
- Logs Viewer: Built-in UI for browsing and analyzing request logs with filtering
Install task on your platform:
Linux/macOS:
sh -c "$(curl -sSL https://taskfile.dev/install.sh)"
export PATH="$PWD/bin:$PATH"Windows (winget):
winget install go-task.go-taskOR
choco install go-taskmacOS (Homebrew):
brew install go-task# Install dependencies
uv sync
# Configure your API keys in configs/.env
# Edit configs/config.yaml to add your models
# Run the proxy
uv run python -m src.main
# Or use: task run# Linux/macOS
./scripts/install.sh
./scripts/run.sh
# Windows
scripts\install.bat
scripts\run.bat
# Run with autoreload (development)
task run:reload# List available models
curl http://localhost:7979/v1/models
# Chat completion
curl http://localhost:7979/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "your-model", "messages": [{"role": "user", "content": "Hello!"}]}'
# Embeddings
curl http://localhost:7979/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "your-embedding-model", "input": "Hello, world!"}'
# Rerank
curl http://localhost:7979/v1/rerank \
-H "Content-Type: application/json" \
-d '{"model": "your-reranker-model", "query": "What is machine learning?", "documents": ["ML is a subset of AI", "Python is a programming language", "Neural networks learn patterns"], "top_n": 2}'
# Anthropic Messages API (routed to OpenAI backend)
curl http://localhost:7979/v1/messages \
-H "Content-Type: application/json" \
-d '{"model": "your-model", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}]}'model_list:
- model_name: my-model # Display name for the model
protected: true # Requires admin password to edit/delete
model_params:
api_base: https://api.example.com/v1 # Backend URL
api_key: sk-xxx # API key (use env vars for security)
model: gpt-4o # Upstream model name (optional)
request_timeout: 30 # Timeout in seconds (optional)
target_model: gpt-4 # Rewrite model name in requests (optional)
supports_reasoning: false # Enable reasoning block injection (optional)
api_type: openai # API type: openai (default), anthropic (optional)
parameters: # Per-model parameter defaults (optional)
temperature:
default: 1.0 # Default value if not specified in request
allow_override: false # If true, use request value if provided (default: true)
proxy_settings:
server:
host: 127.0.0.1 # Bind address
port: 7979 # Port number
enable_responses_endpoint: false # Enable /v1/responses endpoint
logging: # Request logging options (optional)
log_parsed_response: false # Write parsed non-stream responses to *.parsed.log
log_parsed_stream: false # Write parsed stream chunks to *.parsed.log
parsers: # Response parser pipeline (optional)
enabled: false
response: # Ordered response parsers
- parse_tags
- swap_reasoning_content
paths: # Apply to paths containing any of these strings
- /chat/completions
parse_tags: # Parse <think> / <tool_call> tags into structured fields
template_path: "" # Optional: auto-detect config from Jinja template
parse_thinking: true
parse_tool_calls: true
think_tag: think
tool_arg_format: xml # xml | json (for K2-style JSON arguments)
tool_tag: tool_call # For xml format
tool_open: "<tool_call>" # Custom open delimiter (auto-set for json format)
tool_close: "</tool_call>" # Custom close delimiter (auto-set for json format)
tool_buffer_limit: 200 # Optional: max buffered chars before treating as literal
swap_reasoning_content: # Swap reasoning_content <-> content
mode: reasoning_to_content # reasoning_to_content | content_to_reasoning | auto
think_tag: think
think_open:
prefix: "" # Prefix before <think>
suffix: "" # Suffix after <think>
think_close:
prefix: "" # Prefix before </think>
suffix: "" # Suffix after </think>
include_newline: true # Default true: add newline between </think> and content
forwarder_settings:
listen:
host: 0.0.0.0 # External listen address
port: 6969 # External listen port
target:
host: 127.0.0.1 # Proxy host
port: 7979 # Proxy port
http_forwarder_settings:
preserve_host: true # Preserve Host header (recommended)
listen:
host: 0.0.0.0 # External listen address
port: 6969 # External listen port
target:
scheme: http
host: 127.0.0.1
port: 7979Protected models require YALLMP_ADMIN_PASSWORD in configs/.env for edits/deletes via admin APIs.
Per-model parser overrides can be added to individual entries in model_list.
When present, they replace the global proxy_settings.parsers config for that model.
If enabled is omitted, per-model parsers default to enabled (set enabled: false
to explicitly disable parsing on that model).
Parsing notes: parse_tags can auto-detect think/tool tags and argument format from
a Jinja template when template_path is provided. Use tool_arg_format: json for
K2-style models that output JSON arguments (e.g., <|tool_call_begin|>func<|arg|>{"key":"val"}).
Tool parsing buffers until a full block is parsed; if tool_buffer_limit is set and
exceeded, the tag is treated as literal text.
To help align formatting with a Jinja chat template, you can inspect a template
and print suggested think_open/think_close prefixes and suffixes:
uv run python scripts/inspect_template.py configs/jinja_templates/template_example.jinjaCopy the suggested values into your swap_reasoning_content config. If you set
think_close.suffix to include a newline, consider setting include_newline: false
to avoid double newlines.
model_list:
- model_name: GLM-4.7
model_params:
api_base: https://api.z.ai/api/coding/paas/v4
api_key: ${GLM_API_KEY}
parsers:
enabled: true
response:
- swap_reasoning_content
swap_reasoning_content:
mode: reasoning_to_contentThe TCP forwarder is optional but useful when you need a separate inbound port or a separate process for the inbound traffic (e.g. you have a VPN which you must use for API access, but it breaks inbound traffic (or WSL shenanigans); you can whitelist the forwarder executable/process, or run the forwarder in Windows, and keep the proxy running under VPN/in WSL).
It reads forwarder_settings from configs/config.yaml. You can override with
FORWARD_LISTEN_HOST, FORWARD_LISTEN_PORT, FORWARD_TARGET_HOST,
FORWARD_TARGET_PORT at runtime.
Use the HTTP forwarder when you want a protocol-aware reverse proxy (e.g., to avoid connection resets on large JSON responses). It preserves the Host header by default and streams SSE responses.
It reads http_forwarder_settings from configs/config.yaml. You can override with
HTTP_FORWARD_LISTEN_HOST, HTTP_FORWARD_LISTEN_PORT, HTTP_FORWARD_TARGET_SCHEME,
HTTP_FORWARD_TARGET_HOST, HTTP_FORWARD_TARGET_PORT, and HTTP_FORWARD_PRESERVE_HOST.
SSL/TLS Support:
The HTTP forwarder supports HTTPS with SSL certificates:
http_forwarder_settings:
listen:
host: 0.0.0.0
port: 6969
target:
scheme: http
host: 127.0.0.1
port: 7979
ssl:
enabled: true
cert_file: certs/localhost+2.pem
key_file: certs/localhost+2-key.pem
debug: false # Enable debug logging
timeout_seconds: 300 # Request timeout (optional)Generate certificates using mkcert:
task ssl:generate-certsEnvironment variable overrides: HTTP_FORWARD_SSL_ENABLED, HTTP_FORWARD_SSL_CERT, HTTP_FORWARD_SSL_KEY, HTTP_FORWARD_DEBUG, HTTP_FORWARD_TIMEOUT.
Note: both forwarders default to port 6969. If you want to run both at once, change one of their ports.
- API Reference - Complete endpoint documentation
- Configuration Guide - Config options, environment variables
- Project Structure - Directory layout and module descriptions
- Known Issues - Known bugs, limitations, and workarounds
yaLLMproxy/
├── src/ # Source code
│ ├── modules/ # Request/response pipeline modules
│ ├── parsers/ # Response module pipeline (legacy parsers alias)
│ ├── core/ # Core proxy functionality
│ ├── api/ # HTTP API layer
│ ├── database/ # Database support
│ ├── logging/ # Request/response logging
│ ├── middleware/ # Request/response middleware
│ ├── routing/ # Model routing utilities
│ └── types/ # Type definitions
├── static/admin/ # Admin UI (includes logs viewer)
├── configs/ # Configuration files
│ └── jinja_templates/ # Jinja2 templates for parsing
├── docs/ # Detailed documentation
├── scripts/ # Utility scripts
└── tests/ # Test suite
| Command | Description |
|---|---|
task run |
Start the proxy |
task run:reload |
Start with autoreload (development) |
task test |
Run tests |
task forwarder |
Run TCP forwarder |
task forwarder:http |
Run HTTP forwarder |
task ssl:generate-certs |
Generate SSL certificates using mkcert |
uv run pytest tests/ |
Run tests directly |
You can reload the proxy configuration without restarting by calling the admin API:
curl -X POST http://localhost:7979/admin/config/reloadOr use the Reload button in the admin UI (/admin/).
This will:
- Reload
config.yamlfrom disk - Re-parse all models and backends
- Update the router's runtime state atomically
- Apply model inheritance changes to derived models
Useful for applying config changes without interrupting active requests. Note: Changes to base models update derived models in runtime config, but require router reload to apply to active backends.
yaLLMproxy supports SQLite (default) and PostgreSQL databases for persistent logging. Database request logging is enabled by default when the database module is configured and reachable.
SQLite is enabled by default with no additional configuration:
database:
backend: sqlite
connection:
sqlite:
path: logs/yaLLM.db
pool_size: 5 # Connection pool size (PostgreSQL; ignored for file-based SQLite)
max_overflow: 10 # Extra overflow connections above pool_size (PostgreSQL)To use PostgreSQL, update your configuration:
database:
backend: postgres
connection:
postgres:
host: localhost
port: 5432
database: yallm_proxy
user: ${DB_USER}
password: ${DB_PASSWORD}
pool_size: 5 # Base pool size for PostgreSQL connections
max_overflow: 10 # Extra overflow connections above pool_sizeAdd credentials to your .env file:
DB_USER=your_postgres_user
DB_PASSWORD=your_postgres_passwordtask db:migrate # Run database migrations
task db:rollback # Rollback last migration
task db:current # Show current revision
task db:history # Show migration history
task clean # Clear logs (preserves database)- Database Documentation - Detailed database guide
- Project Structure - Module descriptions
MIT
