y(et) a(nother) LLM proxy (yallmp)

Yet Another LLM Proxy - A lightweight, modular LLM proxy with OpenAI-compatible API, fallback routing, and response parsing.

Features

Modular Architecture: Clean separation of concepts with dedicated modules for routing, logging, API endpoints, and configuration
Backend Failover: Automatically routes to fallback backends when primary backends fail
Request/Response Logging: Detailed logs of all requests and responses for debugging
Database Support: SQLite (default) or PostgreSQL(TODO test) for persistent logging with JSONB columns
OpenAI Compatibility: Works with OpenAI-compatible clients and tools
Chat, Embeddings, Rerank & Messages: Supports /v1/chat/completions, /v1/embeddings, /v1/rerank, and /v1/messages (Anthropic API) endpoints
Anthropic API Translation: Route Anthropic Messages API requests to OpenAI-compatible backends with automatic format conversion
Runtime Registration: Register new backends without restarting the proxy
Environment Variable Support: Configure via environment variables in YAML files
Streaming Support: Transparent handling of streaming responses with SSE error detection
Model Inheritance: Create derived models with configuration overrides (e.g., GLM-4.7:Cursor extends GLM-4.7)
Model Copying: Duplicate existing models via API for easy configuration reuse
Template-Based Parsing: Use Jinja2 templates for custom response parsing and extraction
Logs Viewer: Built-in UI for browsing and analyzing request logs with filtering

Installation

Prerequisites

Python 3.10+
uv package manager
task for running automation tasks

Install task on your platform:

Linux/macOS:

sh -c "$(curl -sSL https://taskfile.dev/install.sh)"
export PATH="$PWD/bin:$PATH"

Windows (winget):

winget install go-task.go-task

OR

choco install go-task

macOS (Homebrew):

brew install go-task

Install with uv

# Install dependencies
uv sync

# Configure your API keys in configs/.env
# Edit configs/config.yaml to add your models

# Run the proxy
uv run python -m src.main
# Or use: task run

Alternative: Using scripts

# Linux/macOS
./scripts/install.sh
./scripts/run.sh

# Windows
scripts\install.bat
scripts\run.bat

# Run with autoreload (development)
task run:reload

Quick Test

# List available models
curl http://localhost:7979/v1/models

# Chat completion
curl http://localhost:7979/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "messages": [{"role": "user", "content": "Hello!"}]}'

# Embeddings
curl http://localhost:7979/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "your-embedding-model", "input": "Hello, world!"}'

# Rerank
curl http://localhost:7979/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{"model": "your-reranker-model", "query": "What is machine learning?", "documents": ["ML is a subset of AI", "Python is a programming language", "Neural networks learn patterns"], "top_n": 2}'

# Anthropic Messages API (routed to OpenAI backend)
curl http://localhost:7979/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}]}'

Configuration Structure

model_list:
  - model_name: my-model           # Display name for the model
    protected: true                # Requires admin password to edit/delete
    model_params:
      api_base: https://api.example.com/v1  # Backend URL
      api_key: sk-xxx              # API key (use env vars for security)
      model: gpt-4o                # Upstream model name (optional)
      request_timeout: 30          # Timeout in seconds (optional)
      target_model: gpt-4          # Rewrite model name in requests (optional)
      supports_reasoning: false    # Enable reasoning block injection (optional)
      api_type: openai             # API type: openai (default), anthropic (optional)
    parameters:                    # Per-model parameter defaults (optional)
      temperature:
        default: 1.0               # Default value if not specified in request
        allow_override: false      # If true, use request value if provided (default: true)

proxy_settings:
  server:
    host: 127.0.0.1                # Bind address
    port: 7979                     # Port number
  enable_responses_endpoint: false # Enable /v1/responses endpoint
  logging:                         # Request logging options (optional)
    log_parsed_response: false     # Write parsed non-stream responses to *.parsed.log
    log_parsed_stream: false       # Write parsed stream chunks to *.parsed.log
  parsers:                         # Response parser pipeline (optional)
    enabled: false
    response:                      # Ordered response parsers
      - parse_tags
      - swap_reasoning_content
    paths:                         # Apply to paths containing any of these strings
      - /chat/completions
    parse_tags:                    # Parse <think> / <tool_call> tags into structured fields
      template_path: ""            # Optional: auto-detect config from Jinja template
      parse_thinking: true
      parse_tool_calls: true
      think_tag: think
      tool_arg_format: xml         # xml | json (for K2-style JSON arguments)
      tool_tag: tool_call          # For xml format
      tool_open: "<tool_call>"     # Custom open delimiter (auto-set for json format)
      tool_close: "</tool_call>"   # Custom close delimiter (auto-set for json format)
      tool_buffer_limit: 200       # Optional: max buffered chars before treating as literal
    swap_reasoning_content:        # Swap reasoning_content <-> content
      mode: reasoning_to_content   # reasoning_to_content | content_to_reasoning | auto
      think_tag: think
      think_open:
        prefix: ""                 # Prefix before <think>
        suffix: ""                 # Suffix after <think>
      think_close:
        prefix: ""                 # Prefix before </think>
        suffix: ""                 # Suffix after </think>
      include_newline: true        # Default true: add newline between </think> and content

forwarder_settings:
  listen:
    host: 0.0.0.0                  # External listen address
    port: 6969                     # External listen port
  target:
    host: 127.0.0.1                # Proxy host
    port: 7979                     # Proxy port

http_forwarder_settings:
  preserve_host: true              # Preserve Host header (recommended)
  listen:
    host: 0.0.0.0                  # External listen address
    port: 6969                     # External listen port
  target:
    scheme: http
    host: 127.0.0.1
    port: 7979

Protected models require YALLMP_ADMIN_PASSWORD in configs/.env for edits/deletes via admin APIs.

Per-model parser overrides can be added to individual entries in model_list. When present, they replace the global proxy_settings.parsers config for that model. If enabled is omitted, per-model parsers default to enabled (set enabled: false to explicitly disable parsing on that model).

Parsing notes: parse_tags can auto-detect think/tool tags and argument format from a Jinja template when template_path is provided. Use tool_arg_format: json for K2-style models that output JSON arguments (e.g., <|tool_call_begin|>func<|arg|>{"key":"val"}). Tool parsing buffers until a full block is parsed; if tool_buffer_limit is set and exceeded, the tag is treated as literal text.

To help align formatting with a Jinja chat template, you can inspect a template and print suggested think_open/think_close prefixes and suffixes:

uv run python scripts/inspect_template.py configs/jinja_templates/template_example.jinja

Copy the suggested values into your swap_reasoning_content config. If you set think_close.suffix to include a newline, consider setting include_newline: false to avoid double newlines.

model_list:
  - model_name: GLM-4.7
    model_params:
      api_base: https://api.z.ai/api/coding/paas/v4
      api_key: ${GLM_API_KEY}
    parsers:
      enabled: true
      response:
        - swap_reasoning_content
      swap_reasoning_content:
        mode: reasoning_to_content

Forwarders

TCP forwarder

The TCP forwarder is optional but useful when you need a separate inbound port or a separate process for the inbound traffic (e.g. you have a VPN which you must use for API access, but it breaks inbound traffic (or WSL shenanigans); you can whitelist the forwarder executable/process, or run the forwarder in Windows, and keep the proxy running under VPN/in WSL).

It reads forwarder_settings from configs/config.yaml. You can override with FORWARD_LISTEN_HOST, FORWARD_LISTEN_PORT, FORWARD_TARGET_HOST, FORWARD_TARGET_PORT at runtime.

HTTP forwarder (reverse proxy)

Use the HTTP forwarder when you want a protocol-aware reverse proxy (e.g., to avoid connection resets on large JSON responses). It preserves the Host header by default and streams SSE responses.

It reads http_forwarder_settings from configs/config.yaml. You can override with HTTP_FORWARD_LISTEN_HOST, HTTP_FORWARD_LISTEN_PORT, HTTP_FORWARD_TARGET_SCHEME, HTTP_FORWARD_TARGET_HOST, HTTP_FORWARD_TARGET_PORT, and HTTP_FORWARD_PRESERVE_HOST.

SSL/TLS Support:

The HTTP forwarder supports HTTPS with SSL certificates:

http_forwarder_settings:
  listen:
    host: 0.0.0.0
    port: 6969
  target:
    scheme: http
    host: 127.0.0.1
    port: 7979
  ssl:
    enabled: true
    cert_file: certs/localhost+2.pem
    key_file: certs/localhost+2-key.pem
  debug: false  # Enable debug logging
  timeout_seconds: 300  # Request timeout (optional)

Generate certificates using mkcert:

task ssl:generate-certs

Environment variable overrides: HTTP_FORWARD_SSL_ENABLED, HTTP_FORWARD_SSL_CERT, HTTP_FORWARD_SSL_KEY, HTTP_FORWARD_DEBUG, HTTP_FORWARD_TIMEOUT.

Note: both forwarders default to port 6969. If you want to run both at once, change one of their ports.

Documentation

API Reference - Complete endpoint documentation
Configuration Guide - Config options, environment variables
Project Structure - Directory layout and module descriptions
Known Issues - Known bugs, limitations, and workarounds

yaLLMproxy/
├── src/                  # Source code
│   ├── modules/           # Request/response pipeline modules
│   ├── parsers/           # Response module pipeline (legacy parsers alias)
│   ├── core/              # Core proxy functionality
│   ├── api/               # HTTP API layer
│   ├── database/           # Database support
│   ├── logging/           # Request/response logging
│   ├── middleware/        # Request/response middleware
│   ├── routing/           # Model routing utilities
│   └── types/             # Type definitions
├── static/admin/         # Admin UI (includes logs viewer)
├── configs/              # Configuration files
│   └── jinja_templates/  # Jinja2 templates for parsing
├── docs/                 # Detailed documentation
├── scripts/              # Utility scripts
└── tests/                # Test suite

Commands

Command	Description
`task run`	Start the proxy
`task run:reload`	Start with autoreload (development)
`task test`	Run tests
`task forwarder`	Run TCP forwarder
`task forwarder:http`	Run HTTP forwarder
`task ssl:generate-certs`	Generate SSL certificates using mkcert
`uv run pytest tests/`	Run tests directly

Runtime Management

Hot Reload Configuration

You can reload the proxy configuration without restarting by calling the admin API:

curl -X POST http://localhost:7979/admin/config/reload

Or use the Reload button in the admin UI (/admin/).

This will:

Reload config.yaml from disk
Re-parse all models and backends
Update the router's runtime state atomically
Apply model inheritance changes to derived models

Useful for applying config changes without interrupting active requests. Note: Changes to base models update derived models in runtime config, but require router reload to apply to active backends.

Database Configuration

yaLLMproxy supports SQLite (default) and PostgreSQL databases for persistent logging. Database request logging is enabled by default when the database module is configured and reachable.

SQLite (Default)

SQLite is enabled by default with no additional configuration:

database:
  backend: sqlite
  connection:
    sqlite:
      path: logs/yaLLM.db
  pool_size: 5                  # Connection pool size (PostgreSQL; ignored for file-based SQLite)
  max_overflow: 10              # Extra overflow connections above pool_size (PostgreSQL)

PostgreSQL

To use PostgreSQL, update your configuration:

database:
  backend: postgres
  connection:
    postgres:
      host: localhost
      port: 5432
      database: yallm_proxy
      user: ${DB_USER}
      password: ${DB_PASSWORD}
  pool_size: 5                  # Base pool size for PostgreSQL connections
  max_overflow: 10              # Extra overflow connections above pool_size

Add credentials to your .env file:

DB_USER=your_postgres_user
DB_PASSWORD=your_postgres_password

Database Tasks

task db:migrate    # Run database migrations
task db:rollback   # Rollback last migration
task db:current    # Show current revision
task db:history    # Show migration history
task clean         # Clear logs (preserves database)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.claude		.claude
.cursor/commands		.cursor/commands
certs		certs
configs		configs
docs		docs
scripts		scripts
src		src
static		static
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
__init__.py		__init__.py
pyproject.toml		pyproject.toml
shared_core.md		shared_core.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

y(et) a(nother) LLM proxy (yallmp)

Features

Installation

Prerequisites

Install with uv

Alternative: Using scripts

Quick Test

Configuration Structure

Forwarders

TCP forwarder

HTTP forwarder (reverse proxy)

Documentation

Commands

Runtime Management

Hot Reload Configuration

Database Configuration

SQLite (Default)

PostgreSQL

Database Tasks

See Also

License

About

Uh oh!

Releases

Packages

Languages

License

Ouna-the-Dataweaver/yaLLMproxy

Folders and files

Latest commit

History

Repository files navigation

y(et) a(nother) LLM proxy (yallmp)

Features

Installation

Prerequisites

Install with uv

Alternative: Using scripts

Quick Test

Configuration Structure

Forwarders

TCP forwarder

HTTP forwarder (reverse proxy)

Documentation

Commands

Runtime Management

Hot Reload Configuration

Database Configuration

SQLite (Default)

PostgreSQL

Database Tasks

See Also

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages