feat(proxy): request body cap + per-tenant rate limit plumbing by epappas · Pull Request #240 · techlab-innov/llmtrace

epappas · 2026-05-19T17:57:11Z

Summary

Two SaaS gaps for the Basilica-based per-tenant provisioning shipped in #229 / #233:

Request body size cap (Rust) — the proxy had no hard cap, so one large payload could drive the ML detectors and trace pipeline arbitrarily hard.
Per-tenant rate limit plumbing (Python + Rust) — RateLimitConfig / TenantRateLimitOverride existed in core and were enforced by RateLimiter::check, but the per-tenant value could not be surfaced through the Basilica tenant YAML.

This PR closes both.

What changed

Rust — `crates/llmtrace-proxy`

Cargo.toml — enable tower-http's limit feature.
src/main.rs
- DEFAULT_MAX_REQUEST_BYTES = 1 MiB.
- resolve_max_request_bytes() reads LLMTRACE_MAX_REQUEST_BYTES, falling back to the default on missing / unparseable / non-positive values.
- build_router applies tower_http::limit::RequestBodyLimitLayer::new(body_cap) so oversized requests (with honest Content-Length) get HTTP 413 Payload Too Large before any handler runs.
- Startup info! log now includes max_request_bytes, rate_limit_rps, rate_limit_burst.
- New tests: test_resolve_max_request_bytes_* (4 cases) + test_request_body_cap_rejects_oversized_payload (413 path) + test_request_body_cap_allows_payload_under_default_limit (small body still routes through).
src/config.rs
- apply_env_overrides honours LLMTRACE_RATE_LIMIT_RPS and LLMTRACE_RATE_LIMIT_BURST via a parse_positive_u32 helper that silently ignores zero / unparseable values (a typo must never disable rate limiting wholesale).
- New tests: test_apply_env_overrides_rate_limit_rps_and_burst, test_apply_env_overrides_rate_limit_ignores_invalid.

Python — `deployments/basilica`

lifecycle.py
- New frozen RateLimitSpec(requests_per_second: int, burst_size: int) dataclass with __post_init__ validation (both must be > 0).
- TenantSpec.rate_limit: Optional[RateLimitSpec] (default None).
- _apply_rate_limit(proxy_spec, rate_limit) injects LLMTRACE_RATE_LIMIT_RPS / LLMTRACE_RATE_LIMIT_BURST into the proxy ComponentSpec.env. Precedence matches _apply_proxy_auth: the spec-derived value overrides any same-named value the caller put in proxy.env.
- provision() calls _apply_rate_limit when spec.rate_limit is not None. Because update(strategy="recreate") reuses provision(), recreates pick this up automatically; strategy="restart" does not (matches the existing model — restart keeps the live config).
cli.py
- Optional top-level rate_limit: block. Picked the top level (not under proxy:) because it is a tenant-shape concern, not a proxy-image config concern — same place as enable_proxy_auth and api_key. Fails fast on missing required keys or non-mapping shapes.
configs/examples/{starter,pro}.yaml — commented-out rate_limit: block so the field is discoverable but optional.
README.md — new Per-tenant rate_limit subsection (table of env-var bindings, validation rules, precedence, on-proxy parsing) and a new Request body cap subsection (default 1 MiB, env override, 413 vs 400 nuance for streamed bodies); the top-level optional-fields table now lists rate_limit.

Validation

Rust

$ cargo fmt --all --check
(clean)

$ cargo build -p llmtrace
Finished `dev` profile [unoptimized + debuginfo] target(s) in 31.36s

$ cargo clippy -p llmtrace --lib --bins -- -D warnings
Finished `dev` profile [unoptimized + debuginfo] target(s) in 3m 04s

$ cargo test -p llmtrace --lib
test result: ok. 597 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

$ cargo test -p llmtrace --lib --bins
21 binary tests pass, 597 lib tests pass.

A first cargo test -p llmtrace --lib --bins run flaked once on action_router::tests::test_webhook_action_delivers_payload (unrelated to this PR — a network-bound webhook test); a targeted re-run passed cleanly. A second full run also passed without flakes.

Python

Ran in a venv with basilica-sdk + PyYAML:

$ python -c "from deployments.basilica import lifecycle, cli; print('ok')"
ok

$ python <inline tests>
rate_limit injection OK
rate_limit override semantics OK
rejected rps=0 as expected: rate_limit.requests_per_second must be > 0, got 0
rejected burst=-1 as expected: rate_limit.burst_size must be > 0, got -1
rate_limit optional default OK
ALL OK

$ python <cli parse tests>
cli rate_limit parse OK
cli rate_limit: null OK
cli rate_limit absent OK
missing-key rejected: rate_limit is missing required keys: ['burst_size']
non-mapping rejected: rate_limit must be a mapping, got list
ALL OK

Both example YAMLs (starter.yaml, pro.yaml) still parse via cli._tenant_spec_from_config with rate_limit=None (block commented out as shipped); uncommenting the block parses into a populated RateLimitSpec.

Body cap end-to-end

Validated inside the cargo test harness only. A request with Content-Length over the configured cap returns 413; a small body still reaches proxy_handler (502 because upstream is unreachable in the test, not 413). I did not run the proxy binary outside of tower::ServiceExt::oneshot and have not exercised curl against a live axum::serve instance — flagging this for live verification before going to production tenants.

What was not validated

I did not run the proxy binary against a live LLMTRACE_MAX_REQUEST_BYTES env var with curl. The router-level test exercises the same code path (build_router reads the env at construction), but a live binary test is the strongest signal.
I did not provision a real Basilica deployment with rate_limit: set and verify the proxy logs the overridden rps / burst at startup. The proxy startup info! log line now includes both values, so the next provision should make this trivial to confirm in CI logs.
The proxy reads the env vars at startup; the env-var override path is unit-tested. End-to-end, the chain is: Basilica deploy env → proxy container → apply_env_overrides → config.rate_limiting → RateLimiter::new. Each link is covered by an existing or new test, but I did not run a multi-tenant load test.

Trade-offs / choices

RequestBodyLimitLayer over DefaultBodyLimit — DefaultBodyLimit requires extractors to enforce the cap, and the proxy's proxy_handler reads the body manually via axum::body::to_bytes. The Tower layer enforces the cap before the handler is invoked, which gives a clean 413 for honest clients.
Why streamed-without-Content-Length bodies return 400 rather than 413 — RequestBodyLimitLayer short-circuits with 413 only when Content-Length is set and exceeds the limit. For chunked bodies the layer wraps the stream and errors mid-read; proxy_handler catches that as a generic body-read error and returns 400. The server still does not OOM (the layer caps the bytes streamed), so the protective behaviour is intact — only the status code is downgraded. Worth a follow-up if we want a uniform 413, but it would mean distinguishing the body-limit error from other to_bytes errors in proxy_handler.
Top-level rate_limit: vs under proxy: — went with top-level. It is a tenant-shape concern (same category as enable_proxy_auth, api_key), not a proxy-image config concern. Co-locating with auth controls keeps the YAML readable.
Override precedence: spec wins over proxy.env — mirrors _apply_proxy_auth. An operator who sets LLMTRACE_RATE_LIMIT_RPS directly in proxy.env is doing it for a reason, but the tenant spec block is the source of truth for the deployment. If we want the opposite, swap the spread order in _apply_rate_limit.

Test plan

CI green on feat/proxy-body-cap-and-rate-limits (Rust + Python checks).
Live: set LLMTRACE_MAX_REQUEST_BYTES=2048 on a sandbox proxy, curl -X POST with a > 2 KiB body, expect 413.
Live: provision a tenant via cli.py with rate_limit: { requests_per_second: 5, burst_size: 10 }, hit the proxy faster than 5 rps, expect rate-limit hits sooner than the default 100 rps would allow; inspect the startup log line for rate_limit_rps=5 rate_limit_burst=10.

Adds a 1 MiB request body cap on the Axum router (configurable via LLMTRACE_MAX_REQUEST_BYTES) so a single oversized payload cannot drive ML detectors or the trace pipeline arbitrarily hard, and surfaces per-tenant rate-limit knobs through the Basilica tenant config so SaaS tenants can be shaped without rebuilding the proxy YAML. Rust side - crates/llmtrace-proxy/Cargo.toml: enable tower-http "limit" feature. - crates/llmtrace-proxy/src/main.rs: resolve_max_request_bytes() reads LLMTRACE_MAX_REQUEST_BYTES with a 1 MiB fallback on missing / invalid / non-positive values; build_router applies RequestBodyLimitLayer so oversized requests (with honest Content-Length) are rejected with HTTP 413 before any handler executes. New unit + router tests cover defaults, parse fallbacks, the 413 rejection path, and that small bodies still reach the proxy handler. - crates/llmtrace-proxy/src/config.rs: apply_env_overrides now honours LLMTRACE_RATE_LIMIT_RPS and LLMTRACE_RATE_LIMIT_BURST. A parse_positive_u32 helper silently ignores zero / unparseable values so a typo cannot disable rate limiting wholesale. Python / Basilica side - deployments/basilica/lifecycle.py: new frozen RateLimitSpec dataclass with __post_init__ validation; optional TenantSpec.rate_limit field; _apply_rate_limit injects LLMTRACE_RATE_LIMIT_RPS and LLMTRACE_RATE_LIMIT_BURST into the proxy ComponentSpec env at provision time, mirroring _apply_proxy_auth's precedence (spec wins over caller env). - deployments/basilica/cli.py: parse the optional top-level rate_limit block; fail fast on missing required keys or non-mapping shapes. - configs/examples/{starter,pro}.yaml: commented-out rate_limit block so the field is discoverable but optional. - deployments/basilica/README.md: new "Per-tenant rate_limit" and "Request body cap" subsections under "Tenant config format"; table row added for the new top-level field.

epappas force-pushed the feat/proxy-body-cap-and-rate-limits branch from 201931d to 790251b Compare May 19, 2026 18:49

epappas merged commit 252650f into main May 19, 2026
15 checks passed

epappas mentioned this pull request May 20, 2026

ops(basilica): default startup_timeout too tight when ML preload is on #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): request body cap + per-tenant rate limit plumbing#240

feat(proxy): request body cap + per-tenant rate limit plumbing#240
epappas merged 1 commit into
mainfrom
feat/proxy-body-cap-and-rate-limits

epappas commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

epappas commented May 19, 2026

Summary

What changed

Rust — crates/llmtrace-proxy

Python — deployments/basilica

Validation

Rust

Python

Body cap end-to-end

What was not validated

Trade-offs / choices

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rust — `crates/llmtrace-proxy`

Python — `deployments/basilica`