Skip to content

feat(proxy): request body cap + per-tenant rate limit plumbing#240

Merged
epappas merged 1 commit into
mainfrom
feat/proxy-body-cap-and-rate-limits
May 19, 2026
Merged

feat(proxy): request body cap + per-tenant rate limit plumbing#240
epappas merged 1 commit into
mainfrom
feat/proxy-body-cap-and-rate-limits

Conversation

@epappas
Copy link
Copy Markdown
Collaborator

@epappas epappas commented May 19, 2026

Summary

Two SaaS gaps for the Basilica-based per-tenant provisioning shipped in #229 / #233:

  1. Request body size cap (Rust) — the proxy had no hard cap, so one large payload could drive the ML detectors and trace pipeline arbitrarily hard.
  2. Per-tenant rate limit plumbing (Python + Rust)RateLimitConfig / TenantRateLimitOverride existed in core and were enforced by RateLimiter::check, but the per-tenant value could not be surfaced through the Basilica tenant YAML.

This PR closes both.

What changed

Rust — crates/llmtrace-proxy

  • Cargo.toml — enable tower-http's limit feature.
  • src/main.rs
    • DEFAULT_MAX_REQUEST_BYTES = 1 MiB.
    • resolve_max_request_bytes() reads LLMTRACE_MAX_REQUEST_BYTES, falling back to the default on missing / unparseable / non-positive values.
    • build_router applies tower_http::limit::RequestBodyLimitLayer::new(body_cap) so oversized requests (with honest Content-Length) get HTTP 413 Payload Too Large before any handler runs.
    • Startup info! log now includes max_request_bytes, rate_limit_rps, rate_limit_burst.
    • New tests: test_resolve_max_request_bytes_* (4 cases) + test_request_body_cap_rejects_oversized_payload (413 path) + test_request_body_cap_allows_payload_under_default_limit (small body still routes through).
  • src/config.rs
    • apply_env_overrides honours LLMTRACE_RATE_LIMIT_RPS and LLMTRACE_RATE_LIMIT_BURST via a parse_positive_u32 helper that silently ignores zero / unparseable values (a typo must never disable rate limiting wholesale).
    • New tests: test_apply_env_overrides_rate_limit_rps_and_burst, test_apply_env_overrides_rate_limit_ignores_invalid.

Python — deployments/basilica

  • lifecycle.py
    • New frozen RateLimitSpec(requests_per_second: int, burst_size: int) dataclass with __post_init__ validation (both must be > 0).
    • TenantSpec.rate_limit: Optional[RateLimitSpec] (default None).
    • _apply_rate_limit(proxy_spec, rate_limit) injects LLMTRACE_RATE_LIMIT_RPS / LLMTRACE_RATE_LIMIT_BURST into the proxy ComponentSpec.env. Precedence matches _apply_proxy_auth: the spec-derived value overrides any same-named value the caller put in proxy.env.
    • provision() calls _apply_rate_limit when spec.rate_limit is not None. Because update(strategy="recreate") reuses provision(), recreates pick this up automatically; strategy="restart" does not (matches the existing model — restart keeps the live config).
  • cli.py
    • Optional top-level rate_limit: block. Picked the top level (not under proxy:) because it is a tenant-shape concern, not a proxy-image config concern — same place as enable_proxy_auth and api_key. Fails fast on missing required keys or non-mapping shapes.
  • configs/examples/{starter,pro}.yaml — commented-out rate_limit: block so the field is discoverable but optional.
  • README.md — new Per-tenant rate_limit subsection (table of env-var bindings, validation rules, precedence, on-proxy parsing) and a new Request body cap subsection (default 1 MiB, env override, 413 vs 400 nuance for streamed bodies); the top-level optional-fields table now lists rate_limit.

Validation

Rust

$ cargo fmt --all --check
(clean)

$ cargo build -p llmtrace
Finished `dev` profile [unoptimized + debuginfo] target(s) in 31.36s

$ cargo clippy -p llmtrace --lib --bins -- -D warnings
Finished `dev` profile [unoptimized + debuginfo] target(s) in 3m 04s

$ cargo test -p llmtrace --lib
test result: ok. 597 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

$ cargo test -p llmtrace --lib --bins
21 binary tests pass, 597 lib tests pass.

A first cargo test -p llmtrace --lib --bins run flaked once on action_router::tests::test_webhook_action_delivers_payload (unrelated to this PR — a network-bound webhook test); a targeted re-run passed cleanly. A second full run also passed without flakes.

Python

Ran in a venv with basilica-sdk + PyYAML:

$ python -c "from deployments.basilica import lifecycle, cli; print('ok')"
ok

$ python <inline tests>
rate_limit injection OK
rate_limit override semantics OK
rejected rps=0 as expected: rate_limit.requests_per_second must be > 0, got 0
rejected burst=-1 as expected: rate_limit.burst_size must be > 0, got -1
rate_limit optional default OK
ALL OK

$ python <cli parse tests>
cli rate_limit parse OK
cli rate_limit: null OK
cli rate_limit absent OK
missing-key rejected: rate_limit is missing required keys: ['burst_size']
non-mapping rejected: rate_limit must be a mapping, got list
ALL OK

Both example YAMLs (starter.yaml, pro.yaml) still parse via cli._tenant_spec_from_config with rate_limit=None (block commented out as shipped); uncommenting the block parses into a populated RateLimitSpec.

Body cap end-to-end

Validated inside the cargo test harness only. A request with Content-Length over the configured cap returns 413; a small body still reaches proxy_handler (502 because upstream is unreachable in the test, not 413). I did not run the proxy binary outside of tower::ServiceExt::oneshot and have not exercised curl against a live axum::serve instance — flagging this for live verification before going to production tenants.

What was not validated

  • I did not run the proxy binary against a live LLMTRACE_MAX_REQUEST_BYTES env var with curl. The router-level test exercises the same code path (build_router reads the env at construction), but a live binary test is the strongest signal.
  • I did not provision a real Basilica deployment with rate_limit: set and verify the proxy logs the overridden rps / burst at startup. The proxy startup info! log line now includes both values, so the next provision should make this trivial to confirm in CI logs.
  • The proxy reads the env vars at startup; the env-var override path is unit-tested. End-to-end, the chain is: Basilica deploy env → proxy container → apply_env_overridesconfig.rate_limitingRateLimiter::new. Each link is covered by an existing or new test, but I did not run a multi-tenant load test.

Trade-offs / choices

  • RequestBodyLimitLayer over DefaultBodyLimitDefaultBodyLimit requires extractors to enforce the cap, and the proxy's proxy_handler reads the body manually via axum::body::to_bytes. The Tower layer enforces the cap before the handler is invoked, which gives a clean 413 for honest clients.
  • Why streamed-without-Content-Length bodies return 400 rather than 413RequestBodyLimitLayer short-circuits with 413 only when Content-Length is set and exceeds the limit. For chunked bodies the layer wraps the stream and errors mid-read; proxy_handler catches that as a generic body-read error and returns 400. The server still does not OOM (the layer caps the bytes streamed), so the protective behaviour is intact — only the status code is downgraded. Worth a follow-up if we want a uniform 413, but it would mean distinguishing the body-limit error from other to_bytes errors in proxy_handler.
  • Top-level rate_limit: vs under proxy: — went with top-level. It is a tenant-shape concern (same category as enable_proxy_auth, api_key), not a proxy-image config concern. Co-locating with auth controls keeps the YAML readable.
  • Override precedence: spec wins over proxy.env — mirrors _apply_proxy_auth. An operator who sets LLMTRACE_RATE_LIMIT_RPS directly in proxy.env is doing it for a reason, but the tenant spec block is the source of truth for the deployment. If we want the opposite, swap the spread order in _apply_rate_limit.

Test plan

  • CI green on feat/proxy-body-cap-and-rate-limits (Rust + Python checks).
  • Live: set LLMTRACE_MAX_REQUEST_BYTES=2048 on a sandbox proxy, curl -X POST with a > 2 KiB body, expect 413.
  • Live: provision a tenant via cli.py with rate_limit: { requests_per_second: 5, burst_size: 10 }, hit the proxy faster than 5 rps, expect rate-limit hits sooner than the default 100 rps would allow; inspect the startup log line for rate_limit_rps=5 rate_limit_burst=10.

Adds a 1 MiB request body cap on the Axum router (configurable via
LLMTRACE_MAX_REQUEST_BYTES) so a single oversized payload cannot drive
ML detectors or the trace pipeline arbitrarily hard, and surfaces
per-tenant rate-limit knobs through the Basilica tenant config so SaaS
tenants can be shaped without rebuilding the proxy YAML.

Rust side
- crates/llmtrace-proxy/Cargo.toml: enable tower-http "limit" feature.
- crates/llmtrace-proxy/src/main.rs: resolve_max_request_bytes() reads
  LLMTRACE_MAX_REQUEST_BYTES with a 1 MiB fallback on missing / invalid
  / non-positive values; build_router applies RequestBodyLimitLayer so
  oversized requests (with honest Content-Length) are rejected with HTTP
  413 before any handler executes. New unit + router tests cover defaults,
  parse fallbacks, the 413 rejection path, and that small bodies still
  reach the proxy handler.
- crates/llmtrace-proxy/src/config.rs: apply_env_overrides now honours
  LLMTRACE_RATE_LIMIT_RPS and LLMTRACE_RATE_LIMIT_BURST. A parse_positive_u32
  helper silently ignores zero / unparseable values so a typo cannot
  disable rate limiting wholesale.

Python / Basilica side
- deployments/basilica/lifecycle.py: new frozen RateLimitSpec dataclass
  with __post_init__ validation; optional TenantSpec.rate_limit field;
  _apply_rate_limit injects LLMTRACE_RATE_LIMIT_RPS and
  LLMTRACE_RATE_LIMIT_BURST into the proxy ComponentSpec env at
  provision time, mirroring _apply_proxy_auth's precedence (spec wins
  over caller env).
- deployments/basilica/cli.py: parse the optional top-level rate_limit
  block; fail fast on missing required keys or non-mapping shapes.
- configs/examples/{starter,pro}.yaml: commented-out rate_limit block
  so the field is discoverable but optional.
- deployments/basilica/README.md: new "Per-tenant rate_limit" and
  "Request body cap" subsections under "Tenant config format"; table
  row added for the new top-level field.
@epappas epappas force-pushed the feat/proxy-body-cap-and-rate-limits branch from 201931d to 790251b Compare May 19, 2026 18:49
@epappas epappas merged commit 252650f into main May 19, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant