feat(proxy): request body cap + per-tenant rate limit plumbing#240
Merged
Conversation
Adds a 1 MiB request body cap on the Axum router (configurable via
LLMTRACE_MAX_REQUEST_BYTES) so a single oversized payload cannot drive
ML detectors or the trace pipeline arbitrarily hard, and surfaces
per-tenant rate-limit knobs through the Basilica tenant config so SaaS
tenants can be shaped without rebuilding the proxy YAML.
Rust side
- crates/llmtrace-proxy/Cargo.toml: enable tower-http "limit" feature.
- crates/llmtrace-proxy/src/main.rs: resolve_max_request_bytes() reads
LLMTRACE_MAX_REQUEST_BYTES with a 1 MiB fallback on missing / invalid
/ non-positive values; build_router applies RequestBodyLimitLayer so
oversized requests (with honest Content-Length) are rejected with HTTP
413 before any handler executes. New unit + router tests cover defaults,
parse fallbacks, the 413 rejection path, and that small bodies still
reach the proxy handler.
- crates/llmtrace-proxy/src/config.rs: apply_env_overrides now honours
LLMTRACE_RATE_LIMIT_RPS and LLMTRACE_RATE_LIMIT_BURST. A parse_positive_u32
helper silently ignores zero / unparseable values so a typo cannot
disable rate limiting wholesale.
Python / Basilica side
- deployments/basilica/lifecycle.py: new frozen RateLimitSpec dataclass
with __post_init__ validation; optional TenantSpec.rate_limit field;
_apply_rate_limit injects LLMTRACE_RATE_LIMIT_RPS and
LLMTRACE_RATE_LIMIT_BURST into the proxy ComponentSpec env at
provision time, mirroring _apply_proxy_auth's precedence (spec wins
over caller env).
- deployments/basilica/cli.py: parse the optional top-level rate_limit
block; fail fast on missing required keys or non-mapping shapes.
- configs/examples/{starter,pro}.yaml: commented-out rate_limit block
so the field is discoverable but optional.
- deployments/basilica/README.md: new "Per-tenant rate_limit" and
"Request body cap" subsections under "Tenant config format"; table
row added for the new top-level field.
201931d to
790251b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two SaaS gaps for the Basilica-based per-tenant provisioning shipped in #229 / #233:
RateLimitConfig/TenantRateLimitOverrideexisted in core and were enforced byRateLimiter::check, but the per-tenant value could not be surfaced through the Basilica tenant YAML.This PR closes both.
What changed
Rust —
crates/llmtrace-proxyCargo.toml— enabletower-http'slimitfeature.src/main.rsDEFAULT_MAX_REQUEST_BYTES = 1 MiB.resolve_max_request_bytes()readsLLMTRACE_MAX_REQUEST_BYTES, falling back to the default on missing / unparseable / non-positive values.build_routerappliestower_http::limit::RequestBodyLimitLayer::new(body_cap)so oversized requests (with honestContent-Length) get HTTP413 Payload Too Largebefore any handler runs.info!log now includesmax_request_bytes,rate_limit_rps,rate_limit_burst.test_resolve_max_request_bytes_*(4 cases) +test_request_body_cap_rejects_oversized_payload(413 path) +test_request_body_cap_allows_payload_under_default_limit(small body still routes through).src/config.rsapply_env_overrideshonoursLLMTRACE_RATE_LIMIT_RPSandLLMTRACE_RATE_LIMIT_BURSTvia aparse_positive_u32helper that silently ignores zero / unparseable values (a typo must never disable rate limiting wholesale).test_apply_env_overrides_rate_limit_rps_and_burst,test_apply_env_overrides_rate_limit_ignores_invalid.Python —
deployments/basilicalifecycle.pyRateLimitSpec(requests_per_second: int, burst_size: int)dataclass with__post_init__validation (both must be > 0).TenantSpec.rate_limit: Optional[RateLimitSpec](defaultNone)._apply_rate_limit(proxy_spec, rate_limit)injectsLLMTRACE_RATE_LIMIT_RPS/LLMTRACE_RATE_LIMIT_BURSTinto the proxyComponentSpec.env. Precedence matches_apply_proxy_auth: the spec-derived value overrides any same-named value the caller put inproxy.env.provision()calls_apply_rate_limitwhenspec.rate_limit is not None. Becauseupdate(strategy="recreate")reusesprovision(), recreates pick this up automatically;strategy="restart"does not (matches the existing model — restart keeps the live config).cli.pyrate_limit:block. Picked the top level (not underproxy:) because it is a tenant-shape concern, not a proxy-image config concern — same place asenable_proxy_authandapi_key. Fails fast on missing required keys or non-mapping shapes.configs/examples/{starter,pro}.yaml— commented-outrate_limit:block so the field is discoverable but optional.README.md— new Per-tenantrate_limitsubsection (table of env-var bindings, validation rules, precedence, on-proxy parsing) and a new Request body cap subsection (default 1 MiB, env override, 413 vs 400 nuance for streamed bodies); the top-level optional-fields table now listsrate_limit.Validation
Rust
A first
cargo test -p llmtrace --lib --binsrun flaked once onaction_router::tests::test_webhook_action_delivers_payload(unrelated to this PR — a network-bound webhook test); a targeted re-run passed cleanly. A second full run also passed without flakes.Python
Ran in a venv with
basilica-sdk+PyYAML:Both example YAMLs (
starter.yaml,pro.yaml) still parse viacli._tenant_spec_from_configwithrate_limit=None(block commented out as shipped); uncommenting the block parses into a populatedRateLimitSpec.Body cap end-to-end
Validated inside the cargo test harness only. A request with
Content-Lengthover the configured cap returns 413; a small body still reachesproxy_handler(502 because upstream is unreachable in the test, not 413). I did not run the proxy binary outside oftower::ServiceExt::oneshotand have not exercisedcurlagainst a liveaxum::serveinstance — flagging this for live verification before going to production tenants.What was not validated
LLMTRACE_MAX_REQUEST_BYTESenv var withcurl. The router-level test exercises the same code path (build_routerreads the env at construction), but a live binary test is the strongest signal.rate_limit:set and verify the proxy logs the overridden rps / burst at startup. The proxy startupinfo!log line now includes both values, so the next provision should make this trivial to confirm in CI logs.apply_env_overrides→config.rate_limiting→RateLimiter::new. Each link is covered by an existing or new test, but I did not run a multi-tenant load test.Trade-offs / choices
RequestBodyLimitLayeroverDefaultBodyLimit—DefaultBodyLimitrequires extractors to enforce the cap, and the proxy'sproxy_handlerreads the body manually viaaxum::body::to_bytes. The Tower layer enforces the cap before the handler is invoked, which gives a clean 413 for honest clients.RequestBodyLimitLayershort-circuits with 413 only whenContent-Lengthis set and exceeds the limit. For chunked bodies the layer wraps the stream and errors mid-read;proxy_handlercatches that as a generic body-read error and returns 400. The server still does not OOM (the layer caps the bytes streamed), so the protective behaviour is intact — only the status code is downgraded. Worth a follow-up if we want a uniform 413, but it would mean distinguishing the body-limit error from otherto_byteserrors inproxy_handler.rate_limit:vs underproxy:— went with top-level. It is a tenant-shape concern (same category asenable_proxy_auth,api_key), not a proxy-image config concern. Co-locating with auth controls keeps the YAML readable.proxy.env— mirrors_apply_proxy_auth. An operator who setsLLMTRACE_RATE_LIMIT_RPSdirectly inproxy.envis doing it for a reason, but the tenant spec block is the source of truth for the deployment. If we want the opposite, swap the spread order in_apply_rate_limit.Test plan
feat/proxy-body-cap-and-rate-limits(Rust + Python checks).LLMTRACE_MAX_REQUEST_BYTES=2048on a sandbox proxy,curl -X POSTwith a > 2 KiB body, expect 413.cli.pywithrate_limit: { requests_per_second: 5, burst_size: 10 }, hit the proxy faster than 5 rps, expect rate-limit hits sooner than the default 100 rps would allow; inspect the startup log line forrate_limit_rps=5 rate_limit_burst=10.