perf: enable TLS session tickets + 0-RTT in cvm-ingress#7
Merged
Conversation
Add ssl_session_tickets and ssl_early_data to the TLS server block, and forward the Early-Data header to the backend so cloud-api can choose to reject 0-RTT on non-idempotent requests. Also bump ssl_session_cache from 10m (~40k sessions) to 50m (~200k) to match the size recommended for nearai/dstack-ingress-vpc#17 and provide headroom for the public-facing cloud-api ingress. Saves ~1 RTT (~100ms from typical client locations) on every TLS reconnect to cloud-api.near.ai and cloud-stg-api.near.ai.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Saves ~1 RTT (~100ms from typical client locations to cpu01/cpu02) on every TLS reconnect to
cloud-api.near.aiandcloud-stg-api.near.ai. Every non-keep-alive client currently pays a full cold TLS 1.3 handshake — enabling session resumption (tickets) eliminates the extra round-trip on resumption, and 0-RTT eliminates it entirely on the first early-data flight.Same rationale as the parallel PR against the chat-api / inference vllm-ingress: nearai/dstack-ingress-vpc#17. cloud-api uses a different ingress (sidecar pattern in this repo), so this parallel PR was needed for fleet consistency.
What changed
In
nginx/tls.conf.template:ssl_session_tickets on;— explicit (was relying on nginx default)ssl_early_data on;— was off (nginx default)ssl_session_cache shared:SSL:10m→50m(~40k → ~200k sessions) to match dstack-ingress-vpc#17 for fleet consistency; cost is negligibleproxy_set_header Early-Data $ssl_early_data;— forwards the variable so the backend can reject 0-RTT on non-idempotent requests (RFC 8470)No Dockerfile / base image change. Scope intentionally narrow to avoid conflicts with a parallel HTTP/3 / QUIC PR that may need to bump the base image away from
debian:bookworm-slimapt nginx (1.22) to get--with-http_v3_module.0-RTT replay risk
ssl_early_data onmeans 0-RTT data is replayable by an attacker who captures it (TLS 1.3 stateless tickets). Standard mitigation per RFC 8470 is for the backend to check theEarly-Data: 1header and respond with 425 Too Early on non-idempotent methods (POST, etc.).Cloud-api does not reject on this header today. This PR forwards the header but does not change backend behaviour — flagging as a follow-up in
nearai/cloud-api. In the meantime, the practical exposure is bounded: TLS 1.3 0-RTT replay windows are short (ticket lifetime + clock skew), and our typical client flow is idempotent (GET /v1/models, etc.); the actually-sensitive non-idempotent endpoints (completions, etc.) are still expensive enough that an attacker replaying them gains little they couldn't get by replaying the full request post-handshake.Conservative reviewer alternative if the above is unacceptable: leave
ssl_early_data offfor now (keep session tickets only) and ship 0-RTT once cloud-api rejects on the header. Recommendation is to ship both and queue the cloud-api follow-up.Deployment
Requires rebuilding the cvm-ingress Docker image and rolling all 20 cloud-api CVMs (10 prod on cpu01/cpu02 ports 9440-9449 + 10 staging on ports 9450-9459) via the standard
cvm-ansible-playbooksflow. Not initiated here — flagged as deployment dependency for whoever merges.Verification
bash -n entrypoint.shcleandocker build .succeeds against the pinneddebian:bookworm-slimbasenginx -tclean against the renderedtls.conf.template(with a self-signed cert) — i.e.TLS_ENABLED=truepathnginx -tclean against the rendereddefault.conf.template— i.e.TLS_ENABLED=falsepath (untouched, sanity check only)Related
ssl_*directives + oneproxy_set_header) to minimize merge conflict.