perf: enable TLS session resumption and 0-RTT in ingress nginx by Evrard-Nil · Pull Request #17 · nearai/dstack-ingress-vpc

Evrard-Nil · 2026-05-13T14:53:04Z

Summary

Flip ssl_session_tickets and ssl_early_data from off to on in
the two nginx config generators (scripts/entrypoint.sh and
scripts/generate-nginx-upstream.sh). The session cache
(shared:SSL:50m) and timeout (1d) were already in place — only
tickets and 0-RTT needed enabling.

Also forward the Early-Data header to all proxied locations
(proxy_set_header Early-Data $ssl_early_data;) so downstream backends
can reject 0-RTT requests on non-idempotent paths if they choose.

Why

For repeat clients that do not keep their TLS connection alive
— curl, mobile apps, some SDKs that close sockets between requests
— each new request currently pays a fresh TLS handshake on top of TCP.

TLS 1.3 full handshake: 1-RTT.
TLS 1.3 resumption with tickets: 1-RTT.
TLS 1.3 resumption with tickets + 0-RTT: 0-RTT — client sends
the request along with the ClientHello.

From a developer machine at ~99ms RTT to cpu01, that's ~100ms saved
per cold reconnect. The ingress sits on every CVM (cloud-api,
chat-api, inference CVMs), so the win applies to every public endpoint
behind it: cloud-api.near.ai, cloud.near.ai, agent.near.ai,
*.completions.near.ai.

0-RTT replay risk

RFC 8446 §8 documents that 0-RTT data can be replayed by an attacker
who captures the early-data payload. The mitigation surface is:

nginx forwards Early-Data: 1 to the upstream when the request
was carried in 0-RTT. Each location block in the generated configs
now sets:
```
proxy_set_header Early-Data $ssl_early_data;
```
Backends can reject Early-Data on side-effectful methods. None
of the backends currently behind this ingress check that header.
Follow-ups to track (separate PRs):
- cloud-api (cloud-api.near.ai) — POST /v1/chat/completions,
  POST /v1/responses, POST /v1/embeddings, etc. Strictly
  speaking these are idempotent in spec, but billing and audit
  logging mean we'd rather not replay them. Audit needed.
- inference-proxy — similar; POST inference + GPU attestation.
- chat-api (agent.near.ai) — POST chat / agent state mutations.
  Replay risk is highest here.
Until those land, the worst case for a single replay is: an
attacker who captured a ciphertext within the
ssl_session_timeout window (1d) replays it; the backend processes
it again. For LLM completions and stateless inference reads, the
user-observable damage is minimal — duplicate billing event at
worst. For chat-api the audit is more nuanced and is the most
pressing follow-up.

The conservative alternative is ssl_early_data off; and accept the
~100ms loss per cold reconnect. I'm choosing on here because (a)
TLS 1.3 0-RTT is widely deployed (Cloudflare, Google, AWS ALB default
it on), (b) Early-Data is propagated downstream so the mitigation
hook is in place, and (c) we can flip back to off in a one-line
follow-up if any of the follow-up audits surface a real risk.

Scope

Intentionally narrow:

Only ssl_session_tickets, ssl_early_data, and the
Early-Data proxy header.
No base-image bump, no listen 443 quic, no Alt-Svc, no layout
changes.

There is a parallel HTTP/3 PR in flight that bumps the base nginx
image and adds QUIC. To minimize merge conflict, this PR steers clear
of the listen blocks and the base image. Either PR can land first; the
other rebases trivially.

Deployment

Requires rebuilding the dstack-ingress-vpc image and rolling it out
to every CVM that runs this ingress (cpu01/cpu02 cloud-api prod+stg,
agent0/agent1 CVMs, all inference CVMs). This PR does not initiate
rollout; the image build job runs on merge to main, and the actual
CVM updates go through compose-manager / cvm-compose-files as usual.

Verification

bash -n clean on both modified scripts.
docker build . against the pinned base image (nginx@sha256:b6653fca…,
which is nginx 1.27.4) succeeds.
nginx -t clean for all four config-generation paths, using the
project's own built image:
- single-target mode without rate limiting
- single-target mode with RATE_LIMIT_PATHS
- upstream LB mode without rate limiting
- upstream LB mode with RATE_LIMIT_PATHS
After merge, plan to verify on one CVM (cpu01:9450 staging) before
rolling fleet-wide: confirm Session-ID reused and 0-RTT indicated
via openssl s_client -reconnect and curl --tls-max 1.3 -v over two
separate connections.

Test plan

CI build job (Build & Deploy) succeeds
Image pulls into one staging CVM (cpu01 cloud-api-stg slot)
openssl s_client -connect cloud-stg-api.near.ai:443 -reconnect
shows Reused, TLSv1.3 on second-and-later handshakes
curl -v https://cloud-stg-api.near.ai/health over a fresh
connection shows Early data was accepted by the server (or
equivalent client-side indicator) on a resumed handshake
No regression in normal request flow (smoke test
POST /v1/chat/completions via the existing infra-tests run)
Roll out to remaining CVMs via compose-manager once staging is
validated

Flip the existing ssl_session_tickets and ssl_early_data directives from off to on in both nginx config generators (setup_nginx_conf in entrypoint.sh and generate-nginx-upstream.sh). The session cache size (shared:SSL:50m) and timeout (1d) were already configured. For repeat clients that don't keep TLS connections alive (curl, mobile, some SDKs), this eliminates ~1 RTT per reconnect: tickets give 1-RTT TLS 1.3 resumption, ssl_early_data gives 0-RTT. From a typical client at ~100ms RTT to a CPU CVM, that is ~100ms saved per cold handshake. 0-RTT replay risk is mitigated by forwarding the Early-Data header to backends in every proxied location, so cloud-api / chat-api / inference-proxy can reject Early-Data on non-idempotent methods. None of them act on this header today; follow-up audits are tracked in the PR description. Scope is intentionally narrow: no base-image bump, no UDP/QUIC listen directives, no layout changes. There is a parallel PR adding HTTP/3 + Alt-Svc; this one only touches SSL session / ticket / early-data directives to minimize merge conflict. Verified: docker build succeeds; nginx -t passes for all four config-generation paths (single-target +/- rate-limit, upstream LB +/- rate-limit) using the project's own pinned base image.

claude · 2026-05-13T14:56:01Z

Review: TLS session resumption + 0-RTT

Thanks for the thorough write-up — the risk framing in the PR description is what made this easy to review. Most of the comments below are nits or follow-up suggestions; nothing here blocks merge in my opinion.

Higher-priority concerns

1. Ticket-key rotation / forward secrecy is overstated by the inline comment. The comment says:

ssl_session_tickets: nginx rotates ticket keys on reload — sufficient for a single-replica CVM.

Two things to tighten here:

nginx generates the ticket key once at startup, not on every reload. SIGHUP reloads do regenerate it, but reloads aren't automatic — without an external trigger, the same key encrypts every ticket for the lifetime of the worker.
Combined with ssl_session_timeout 1d, that means: if the ticket key is ever compromised (memory disclosure, side-channel, container snapshot), an attacker can decrypt up to one day of past resumed sessions. That defeats one of the main reasons we use ECDHE.

Given these are CVMs, the threat model is interesting — the worker memory is inside the TEE, so the surface is narrower than a generic VM. But it's worth either (a) softening the comment to be accurate, or (b) following up with a cron-driven `kill -HUP $(pidof nginx)` or an external `ssl_session_ticket_key` file rotated daily. Probably a separate PR.

2. Half-deployed mitigation window. The PR description acknowledges that none of the current backends inspect `Early-Data` yet, and lists the audits as separate follow-ups. That's fine as a sequencing choice, but the practical implication is: between merge and the backend-audit PRs landing, the deployed system has `ssl_early_data on` with no replay rejection at all. The mitigation hook (`Early-Data` header) is wired but no one is reading it.

For `chat-api` in particular — which the PR itself flags as the highest replay risk — it might be worth landing the backend `425 Too Early` check first, then this PR, rather than the other way around. Or at minimum, link the follow-up tickets here so the sequencing is visible.

3. WebSocket upgrades + 0-RTT. The `location ~ ^/(ws|socket.io)/` block also gets `Early-Data` forwarded, but a WebSocket upgrade handshake replayed in 0-RTT data would open a duplicate connection (and potentially bypass any one-shot auth-token-in-upgrade-request flow). I do not see WS auth here, but if any backend behind this ingress uses upgrade-time tokens, it is a real consideration. RFC 8470 explicitly calls out that protocol switches should not be allowed in 0-RTT. Easiest local fix: `if ($ssl_early_data) { return 425; }` inside the WS location, since `ssl_early_data` itself is only valid in `server` context.

Smaller things

4. gRPC mode. `PROXY_CMD=grpc` is set when `TARGET_ENDPOINT` starts with `grpc://`. All gRPC calls are POSTs over HTTP/2, and per-call idempotency is service-defined — so the same Early-Data forwarding rationale applies but backend-side mitigation is harder (most gRPC frameworks do not expose request headers to service code in a uniform way). Worth adding to the follow-up audit list if any gRPC backends sit behind this ingress.

5. The "strictly speaking idempotent" claim in the PR description. `POST /v1/chat/completions` is not idempotent per HTTP semantics (RFC 9110 §9.2.2) — POST is one of the methods explicitly defined as non-idempotent. I think you mean "logically idempotent in the application sense" — the underlying inference is a pure function, but billing, audit logging, and rate-limit counters are side effects. Worth phrasing precisely in the PR description so a reviewer does not conclude it is safe to replay.

6. Verification step naming. In the test plan, "Session-ID reused" is a TLS 1.2 concept (Session ID from RFC 5246). TLS 1.3 does not have a session ID in the same sense — it uses session tickets / PSK identity. `openssl s_client -reconnect` against a TLS 1.3 server will show `Reused, TLSv1.3` (which the next bullet already calls out correctly). Minor wording fix only.

7. Empty-value header behavior — confirm this is intentional. `proxy_set_header Early-Data $ssl_early_data;` will set `Early-Data: 1` when 0-RTT is used and omit the header entirely when it is not (nginx drops empty proxy_set_header values). That matches RFC 8470 §5.1, which is the right behavior — but worth a one-line code comment because the alternative ("always set, value empty when not 0-RTT") would look identical in the diff and be wrong. The existing comment says what the header is for but not that it intentionally relies on nginx's empty-value-elision behavior.

Test coverage

The repo does not have an automated test path for the generated nginx configs (the only checks are `bash -n` and a manual `nginx -t` matrix per the PR description). Pre-existing limitation, not introduced here. For a TLS-config PR specifically, a regression test that:

runs `entrypoint.sh` / `generate-nginx-upstream.sh` with each combination of env vars,
greps the output for the expected directives,

would be cheap and prevent quiet regressions. Not in scope for this PR but worth a follow-up if these scripts keep accumulating SSL config.

Things this PR does well

Narrow scope, clear rollback path (one-line flip back to `off`).
All four config-generation modes manually exercised against `nginx -t`.
Risk analysis in the PR description names the specific backends and the specific replay surface, instead of generic hand-waving.
Doesn't conflict with the parallel HTTP/3 PR — the scoping choice is good.
The deployment note ("this PR does not initiate rollout") is exactly the kind of guard that prevents accidental fleet-wide rollouts.

Overall: I'd be comfortable merging this, with a preference for landing the `chat-api` Early-Data check first if that's possible without much extra effort. Otherwise the follow-up audit tickets should be filed before this lands so the sequencing is tracked.

🤖 Generated with Claude Code

Evrard-Nil · 2026-05-13T17:10:14Z

Scope note (post-investigation): this PR affects chat-api (private.near.ai, agent.near.ai) and inference CVMs (vllm-ingress), which terminate TLS via nearaidev/dstack-ingress-vpc. It does not affect cloud-api.near.ai — that uses a different ingress (nearaidev/cvm-ingress, sidecar pattern). For TLS-handshake wins on cloud-api, a parallel PR against nearai/cvm-ingress is needed.

Evrard-Nil mentioned this pull request May 13, 2026

perf: enable TLS session tickets + 0-RTT in cvm-ingress nearai/cvm-ingress#7

Merged

Evrard-Nil requested a review from think-in-universe May 18, 2026 13:42

Evrard-Nil closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: enable TLS session resumption and 0-RTT in ingress nginx#17

perf: enable TLS session resumption and 0-RTT in ingress nginx#17
Evrard-Nil wants to merge 1 commit into
mainfrom
perf/tls-session-resumption

Evrard-Nil commented May 13, 2026

Uh oh!

claude Bot commented May 13, 2026

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Evrard-Nil commented May 13, 2026

Summary

Why

0-RTT replay risk

Scope

Deployment

Verification

Test plan

Uh oh!

claude Bot commented May 13, 2026

Review: TLS session resumption + 0-RTT

Higher-priority concerns

Smaller things

Test coverage

Things this PR does well

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant