Commit ca6cc73
authored
feat: rotation-SNI discovery + rapid-eviction pin set (#603)
* feat: rotation-SNI discovery + rapid-eviction pin set
cumulative discovery used to stall on multi-backend models because the
proxy's least-connections LB collapses fresh-TCP probes onto a stable
subset of backends. We worked around this with parallelism (5 calls per
new provider, 2 per refresh cycle) and inter-model staggering, but the
shape was fundamentally O(luck): some replicas kept missing some
backends forever and got TLS-handshake-rejected when the LB later
routed them there. The customer-visible symptom is ~42% of
/v1/attestation/report calls for GLM-5.1-FP8 failing with
"error sending request".
model-proxy PR #27 published a deterministic routing knob: rotation SNI
'<canonical>-i<N>.<base>' routes to 'healthy_backends_sorted[N % healthy]',
and GET /backends/count?domain=<host> reports the current healthy count.
This PR rewrites discover_model on top of those two pieces.
Per-cycle flow:
- Fetch the healthy backend count from /backends/count.
- Fan out one fresh-TCP attestation call per backend index, in parallel,
no stagger. Each call lands on a distinct backend by construction, so
per-backend GPU evidence pressure per cycle is exactly one attestation
regardless of how many models refresh together.
- Apply the cycle's verified fingerprints to the shared pin set
according to apply_pin_update():
* Complete coverage (no failures, verify_failures == 0, distinct
observed fingerprints == backend_count): REPLACE the pin set with
the observed set. A backend that just went unhealthy or had its
cert rotated drops out of the pin set within one refresh
interval — rapid eviction.
* Anything less: additive merge. A transient hiccup never evicts
verified fingerprints we just couldn't reconfirm.
Eliminates:
- ATTESTATION_DISCOVERY_PARALLELISM (was 5)
- CUMULATIVE_DISCOVERY_CALLS (was 2)
- STAGGER_MS (intra-model, was 200)
- MODEL_DISCOVERY_STAGGER_MS (inter-model, was 2_000)
discover_model loses the num_calls parameter. Both call sites (the
new-provider phase in load_inference_url_models and the cumulative
refresh path) become identical.
DiscoveryOutcome gains:
- backend_count: healthy count from /backends/count this cycle, 0 if
the fetch failed (failure_reasons[0] then carries the reason).
- replaced_state: true iff this cycle achieved complete coverage and
the pin set was wholesale replaced rather than additively merged.
Both fields are surfaced on the existing INFO log lines (initial
discovery, cumulative expansion, cumulative no-new-fingerprints) for
DD-side observability.
URL handling derives the base domain by stripping the leftmost DNS
label of the inference URL host. Works for every URL we have today
('*.completions{,-stg}.near.ai'); URLs that don't fit (one-label hosts,
IP literals) return an empty outcome with a 'url_parse:' failure
reason and the existing fail-closed path handles eviction.
Tests:
- spki_verifier: replace_with state transitions (Bootstrap->Pinned,
Pinned shrink, Blocked->Pinned recovery, empty set).
- rotation: 10 URL-helper tests covering canonicals with internal
dashes, case insensitivity, port preservation, IP/one-label
rejection, count-URL shape.
- inference_provider_pool: 8 apply_pin_update policy tests covering
steady state, eviction on shrinking count, partial-cycle additive
preservation, duplicate-observation safety, verify_failure
blocking replacement, zero-count safety, bootstrap first cycle.
Followup #600: rotation SNI for chat-completion bucket pre-warm.
* review: distinguish count_zero, cap fan-out, redact reqwest URLs
Address bot review feedback on #603:
- count_zero vs count-fetch-failure are now distinguishable in
failure_reasons. Previously both rendered as empty / generic
count_*:; now Ok(0) records 'count_zero: proxy reports 0 healthy
backends' explicitly.
- Sanity-cap rotation fan-out at 256 backends per model per cycle.
A bogus registry reading (race during deploy, partial split) would
otherwise spawn an unbounded number of fresh-TCP TLS handshakes.
Hitting the cap is logged and recorded in failure_reasons.
- Strip the request URL from every reqwest error in failure_reasons
via Error::without_url(). The URLs embed our random per-call nonce,
which would otherwise create unbounded label cardinality in DD when
any reqwest error path fires. Full error stays available at DEBUG
via the existing debug! lines.
- pin_update_verify_failure_blocks_replacement test now uses an input
shape that the production caller can actually produce
(backend_count=4, verified=3, verify_failures=1). The policy
assertion is unchanged.1 parent 5a81e72 commit ca6cc73
4 files changed
Lines changed: 768 additions & 162 deletions
File tree
- crates
- inference_providers/src
- services/src
- attestation
- inference_provider_pool
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
60 | 78 | | |
61 | 79 | | |
62 | 80 | | |
| |||
266 | 284 | | |
267 | 285 | | |
268 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
269 | 348 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | 11 | | |
47 | 12 | | |
48 | 13 | | |
| |||
0 commit comments