Commit cc680aa
Dx local safety and pending status (#29)
* dev safety + launched_by-driven tier + pending status
Frontend
- Tier (24/7 vs Slurm badge) now derived from the peer's launched_by
label instead of a hardcoded model list. Persistent launchers
(k8s, cscs_L1) → 24/7; anything else (username from model-launch,
empty) → Slurm. New helper getTierFromLaunchedBy replaces
getModelTier in ModelCard and ModelList.
- Pending status surfaces on the collapsed card via a traffic-light
dot (green/amber+pulsing/grey) AND a muted-grey tile treatment
(grayscale logo+badges, gray-500/400 text, faint background wash).
Amber dot stays vivid against the grey card.
Local-dev safety
- Makefile guards _guard-local-db and _guard-local-api refuse to run
if .env DATABASE_URL or frontend/.env VITE_API_URL points at a
non-localhost host. Closes a foot-gun where prod creds in .env let
`make dummy-run` attempt alembic upgrade head against prod Neon.
- Committed .env.example and frontend/.env.example templates (with
!.env.example in .gitignore so the un-ignore actually works) so a
fresh clone bootstraps cleanly via `make run`.
Fixture
- dnt_table_dev_live.json gains 6 real k8s peer entries pulled from
prod DNT, so `make dummy-run` shows a representative mix of k8s
24/7 models + Slurm jobs (incl. a pending one) for UI iteration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pick multi-node TP head by has_service, not first peer
In a multi-node TP replica only rank-0 registers the `llm` service; the
other ranks run as background workers and their OCFs stay status=pending
forever. The frontend was picking the head as the first peer whose id
matched the model id — but every peer in the group shares that id (from
the served_model_name label), so rank-N could win and the whole replica
would render as pending despite serving traffic fine.
Surface `has_service` on each peer entry from the backend and prefer it
in the frontend's head selection. Same change also makes the expanded-
view "head" label match the node sglang actually runs the API server on.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 1d0ac10 commit cc680aa
3 files changed
Lines changed: 18 additions & 2 deletions
File tree
- backend
- services
- tests
- frontend/src/components/ui
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
| |||
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
129 | 136 | | |
130 | 137 | | |
131 | 138 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
62 | 69 | | |
63 | 70 | | |
64 | 71 | | |
| |||
0 commit comments