Skip to content

Commit f4b8fa1

Browse files
authored
ui: right-align card buttons + add PENDING peer to dummy-run fixture (#26)
* fix: surface PENDING peers under their served_model_name during boot Before: when a launching peer is still in PENDING (no service advertised yet), get_all_models surfaced it with id="" and worker_group_id set. The frontend (ModelList.svelte) builds wgToModel from peers that already carry an id, then drops any remaining id="" peer whose worker_group_id doesn't appear in that map. During the brief PENDING window every peer in the worker group is service-less, so wgToModel is empty for that group and the replica is silently filtered out. By the time we COULD render it, registrar.go flips status from PENDING to READY and advertises the service in the same step — so PENDING is never actually visible on the dashboard. After: fall back to labels.served_model_name (already emitted by model-launch's _ocf_labels on every peer) when synthesising the no-service entry. The peer now has a real model id during boot, the frontend's grouping succeeds, and the status pill renders "pending" until the health check passes. Tests updated: the multi-node-replica grouping test previously asserted the follower kept id="". With served_model_name on every peer, both peers in the group now resolve to the same id; we still verify the shared worker_group_id keeps them in one replica. Added a defensive test for the older-binary case (no served_model_name label) where the id stays empty as before. * ui: right-align action buttons + pending peer example fixture Right-align the OpenWebUI + Metrics Dashboard buttons inside the expanded model card (`justify-end` on the flex row). Matches the in-card details which are right-aligned by design. Add a synthetic PENDING peer to the dummy-run fixture so /make dummy-run shows what a booting model looks like — status: "pending", service: [], served_model_name carried in labels. Hostname + peer id are synthetic but realistic; framework_args resembles a 70B vLLM launch. * fixture: swap synthetic pending peer for a real one Caught the dev mesh while a fresh sml launch (job 2297439, --dev3) was still in OCF-PENDING — service: [], status: "pending", labels carry served_model_name. Real shape, real hostname, real peer id; nothing hand-rolled. * ui: tighten extra-labels padding to fit the actual key length The Extra labels pre-block used padEnd(18) — a fixed width copied from the main-labels block where it makes columns line up with header / follower entries. In the extras block there's typically just one entry (framework_args, 14 chars), so the fixed pad inserts 4 extra spaces between key and value with nothing to align them to. Reads like a formatting bug. Compute the pad from the actual keys present + 1. With framework_args alone, that's padEnd(15) — one space between key and value. If more labels show up later, they self-align. * ui: rename "Open in OpenWebUI" to "Swiss AI Chat", reorder buttons - Button text "Open in OpenWebUI" → "Swiss AI Chat". The underlying URL still points at the OpenWebUI deployment, but users see "Swiss AI Chat" which matches the surface-level brand they actually interact with. - Reorder so Metrics Dashboard (secondary, emerald) sits left of Swiss AI Chat (primary, black). Both still right-aligned as a group; the primary action lands at the right edge where the eye finishes scanning the card. * ui: shorten button labels to "Chat" and "Metrics", restore Chat-first order - "Swiss AI Chat" → "Chat" and "Metrics Dashboard" → "Metrics". The expanded card sits below the model title that already says "swiss-ai/..." — repeating "Swiss AI" in the button label is noise, and "Dashboard" doesn't carry meaning past the icon. - Restore Chat-first order so Metrics ends up at the right edge, where it lived before the brief mid-iteration swap. * ui: left-align action buttons (drop justify-end) * ui: prefix topology summary with replica count when > 1 The header line said "on 4 nodes × 4x GH200" for a model with 2 replicas of 4 nodes each — undercounting the actual resources by half. Prepend the replica multiplier so the line describes total commitment: 1 replica, 4 nodes → "4 nodes × 4x GH200" 2 replicas, 4 nodes each → "2 replicas × 4 nodes × 4x GH200" 10 replicas, 1 node each → "10 replicas × 4x GH200" The red ×N chip next to the title still shows the replica count on its own; the topology line now expresses it in resource terms. * format: ruff format on test_model_service.py
1 parent ac83a11 commit f4b8fa1

4 files changed

Lines changed: 117 additions & 20 deletions

File tree

backend/services/model_service.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,13 @@ def get_all_models(endpoint: str, with_details: bool = False):
6363
# worker_group_id and show it as part of a launching/follower set.
6464
if not meta["worker_group_id"]:
6565
continue
66+
# Fall back to the served_model_name label so the frontend can
67+
# group PENDING peers under their eventual model card during boot.
68+
# Without this, the brief PENDING window is invisible because the
69+
# peer has no advertised service yet and nothing else maps its
70+
# worker_group_id back to a model id.
6671
entry = {
67-
"id": "", # no model yet
72+
"id": meta["labels"].get("served_model_name", ""),
6873
"object": "model",
6974
"created": "0x",
7075
"owner": "0x",

backend/tests/fixtures/dnt_table_dev_live.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,61 @@
415415
"status": "ready",
416416
"version": "v0.1.11"
417417
},
418+
"/QmY7FvKB3i6N1yvpkgAZXQCnFmpKR5WJ4MqqGNcLb3tWC5": {
419+
"available_offering": null,
420+
"connected": true,
421+
"current_offering": null,
422+
"hardware": {
423+
"gpus": [
424+
{
425+
"name": "NVIDIA GH200 120GB",
426+
"total_memory": 97871,
427+
"used_memory": 6
428+
},
429+
{
430+
"name": "NVIDIA GH200 120GB",
431+
"total_memory": 97871,
432+
"used_memory": 5
433+
},
434+
{
435+
"name": "NVIDIA GH200 120GB",
436+
"total_memory": 97871,
437+
"used_memory": 5
438+
},
439+
{
440+
"name": "NVIDIA GH200 120GB",
441+
"total_memory": 97871,
442+
"used_memory": 13
443+
}
444+
],
445+
"host_memory": 0,
446+
"host_memory_bandwidth": 0,
447+
"host_memory_used": 0
448+
},
449+
"hostname": "nid007456",
450+
"id": "QmY7FvKB3i6N1yvpkgAZXQCnFmpKR5WJ4MqqGNcLb3tWC5",
451+
"labels": {
452+
"expires_at": "2026-05-19T00:09:35Z",
453+
"framework": "sglang",
454+
"framework_args": "--port 8080 --model-path /capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509 --served-model-name swiss-ai/Apertus-8B-Instruct-2509-rob-dev3 --host 0.0.0.0 --enable-metrics",
455+
"launched_by": "rosmith",
456+
"served_model_name": "swiss-ai/Apertus-8B-Instruct-2509-rob-dev3",
457+
"slurm_job_id": "2297439",
458+
"slurm_partition": "normal",
459+
"started_at": "2026-05-18T18:09:35Z",
460+
"worker_group_id": "2297439"
461+
},
462+
"last_seen": 1779127775,
463+
"latency": 0,
464+
"load": null,
465+
"owner": "",
466+
"privileged": false,
467+
"public_address": "",
468+
"role": null,
469+
"service": null,
470+
"status": "pending",
471+
"version": "dev-9ff5ec9"
472+
},
418473
"/QmbUKJkCfotDzbFE5uoTsXD4GRyPHjzZC1f2yAGLoeBMn9": {
419474
"available_offering": null,
420475
"connected": true,

backend/tests/test_model_service.py

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ def json(self):
5353
"slurm_job_id": "12345",
5454
"worker_group_id": "12345",
5555
"framework": "sglang",
56+
"served_model_name": "swiss-ai/Apertus-8B",
5657
"started_at": "2026-05-15T18:00:00Z",
5758
},
5859
"hardware": {"gpus": [{"name": "GH200"}] * 4},
@@ -100,9 +101,11 @@ def test_new_binary_head_carries_labels():
100101

101102

102103
def test_metrics_only_follower_groups_with_head_via_worker_group_id():
103-
"""A multi-node replica's follower has no `service` but does carry
104-
worker_group_id. It should appear in the output with id='' so the
105-
frontend can attribute it to the same replica as the head."""
104+
"""A peer with no advertised `service` (multi-node follower, or a head
105+
still in PENDING during boot) should fall back to its served_model_name
106+
label so the frontend can render the model card during the brief window
107+
before the service is published. Without the fallback, the peer's id
108+
stays empty and the frontend silently drops it."""
106109
with patch("backend.services.model_service.requests.get") as mock_get:
107110
mock_get.return_value = _dnt_response(
108111
{
@@ -114,15 +117,38 @@ def test_metrics_only_follower_groups_with_head_via_worker_group_id():
114117
assert len(out) == 2
115118
by_id = {e["peer_id"]: e for e in out}
116119
assert by_id["QmHead"]["id"] == "swiss-ai/Apertus-8B"
117-
assert by_id["QmFollower"]["id"] == ""
118-
# Shared worker_group_id lets the frontend group them.
120+
# Follower inherits id from the served_model_name label — same model card.
121+
assert by_id["QmFollower"]["id"] == "swiss-ai/Apertus-8B"
122+
assert by_id["QmFollower"]["status"] == "pending"
123+
# Shared worker_group_id lets the frontend group them within the model.
119124
assert (
120125
by_id["QmHead"]["worker_group_id"]
121126
== by_id["QmFollower"]["worker_group_id"]
122127
== "12345"
123128
)
124129

125130

131+
def test_pending_peer_without_served_model_name_label_falls_back_to_empty_id():
132+
"""Defensive: if a peer is mid-boot from an older binary that doesn't
133+
emit served_model_name, we still surface it via worker_group_id with
134+
id=''. The frontend then needs another peer in the same group with an
135+
id to attribute it; otherwise it's dropped."""
136+
peer = {
137+
**PEER_NEW_BINARY_FOLLOWER,
138+
"labels": {
139+
k: v
140+
for k, v in PEER_NEW_BINARY_FOLLOWER["labels"].items()
141+
if k != "served_model_name"
142+
},
143+
}
144+
with patch("backend.services.model_service.requests.get") as mock_get:
145+
mock_get.return_value = _dnt_response({"/QmPending": peer})
146+
out = get_all_models("http://x/v1/dnt/table", with_details=True)
147+
assert len(out) == 1
148+
assert out[0]["id"] == ""
149+
assert out[0]["worker_group_id"] == "12345"
150+
151+
126152
def test_follower_without_worker_group_id_skipped():
127153
"""Older binary follower with no labels and no service is uninformative —
128154
drop it so the model list stays clean."""
@@ -196,9 +222,10 @@ def test_real_prod_payload_returns_models():
196222

197223
def test_upgraded_payload_groups_multinode_replica():
198224
"""Simulated v0.0.6 deployment: the gemma 'multi-node demo' pair share a
199-
worker_group_id. One has a service, the other is metrics-only with id=''.
200-
Backend returns both entries with the shared worker_group_id so the
201-
frontend can aggregate them into one logical replica."""
225+
worker_group_id. Both peers carry the served_model_name label, so both
226+
resolve to the same model id even though only one advertises a service.
227+
Backend returns both entries with the shared worker_group_id + model id
228+
so the frontend can aggregate them into one logical replica."""
202229
with patch("backend.services.model_service.requests.get") as mock_get:
203230
mock_get.return_value = type(
204231
"R",
@@ -212,7 +239,8 @@ def test_upgraded_payload_groups_multinode_replica():
212239
by_wg.setdefault(e["worker_group_id"], []).append(e)
213240
multi = [v for v in by_wg.values() if len(v) > 1]
214241
assert multi, "fixture should contain at least one multi-peer worker group"
215-
# At least one peer in the multi-peer group should be metrics-only (id='').
216242
pair = multi[0]
217-
assert any(e["id"] == "" for e in pair), pair
218-
assert any(e["id"] != "" for e in pair), pair
243+
# Both peers in the group share the same non-empty model id.
244+
ids = {e["id"] for e in pair}
245+
assert ids != {""}, pair
246+
assert len(ids) == 1, f"peers in one worker group should share one model id: {ids}"

frontend/src/components/ui/ModelCard.svelte

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,13 +83,21 @@
8383
8484
// Header summary across all replicas of this model. If every replica has
8585
// the same per-replica topology (almost always true: a model is launched
86-
// with one shape), show it once. Otherwise admit ambiguity rather than
87-
// pick one to display.
86+
// with one shape), show it with the replica multiplier prefixed when
87+
// there's more than one. Otherwise admit ambiguity rather than pick one
88+
// to display.
89+
//
90+
// 1 replica, 1 node → "4x NVIDIA GH200 120GB"
91+
// 1 replica, 4 nodes → "4 nodes × 4x NVIDIA GH200 120GB"
92+
// 2 replicas, 4 nodes each → "2 replicas × 4 nodes × 4x NVIDIA GH200 120GB"
93+
// replicas with differing shapes → "Various"
8894
function topologySummary(replicas: Replica[]): string {
8995
if (replicas.length === 0) return "unknown";
9096
const distinct = new Set(replicas.map(topologyString));
91-
if (distinct.size === 1) return [...distinct][0];
92-
return "Various";
97+
if (distinct.size !== 1) return "Various";
98+
const perReplica = [...distinct][0];
99+
if (replicas.length === 1) return perReplica;
100+
return `${replicas.length} replicas × ${perReplica}`;
93101
}
94102
95103
async function copyModelName(e: Event) {
@@ -187,7 +195,7 @@
187195
on:keydown|stopPropagation
188196
role="region"
189197
>
190-
<!-- Action buttons (what clicking the card used to do, plus metrics) -->
198+
<!-- Action buttons: Chat (primary) + Metrics, left-aligned. -->
191199
<div class="flex flex-wrap gap-2">
192200
<a
193201
href={chatUrl}
@@ -200,7 +208,7 @@
200208
<polyline points="15 3 21 3 21 9"></polyline>
201209
<line x1="10" y1="14" x2="21" y2="3"></line>
202210
</svg>
203-
Open in OpenWebUI
211+
Chat
204212
</a>
205213
{#if metricsUrl}
206214
<a
@@ -213,7 +221,7 @@
213221
<path d="M3 3v18h18"></path>
214222
<path d="M7 15l4-4 4 4 5-5"></path>
215223
</svg>
216-
Metrics Dashboard
224+
Metrics
217225
</a>
218226
{/if}
219227
</div>
@@ -272,8 +280,9 @@
272280
!["launched_by","slurm_job_id","worker_group_id","framework","started_at","expires_at","slurm_partition","served_model_name"].includes(k)
273281
)}
274282
{#if extra.length > 0}
283+
{@const pad = Math.max(...extra.map(([k]) => k.length)) + 1}
275284
<div class="text-xs text-slate-500 dark:text-slate-400 mt-2 mb-1">Extra labels</div>
276-
<pre class="code-block">{extra.map(([k, v]) => `${k.padEnd(18)} ${v}`).join("\n")}</pre>
285+
<pre class="code-block">{extra.map(([k, v]) => `${k.padEnd(pad)} ${v}`).join("\n")}</pre>
277286
{/if}
278287
{/if}
279288
</div>

0 commit comments

Comments
 (0)