Commit bad4dca
committed
fix: pick multi-node TP head by has_service, not first peer
In a multi-node TP replica only rank-0 registers the `llm` service; the
other ranks run as background workers and their OCFs stay status=pending
forever. The frontend was picking the head as the first peer whose id
matched the model id — but every peer in the group shares that id (from
the served_model_name label), so rank-N could win and the whole replica
would render as pending despite serving traffic fine.
Surface `has_service` on each peer entry from the backend and prefer it
in the frontend's head selection. Same change also makes the expanded-
view "head" label match the node sglang actually runs the API server on.1 parent 70fe01d commit bad4dca
3 files changed
Lines changed: 18 additions & 2 deletions
File tree
- backend
- services
- tests
- frontend/src/components/ui
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
| |||
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
129 | 136 | | |
130 | 137 | | |
131 | 138 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
62 | 69 | | |
63 | 70 | | |
64 | 71 | | |
| |||
0 commit comments