Commit e3970a9
training(eliza-1): wire qwen3.5-27b SFT launch path + extend stage_base_v1_candidate to 27b
Lands the missing pieces so the eliza-1-27b cloud SFT + HF publish chain is
addressable from train_vast.sh and stage_base_v1_candidate.py. No live run
launched here — the swarm's 0_8b path on Nebius is still working through
retries, and the only currently-listed Vast B200-1x offer is $7.5/hr ×
~50h projected wall = ~$375 (over the operator's $200 cap). Nebius H200-1x
(per the registry entry's extras={"nebius_machine": "H200-1x"}) remains the
cheaper launch target once the swarm proves the path end-to-end.
scripts/lib/vast.py:
- Add b200-1x target (1× NVIDIA B200, ≈183 GB, min_per_gpu_ram_gb=170).
Cheapest single-GPU fit on Vast for qwen3.5-27b's 130 GB working-set
budget at seq=32k with apollo_mini + grad-ckpt + Liger CE.
scripts/train_vast.sh:
- Wire qwen3.5-27b → b200-1x, FSDP_WORLD_SIZE=1 (was falling through to
the catch-all blackwell6000-2x default; that worked but projected ~$458
/ 208h at the current $/hr, badly off the registry's intent).
- qwen3.6-27b legacy stays on b200-2x (still the right target for the
larger 64k-seq context budget on that backbone).
scripts/publish/stage_base_v1_candidate.py:
- Add "27b" to --tier choices, REQUIRED_KERNELS_BY_TIER, RAM_BUDGET_MB
(mirrors eliza1_manifest.REQUIRED_KERNELS_BY_TIER["27b"] +
publish/orchestrator.RAM_BUDGET_BY_TIER["27b"]).
- Introduce QWEN3_PARAMS_BY_TIER lookup; replace the two hardcoded
`'1.7B' if tier=='1_7b' else '0.6B'` ternaries with the lookup so the
lineage/provenance/README blocks render correctly for 27b.
- Make the cuda/rocm kernel-verify caveats tier-aware: for 27b both are
tier-supported (per SUPPORTED_BACKENDS_BY_TIER["27b"]); rocm stays
"skipped/needs-hardware" because the build host has no AMD GPU.
- Make the drafter target-meta note tier-aware (no longer hardcoded to
"Upstream Qwen3-0.6B GGUF used as the DFlash drafter for the 1.7B
target").
- Document the Q8_0 voice-asset gap: VOICE_QUANT_BY_TIER["27b"]="Q8_0"
but elizaos/eliza-1-assets only carries Q4_K_M voice GGUFs under 1_7b/.
The candidate bundle still stages with Q4_K_M (installable but the
orchestrator's release-gate stays red until Q8_0 OmniVoice GGUFs are
derived and pushed to elizaos/eliza-1-assets/27b/).
Verified:
- scripts/publish/test_orchestrator.py: 36/36 pass.
- scripts/test_backends_vast.py + scripts/test_vast_budget.py: 26/26 pass.
- bash scripts/train_vast.sh provision-and-train --registry-key qwen3.5-27b
--dry-run: resolves gpu_target=b200-1x, world_size=1 (was b200-2x default).
- python -m scripts.publish.stage_base_v1_candidate --help: accepts --tier 27b.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent d50d44e commit e3970a9
3 files changed
Lines changed: 68 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
118 | 125 | | |
119 | 126 | | |
120 | 127 | | |
| |||
Lines changed: 53 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
47 | 63 | | |
48 | 64 | | |
49 | 65 | | |
| |||
90 | 106 | | |
91 | 107 | | |
92 | 108 | | |
93 | | - | |
| 109 | + | |
94 | 110 | | |
95 | 111 | | |
96 | 112 | | |
| |||
171 | 187 | | |
172 | 188 | | |
173 | 189 | | |
174 | | - | |
175 | | - | |
176 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
177 | 197 | | |
178 | 198 | | |
179 | 199 | | |
| |||
184 | 204 | | |
185 | 205 | | |
186 | 206 | | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
187 | 216 | | |
188 | 217 | | |
189 | 218 | | |
| |||
288 | 317 | | |
289 | 318 | | |
290 | 319 | | |
| 320 | + | |
291 | 321 | | |
292 | 322 | | |
293 | | - | |
| 323 | + | |
294 | 324 | | |
295 | 325 | | |
296 | 326 | | |
| |||
319 | 349 | | |
320 | 350 | | |
321 | 351 | | |
322 | | - | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
323 | 360 | | |
324 | 361 | | |
325 | 362 | | |
326 | 363 | | |
327 | 364 | | |
328 | 365 | | |
329 | 366 | | |
330 | | - | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
331 | 375 | | |
332 | 376 | | |
333 | 377 | | |
| |||
337 | 381 | | |
338 | 382 | | |
339 | 383 | | |
340 | | - | |
| 384 | + | |
341 | 385 | | |
342 | 386 | | |
343 | 387 | | |
| |||
443 | 487 | | |
444 | 488 | | |
445 | 489 | | |
446 | | - | |
| 490 | + | |
447 | 491 | | |
448 | 492 | | |
449 | 493 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
251 | 259 | | |
252 | 260 | | |
253 | 261 | | |
| |||
0 commit comments