Skip to content

ci(bfcl): add GLM-5.2-FP8 nightly leg (TP=8 sequential, whole node)#1834

Merged
key4ng merged 3 commits into
mainfrom
ci/bfcl-glm5.2-leg
Jun 24, 2026
Merged

ci(bfcl): add GLM-5.2-FP8 nightly leg (TP=8 sequential, whole node)#1834
key4ng merged 3 commits into
mainfrom
ci/bfcl-glm5.2-leg

Conversation

@key4ng

@key4ng key4ng commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Description

Problem

The nightly BFCL A/B (SMG parsing frontend vs. pure vLLM) had no GLM-5 family leg. GLM-5.2 ships an <arg_key>/<arg_value> tool-call format and <think> reasoning, so it's a useful parser-correctness check — but its FP8 checkpoint is far larger than the existing Blackwell legs and cannot be slotted in as-is.

Solution

Add a glm-5.2 leg to the nightly BFCL matrix.

Why it can't reuse the concurrent half-node pattern: GLM-5.2-FP8 is ~744 GB of weights (753B-param MoE). A B200 has ~180–192 GB, so 4× B200 (720–768 GB) cannot even hold the weights with working room — it does not fit a TP=4 half-node. vLLM's official recipe requires 8× B200 / TP=8. The existing Blackwell legs (DeepSeek-V4-Flash, MiniMax-M2.7, Kimi-K2.6-int4) all fit TP=4 and run two arms concurrently on GPUs 0-3 + 4-7; GLM-5.2 needs the whole node per arm.

So this is the first leg to use the workflow's existing sequential arm_mode: arm A (pure vLLM) on all 8 GPUs → score → tear down → arm B (SMG→vLLM gRPC) on all 8 → score → diff. The run_ab.py sequential path (--score-arm / --diff-baseline / --diff-candidate) was already implemented; no script changes needed.

Parsers follow vLLM's GLM-5.2 recipe — glm47 / glm45 for the vLLM arm, glm47_moe / glm45 for SMG (passed explicitly because the org-prefixed served name zai-org/GLM-5.2-FP8 won't match SMG's glm-5* auto-detect). --kv-cache-dtype fp8_e4m3 per the B200 recipe; MTP speculative decoding omitted (perf-only; the A/B isolates parsing).

Changes

  • Add glm-5.2 leg (TP=8, arm_mode: sequential, whole 8-GPU node, glm47/glm47_moe + glm45, --kv-cache-dtype fp8_e4m3, startup_timeout: 3000) and list it in the only dispatch input.
  • Bump job timeout-minutes 240 → 360 (sequential runs two whole-node arms serially, ~2× the concurrent legs). Ceiling only; fast legs finish early.
  • Move build-wheel to k8s-runner-cpu (CPU-only maturin compile — no GPU needed).

Test Plan

  • python -c "import yaml; yaml.safe_load(...)" — workflow parses.
  • Executed the embedded matrix-builder Python: produces all 6 legs incl. glm-5.2 with TP=8 / sequential / correct parsers / whole-node gpu_a; only=glm-5.2 selects exactly that leg.
  • actionlint — clean apart from the pre-existing custom self-hosted runner-label warnings.
  • Verified run_ab.py already implements the sequential --score-arm / --diff-baseline / --diff-candidate flags the leg invokes.

Full validation runs on the nightly schedule / workflow_dispatch (requires the Blackwell runner + staged weights).

Checklist
  • cargo +nightly fmt passes (no Rust changes)
  • cargo clippy --all-targets --all-features -- -D warnings passes (no Rust changes)
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • Chores
    • Updated the nightly workflow dispatch input description to recognize an additional supported matrix leg name.
    • Switched the wheel build job to a CPU-based runner to improve build resource alignment.
    • Extended the BFCL test matrix with a new glm-5.2 sequential leg and related runtime configuration.
    • Increased the bfcl-ab job timeout from 240 to 360 minutes to support longer-running scenarios.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@github-actions github-actions Bot added the ci CI/CD configuration changes label Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c02c419f-f78a-41c1-96da-306eb684c663

📥 Commits

Reviewing files that changed from the base of the PR and between 78c7a12 and 18c23d0.

📒 Files selected for processing (1)
  • .github/workflows/nightly-bfcl.yml

📝 Walkthrough

Walkthrough

The nightly BFCL workflow gains a glm-5.2 matrix leg running in sequential arm mode with TP=8 across all GPUs (gpu_a only), FP8 KV-cache settings, and an extended startup timeout. The build-wheel job moves to a CPU runner, the bfcl-ab job timeout increases from 240 to 360 minutes, and the workflow_dispatch description is updated to include glm-5.2.

Changes

Nightly BFCL Workflow: glm-5.2 leg and runner adjustments

Layer / File(s) Summary
build-wheel runner and dispatch description update
.github/workflows/nightly-bfcl.yml
build-wheel job runs-on changed from GPU runner to k8s-runner-cpu; workflow_dispatch.only description updated to include glm-5.2 in the documented leg names.
glm-5.2 matrix leg definition and bfcl-ab timeout
.github/workflows/nightly-bfcl.yml
New glm-5.2 matrix entry added with arm_mode: sequential, TP=8 assigned to gpu_a (full node), empty gpu_b, vLLM/SMG tool parser settings, FP8 KV-cache vllm_extra, and increased startup_timeout. bfcl-ab job timeout-minutes raised from 240 to 360.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • lightseekorg/smg#1724: Introduced the nightly BFCL A/B workflow framework that this PR directly extends with the glm-5.2 leg.
  • lightseekorg/smg#1764: Previously expanded BFCL matrix legs and increased bfcl-ab timeout-minutes in the same workflow file.
  • lightseekorg/smg#1791: Introduced per-leg matrix-wiring for tool/parser settings that this PR leverages for the glm-5.2 leg configuration.

Suggested reviewers

  • gongwei-130
  • CatherineSue
  • XinyueZhang369
  • slin1237

Poem

🐇 A new leg hops in — glm-5.2 joins the race,
Eight GPUs marching in sequential grace.
The builder swaps to CPU, light and lean,
Six hours now granted where four had been.
Another matrix entry, another night's test —
The nightly rabbit checks every model's best! 🌙

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title directly and specifically describes the main change: adding a GLM-5.2-FP8 nightly leg with TP=8 sequential execution on whole node.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/bfcl-glm5.2-leg

Comment @coderabbitai help to get the list of available commands.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean CI change — new GLM-5.2-FP8 sequential leg follows the existing matrix pattern correctly. Verified the sequential code path handles empty gpu_b safely, parser assignments match the vLLM recipe, and the build-wheel → CPU runner change is appropriate.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly-bfcl.yml:
- Around line 177-188: In the glm-5.2 configuration block, update the vllm_extra
field to add the --trust-remote-code flag alongside the existing
--kv-cache-dtype fp8_e4m3 argument, as this flag is required by vLLM v0.23.0+ to
properly load GLM-5.2's specialized model architecture from the Hugging Face
hub. Additionally, replace the smg_tool value glm47_moe with the generic glm
parser identifier, which is the current standard in SMG that uniformly handles
GLM-4.7 and GLM-5.x versions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4fe68a2-164e-4907-9769-8e40e0e40962

📥 Commits

Reviewing files that changed from the base of the PR and between 10a085f and e33aa57.

📒 Files selected for processing (1)
  • .github/workflows/nightly-bfcl.yml

Comment thread .github/workflows/nightly-bfcl.yml
@key4ng

key4ng commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

key4ng added 3 commits June 23, 2026 19:29
GLM-5.2-FP8 (~744GB FP8 weights) does not fit a TP=4 half-node, so it
runs as the first sequential whole-node leg: arm A on all 8 GPUs, torn
down, then arm B. Uses vLLM's GLM-5.2 recipe (glm47/glm45 parsers,
--kv-cache-dtype fp8_e4m3); SMG arm uses glm47_moe + glm45 passed
explicitly (the org-prefixed served name won't auto-detect).

Also bump the job timeout to 360m (sequential runs two whole-node arms
serially) and move build-wheel to k8s-runner-cpu (CPU-only compile).

Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
GLM-5.2 needs --trust-remote-code to load its custom architecture, matching
the other large MoE legs (deepseek-v4, minimax-m2.7, kimi-k2.6).

Signed-off-by: key4ng <rukeyang@gmail.com>
@key4ng key4ng force-pushed the ci/bfcl-glm5.2-leg branch from 78c7a12 to 18c23d0 Compare June 24, 2026 02:29
@key4ng key4ng merged commit 25fb974 into main Jun 24, 2026
30 checks passed
@key4ng key4ng deleted the ci/bfcl-glm5.2-leg branch June 24, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant