ci(bfcl): add GLM-5.2-FP8 nightly leg (TP=8 sequential, whole node) by key4ng · Pull Request #1834 · lightseekorg/smg

key4ng · 2026-06-23T22:27:42Z

Description

Problem

The nightly BFCL A/B (SMG parsing frontend vs. pure vLLM) had no GLM-5 family leg. GLM-5.2 ships an <arg_key>/<arg_value> tool-call format and <think> reasoning, so it's a useful parser-correctness check — but its FP8 checkpoint is far larger than the existing Blackwell legs and cannot be slotted in as-is.

Solution

Add a glm-5.2 leg to the nightly BFCL matrix.

Why it can't reuse the concurrent half-node pattern: GLM-5.2-FP8 is ~744 GB of weights (753B-param MoE). A B200 has ~180–192 GB, so 4× B200 (720–768 GB) cannot even hold the weights with working room — it does not fit a TP=4 half-node. vLLM's official recipe requires 8× B200 / TP=8. The existing Blackwell legs (DeepSeek-V4-Flash, MiniMax-M2.7, Kimi-K2.6-int4) all fit TP=4 and run two arms concurrently on GPUs 0-3 + 4-7; GLM-5.2 needs the whole node per arm.

So this is the first leg to use the workflow's existing sequential arm_mode: arm A (pure vLLM) on all 8 GPUs → score → tear down → arm B (SMG→vLLM gRPC) on all 8 → score → diff. The run_ab.py sequential path (--score-arm / --diff-baseline / --diff-candidate) was already implemented; no script changes needed.

Parsers follow vLLM's GLM-5.2 recipe — glm47 / glm45 for the vLLM arm, glm47_moe / glm45 for SMG (passed explicitly because the org-prefixed served name zai-org/GLM-5.2-FP8 won't match SMG's glm-5* auto-detect). --kv-cache-dtype fp8_e4m3 per the B200 recipe; MTP speculative decoding omitted (perf-only; the A/B isolates parsing).

Changes

Add glm-5.2 leg (TP=8, arm_mode: sequential, whole 8-GPU node, glm47/glm47_moe + glm45, --kv-cache-dtype fp8_e4m3, startup_timeout: 3000) and list it in the only dispatch input.
Bump job timeout-minutes 240 → 360 (sequential runs two whole-node arms serially, ~2× the concurrent legs). Ceiling only; fast legs finish early.
Move build-wheel to k8s-runner-cpu (CPU-only maturin compile — no GPU needed).

Test Plan

python -c "import yaml; yaml.safe_load(...)" — workflow parses.
Executed the embedded matrix-builder Python: produces all 6 legs incl. glm-5.2 with TP=8 / sequential / correct parsers / whole-node gpu_a; only=glm-5.2 selects exactly that leg.
actionlint — clean apart from the pre-existing custom self-hosted runner-label warnings.
Verified run_ab.py already implements the sequential --score-arm / --diff-baseline / --diff-candidate flags the leg invokes.

Full validation runs on the nightly schedule / workflow_dispatch (requires the Blackwell runner + staged weights).

Checklist

cargo +nightly fmt passes (no Rust changes)
cargo clippy --all-targets --all-features -- -D warnings passes (no Rust changes)
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

Chores
- Updated the nightly workflow dispatch input description to recognize an additional supported matrix leg name.
- Switched the wheel build job to a CPU-based runner to improve build resource alignment.
- Extended the BFCL test matrix with a new glm-5.2 sequential leg and related runtime configuration.
- Increased the bfcl-ab job timeout from 240 to 360 minutes to support longer-running scenarios.

gemini-code-assist · 2026-06-23T22:27:47Z

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

coderabbitai · 2026-06-23T22:28:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c02c419f-f78a-41c1-96da-306eb684c663

📥 Commits

Reviewing files that changed from the base of the PR and between 78c7a12 and 18c23d0.

📒 Files selected for processing (1)

.github/workflows/nightly-bfcl.yml

📝 Walkthrough

Walkthrough

The nightly BFCL workflow gains a glm-5.2 matrix leg running in sequential arm mode with TP=8 across all GPUs (gpu_a only), FP8 KV-cache settings, and an extended startup timeout. The build-wheel job moves to a CPU runner, the bfcl-ab job timeout increases from 240 to 360 minutes, and the workflow_dispatch description is updated to include glm-5.2.

Changes

Nightly BFCL Workflow: glm-5.2 leg and runner adjustments

Layer / File(s)	Summary
build-wheel runner and dispatch description update `.github/workflows/nightly-bfcl.yml`	`build-wheel` job `runs-on` changed from GPU runner to `k8s-runner-cpu`; `workflow_dispatch.only` description updated to include `glm-5.2` in the documented leg names.
glm-5.2 matrix leg definition and bfcl-ab timeout `.github/workflows/nightly-bfcl.yml`	New `glm-5.2` matrix entry added with `arm_mode: sequential`, TP=8 assigned to `gpu_a` (full node), empty `gpu_b`, vLLM/SMG tool parser settings, FP8 KV-cache `vllm_extra`, and increased `startup_timeout`. `bfcl-ab` job `timeout-minutes` raised from 240 to 360.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

lightseekorg/smg#1724: Introduced the nightly BFCL A/B workflow framework that this PR directly extends with the glm-5.2 leg.
lightseekorg/smg#1764: Previously expanded BFCL matrix legs and increased bfcl-ab timeout-minutes in the same workflow file.
lightseekorg/smg#1791: Introduced per-leg matrix-wiring for tool/parser settings that this PR leverages for the glm-5.2 leg configuration.

Suggested reviewers

gongwei-130
CatherineSue
XinyueZhang369
slin1237

Poem

🐇 A new leg hops in — glm-5.2 joins the race,
Eight GPUs marching in sequential grace.
The builder swaps to CPU, light and lean,
Six hours now granted where four had been.
Another matrix entry, another night's test —
The nightly rabbit checks every model's best! 🌙

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title directly and specifically describes the main change: adding a GLM-5.2-FP8 nightly leg with TP=8 sequential execution on whole node.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci/bfcl-glm5.2-leg

_{Comment @coderabbitai help to get the list of available commands.}

claude

Clean CI change — new GLM-5.2-FP8 sequential leg follows the existing matrix pattern correctly. Verified the sequential code path handles empty gpu_b safely, parser assignments match the vLLM recipe, and the build-wheel → CPU runner change is appropriate.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly-bfcl.yml:
- Around line 177-188: In the glm-5.2 configuration block, update the vllm_extra
field to add the --trust-remote-code flag alongside the existing
--kv-cache-dtype fp8_e4m3 argument, as this flag is required by vLLM v0.23.0+ to
properly load GLM-5.2's specialized model architecture from the Hugging Face
hub. Additionally, replace the smg_tool value glm47_moe with the generic glm
parser identifier, which is the current standard in SMG that uniformly handles
GLM-4.7 and GLM-5.x versions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4fe68a2-164e-4907-9769-8e40e0e40962

📥 Commits

Reviewing files that changed from the base of the PR and between 10a085f and e33aa57.

📒 Files selected for processing (1)

.github/workflows/nightly-bfcl.yml

key4ng · 2026-06-24T00:22:42Z

https://github.com/lightseekorg/smg/actions/runs/28064457054/job/83086786670 tested

GLM-5.2-FP8 (~744GB FP8 weights) does not fit a TP=4 half-node, so it runs as the first sequential whole-node leg: arm A on all 8 GPUs, torn down, then arm B. Uses vLLM's GLM-5.2 recipe (glm47/glm45 parsers, --kv-cache-dtype fp8_e4m3); SMG arm uses glm47_moe + glm45 passed explicitly (the org-prefixed served name won't auto-detect). Also bump the job timeout to 360m (sequential runs two whole-node arms serially) and move build-wheel to k8s-runner-cpu (CPU-only compile). Signed-off-by: key4ng <rukeyang@gmail.com>

Signed-off-by: key4ng <rukeyang@gmail.com>

GLM-5.2 needs --trust-remote-code to load its custom architecture, matching the other large MoE legs (deepseek-v4, minimax-m2.7, kimi-k2.6). Signed-off-by: key4ng <rukeyang@gmail.com>

key4ng requested review from CatherineSue, XinyueZhang369 and slin1237 as code owners June 23, 2026 22:27

github-actions Bot added the ci CI/CD configuration changes label Jun 23, 2026

claude Bot approved these changes Jun 23, 2026

View reviewed changes

coderabbitai Bot requested changes Jun 23, 2026

View reviewed changes

Comment thread .github/workflows/nightly-bfcl.yml

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

key4ng mentioned this pull request Jun 24, 2026

ci(benchmarks): bump genai-bench timeout 480s -> 600s #1838

Merged

4 tasks

key4ng added 3 commits June 23, 2026 19:29

ci(bfcl): make GLM-5.2 leg comment concise

02c170a

Signed-off-by: key4ng <rukeyang@gmail.com>

ci(bfcl): add --trust-remote-code to GLM-5.2 vllm_extra

18c23d0

GLM-5.2 needs --trust-remote-code to load its custom architecture, matching the other large MoE legs (deepseek-v4, minimax-m2.7, kimi-k2.6). Signed-off-by: key4ng <rukeyang@gmail.com>

key4ng force-pushed the ci/bfcl-glm5.2-leg branch from 78c7a12 to 18c23d0 Compare June 24, 2026 02:29

key4ng merged commit 25fb974 into main Jun 24, 2026
30 checks passed

key4ng deleted the ci/bfcl-glm5.2-leg branch June 24, 2026 03:13

coderabbitai Bot mentioned this pull request Jun 24, 2026

ci(bfcl): set max_model_len=auto and capture both arms' transcripts #1840

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(bfcl): add GLM-5.2-FP8 nightly leg (TP=8 sequential, whole node)#1834

ci(bfcl): add GLM-5.2-FP8 nightly leg (TP=8 sequential, whole node)#1834
key4ng merged 3 commits into
mainfrom
ci/bfcl-glm5.2-leg

key4ng commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

gemini-code-assist Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

claude Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

key4ng commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

key4ng commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

key4ng commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

key4ng commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading