fix: save_hf_weights drops boundary shards for MTP-less GLM-4.x glm4_moe_lite export#4189
Open
dinhxuanvu wants to merge 4 commits into
Open
fix: save_hf_weights drops boundary shards for MTP-less GLM-4.x glm4_moe_lite export#4189dinhxuanvu wants to merge 4 commits into
dinhxuanvu wants to merge 4 commits into
Conversation
yaoyu-33
approved these changes
Jun 8, 2026
yaoyu-33
previously approved these changes
Jun 8, 2026
Contributor
|
/ok to test 96b7b68 |
Author
|
Pushed a commit to fix ruff lint issue. |
Contributor
|
/ok to test dafd239 |
dafd239 to
c5acab5
Compare
Author
|
Had to rebase PR due to conflicts from recently merged PRs. Only impacted the test code. |
gautham-kollu
previously approved these changes
Jun 11, 2026
Contributor
|
/ok to test c5acab5 |
Contributor
|
Enabled auto-merge. Once tests pass, should get merged automatically. |
Author
|
The |
Contributor
|
/ok to test fff2d82 |
auto-merge was automatically disabled
June 16, 2026 01:05
Head branch was pushed to by a user without write access
When a Megatron model is built without an MTP head (provider
mtp_num_layers is None or 0, e.g. SkyRL's GLM-4.x glm4_moe_lite
override), the export generator never yields the MTP/nextn tensors.
For GLM the nextn layer is stored in the source safetensors index as a
regular decoder layer at index num_hidden_layers (model.layers.{N}.*),
not under a dedicated mtp.* prefix.
Because those keys remain in the expected source sharding map, the
strict non-distributed save_generator completeness gate can never
complete any shard that co-locates a nextn key with real (non-MTP)
tensors. Those shards are dropped wholesale, taking pipeline-boundary
params (embed_tokens, layer 0, last layer, final norm, lm_head) with
them. For GLM-4.7-Flash this deterministically produced a 45/48-shard
checkpoint missing shards 00001/00047/00048.
Generalize the existing mtp.* ignore mechanism:
- _model_omits_mtp(model_config): detect an MTP-less built model from the
provider's mtp_num_layers (falsy => omitted), distinct from
_config_disables_mtp which only inspects HF config fields.
- _mtp_source_key_prefixes(source, *configs): resolve the source-key
prefixes to strip, covering both DeepSeek-style mtp.* and GLM's
model.layers.{num_hidden_layers}.*, each gated on source.has_glob so
only prefixes actually present are stripped.
Stripping model.layers.{N}. lets the boundary shards complete with their
real keys (47-shard self-consistent index) while the pure-nextn shard is
correctly omitted.
Signed-off-by: Vu Dinh <vudinh@outlook.com>
- fold ignored_source_key_prefixes selection inline - document hf_config-before-model_config ordering and 0-indexed layer assumption; note _mtp_source_key_prefixes precondition - add save_hf_weights integration tests asserting the nextn prefix is stripped only when the built model omits the MTP head - use SimpleNamespace instead of Mock(spec=[]) for unset-attr check Signed-off-by: Vu Dinh <vudinh@outlook.com>
Collapse a multi-line assert that ruff-format (v0.9.9) wants on a single line; no logic change. Signed-off-by: Vu Dinh <vudinh@outlook.com>
…tests The _run_save_hf_weights helper assigned hf_pretrained.state directly, but state is a read-only property on PreTrainedBase (no setter), raising AttributeError. Patch it via PropertyMock on the mock subclass instead. Signed-off-by: Vu Dinh <vudinh@outlook.com>
0a34d2e to
2d19db5
Compare
Author
|
@gautham-kollu The CI revealed a failure on the test code. I pushed a commit to correct it. PTAL. Thanks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Exporting a GLM-4.x
glm4_moe_litemodel to HuggingFace safetensors viaAutoBridge.save_hf_weightscan silently write an incomplete checkpoint when the model was built without an MTP/nextn head.GLM keeps its nextn layer as a regular decoder layer (
model.layers.{num_hidden_layers}.*), not under the usualmtp.*prefix. An MTP-less model never produces those tensors, but they're still expected in the shard map — so the per-shard completeness check drops any shard that shares space with them, including ones holdingembed_tokens,lm_head, andmodel.norm. The index stays self-consistent and no error is raised, so the loss is silent. For GLM-4.7-Flash this drops 3 of 48 shards.See #4188 for the full breakdown.
Changes
model.layers.{num_hidden_layers}.*) in addition tomtp.*when a model is exported without an MTP head.mtp_num_layers(the old config-based check misread GLM'snum_nextn_predict_layers=1as MTP-enabled).test_auto_bridge.py.Test plan
test_auto_bridge.pycases are mock-based and need no GPU.Additional Information