[Hotfix] final fixes for P2P Transfer #22663
Open
JD-ETH wants to merge 5 commits intosgl-project:sglang-milesfrom
Open
[Hotfix] final fixes for P2P Transfer #22663JD-ETH wants to merge 5 commits intosgl-project:sglang-milesfrom
JD-ETH wants to merge 5 commits intosgl-project:sglang-milesfrom
Conversation
The stacked_params_mapping routes q_a_proj and kv_a_proj_with_mqa to ReplicatedLinear.weight_loader with a shard_id, but ReplicatedLinear does not support shard_id. Skip the stacked path for this param_name so weights fall through to the existing cached_a_proj path in do_load_weights(), which correctly caches both halves and torch.cats them before loading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change lazy import from `sglang.srt.utils` to `sglang.srt.utils.network` to match the module where `get_local_ip_auto` is actually defined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Glm4MoeGate._weight_fp32 is a FP32 cache of the bf16 gate weight. Runtime invalidation of this cache after weight update is not yet supported. Skip it in both _reset_tensors and _postprocess_tensors, same pattern as cos_sin_cache and inv_freq. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Already imported at module level (line 183). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
This was referenced Apr 13, 2026
JensenFire
reviewed
Apr 13, 2026
| def _reset_tensors(self): | ||
| for name, param in self._model_state(): | ||
| if "cos_sin_cache" in name or "freqs_cis" in name: | ||
| if "cos_sin_cache" in name or "freqs_cis" in name or "_weight_fp32" in name: |
Contributor
There was a problem hiding this comment.
nit: maybe we could maintain a list where these keys could be skipped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- Fix Qwen3 rope_parameters → use get_rope_config() helper instead of accessing config.rope_parameters dict directly
- (model_runner.py conflict resolved — redundant import removed)
- _reset_tensors(): skip _weight_fp32 buffers (don't randomize them)
- _postprocess_tensors(): add _weight_fp32 to non_persistent_buffer_patterns (don't fail on mismatch)
- Reason: Glm4MoeGate._weight_fp32 is a FP32 cache of the bf16 gate weight. Runtime invalidation after P2P weight update is not supported yet. Same skip pattern as cos_sin_cache / inv_freq.
Validated models (all ✅ with --check-weight-update-equal + p2p)