Skip to content

[Hotfix] final fixes for P2P Transfer #22663

Open
JD-ETH wants to merge 5 commits intosgl-project:sglang-milesfrom
JD-ETH:patch-fix-dpsk-5-10-v4
Open

[Hotfix] final fixes for P2P Transfer #22663
JD-ETH wants to merge 5 commits intosgl-project:sglang-milesfrom
JD-ETH:patch-fix-dpsk-5-10-v4

Conversation

@JD-ETH
Copy link
Copy Markdown
Contributor

@JD-ETH JD-ETH commented Apr 13, 2026

  1. Cherry-pick PR fix: deprecated interfaces after dump #22486 — fix: deprecated interfaces after dump
    - Fix Qwen3 rope_parameters → use get_rope_config() helper instead of accessing config.rope_parameters dict directly
    - (model_runner.py conflict resolved — redundant import removed)
  2. fix(weight_checker): skip _weight_fp32 in weight equality check
    - _reset_tensors(): skip _weight_fp32 buffers (don't randomize them)
    - _postprocess_tensors(): add _weight_fp32 to non_persistent_buffer_patterns (don't fail on mismatch)
    - Reason: Glm4MoeGate._weight_fp32 is a FP32 cache of the bf16 gate weight. Runtime invalidation after P2P weight update is not supported yet. Same skip pattern as cos_sin_cache / inv_freq.
  3. chore: remove redundant local import of get_local_ip_auto
  4. Cherry-pick [sglang-miles] fix fused qkv load weight from hf #22552 that fixes a special shard loading implementation in sglang

Validated models (all ✅ with --check-weight-update-equal + p2p)

  • Qwen3-4B (Qwen3ForCausalLM, 1 node)
  • GLM-Z1-9B-0414 (Glm4ForCausalLM, 1 node)
  • Moonlight-16B-A3B (DeepseekV2ForCausalLM, 2 nodes)
  • GLM-4.7-9B-Flash (Glm4MoeLiteForCausalLM, 2 nodes)
  • GLM-5_4layer (DeepseekV3ForCausalLM, 2 nodes)
  • Qwen3-30B-A3B (Qwen3MoeForCausalLM, 4 nodes)
  • GLM-4.5-Air (Glm4MoeForCausalLM, 8 nodes)

JD-ETH and others added 5 commits April 11, 2026 06:32
The stacked_params_mapping routes q_a_proj and kv_a_proj_with_mqa to
ReplicatedLinear.weight_loader with a shard_id, but ReplicatedLinear
does not support shard_id. Skip the stacked path for this param_name
so weights fall through to the existing cached_a_proj path in
do_load_weights(), which correctly caches both halves and torch.cats
them before loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change lazy import from `sglang.srt.utils` to `sglang.srt.utils.network`
to match the module where `get_local_ip_auto` is actually defined.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Glm4MoeGate._weight_fp32 is a FP32 cache of the bf16 gate weight.
Runtime invalidation of this cache after weight update is not yet
supported. Skip it in both _reset_tensors and _postprocess_tensors,
same pattern as cos_sin_cache and inv_freq.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Already imported at module level (line 183).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

def _reset_tensors(self):
for name, param in self._model_state():
if "cos_sin_cache" in name or "freqs_cis" in name:
if "cos_sin_cache" in name or "freqs_cis" in name or "_weight_fp32" in name:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we could maintain a list where these keys could be skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants