Skip to content

Conversation

@WeiweiZhang1
Copy link
Contributor

@WeiweiZhang1 WeiweiZhang1 commented Jan 22, 2026

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please describe):

Related Issues

Fixes #
Relates to #

Changes Made

Testing

  • Tested locally
  • Added/updated unit tests
  • All existing tests pass
  • Tested on specific hardware/environment (please specify):

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Context

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>
Copilot AI review requested due to automatic review settings January 22, 2026 14:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables quantization and generation support for the GLM-4.7-Flash (glm4_moe_lite) model by implementing a custom MoE module replacement and adding comprehensive test coverage.

Changes:

  • Implemented LinearGlm4MoeLiteMoE replacement module for calibration and quantization of the GLM-4 MoE architecture
  • Added test fixtures and test cases for glm4_moe_lite in both CPU and CUDA test suites
  • Registered the new module type in the replacement modules registry

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
auto_round/modelling/glm4_moe_lite.py New module implementing MoE layer replacement for GLM-4 model quantization
auto_round/modelling/replace_modules.py Registered Glm4MoeLiteMoE in the module replacement registry
test/test_cuda/models/test_moe_model.py Added glm4_moe_lite test fixture and comprehensive test including VLLM integration
test/test_cpu/models/test_moe_model.py Added glm4_moe_lite test fixture and CPU-specific quantization test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self.shared_experts = original.shared_experts

def forward(self, hidden_states):
residuals = hidden_states
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace on this line should be removed to maintain consistent code formatting.

Suggested change
residuals = hidden_states

Copilot uses AI. Check for mistakes.
for expert_idx, expert in enumerate(self.experts):
mask = expert_mask[expert_idx]
token_indices, weight_indices = torch.where(mask)
has_tokens = token_indices.numel() > 0
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace on this line should be removed to maintain consistent code formatting.

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +148
_update_parameter(self[i].up_proj, "weight", up_proj)
_update_parameter(self[i].down_proj, "weight", down)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after comma in slice notation. Should be [:intermediate_size, :] and [intermediate_size:, :] for consistency with PEP 8.

Suggested change
_update_parameter(self[i].up_proj, "weight", up_proj)
_update_parameter(self[i].down_proj, "weight", down)
gate_proj = gate_up[:intermediate_size, :]
up_proj = gate_up[intermediate_size:, :]

Copilot uses AI. Check for mistakes.
@pytest.fixture
def setup_glm4_moe_lite():
"""Fixture to set up the glm4_moe_lite model and tokenizer."""
model_name = "/dataset/GLM-4.7-Flash/"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded absolute path differs from the CPU test fixture which uses get_model_path(). Consider using the same pattern for consistency: model_name = get_model_path(\"zai-org/GLM-4.7-Flash\") as seen in the CPU test file.

Copilot uses AI. Check for mistakes.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
# if "France" in prompt:
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out code should be removed as it appears to be debug code that is no longer needed.

Suggested change
# if "France" in prompt:

Copilot uses AI. Check for mistakes.
config: "Glm4MoeLiteConfig",
calibrate_all_experts: bool = False,
):
super().__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WeiweiZhang1, just a heads‑up that https://github.com/intel/auto-round/pull/1307/changes has been merged.
Please refer to the new https://github.com/intel/auto-round/blob/main/auto_round/modelling/qwen3_vl_moe.py as an example.

To adapt to this change, we need to:

  1. Pass original to ReplacementModuleBase.

with torch.device(target_device):
super().__init__([Glm4MoeLiteMLP(config, intermediate_size) for _ in range(self.num_experts)])

if not unsupported_meta_device(original):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Move this part explicitly into the _materialize_weights function.

gate_proj = gate_up[:intermediate_size, :]
up_proj = gate_up[intermediate_size:, :]

_update_parameter(self[i].gate_proj, "weight", gate_proj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Replace the _update_parameter with from auto_round.modelling.utils import _update_parameter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants