Skip to content

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 integration tests and multi-GPU support#7511

Open
bobby-cloudforge wants to merge 2 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-047-minimax-m1-integration1
Open

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 integration tests and multi-GPU support#7511
bobby-cloudforge wants to merge 2 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-047-minimax-m1-integration1

Conversation

@bobby-cloudforge
Copy link
Copy Markdown

Motivation

Companion to the MiniMax-M1 model PR — adds integration tests and multi-GPU validation infrastructure for Hackathon 10th Spring No.47.

Modifications

Integration Tests (tests/model_executor/test_minimax_m1_integration.py)

  • End-to-end construction + forward pass tests with full model config
  • Multi-layer interaction tests (linear + full attention)
  • Weight loading validation (v0 and v1 paths)

Multi-GPU Validation Script (scripts/validate_minimax_m1_multigpu.sh)

  • Automated tensor-parallel validation script for 2/4/8 GPU configurations
  • Includes correctness checks and basic throughput measurement

Test Infrastructure (tests/model_executor/conftest.py)

  • Shared fixtures for model executor tests
  • Config builder helpers for MiniMax-M1 test variants

Model Base Extension (fastdeploy/model_executor/models/model_base.py)

  • Minor extension to support MiniMax-M1 linear attention state management

Usage or Command

# Run integration tests
pytest tests/model_executor/test_minimax_m1_integration.py -v

# Multi-GPU validation (requires 8 GPUs)
bash scripts/validate_minimax_m1_multigpu.sh

Accuracy Tests

Integration tests verify:

  • Model construction with correct layer type dispatch (linear vs full attention)
  • Forward pass shape correctness through mixed attention pipeline
  • Weight loading key mapping for both v0 and v1 loaders
  • DeepNorm scaling coefficients applied correctly

All tests use monkeypatch.setattr + real objects (no MagicMock).

Checklist

  • Integration tests for mixed attention pipeline
  • Multi-GPU validation script
  • Shared test fixtures
  • Pre-commit hooks passing

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 20, 2026
- scripts/validate_minimax_m1_multigpu.sh: fix Tier 2 RESPONSE not reaching
  Python (use env var instead of stdin); pipe $MODELS via stdin in Tier 1
  to avoid triple-quote injection; use jq in send_chat for safe JSON
- model_base.py: warn on architecture registration overwrite
- lightning_attn.py: use None + conditional add instead of int 0 accumulator
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 17:05:25

📋 Review 摘要

PR 概述:为 MiniMax-M1(456B MoE)模型新增完整的模型实现、Lightning Attention Triton kernel、集成测试、多卡验证脚本及文档
变更范围:model_executor/models/、model_executor/ops/triton_ops/、tests/、scripts/、docs/
影响面 TagModels OP Docs/CI

📝 PR 规范检查

PR 标题 [Feature] Tag 有效,描述包含 Motivation/Modifications 且内容充分,符合规范。

问题

级别 文件 概述
🔴 Bug scripts/validate_minimax_m1_multigpu.sh:213 Tier 2 Python heredoc 中使用 sys.exit(1) 但未导入 sys 模块
🟡 建议 fastdeploy/model_executor/models/model_base.py:312 logging.warning() 应统一使用项目日志框架
🟡 建议 fastdeploy/model_executor/models/minimax_m1.py:376 _kv_history 实例变量存储 KV state,多请求并发时存在缓存污染风险(已有 TODO 标注)

总体评价

整体实现质量较高,模型架构(混合 linear/full attention + MoE)复用了 FastDeploy 已有的 layer 抽象,权重加载支持 HF v0/v1 两种路径,测试覆盖全面(包含纯 Python 参考实现验证 Lightning Attention 正确性)。主要问题是验证脚本中的 sys 未导入 bug,会导致 Tier 2 测试无法正确报错退出。

resp = json.loads(os.environ["RESPONSE"])
if "choices" not in resp or len(resp["choices"]) == 0:
print(f"❌ Tier 2 FAIL: No choices in response: {resp}")
sys.exit(1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug sys.exit(1) 调用但未导入 sys 模块。

第 208 行 Python heredoc 仅 import json, os,缺少 import sys。第 213 行和第 221 行的 sys.exit(1) 会抛出 NameError: name 'sys' is not defined,导致 Tier 2 验证在推理失败时无法输出正确的错误信息(Python heredoc 会因未捕获异常以非零状态码退出,但实际错误信息是 NameError 而非推理失败的提示,具有误导性)。

建议修复第 208 行:

import json, os, sys

def _register(model_cls):
# Traditional registration for ModelForCasualLM subclasses
cls._arch_to_model_cls[model_cls.name()] = model_cls
if architecture:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 此处使用标准库 logging.warning(),而项目其他位置(包括本 PR 新增的 minimax_m1.py)统一使用 paddleformers.utils.log.logger

建议统一使用项目日志框架以保持一致性:

from paddleformers.utils.log import logger
# ...
logger.warning("Overwriting model registration for architecture '%s'", architecture)

dtype=q.dtype,
)

# Apply lightning attention (returns 4D kv_history, not 5D concat)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 _kv_history 使用实例变量存储线性注意力的 recurrent KV state,在 serving 多请求并发场景下会导致跨请求缓存污染。

当前 self._kv_history.shape[0] != batch_size 仅对比 batch 维度——若不同请求恰好 batch_size 相同,仍会复用上一个请求的残留 state。代码中已有 TODO 标注迁移至 slot-based cache,文档也标注了已知限制,这些都是好的。建议在 PR 描述或 issue 中明确后续迁移的优先级和时间线。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants