[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 integration tests and multi-GPU support#7511
Conversation
…s and multi-GPU support
|
Thanks for your contribution! |
- scripts/validate_minimax_m1_multigpu.sh: fix Tier 2 RESPONSE not reaching Python (use env var instead of stdin); pipe $MODELS via stdin in Tier 1 to avoid triple-quote injection; use jq in send_chat for safe JSON - model_base.py: warn on architecture registration overwrite - lightning_attn.py: use None + conditional add instead of int 0 accumulator
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-20 17:05:25
📋 Review 摘要
PR 概述:为 MiniMax-M1(456B MoE)模型新增完整的模型实现、Lightning Attention Triton kernel、集成测试、多卡验证脚本及文档
变更范围:model_executor/models/、model_executor/ops/triton_ops/、tests/、scripts/、docs/
影响面 Tag:Models OP Docs/CI
📝 PR 规范检查
PR 标题 [Feature] Tag 有效,描述包含 Motivation/Modifications 且内容充分,符合规范。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | scripts/validate_minimax_m1_multigpu.sh:213 |
Tier 2 Python heredoc 中使用 sys.exit(1) 但未导入 sys 模块 |
| 🟡 建议 | fastdeploy/model_executor/models/model_base.py:312 |
logging.warning() 应统一使用项目日志框架 |
| 🟡 建议 | fastdeploy/model_executor/models/minimax_m1.py:376 |
_kv_history 实例变量存储 KV state,多请求并发时存在缓存污染风险(已有 TODO 标注) |
总体评价
整体实现质量较高,模型架构(混合 linear/full attention + MoE)复用了 FastDeploy 已有的 layer 抽象,权重加载支持 HF v0/v1 两种路径,测试覆盖全面(包含纯 Python 参考实现验证 Lightning Attention 正确性)。主要问题是验证脚本中的 sys 未导入 bug,会导致 Tier 2 测试无法正确报错退出。
| resp = json.loads(os.environ["RESPONSE"]) | ||
| if "choices" not in resp or len(resp["choices"]) == 0: | ||
| print(f"❌ Tier 2 FAIL: No choices in response: {resp}") | ||
| sys.exit(1) |
There was a problem hiding this comment.
🔴 Bug sys.exit(1) 调用但未导入 sys 模块。
第 208 行 Python heredoc 仅 import json, os,缺少 import sys。第 213 行和第 221 行的 sys.exit(1) 会抛出 NameError: name 'sys' is not defined,导致 Tier 2 验证在推理失败时无法输出正确的错误信息(Python heredoc 会因未捕获异常以非零状态码退出,但实际错误信息是 NameError 而非推理失败的提示,具有误导性)。
建议修复第 208 行:
import json, os, sys| def _register(model_cls): | ||
| # Traditional registration for ModelForCasualLM subclasses | ||
| cls._arch_to_model_cls[model_cls.name()] = model_cls | ||
| if architecture: |
There was a problem hiding this comment.
🟡 建议 此处使用标准库 logging.warning(),而项目其他位置(包括本 PR 新增的 minimax_m1.py)统一使用 paddleformers.utils.log.logger。
建议统一使用项目日志框架以保持一致性:
from paddleformers.utils.log import logger
# ...
logger.warning("Overwriting model registration for architecture '%s'", architecture)| dtype=q.dtype, | ||
| ) | ||
|
|
||
| # Apply lightning attention (returns 4D kv_history, not 5D concat) |
There was a problem hiding this comment.
🟡 建议 _kv_history 使用实例变量存储线性注意力的 recurrent KV state,在 serving 多请求并发场景下会导致跨请求缓存污染。
当前 self._kv_history.shape[0] != batch_size 仅对比 batch 维度——若不同请求恰好 batch_size 相同,仍会复用上一个请求的残留 state。代码中已有 TODO 标注迁移至 slot-based cache,文档也标注了已知限制,这些都是好的。建议在 PR 描述或 issue 中明确后续迁移的优先级和时间线。
Motivation
Companion to the MiniMax-M1 model PR — adds integration tests and multi-GPU validation infrastructure for Hackathon 10th Spring No.47.
Modifications
Integration Tests (
tests/model_executor/test_minimax_m1_integration.py)Multi-GPU Validation Script (
scripts/validate_minimax_m1_multigpu.sh)Test Infrastructure (
tests/model_executor/conftest.py)Model Base Extension (
fastdeploy/model_executor/models/model_base.py)Usage or Command
Accuracy Tests
Integration tests verify:
All tests use
monkeypatch.setattr+ real objects (no MagicMock).Checklist