Skip to content

delete me#7508

Closed
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/047-minimax-m1-integration1
Closed

delete me#7508
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/047-minimax-m1-integration1

Conversation

@bobby-cloudforge
Copy link
Copy Markdown

@bobby-cloudforge bobby-cloudforge commented Apr 20, 2026

n/a

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 20, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 15:10 CST

📋 Review 摘要

PR 概述:为 MiniMax-M1 (456B MoE) 模型新增完整集成测试、多卡验证脚本、模型实现及 Lightning Attention Triton kernel。
变更范围:模型实现 (minimax_m1.py)、Triton 算子 (lightning_attn.py)、模型注册 (model_base.py)、测试 & 脚本、双语文档
影响面 TagModels OP Docs

📝 PR 规范检查

PR 标题已包含 [Feature] Tag,描述包含 Motivation/Modifications/Usage/Checklist,符合规范。

问题

级别 文件 概述
🔴 Bug scripts/validate_minimax_m1_multigpu.sh:211 Tier 2 RESPONSE 变量未传给 Python 脚本,sys.stdin.read() 将返回空串导致 json 解析失败
🔴 Bug scripts/validate_minimax_m1_multigpu.sh:181 $MODELS 通过 '''..''' 插值到 Python 字符串中,若响应包含 ''' 会导致语法错误或注入
🟡 建议 scripts/validate_minimax_m1_multigpu.sh:162 send_chat 通过字符串插值构建 JSON,特殊字符可能破坏 JSON 格式
🟡 建议 fastdeploy/model_executor/models/minimax_m1.py:370 _kv_history 使用实例变量存储,多请求并发下存在状态串扰风险
🟡 建议 fastdeploy/model_executor/models/model_base.py:311 新增 architecture 直接注册可能覆盖其他模型的已有注册
🟡 建议 fastdeploy/model_executor/ops/triton_ops/lightning_attn.py:636 output = 0 作为 tensor 累加初值,类型不清晰

总体评价

模型实现整体结构清晰,复用了 FastDeploy 现有 layers 组件(Attention、FusedMoE、RMSNorm 等),DeepNorm 和混合注意力调度逻辑正确。主要阻塞问题在于 多卡验证脚本 Tier 2 存在确定性 Bug——RESPONSE 变量未传递给 Python 子进程,导致 Tier 2 验证永远无法成功。建议修复后合入。

python3 - <<'PYEOF'
import json, sys

resp = json.loads(sys.stdin.read())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug Tier 2 的 RESPONSE 变量未传递给 Python 脚本,sys.stdin.read() 将返回空字符串。

第 206 行通过 RESPONSE=$(send_chat ...) 捕获了 curl 响应,但第 208 行使用 python3 - <<'PYEOF' 将 heredoc 作为脚本传入。此时 stdin 已被 heredoc 占用,sys.stdin.read() 读到的是空串(heredoc 读完后 stdin 到达 EOF),json.loads("") 将抛出 JSONDecodeErrorTier 2 验证永远会失败。

建议修复方式——改用环境变量或管道传递 RESPONSE:

# 方案 1: 通过环境变量
RESPONSE=$(send_chat "What is 2+3? Answer with just the number." 32)
export RESPONSE
python3 <<'PYEOF'
import json, sys, os
resp = json.loads(os.environ["RESPONSE"])
...
PYEOF

# 方案 2: 将 heredoc 改为 -c 并用 pipe 传入
echo "$RESPONSE" | python3 -c "
import json, sys
resp = json.loads(sys.stdin.read())
...
"

MODELS=$(curl -s "http://localhost:${PORT}/v1/models")
python3 -c "
import json, sys
models = json.loads('''$MODELS''')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 $MODELS 通过三引号 '''..''' 插值到 Python 字面量中,如果服务端返回的 JSON 包含 ''',会导致 Python 语法错误甚至代码注入。

建议改为与 Tier 3 相同的安全模式(通过 stdin 传入):

MODELS=$(curl -s "http://localhost:${PORT}/v1/models")
echo "$MODELS" | python3 -c "
import json, sys
models = json.load(sys.stdin)
assert 'data' in models, f'No data in /v1/models: {models}'
names = [m['id'] for m in models['data']]
print(f'Models served: {names}')
assert len(names) > 0, 'No models loaded'
print('✅ Tier 1 PASS')
"

-H "Content-Type: application/json" \
-d "{
\"model\": \"$MODEL_PATH\",
\"messages\": [{\"role\": \"user\", \"content\": \"$prompt\"}],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 send_chat 函数通过字符串插值构建 JSON body,若 $MODEL_PATH 包含双引号、反斜杠或换行符,会生成非法 JSON。

当前调用场景下 prompt 都是硬编码英文短句,实际不会触发问题。但作为通用函数建议改用 jq 构造:

send_chat() {
    local prompt="$1"
    local max_tokens="${2:-64}"
    local body
    body=$(jq -n --arg model "$MODEL_PATH" --arg prompt "$prompt" --argjson mt "$max_tokens" \
        '{model: $model, messages: [{role: "user", content: $prompt}], max_tokens: $mt, temperature: 0.0}')
    curl -s "http://localhost:${PORT}/v1/chat/completions" \
        -H "Content-Type: application/json" \
        -d "$body"
}

# Retrieve or initialize KV history for recurrent state persistence.
# TODO: Migrate to ForwardMeta.caches / slot-based cache management for
# proper multi-request isolation in production serving scenarios.
if not hasattr(self, "_kv_history") or self._kv_history is None or self._kv_history.shape[0] != batch_size:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 _kv_history 使用实例变量存储线性注意力的递推状态,在 serving 场景下不同请求的 batch 会共享同一个 Layer 实例,导致状态串扰。

当前已有 TODO 标注,理解这是第一版实现的有意取舍。但需要注意:

  1. batch_size 变化时(第 370 行条件),_kv_history 会被重置为零,之前请求的递推状态丢失——连续对话场景下会造成输出质量下降。
  2. 建议在后续 PR 中优先迁移至 ForwardMeta.caches 或 slot-based cache,否则此模型不适合并发 serving。

请确认此限制是否需要在文档(docs/best_practices/MiniMax-M1.md)的 Known Limitations 中明确说明。

def _register(model_cls):
# Traditional registration for ModelForCasualLM subclasses
cls._arch_to_model_cls[model_cls.name()] = model_cls
if architecture:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 新增的 cls._arch_to_model_cls[architecture] = model_cls 在 310 行的 model_cls.name() 注册之外额外以 decorator 参数 architecture 为 key 注册。

这里没有冲突检测——如果两个不同的模型类用相同的 architecture 字符串注册,后注册的会静默覆盖前者。当前 MiniMax-M1 注册了 MiniMaxM1ForCausalLMMiniMaxText01ForCausalLM 两个 architecture,不会冲突。但建议添加一个 warning 日志或断言防止未来的静默覆盖:

if architecture:
    if architecture in cls._arch_to_model_cls and cls._arch_to_model_cls[architecture] is not model_cls:
        logger.warning(f"Overwriting registration for architecture '{architecture}'")
    cls._arch_to_model_cls[architecture] = model_cls

if arr[-1] != d:
arr.append(d)
n = len(arr)
output = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 output = 0(Python int)作为 tensor 累加的初始值,依赖 0 + paddle.Tensor 的隐式类型提升。虽然功能正确,但可读性较差且 type checker 会标记类型不一致。

建议显式初始化:

output = paddle.zeros_like(q[..., :v.shape[-1]].sum(dim=-1, keepdim=True).expand_as(v))
# 或更简单地:
output = None
...
output = o if output is None else output + o

@bobby-cloudforge bobby-cloudforge changed the title [Feature]【Hackathon 10th Spring No.47】MiniMax-M1 integration tests and multi-GPU validation delete me Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants