fix(deepseek): gate H100 fused kernel defaults#4338
Conversation
Signed-off-by: Chen Cui <chcui@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Code Review - [High] Existing recipe tests will fail on H100 CI runners. Location: tests/unit_tests/recipes/test_deepseek_recipes.py:315-318 (_build_deepseek_v4_recipe). Problem: _build_deepseek_v4_recipe monkeypatches AutoBridge but not deepseek_v4_supports_blackwell_fused_kernels. After this PR, use_fused_mhc is set dynamically by that helper. On H100 CI runners (sm_90, CUDA available), the helper returns False, so use_fused_mhc will be False -- but five existing tests assert it is True. Affected tests: test_deepseek_v4_adam_mxfp8_recipe_uses_validated_optimizer_defaults (line 342), test_deepseek_v4_muon_recipe_uses_validated_optimizer_defaults (line 372), test_deepseek_v4_base_recipe_uses_blackwell_defaults (line 387), test_deepseek_v4_flash_sft_recipe_uses_fused_mhc (line 423), test_deepseek_v4_flash_no_mtp_sft_recipe_disables_mtp (line 435). Suggested fix: Add monkeypatch.setattr for deepseek_v4_supports_blackwell_fused_kernels (lambda: True) inside _build_deepseek_v4_recipe. Suggested test cases: No perf tests impacted. |
Signed-off-by: Chen Cui <chcui@nvidia.com>
Summary
Validation
bfb60bb8, MCore dev2f1004963dcb1718804f3b858f5fb2fc73819694, NeMo 26.06 rc3 container:12760509,use_fused_mhc=False,apply_rope_fusion=False,apply_dsa_kernel_fusion=True: failed before model construction withAssertionError: apply_dsa_kernel_fusion requires SM100+ (Blackwell or later), but current device has compute capability 9.0.12760510,use_fused_mhc=False,apply_rope_fusion=False,apply_dsa_kernel_fusion=False: completed one-token inference,after_model: cuda_allocated_gib=47.57 cuda_reserved_gib=48.36, generatedHello2, exit0:0.tests/unit_tests/models/deepseek/test_deepseek_v4_bridge.pyandtests/unit_tests/recipes/test_deepseek_recipes.pyinnvcr.io/nvidia/pytorch:26.04-py3.uv run --no-sync pre-commit run --all-filespassed.uv run --no-sync ruff check src/megatron/bridge/models/deepseek/deepseek_v4_bridge.py src/megatron/bridge/recipes/deepseek/deepseek_v4.py tests/unit_tests/models/deepseek/test_deepseek_v4_bridge.py tests/unit_tests/recipes/test_deepseek_recipes.pypassed.uv run --no-sync ruff format --check src/megatron/bridge/models/deepseek/deepseek_v4_bridge.py src/megatron/bridge/recipes/deepseek/deepseek_v4.py tests/unit_tests/models/deepseek/test_deepseek_v4_bridge.py tests/unit_tests/recipes/test_deepseek_recipes.pypassed.git diff --checkpassed.