-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
2026-03-11T12:40:37.7211015Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7211626Z 9-task-1-0/0 [default0]:=================================== FAILURES ===================================
2026-03-11T12:40:37.7212418Z 9-task-1-0/0 [default0]:_____ TestVisionTECudaGraphHelper.test_create_cudagraphs_multi_microbatch ______
2026-03-11T12:40:37.7212967Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7213598Z 9-task-1-0/0 [default0]:self = <tests.unit_tests.transformer.test_vision_cuda_graphs.TestVisionTECudaGraphHelper object at 0x7af9f25ab0b0>
2026-03-11T12:40:37.7215121Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7215373Z 9-task-1-0/0 [default0]: @pytest.mark.skipif(
2026-03-11T12:40:37.7215724Z 9-task-1-0/0 [default0]: not (HAVE_TE_GRAPHS and is_te_min_version("2.7.0")),
2026-03-11T12:40:37.7216239Z 9-task-1-0/0 [default0]: reason="TE CUDA graph capture requires TransformerEngine >= 2.7.0",
2026-03-11T12:40:37.7216625Z 9-task-1-0/0 [default0]: )
2026-03-11T12:40:37.7216936Z 9-task-1-0/0 [default0]: def test_create_cudagraphs_multi_microbatch(self):
2026-03-11T12:40:37.7217398Z 9-task-1-0/0 [default0]: """Verify that graphs are created per-microbatch per-layer."""
2026-03-11T12:40:37.7217811Z 9-task-1-0/0 [default0]: self.llava_model.cuda()
2026-03-11T12:40:37.7218097Z 9-task-1-0/0 [default0]: num_mb = 2
2026-03-11T12:40:37.7218443Z 9-task-1-0/0 [default0]: helper = self._make_helper(num_microbatches=num_mb)
2026-03-11T12:40:37.7218790Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7219056Z 9-task-1-0/0 [default0]:> helper.create_cudagraphs()
2026-03-11T12:40:37.7219338Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7219879Z 9-task-1-0/0 [default0]:tests/unit_tests/transformer/test_vision_cuda_graphs.py:390:
2026-03-11T12:40:37.7220550Z 9-task-1-0/0 [default0]:_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2026-03-11T12:40:37.7221039Z 9-task-1-0/0 [default0]:megatron/core/transformer/cuda_graphs.py:2751: in create_cudagraphs
2026-03-11T12:40:37.7221457Z 9-task-1-0/0 [default0]: super().create_cudagraphs()
2026-03-11T12:40:37.7221871Z 9-task-1-0/0 [default0]:megatron/core/transformer/cuda_graphs.py:2253: in create_cudagraphs
2026-03-11T12:40:37.7222368Z 9-task-1-0/0 [default0]: sample_args, kwargs = self._get_cuda_graph_input_data()
2026-03-11T12:40:37.7222893Z 9-task-1-0/0 [default0]:megatron/core/transformer/cuda_graphs.py:2197: in _get_cuda_graph_input_data
2026-03-11T12:40:37.7223382Z 9-task-1-0/0 [default0]: kwargs = get_make_graphed_callables_kwargs()
2026-03-11T12:40:37.7223877Z 9-task-1-0/0 [default0]:megatron/core/transformer/cuda_graphs.py:2143: in get_make_graphed_callables_kwargs
2026-03-11T12:40:37.7224476Z 9-task-1-0/0 [default0]: (10 - self.config.cuda_graph_warmup_steps * get_num_microbatches())
2026-03-11T12:40:37.7224953Z 9-task-1-0/0 [default0]:_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2026-03-11T12:40:37.7225303Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7225565Z 9-task-1-0/0 [default0]: def get_num_microbatches() -> int:
2026-03-11T12:40:37.7225916Z 9-task-1-0/0 [default0]: """Get number of microbatches."""
2026-03-11T12:40:37.7226445Z 9-task-1-0/0 [default0]:> return _GLOBAL_NUM_MICROBATCHES_CALCULATOR.get()
2026-03-11T12:40:37.7226941Z 9-task-1-0/0 [default0]:E AttributeError: 'NoneType' object has no attribute 'get'
2026-03-11T12:40:37.7227291Z 9-task-1-0/0 [default0]:
2026-03-11T12:40:37.7227613Z 9-task-1-0/0 [default0]:megatron/core/num_microbatches_calculator.py:19: AttributeError
Steps/Code to reproduce bug
https://github.com/NVIDIA/Megatron-LM/actions/runs/22952348427/job/66620424711
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working