-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Test environment: CTK 13.1, torch = 2.9.1+cu130, cuda-tile = 1.0.0, single B200 GPU
When using a parameter set (num_tokens, hidden_size, moe_intermediate_size, n_experts, top_k) = (128, 4096, 2048, 16, 4) in tests/ops/test_moe.py and add torch.manual_seed(0) in line 101, will have mismatched results:
(Earlier output omitted)
> assert passed, f"\n{failed_msgs}"
^^^^^^
E AssertionError:
E *** OUTPUT 0 DID NOT MATCH THE REFERENCE (rtol=0.1, atol=0.1) ***
E allclose: False
E matched: 523466 / 524288 [99.84%]
E ref range: -1.1700e+02 : 1.2300e+02
E test range: -1.1650e+02 : 1.2300e+02
E |ref| range: 0.0000e+00 : 1.2300e+02
E |test| range: 0.0000e+00 : 1.2300e+02
E max absolute difference: 1.0000e+00
E max relative change: 4.3000e+01
E max max mean change: 2.0000e+00
E max arith mean change: 5.1400e+02
E shape: torch.Size([128, 4096]) stride: (4096, 1) dtype: torch.bfloat16
E mismatched indices:tensor([[ 0, 1842],
E [ 0, 2071],
E [ 1, 675],
E ...,
E [ 127, 1728],
E [ 127, 2207],
E [ 127, 2628]])
For the same parameter set, if dtype is changed to float16, the test can pass.
Using a smaller hidden_size or a smaller moe_intermediate_size can also pass the test.
Metadata
Metadata
Assignees
Labels
No labels