Potential numerical accuracy issue in the fused_moe implementation

Test environment: CTK 13.1, torch = 2.9.1+cu130, cuda-tile = 1.0.0, single B200 GPU

When using a parameter set `(num_tokens, hidden_size, moe_intermediate_size, n_experts, top_k) = (128, 4096, 2048, 16, 4)` in [tests/ops/test_moe.py](https://github.com/NVIDIA/TileGym/blob/58385288a38515ef8c7906751eaf32aabb25ace0/tests/ops/test_moe.py#L68) and add `torch.manual_seed(0)` in line 101, will have mismatched results:

```text
(Earlier output omitted)
>       assert passed, f"\n{failed_msgs}"
               ^^^^^^
E       AssertionError:
E               *** OUTPUT 0 DID NOT MATCH THE REFERENCE (rtol=0.1, atol=0.1) ***
E                       allclose: False
E                       matched: 523466 / 524288 [99.84%]
E                       ref range:    -1.1700e+02 :  1.2300e+02
E                       test range:   -1.1650e+02 :  1.2300e+02
E                       |ref| range:   0.0000e+00 :  1.2300e+02
E                       |test| range:  0.0000e+00 :  1.2300e+02
E                       max absolute difference:  1.0000e+00
E                       max relative change:      4.3000e+01
E                       max max mean change:      2.0000e+00
E                       max arith mean change:    5.1400e+02
E                       shape: torch.Size([128, 4096]) stride: (4096, 1) dtype: torch.bfloat16
E                       mismatched indices:tensor([[   0, 1842],
E               [   0, 2071],
E               [   1,  675],
E               ...,
E               [ 127, 1728],
E               [ 127, 2207],
E               [ 127, 2628]])
```

For the same parameter set, if dtype is changed to float16, the test can pass.
Using a smaller `hidden_size` or a smaller `moe_intermediate_size` can also pass the test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential numerical accuracy issue in the fused_moe implementation #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential numerical accuracy issue in the fused_moe implementation #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions