Describe the bug
The test compares a Helion bf16×int16 GEMM kernel against a pytorch reference implementation, but ROCm produces slightly different numerical results that exceed the comparison tolerance.
To Reproduce
Pull PR #794 and remove the @skipIfRocm decorator from test_bf16xint16 in test/test_examples.py. Run the test on ROCm environment.
Failed CI job: https://github.com/pytorch/helion/actions/runs/18213179448/job/51857480348
Expected behavior
The test should pass on ROCm since both implementations perform the same mathematical operation
- helion kernel: converts int16 -> bf16 inside Triton kernel then performs GEMM
 
- pytorch reference: converts int16 -> bf16 with PyTorch then performs torch.matmul