[BOO] Limit threads used for numeric verification #1346
Conversation
Limit BLAS to a single thread inside compute_cpu_reference(), which exists for correctness not performance. Without this, OpenBLAS tries to spawn as many threads as there are CPU cores and exceeds its compiled-in limit (typically 128), causing a segfault. Fixes iree-org#1336 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
|
This should fix the RDNA failure @yash-amd |
|
1 thread would be pretty slow, can you just use |
|
Also I noticed in the issue it was reported that a nightly TheRock version of pytorch was used, not a stable release. Could you check if stable versions are hitting this as well? If not, I think it'd be better to work around this locally for now, and report a bug to the appropriate repo. |
@yash-amd Please check. |
yeah i tested on the runner earlier with your branch |
i have asked @deedongala from ossci team to check for the rocm version installed on the mi355 nod-ai runner, as on the runner when running rocminfo it was showing "Marketing Name" as "AMD Radeon Graphics" instead of "AMD Instinct MI355X" as we see on the other conductor machines like 10-09. |
In the meantime, can you try a test run on the CI with pytorch 2.10 installed with |
yeah i tested this again and it gives the same output as "AMD Radeon Graphics" instead of "AMD Instinct MI355" in the Setup Environment job below using https://github.com/nod-ai/amd-shark-ai/actions/runs/24546996337/job/71764661591?pr=2894#step:6:82 |
|
This PR is about addressing a different issue ( |
yeah, i was not talking about OpenBLAS warning issue, that is solved. |
so this PR isn't necessary anymore? To clarify, I was asking if stable pytorch without this fix still hits the OpenBLAS issue. |
oh ok, i haven't check with stable pytorch if this issue(OpenBlas warning) is coming or not. |
they are working when using stable version on tom iree-turbine : |
|
okay I think it'll be best to work around this in CI for now by setting the |
also one thing to notice, using the stable pytorch version, we are getting N.A values for six configs with iree as backend. As can be seen in this file https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs-gfx120X/2026-04-17_17-51/rdna4_attention_shapes_miopen_iree.csv |
this might be because of for rdna4 in the logs https://github.com/nod-ai/amd-shark-ai/actions/runs/24578284442/job/71869254564#step:15:120 i tried setting the flag PYTORCH_ALLOC_CONF=expandable_segments:True as well but still getting the same "torch.OutOfMemoryError: HIP out of memory". |
Limit BLAS threads to 1 inside
compute_cpu_reference()to prevent OpenBLAS from exceeding its compiled-in thread limit (128) on high-core-count machines, which causes a segfault.Fixes #1336