[BOO] Limit threads used for numeric verification by keshavvinayak01 · Pull Request #1346 · iree-org/iree-turbine

keshavvinayak01 · 2026-04-16T05:58:32Z

Limit BLAS threads to 1 inside compute_cpu_reference() to prevent OpenBLAS from exceeding its compiled-in thread limit (128) on high-core-count machines, which causes a segfault.

Fixes #1336

Limit BLAS to a single thread inside compute_cpu_reference(), which exists for correctness not performance. Without this, OpenBLAS tries to spawn as many threads as there are CPU cores and exceeds its compiled-in limit (typically 128), causing a segfault. Fixes iree-org#1336 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

keshavvinayak01 · 2026-04-16T09:11:00Z

This should fix the RDNA failure @yash-amd

rkayaith · 2026-04-16T12:09:17Z

1 thread would be pretty slow, can you just use min(num_threads, 128)

rkayaith · 2026-04-16T12:16:39Z

Also I noticed in the issue it was reported that a nightly TheRock version of pytorch was used, not a stable release. Could you check if stable versions are hitting this as well? If not, I think it'd be better to work around this locally for now, and report a bug to the appropriate repo.

keshavvinayak01 · 2026-04-16T12:46:53Z

Also I noticed in the issue it was reported that a nightly TheRock version of pytorch was used, not a stable release. Could you check if stable versions are hitting this as well? If not, I think it'd be better to work around this locally for now, and report a bug to the appropriate repo.

@yash-amd Please check.

yash-amd · 2026-04-16T13:36:34Z

This should fix the RDNA failure @yash-amd

yeah i tested on the runner earlier with your branch
RDNA4 logs: https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs-gfx120X/2026-04-16_12-02/rdna4_attention_shapes_miopen_iree.csv

mi355 logs: https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs/2026-04-16_11-56/attention_shapes_miopen_iree.csv

yash-amd · 2026-04-16T13:41:23Z

Also I noticed in the issue it was reported that a nightly TheRock version of pytorch was used, not a stable release. Could you check if stable versions are hitting this as well? If not, I think it'd be better to work around this locally for now, and report a bug to the appropriate repo.

i have asked @deedongala from ossci team to check for the rocm version installed on the mi355 nod-ai runner, as on the runner when running rocminfo it was showing "Marketing Name" as "AMD Radeon Graphics" instead of "AMD Instinct MI355X" as we see on the other conductor machines like 10-09.
@deedongala any update on this?

rkayaith · 2026-04-16T17:36:38Z

i have asked @deedongala from ossci team to check for the rocm version installed on the mi355 nod-ai runner

In the meantime, can you try a test run on the CI with pytorch 2.10 installed with --index-url https://download.pytorch.org/whl/rocm7.1, and without this fix, to see if the error still ocurrs.

yash-amd · 2026-04-17T05:46:18Z

i have asked @deedongala from ossci team to check for the rocm version installed on the mi355 nod-ai runner

In the meantime, can you try a test run on the CI with pytorch 2.10 installed with --index-url https://download.pytorch.org/whl/rocm7.1, and without this fix, to see if the error still ocurrs.

yeah i tested this again and it gives the same output as "AMD Radeon Graphics" instead of "AMD Instinct MI355" in the Setup Environment job below using

  pip install "torch>=2.5,<=2.10.0" --index-url https://download.pytorch.org/whl/rocm7.1
  python3 -c "import torch; props = torch.cuda.get_device_properties(0); print(props.name)"

https://github.com/nod-ai/amd-shark-ai/actions/runs/24546996337/job/71764661591?pr=2894#step:6:82

rkayaith · 2026-04-17T16:42:37Z

This PR is about addressing a different issue (OpenBLAS warning: precompiled NUM_THREADS exceeded segfault when running tests), do you still see that error with the stable pytorch?

yash-amd · 2026-04-17T16:47:13Z

This PR is about addressing a different issue (OpenBLAS warning: precompiled NUM_THREADS exceeded segfault when running tests), do you still see that error with the stable pytorch?

yeah, i was not talking about OpenBLAS warning issue, that is solved.
I was taking about this https://xilinx.slack.com/archives/C08JKR35LRY/p1774394516655799

rkayaith · 2026-04-17T16:48:42Z

yeah, i was not talking about OpenBLAS warning issue, that is solved.

so this PR isn't necessary anymore? To clarify, I was asking if stable pytorch without this fix still hits the OpenBLAS issue.

yash-amd · 2026-04-17T16:55:29Z

yeah, i was not talking about OpenBLAS warning issue, that is solved.

so this PR isn't necessary anymore? To clarify, I was asking if stable pytorch without this fix still hits the OpenBLAS issue.

oh ok, i haven't check with stable pytorch if this issue(OpenBlas warning) is coming or not.

yash-amd · 2026-04-17T17:06:04Z

yeah, i was not talking about OpenBLAS warning issue, that is solved.

so this PR isn't necessary anymore? To clarify, I was asking if stable pytorch without this fix still hits the OpenBLAS issue.

they are working when using stable version on tom iree-turbine :
for rdna4: https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs-gfx120X/2026-04-17_17-51/rdna4_attention_shapes_miopen_iree.csv

for mi355: https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs/2026-04-17_17-12/attention_shapes_miopen_iree.csv

rkayaith · 2026-04-17T19:26:44Z

okay I think it'll be best to work around this in CI for now by setting the OPENBLAS_NUM_THREADS env var, and report this as an issue to TheRock so they can fix this.

yash-amd · 2026-04-20T09:21:34Z

okay I think it'll be best to work around this in CI for now by setting the OPENBLAS_NUM_THREADS env var, and report this as an issue to TheRock so they can fix this.

also one thing to notice, using the stable pytorch version, we are getting N.A values for six configs with iree as backend. As can be seen in this file https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs-gfx120X/2026-04-17_17-51/rdna4_attention_shapes_miopen_iree.csv

yash-amd · 2026-04-20T09:46:23Z

okay I think it'll be best to work around this in CI for now by setting the OPENBLAS_NUM_THREADS env var, and report this as an issue to TheRock so they can fix this.

also one thing to notice, using the stable pytorch version, we are getting N.A values for six configs with iree as backend. As can be seen in this file https://github.com/nod-ai/amd-shark-ai-reports/blob/main/boo/boo-custom-runs-gfx120X/2026-04-17_17-51/rdna4_attention_shapes_miopen_iree.csv

this might be because of

torch.OutOfMemoryError: HIP out of memory. Tried to allocate 6.00 GiB. GPU 0 has a total capacity of 15.92 GiB of which 3.70 GiB is free. Of the allocated memory 3.19 GiB is allocated by PyTorch, and 1.81 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

for rdna4 in the logs https://github.com/nod-ai/amd-shark-ai/actions/runs/24578284442/job/71869254564#step:15:120

i tried setting the flag PYTORCH_ALLOC_CONF=expandable_segments:True as well but still getting the same "torch.OutOfMemoryError: HIP out of memory".

keshavvinayak01 requested review from rkayaith and zjgarvey as code owners April 16, 2026 05:58

keshavvinayak01 changed the title ~~Fix segfault in --verify-numerics on machines with >128 CPU cores~~ [BOO] Limit threads used for numeric verification Apr 16, 2026

Conversation

keshavvinayak01 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keshavvinayak01 commented Apr 16, 2026

Uh oh!

rkayaith commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkayaith commented Apr 16, 2026

Uh oh!

keshavvinayak01 commented Apr 16, 2026

Uh oh!

yash-amd commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yash-amd commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkayaith commented Apr 16, 2026

Uh oh!

yash-amd commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkayaith commented Apr 17, 2026

Uh oh!

yash-amd commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkayaith commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yash-amd commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yash-amd commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkayaith commented Apr 17, 2026

Uh oh!

yash-amd commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yash-amd commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keshavvinayak01 commented Apr 16, 2026 •

edited

Loading

rkayaith commented Apr 16, 2026 •

edited

Loading

yash-amd commented Apr 16, 2026 •

edited

Loading

yash-amd commented Apr 16, 2026 •

edited

Loading

yash-amd commented Apr 17, 2026 •

edited

Loading

yash-amd commented Apr 17, 2026 •

edited

Loading

rkayaith commented Apr 17, 2026 •

edited

Loading

yash-amd commented Apr 17, 2026 •

edited

Loading

yash-amd commented Apr 17, 2026 •

edited

Loading

yash-amd commented Apr 20, 2026 •

edited

Loading

yash-amd commented Apr 20, 2026 •

edited

Loading