Skip to content

ROCM-21372: clamp rocSOLVER reset_info launch block counts#7914

Draft
Copilot wants to merge 3 commits into
developfrom
copilot/rocm-21372-clamp-thread-blocks
Draft

ROCM-21372: clamp rocSOLVER reset_info launch block counts#7914
Copilot wants to merge 3 commits into
developfrom
copilot/rocm-21372-clamp-thread-blocks

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 30, 2026

Some rocSOLVER paths compute reset_info grid dimensions directly from problem size or batch count. A few of those launches can produce invalid HIP grid/workgroup configurations on AMD GPUs when the derived block count falls outside the valid range.

  • What changed

    • Added a shared helper in lib_device_helpers.hpp to compute reset_info block counts with bounded thread/block dimensions:
      • thread count clamped to [32, 1024]
      • block count clamped to [1, 64*1024 - 1]
    • Switched rocSOLVER reset_info launch sites to use the helper instead of open-coded ceil-division formulas.
  • Where this is applied

    • lapack_device_functions.hpp (rocsolver_lacn2_template)
    • rocSOLVER LAPACK, auxiliary, and refactor paths that launch reset_info
    • existing launch thread sizes remain unchanged; only the derived block count is normalized
  • Effect

    • avoids zero-block launches
    • avoids oversized grid dimensions for reset_info
    • keeps launch sizing consistent across rocSOLVER call sites
  • Example

    auto const nblocks = reset_info_nblocks(batch_count, BS1);
    ROCSOLVER_LAUNCH_KERNEL(reset_info, dim3(nblocks, 1, 1), dim3(BS1, 1, 1), 0, stream,
                            info, batch_count, 0);
Original prompt

Create a pull request in ROCm/rocm-libraries against the develop branch to address ROCM-21372: ensure kernel launches that use reset_info do not use an invalid workgroup/grid configuration.

Context:

  • The concern is that some launch configurations for reset_info may produce an invalid workgroup size in HIP on AMD GPUs.
  • Relevant files include:
    • projects/rocsolver/library/src/include/lib_device_helpers.hpp (contains the reset_info kernel)
    • projects/rocsolver/library/src/include/lapack_device_functions.hpp (contains at least one caller in rocsolver_lacn2_template that computes blocks for reset_info)
  • The repository branch to target is develop.

Requested change:

  • Add an adjustment/helper so the number of thread blocks used for launches is always clamped to a valid range.
  • The user suggested a helper similar to:
template<typename I>
static inline I ceildiv_block(I const batch_count, I const nthreads_per_block) {
    I const MinThreads = 32;
    I const MaxThreads = 1024;
    I const nthreads = std::max(MinThreads, std::min(MaxThreads, nthreads_per_block));
    I const blocks = (batch_count <= 0) ? 1 : (batch_count - 1) / nthreads + 1;
    I const MinBlocks = 1;
    I const MaxBlocks = 64 * 1024 - 1;
    return std::max(MinBlocks, std::min(MaxBlocks, blocks));
}

and usage like:

auto const nthreads_per_block = BS1;
auto const nblocks = ceildiv_block(batch_count, nthreads_per_block);
ROCSOLVER_LAUNCH_KERNEL(reset_info, dim3(1,1,nblocks), dim3(nthreads_per_block,1,1), 0, stream,
                        info, batch_count, 0);

Implementation guidance:

  • Inspect all reset_info launch sites in the repo, especially rocSOLVER paths, and update them to use a valid, clamped block count computation.
  • Prefer introducing a reusable helper in an appropriate shared header if that fits project style, rather than duplicating logic at each call site.
  • Keep the change minimal and consistent with existing coding patterns.
  • Ensure the launch dimensions passed to HIP are valid and avoid zero blocks.
  • If there are existing utilities for ceil division or launch-dimension clamping, reuse them instead of introducing redundant helpers.
  • Update only what is necessary to fix the invalid launch-configuration concern.

Deliverables:

  • Code changes implementing the launch-dimension adjustment.
  • A pull request with a concise summary referencing ROCM-21372.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Suppose you are an expert GPU programmer for ROCM HIP on AMD GPU, there is a concern from jira ticket ROCM-21372 that there may be invalid workgroup size in launch configuration for kernel reset_info.

Please create a new PR to add an adjustment to the number of thread blocks to be a valid value.

You may consider something like the following

template
static inline I ceildiv_block( I const batch_count, I const nthreads_per_block) {
I const MinThreads = 32; I const MaxThreads = 1024;
I const nthreads = std::max( MinThreads, std::min( MaxThreads, nthreads_per_block));
I const blocks = (batch_count <= 0) ? 1 : (batch_count-1)/nthreads + 1;
I const MinBlocks = 1; I const MaxBlocks = 64 * 1024 - 1;
return std::max( MinBlocks, std::min( MaxBlocks, blocks ));
}

then use as something like

auto const nthreads_per_block = BS1;
auto const nblocks = ceildiv( batch_count, nthreads_per_block);

ROCSOLVER_LAUNCH_KERNEL( reset_info, dim3(1,1,nblocks), dim3(nthreads_per_block,1,1),0,stream, info, batch_count, 0);

Assistant: I’m waiting for your confirmation on the pull request creation request, and once you approve it I’ll continue with the change.

This pull request was created from Copilot chat.

@assistant-librarian assistant-librarian Bot added the external contribution Code contribution from users community.. label May 30, 2026
Copilot AI changed the title [WIP] Add helper to clamp thread blocks for reset_info kernel ROCM-21372: clamp rocSOLVER reset_info launch block counts May 30, 2026
Copilot AI requested a review from EdDAzevedo May 30, 2026 17:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses ROCM-21372 by introducing a shared reset_info_nblocks helper that clamps derived grid x-dimension to [1, 65535] (and threads to [32, 1024]) for all reset_info kernel launches in rocSOLVER, eliminating the possibility of zero-block or oversized grid configurations. Because reset_info is implemented as a strided 1D loop, the clamp is correctness-preserving.

Changes:

  • Added reset_info_nblocks helper (and <algorithm> include) in lib_device_helpers.hpp.
  • Replaced open-coded ceil-division block counts at ~37 reset_info launch sites across LAPACK, auxiliary, and refactor paths.
  • Removed the per-launch hipGetDeviceProperties clamp in rocsolver_lacn2_template in favor of the shared helper.

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated no comments.

Show a summary per file
File Description
library/src/include/lib_device_helpers.hpp Adds reset_info_nblocks helper and <algorithm> include.
library/src/include/lapack_device_functions.hpp lacn2 switches from device-property clamp to helper.
library/src/auxiliary/rocauxiliary_bdsqr.hpp Replaces batch_count/BS1 + 1 with reset_info_nblocks(batch_count + 1, BS1).
library/src/auxiliary/rocauxiliary_bdsvdx.hpp Uses helper for both info and Dtgk resets.
library/src/auxiliary/rocauxiliary_gecon.hpp Uses helper for rcond reset.
library/src/auxiliary/rocauxiliary_lasyf.hpp Uses helper for kb/info reset.
library/src/auxiliary/rocauxiliary_stebz.hpp Uses helper for reset grid.
library/src/auxiliary/rocauxiliary_stedc.hpp Uses helper for reset grid.
library/src/auxiliary/rocauxiliary_stedcj.hpp Uses helper for reset grid.
library/src/auxiliary/rocauxiliary_stedcx.hpp Uses helper for reset grid.
library/src/auxiliary/rocauxiliary_stein.hpp Uses helper for reset grid.
library/src/auxiliary/rocauxiliary_steqr.hpp Uses helper for reset grid.
library/src/auxiliary/rocauxiliary_sterf.hpp Uses helper for reset grid.
library/src/lapack/roclapack_gels.hpp / _outofplace.hpp Uses helper for reset grid.
library/src/lapack/roclapack_gesdd.hpp Uses helper in quick-return path.
library/src/lapack/roclapack_gesv.hpp / _outofplace.hpp Uses helper for reset grid.
library/src/lapack/roclapack_gesvdj.hpp / _notransv.hpp Uses helper in quick-return path.
library/src/lapack/roclapack_gesvdx.hpp / _notransv.hpp Uses helper for info/nsv reset.
library/src/lapack/roclapack_getf2.hpp Uses helper with static_cast<I>(256).
library/src/lapack/roclapack_getrf.hpp Uses helper in quick-return path.
library/src/lapack/roclapack_getri.hpp / _outofplace.hpp Uses helper in quick-return path.
library/src/lapack/roclapack_posv.hpp Uses helper for reset grid.
library/src/lapack/roclapack_potf2.hpp / roclapack_potrf.hpp Uses helper for reset grid.
library/src/lapack/roclapack_potri.hpp Uses helper in quick-return path.
library/src/lapack/roclapack_syev_heev.hpp and syevd/_dx/_dx_inplace/_dj/_jx_jx variants Uses helper for reset grid.
library/src/lapack/roclapack_syevj_heevj.hpp Uses helper at two sites; + 1 preserves prior batch_count/BS1 + 1 semantics.
library/src/lapack/roclapack_sygv_hegv.hpp and sygvd/_dj/_dx/_dx_inplace/_j/_x variants Uses helper for reset grid.
library/src/lapack/roclapack_sytf2.hpp / roclapack_sytrf.hpp Uses helper in quick-return path.
library/src/lapack/roclapack_trtri.hpp Uses helper for info reset.
library/src/refact/rocrefact_csrrf_splitlu.hpp Uses helper for ptrU/ptrL/... resets.
library/src/refact/rocrefact_csrrf_sumlu.hpp Uses helper with n + 1 to match prior n/BS1 + 1 semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.36066% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...csolver/library/src/include/lib_device_helpers.hpp 90.00% 0 Missing and 1 partial ⚠️

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7914   +/-   ##
========================================
  Coverage    61.43%   61.43%           
========================================
  Files         2091     2091           
  Lines       358852   358857    +5     
  Branches     54266    54265    -1     
========================================
+ Hits        220451   220457    +6     
  Misses      119672   119672           
+ Partials     18729    18728    -1     
Flag Coverage Δ *Carryforward flag
TensileLite 27.41% <ø> (ø) Carriedforward from 9c82e2e
hipBLAS 90.65% <ø> (ø) Carriedforward from 9c82e2e
hipBLASLt 41.23% <ø> (ø) Carriedforward from 9c82e2e
hipCUB 82.21% <ø> (ø) Carriedforward from 9c82e2e
hipDNN 86.59% <ø> (ø) Carriedforward from 9c82e2e
hipFFT 51.59% <ø> (ø) Carriedforward from 9c82e2e
hipRAND 76.12% <ø> (ø) Carriedforward from 9c82e2e
hipSOLVER 69.24% <ø> (ø) Carriedforward from 9c82e2e
hipSPARSE 86.54% <ø> (ø) Carriedforward from 9c82e2e
rocBLAS 48.09% <ø> (ø) Carriedforward from 9c82e2e
rocFFT 45.24% <ø> (ø) Carriedforward from 9c82e2e
rocRAND 57.03% <ø> (ø) Carriedforward from 9c82e2e
rocSOLVER 77.83% <98.36%> (+0.01%) ⬆️
rocSPARSE 72.67% <ø> (ø) Carriedforward from 9c82e2e

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...olver/library/src/auxiliary/rocauxiliary_bdsqr.hpp 76.12% <100.00%> (ø)
...lver/library/src/auxiliary/rocauxiliary_bdsvdx.hpp 86.36% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_gecon.hpp 85.60% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_lasyf.hpp 73.91% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_stebz.hpp 81.58% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_stedc.hpp 55.56% <100.00%> (ø)
...lver/library/src/auxiliary/rocauxiliary_stedcj.hpp 64.10% <100.00%> (ø)
...lver/library/src/auxiliary/rocauxiliary_stedcx.hpp 66.47% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_stein.hpp 82.22% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_steqr.hpp 85.03% <100.00%> (ø)
... and 39 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external contribution Code contribution from users community.. project: rocsolver

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants