Skip to content

Add cuDSS support#1999

Merged
pratikvn merged 19 commits into
developfrom
feat/add-cudss
May 5, 2026
Merged

Add cuDSS support#1999
pratikvn merged 19 commits into
developfrom
feat/add-cudss

Conversation

@pratikvn
Copy link
Copy Markdown
Member

@pratikvn pratikvn commented Apr 5, 2026

As our direct solvers are still under development, and we need direct solvers for MICROCARD, I think it would be good to have cuDSS as an option when using solver::Direct.

I am open to interface suggestions.

My main constraints were:

  1. Add it to solver::Direct as an additional option, and not as a separate class. Now extracted into extensions, but still available for the user as it is linked as a separate library to ginkgo, and also available through the JSON config.
  2. cuDSS does not provide access to its factorizations, so the factorization and solve is bundled, but refactorization on the same sparsity pattern is supported through refactorize()
  3. AFAIK, cuDSS needs column-major contiguous dense vectors, which was a bit annoying, so we need to unwrap and solve column-by-column for multi-rhs.

@pratikvn pratikvn self-assigned this Apr 5, 2026
@pratikvn pratikvn added mod:cuda This is related to the CUDA module. type:solver This is related to the solvers 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Apr 5, 2026
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:all This touches all Ginkgo modules. labels Apr 5, 2026
Comment thread core/solver/direct.cpp Outdated
Comment thread cuda/CMakeLists.txt Outdated
Comment thread include/ginkgo/core/solver/direct.hpp Outdated
@pratikvn pratikvn changed the base branch from feat/row-scatterer to develop April 15, 2026 11:56
@pratikvn pratikvn requested a review from yhmtsai April 15, 2026 11:56
@pratikvn pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Apr 15, 2026
@pratikvn
Copy link
Copy Markdown
Member Author

cuDSS is now only a part of extensions, but linked to Ginkgo, and also available through the JSON config.

Copy link
Copy Markdown
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not full finished yet

Comment thread extensions/cuda/solver/cudss.cpp Outdated


template <typename ValueType, typename IndexType>
void CuDss<ValueType, IndexType>::refactorize(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cudss?
CuDss will be cu_dss if we following some trend of naming scheme in Ginkgo

Comment thread extensions/cuda/CMakeLists.txt Outdated
Comment thread examples/CMakeLists.txt Outdated
Comment thread extensions/test/cuda/CMakeLists.txt Outdated
Comment thread extensions/test/cuda/CMakeLists.txt Outdated
Comment thread extensions/cuda/solver/cudss.cpp
Comment thread extensions/cuda/solver/cudss.cpp Outdated
Comment thread cmake/Modules/FindcuDSS.cmake Outdated
Comment thread cmake/Modules/FindcuDSS.cmake Outdated
Comment thread extensions/cuda/solver/cudss.cpp
Comment thread include/ginkgo/extensions/cuda/solver/cudss.hpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp
Comment thread extensions/test/cuda/solver/cudss.cpp
Comment thread cmake/Modules/FindcuDSS.cmake Outdated
Comment thread extensions/cuda/solver/cudss.cpp
Comment thread extensions/cuda/solver/cudss.cpp Outdated
Comment thread extensions/cuda/solver/cudss.cpp Outdated
Comment thread extensions/cuda/solver/cudss.cpp Outdated
Comment thread extensions/cuda/solver/cudss.cpp Outdated
Comment thread extensions/cuda/solver/cudss.cpp Outdated
auto mut_b = const_cast<std::remove_const_t<
std::remove_pointer_t<decltype(dense_b)>>*>(dense_b);
for (size_type j = 0; j < nrhs; ++j) {
mut_b->create_submatrix(span{0, nrows}, span{j, j + 1})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised at we do not have const version

Comment thread include/ginkgo/extensions/cuda/solver/cudss.hpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
@pratikvn pratikvn requested a review from yhmtsai April 21, 2026 11:15
Comment thread extensions/cuda/CMakeLists.txt Outdated
Comment on lines +25 to +27
# Absorb into the umbrella ginkgo target (same pattern as ginkgo_cuda etc.)
# extensions/ is add_subdirectory'd after core/, so the ginkgo target exists.
target_link_libraries(ginkgo PUBLIC ginkgo_cudss)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this into ginkgo target directly?
It sounds like optional feature such that we include it with Ginkgo.
another approach is only to have ginkgo_cudss target as extension. Users add it when necessary.

Comment thread extensions/test/cuda/solver/cudss.cpp
Comment on lines +213 to +220
d_wide_output->create_submatrix(gko::span{0, nrows}, gko::span{1, 4});

ref_solver->apply(this->input, this->output);

const auto input_stride_before = strided_input->get_stride();
const auto output_stride_before = strided_output->get_stride();

cudss_solver->apply(strided_input, strided_output);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d_wide_output->create_submatrix(gko::span{0, nrows}, gko::span{1, 4});
ref_solver->apply(this->input, this->output);
const auto input_stride_before = strided_input->get_stride();
const auto output_stride_before = strided_output->get_stride();
cudss_solver->apply(strided_input, strided_output);
d_wide_output->create_submatrix(gko::span{0, nrows}, gko::span{1, 4});
const auto input_stride_before = strided_input->get_stride();
const auto output_stride_before = strided_output->get_stride();
ref_solver->apply(this->input, this->output);
cudss_solver->apply(strided_input, strided_output);

Comment thread extensions/test/cuda/solver/cudss.cpp
Comment thread extensions/test/cuda/solver/cudss.cpp
Comment thread extensions/test/cuda/solver/cudss.cpp Outdated
Comment thread extensions/cuda/solver/cudss.cpp Outdated
ValueType* x_buf = nullptr;

if (b_strided) {
cudaMalloc(&b_buf, nrows * sizeof(ValueType));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudaMalloc use ginkgo dense allocation

Comment thread extensions/cuda/solver/cudss.cpp Outdated
Comment on lines +294 to +297
cudaMemcpy2D(
b_buf, sizeof(ValueType), dense_b->get_const_values(),
dense_b->get_stride() * sizeof(ValueType),
sizeof(ValueType), nrows, cudaMemcpyDeviceToDevice);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer use ginkgo view copy. At least, this needs to be async version with stream

Comment thread extensions/cuda/solver/cudss.cpp Outdated
}
if (x_strided) {
cudaMalloc(&x_buf, nrows * sizeof(ValueType));
cudaMemset(x_buf, 0, nrows * sizeof(ValueType));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by dense fill

@pratikvn pratikvn requested a review from yhmtsai May 4, 2026 10:21
Copy link
Copy Markdown
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test needs to follow AAA pattern and compare with solution not ref_solver

Comment thread extensions/test/cuda/solver/cudss.cpp
@pratikvn pratikvn requested a review from yhmtsai May 4, 2026 12:53
Copy link
Copy Markdown
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels May 4, 2026
@pratikvn pratikvn merged commit 8e9859e into develop May 5, 2026
14 of 21 checks passed
@pratikvn pratikvn deleted the feat/add-cudss branch May 5, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-to-merge This PR is ready to merge. mod:all This touches all Ginkgo modules. mod:cuda This is related to the CUDA module. reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants