Add cuDSS support#1999
Conversation
cedd1b8 to
f53796c
Compare
f53796c to
7be0b38
Compare
|
cuDSS is now only a part of extensions, but linked to Ginkgo, and also available through the JSON config. |
25d9de6 to
73b4c43
Compare
|
|
||
|
|
||
| template <typename ValueType, typename IndexType> | ||
| void CuDss<ValueType, IndexType>::refactorize( |
There was a problem hiding this comment.
Cudss?
CuDss will be cu_dss if we following some trend of naming scheme in Ginkgo
d9d7e01 to
b7589f6
Compare
| auto mut_b = const_cast<std::remove_const_t< | ||
| std::remove_pointer_t<decltype(dense_b)>>*>(dense_b); | ||
| for (size_type j = 0; j < nrhs; ++j) { | ||
| mut_b->create_submatrix(span{0, nrows}, span{j, j + 1}) |
There was a problem hiding this comment.
I am surprised at we do not have const version
| # Absorb into the umbrella ginkgo target (same pattern as ginkgo_cuda etc.) | ||
| # extensions/ is add_subdirectory'd after core/, so the ginkgo target exists. | ||
| target_link_libraries(ginkgo PUBLIC ginkgo_cudss) |
There was a problem hiding this comment.
Should we add this into ginkgo target directly?
It sounds like optional feature such that we include it with Ginkgo.
another approach is only to have ginkgo_cudss target as extension. Users add it when necessary.
| d_wide_output->create_submatrix(gko::span{0, nrows}, gko::span{1, 4}); | ||
|
|
||
| ref_solver->apply(this->input, this->output); | ||
|
|
||
| const auto input_stride_before = strided_input->get_stride(); | ||
| const auto output_stride_before = strided_output->get_stride(); | ||
|
|
||
| cudss_solver->apply(strided_input, strided_output); |
There was a problem hiding this comment.
| d_wide_output->create_submatrix(gko::span{0, nrows}, gko::span{1, 4}); | |
| ref_solver->apply(this->input, this->output); | |
| const auto input_stride_before = strided_input->get_stride(); | |
| const auto output_stride_before = strided_output->get_stride(); | |
| cudss_solver->apply(strided_input, strided_output); | |
| d_wide_output->create_submatrix(gko::span{0, nrows}, gko::span{1, 4}); | |
| const auto input_stride_before = strided_input->get_stride(); | |
| const auto output_stride_before = strided_output->get_stride(); | |
| ref_solver->apply(this->input, this->output); | |
| cudss_solver->apply(strided_input, strided_output); |
| ValueType* x_buf = nullptr; | ||
|
|
||
| if (b_strided) { | ||
| cudaMalloc(&b_buf, nrows * sizeof(ValueType)); |
There was a problem hiding this comment.
cudaMalloc use ginkgo dense allocation
| cudaMemcpy2D( | ||
| b_buf, sizeof(ValueType), dense_b->get_const_values(), | ||
| dense_b->get_stride() * sizeof(ValueType), | ||
| sizeof(ValueType), nrows, cudaMemcpyDeviceToDevice); |
There was a problem hiding this comment.
I would also prefer use ginkgo view copy. At least, this needs to be async version with stream
| } | ||
| if (x_strided) { | ||
| cudaMalloc(&x_buf, nrows * sizeof(ValueType)); | ||
| cudaMemset(x_buf, 0, nrows * sizeof(ValueType)); |
Co-authored-by: Yu-Hsiang Tsai<yhmtsai@gmail.com>
yhmtsai
left a comment
There was a problem hiding this comment.
test needs to follow AAA pattern and compare with solution not ref_solver
As our direct solvers are still under development, and we need direct solvers for MICROCARD, I think it would be good to have cuDSS as an option
when usingsolver::Direct.I am open to interface suggestions.
My main constraints were:
Add it toNow extracted intosolver::Directas an additional option, and not as a separate class.extensions, but still available for the user as it is linked as a separate library toginkgo, and also available through the JSON config.refactorize()