Multicolor (independent set) reordering capability on reference executor#2006
Multicolor (independent set) reordering capability on reference executor#2006Slaedr wants to merge 7 commits into
Conversation
|
|
||
| multicolor_reorder( | ||
| matrix.get(), color_ptrs_, permutation_->get_permutation(), | ||
| inv_permutation_ ? inv_permutation_->get_permutation() : nullptr); |
There was a problem hiding this comment.
In the reference kernel atleast, you always seem to compute the inverse permutation, so this would not work.
There was a problem hiding this comment.
I'll keep the public option, but throw NOT_IMPLEMENTED if the user requests construct_inverse_permutation to be false.
yhmtsai
left a comment
There was a problem hiding this comment.
you also need to modify test_install.cpp
|
@yhmtsai None of the other reorder classes are there in test_install.cpp. Should I still add a check for building a Multicolor object there? |
pratikvn
left a comment
There was a problem hiding this comment.
some more comments. But mostly looks good. One general question from my side: Does this reorder setup suit your workflow ? Meaning can you make use of this in your Gauss-Seidel ?
| } | ||
|
|
||
|
|
||
| TEST(MatrixGenerator, GeneratesLaplace3d27pointMatrixData) |
There was a problem hiding this comment.
This test is a little elaborate and hard to read. Maybe you can have a .mtx file generated from MATLAB/Python with the required input, and read it and just compare element by element ?
There was a problem hiding this comment.
Are you sure would not rather have a function to generate a 3D stencil matrix of whatever size you want? If so, I guess I would keep mtx files for 4x4 and 4x4x4 grids.
There was a problem hiding this comment.
There's another consideration: for the omp and cuda/hip tests, I'd rather use a larger matrix. For that reason, I would prefer to have programmatic generation. I could try to go over the code again to make it clearer.
| * The first entry is always 0, since the first color always starts at 0. | ||
| * The last entry stores the total number of rows. | ||
| */ | ||
| std::vector<index_type> get_color_pointers() const { return color_ptrs_; } |
There was a problem hiding this comment.
It might make sense to have this as a gko::array rather than a std::vector.
There was a problem hiding this comment.
This makes it explicit that the color_ptrs is always on the host. Do you expect a situation in which someone needs this on the device instead?
There was a problem hiding this comment.
@pratikvn I can change the std::vector to gko::array if you say that is preferable, even though I can't (currently) think of a use case where an application will want the color pointers on the device.
There was a problem hiding this comment.
I think keeping all internal arrays as gko::array objects is better than having some as std::vector. But if you need some methods of std::vector, then it should be fine.
| // assert(permutation.end() == permutation.begin() + | ||
| // color_ptrs[num_colors]); |
There was a problem hiding this comment.
Looks like this was left over from where I initially implemented this a while back. Since permutation is a raw pointer here, it does not make sense. I'll just remove it.
| // assert(permutation.end() == permutation.begin() + | ||
| // color_ptrs[num_colors]); |
There was a problem hiding this comment.
I will uncomment this part for checking?
There was a problem hiding this comment.
Looks like this was left over from where I initially implemented this a while back. Since permutation is a raw pointer here, it does not make sense. I'll just remove it.
| private: | ||
| std::shared_ptr<PermutationMatrix> permutation_; | ||
| std::shared_ptr<PermutationMatrix> inv_permutation_; | ||
| std::vector<index_type> color_ptrs_; |
There was a problem hiding this comment.
should color_ptrs always on host? you use color_ptrs to set up permutation. how do you plan on device?
There was a problem hiding this comment.
For using this in preconditioners etc., we will need it on the host (to control kernel launches and parameters). So even if the multicolor cuda/hip kernel computes this on the device, it will need to be copied to the host.
There was a problem hiding this comment.
okay. then it makes sense to put it in std::vector
| ASSERT_EQ(this->mc_factory->get_executor(), this->exec); | ||
| } | ||
|
|
||
| TYPED_TEST(Multicolor, GeneratesCorrectOrderingWithCsrInput) |
There was a problem hiding this comment.
I think the followings should be in the reference test?
There was a problem hiding this comment.
Since this testing a particular generate, I think it should be in core.
|
@Slaedr , does the approach of having Multicolor as a reordering work for the goal of having this within GaussSeidel ? |
|
@pratikvn Yes. I'm not sure if it's best offered as an option within the Gauss-Seidel class(es), since the matrix needs to be modified to obey the ordering. The best workflow is probably to reorder the matrix and then generate GaussSeidel on the reordered matrix, passing in the color pointers. If the user does not supply the color pointers, I guess we could compute and store a reordered copy of the matrix in the generate step. |
|
@Slaedr , can you please check that you can use reordering as a wrapper for Gauss-Seidel ? If so, then I dont have any other issues with this PR. |
| if (parameters_.construct_inverse_permutation) { | ||
| inv_permutation_ = PermutationMatrix::create(exec, size); | ||
| } else { | ||
| GKO_NOT_IMPLEMENTED; | ||
| } |
There was a problem hiding this comment.
If not constructing an inverse permutation is not allowed, then maybe just remove that factory parameter ?
| const auto local_nrows = num_vertices; | ||
|
|
||
| std::vector<int> color(local_nrows, -1); |
There was a problem hiding this comment.
If we already know the size beforehand, then we should allocate the vector outside the kernel in core.
| * The first entry is always 0, since the first color always starts at 0. | ||
| * The last entry stores the total number of rows. | ||
| */ | ||
| std::vector<index_type> get_color_pointers() const { return color_ptrs_; } |
There was a problem hiding this comment.
I think keeping all internal arrays as gko::array objects is better than having some as std::vector. But if you need some methods of std::vector, then it should be fine.
The aim is to have a parallel GPU-capable multicolor ordering using the JPL algorithm. This PR only has the sequential reference implementation and corresponding tests.
Currently, CSR and SparsityCSR inputs are supported.
Also adds a "cores per SM" entry for NVIDIA GB10 GPU.