-
Notifications
You must be signed in to change notification settings - Fork 11
Add Matrix-Vector Product example - 1D distribution #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Add AXPY example to demonstrate the use of KokkosComm - Add KokkosComm_ENABLE_EXAMPLES option (default OFF) - Modify KokkosComm::wait_all to bypass std::vector Signed-off-by: Adrien Taberner <[email protected]>
Signed-off-by: Adrien Taberner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the added wait_all
is not required, and even so, it should be in a separate PR.
examples/01_mvp/01_mvp.cpp
Outdated
{ | ||
using ExecSpace = Kokkos::DefaultExecutionSpace; | ||
using CommSpace = KokkosComm::DefaultCommunicationSpace; | ||
using matrix_type = Kokkos::View<double**, Kokkos::LayoutRight, ExecSpace>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Kokkos::LayoutRight
mandatory?
I thought that we can have either storage for the local data even if the global matrix is row distributed.
examples/01_mvp/01_mvp.cpp
Outdated
// Initialize A, x, y | ||
Kokkos::parallel_for("Initialize", dim.nb_rows, KOKKOS_LAMBDA(const int i) { | ||
for (int j = 0; j < N; j++) { | ||
A(i, j) = 1.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps put other value than the same everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vector x now corresponds to x = (1,2, ..., N)
and the matrix A is filled as A(i,j) = j + 1.0
which means that each element of y
is equal to the sum of squares from 1 to N. This is now less trivial, but it can be improved.
RankDims current_dim = dim; | ||
|
||
// Communication and computation steps | ||
for (int step = 1; step < size; step++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a step in this algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's a step of calculations and communications. There's a communication for the next chunk of the distributed x vector. Step 1 performs the calculation on the vector x local to the rank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, we progress diagonally, starting with the part of x that is assigned to the rank.
examples/01_mvp/01_mvp.cpp
Outdated
// This example demonstrates how to perform a distributed matrix-vector product (A * x = y) | ||
// using KokkosComm. The matrix A is distributed among the ranks by blocks of contiguous rows. | ||
// Each rank owns a part of the vector x and will communicate it with other ranks step by step. | ||
// At each step a node communicates with two other nodes to send and receive data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not doing collective? And can you precise the data you are talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick pass, I agree with all of Cedric's remarks.
I have made additional notes regarding the "parsing" of CLI options that I find too convoluted, and the choice of integer types that looks a bit random to me. I would like to have consistent typing, e.g. always use size_t
for unsigned stuff and int
everywhere else. If you require specific bit widths, make it clear by using the types provided by the cstdint
header.
examples/01_mvp/01_mvp.cpp
Outdated
long N = -1; | ||
|
||
for (int i = 0; i < argc; i++) { | ||
if (strcmp(argv[i], "-N") == 0 && i + 1 < argc) { | ||
N = std::atoi(argv[++i]); | ||
} | ||
if (strcmp(argv[i], "-h") == 0) { | ||
std::cout << "KokkosComm dense square matrix-vector product example \n" | ||
<< " Usage: " << argv[0] << " [-N <size>] default size is 2^12" << std::endl; | ||
return 0; | ||
} | ||
} | ||
checkArgs(N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section of code looks messy and unnecessarily complex to me:
- Why is
N
along
instead of an unsigned integer since incheckArgs
you check for strict positivity (N > 1
)? I would usesize_t
. - Why use C standard library functions instead of C++
std::string
/string_view
comparison operator? String views should be the preferred option since they do not need an allocation. - Why is the
for
loop starting at 0 instead of 1?argv[0]
is always the executable name. - Why use a
for
loop at all since you only check for one of two possible arguments:-N
or-h
?
examples/01_mvp/01_mvp.cpp
Outdated
using CommSpace = KokkosComm::DefaultCommunicationSpace; | ||
using matrix_type = Kokkos::View<double**, Kokkos::LayoutRight, ExecSpace>; | ||
using vector_type = Kokkos::View<double*, Kokkos::LayoutRight, ExecSpace>; | ||
using kk_pair = Kokkos::pair<long, long>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use long
instead of a plain int
here? If you absolutely need a 64-bit wide type, I would prefer to have int64_t
explicitly.
examples/01_mvp/01_mvp.cpp
Outdated
|
||
// Compute with current data while communication may happen in the background | ||
Kokkos::parallel_for("MatrixVectorProduct", dim.nb_rows, KOKKOS_LAMBDA(const int i) { | ||
for (unsigned int j = 0; j < current_dim.nb_rows; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use an unsigned int
in this loop but not in the other ones?
I would suggest size_t
instead.
examples/01_mvp/01_mvp.cpp
Outdated
|
||
// Last step | ||
Kokkos::parallel_for("MatrixVectorProduct tail", dim.nb_rows, KOKKOS_LAMBDA(const int i) { | ||
for (unsigned int j = 0; j < current_dim.nb_rows; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here for j
.
examples/01_mvp/01_mvp.cpp
Outdated
unsigned int nb_rows; | ||
unsigned int row_start; | ||
unsigned int row_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest size_t
instead of unsigned int
here. If you absolutely need 32-bit wide types, use uint32_t
.
Signed-off-by: Adrien Taberner <[email protected]>
examples/01_mvp/01_mvp.cpp
Outdated
std::cout << "KokkosComm dense square matrix-vector product example \n" | ||
<< " Usage: " << argv[0] << " [-N <size>] default size is 2^12" << std::endl; | ||
return 0; | ||
} else if (arg == "-N" && argc > 2) { | ||
N = static_cast<int>(std::stoi(argv[2])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the static_cast
is necessary here.
Signed-off-by: Adrien Taberner <[email protected]>
Modify KokkosComm::wait_all to bypass std::vector