Skip to content

Add Matrix-Vector Product example - 1D distribution #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

AdRi1t
Copy link

@AdRi1t AdRi1t commented Mar 26, 2025

  • Add AXPY example to demonstrate the use of KokkosComm
  • Add KokkosComm_ENABLE_EXAMPLES option (default OFF)
  • Modify KokkosComm::wait_all to bypass std::vector

- Add AXPY example to demonstrate the use of KokkosComm
- Add KokkosComm_ENABLE_EXAMPLES option (default OFF)
- Modify KokkosComm::wait_all to bypass std::vector

Signed-off-by: Adrien Taberner <[email protected]>
@AdRi1t AdRi1t changed the title Dense Distributed Matrix-Vector product - 1D distribution example Add Matrix-Vector Product example - 1D distribution Mar 26, 2025
Signed-off-by: Adrien Taberner <[email protected]>
Copy link
Member

@cedricchevalier19 cedricchevalier19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the added wait_all is not required, and even so, it should be in a separate PR.

{
using ExecSpace = Kokkos::DefaultExecutionSpace;
using CommSpace = KokkosComm::DefaultCommunicationSpace;
using matrix_type = Kokkos::View<double**, Kokkos::LayoutRight, ExecSpace>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Kokkos::LayoutRight mandatory?

I thought that we can have either storage for the local data even if the global matrix is row distributed.

// Initialize A, x, y
Kokkos::parallel_for("Initialize", dim.nb_rows, KOKKOS_LAMBDA(const int i) {
for (int j = 0; j < N; j++) {
A(i, j) = 1.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps put other value than the same everywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector x now corresponds to x = (1,2, ..., N) and the matrix A is filled as A(i,j) = j + 1.0
which means that each element of y is equal to the sum of squares from 1 to N. This is now less trivial, but it can be improved.

RankDims current_dim = dim;

// Communication and computation steps
for (int step = 1; step < size; step++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a step in this algorithm?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's a step of calculations and communications. There's a communication for the next chunk of the distributed x vector. Step 1 performs the calculation on the vector x local to the rank.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, we progress diagonally, starting with the part of x that is assigned to the rank.

// This example demonstrates how to perform a distributed matrix-vector product (A * x = y)
// using KokkosComm. The matrix A is distributed among the ranks by blocks of contiguous rows.
// Each rank owns a part of the vector x and will communicate it with other ranks step by step.
// At each step a node communicates with two other nodes to send and receive data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not doing collective? And can you precise the data you are talking about?

Copy link
Collaborator

@dssgabriel dssgabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick pass, I agree with all of Cedric's remarks.

I have made additional notes regarding the "parsing" of CLI options that I find too convoluted, and the choice of integer types that looks a bit random to me. I would like to have consistent typing, e.g. always use size_t for unsigned stuff and int everywhere else. If you require specific bit widths, make it clear by using the types provided by the cstdint header.

Comment on lines 81 to 93
long N = -1;

for (int i = 0; i < argc; i++) {
if (strcmp(argv[i], "-N") == 0 && i + 1 < argc) {
N = std::atoi(argv[++i]);
}
if (strcmp(argv[i], "-h") == 0) {
std::cout << "KokkosComm dense square matrix-vector product example \n"
<< " Usage: " << argv[0] << " [-N <size>] default size is 2^12" << std::endl;
return 0;
}
}
checkArgs(N);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section of code looks messy and unnecessarily complex to me:

  1. Why is N a long instead of an unsigned integer since in checkArgs you check for strict positivity (N > 1)? I would use size_t.
  2. Why use C standard library functions instead of C++ std::string/string_view comparison operator? String views should be the preferred option since they do not need an allocation.
  3. Why is the for loop starting at 0 instead of 1? argv[0] is always the executable name.
  4. Why use a for loop at all since you only check for one of two possible arguments: -N or -h?

using CommSpace = KokkosComm::DefaultCommunicationSpace;
using matrix_type = Kokkos::View<double**, Kokkos::LayoutRight, ExecSpace>;
using vector_type = Kokkos::View<double*, Kokkos::LayoutRight, ExecSpace>;
using kk_pair = Kokkos::pair<long, long>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use long instead of a plain int here? If you absolutely need a 64-bit wide type, I would prefer to have int64_t explicitly.


// Compute with current data while communication may happen in the background
Kokkos::parallel_for("MatrixVectorProduct", dim.nb_rows, KOKKOS_LAMBDA(const int i) {
for (unsigned int j = 0; j < current_dim.nb_rows; j++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use an unsigned int in this loop but not in the other ones?
I would suggest size_t instead.


// Last step
Kokkos::parallel_for("MatrixVectorProduct tail", dim.nb_rows, KOKKOS_LAMBDA(const int i) {
for (unsigned int j = 0; j < current_dim.nb_rows; j++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here for j.

Comment on lines 32 to 34
unsigned int nb_rows;
unsigned int row_start;
unsigned int row_end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest size_t instead of unsigned int here. If you absolutely need 32-bit wide types, use uint32_t.

Signed-off-by: Adrien Taberner <[email protected]>
std::cout << "KokkosComm dense square matrix-vector product example \n"
<< " Usage: " << argv[0] << " [-N <size>] default size is 2^12" << std::endl;
return 0;
} else if (arg == "-N" && argc > 2) {
N = static_cast<int>(std::stoi(argv[2]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the static_cast is necessary here.

@AdRi1t AdRi1t marked this pull request as draft April 7, 2025 13:48
@AdRi1t AdRi1t requested a review from cedricchevalier19 April 7, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants