Skip to content

[FEAT] MatMul API #871

Merged
lnm-at-work merged 10 commits into
mainfrom
matmul-api
Jun 17, 2025
Merged

[FEAT] MatMul API #871
lnm-at-work merged 10 commits into
mainfrom
matmul-api

Conversation

@lnm-at-work

@lnm-at-work lnm-at-work commented Apr 22, 2025

Copy link
Copy Markdown
Contributor

cuda-backend-branch: lisa/matmul


Comment thread icicle/backend/cpu/src/field/cpu_vec_ops.cpp Outdated

@yshekel yshekel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure looks good for generalized mat-mul
I think not what we need in labrador (its Rq elements there in NTT domain) but we can start from that, @LeonHibnik ?

@lnm-at-work lnm-at-work reopened this May 15, 2025
Comment thread icicle/tests/test_matrix_api.h Outdated
Comment thread icicle/tests/test_matrix_api.h Outdated
Comment thread icicle/tests/test_matrix_api.h
Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated
Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated
Comment thread icicle/include/icicle/vec_ops.h Outdated
Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated

@yshekel yshekel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, I left a few comments.

Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated
Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated
Comment thread icicle/tests/test_matrix_api.h Outdated
Comment thread icicle/tests/test_matrix_api.h Outdated

// Process each matrix A in the batch with the same B matrix
for (size_t i = 0; i < batch_size; i++) {
ICICLE_CHECK(matrix_mult(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a batch in this test? The API doesn't support it (and it is not required)

Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated

@yshekel yshekel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, great tests but two problems:
(1) missing cpu implementation
(2) missing tests and code for PolyRing type

@lnm-at-work lnm-at-work marked this pull request as ready for review June 10, 2025 23:50

@LeonHibnik LeonHibnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left some comments.
I think we should move the matmul and transpose implementations from vecops to matops, open to a discussion

Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated
Comment thread icicle/backend/cpu/src/field/cpu_matrix_ops.cpp Outdated
// Naive algorithm using taskflow

// Divide the problem among workers
const int nof_workers = 10; // get_nof_workers(config);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the nof_workers hardcoded?

for (uint32_t j = 0; j < nof_cols_b; j++) {
// Initialize result element
uint64_t a_idx = config.columns_batch ? (i * nof_cols_a + 0) * stride : i * nof_cols_a + 0;
uint64_t b_idx = config.columns_batch ? (0 * nof_cols_b + j) * stride : 0 * nof_cols_b + j;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is nof_cols_b here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legibility, to keep it analogous to the loop body below

for (uint32_t i = row_start; i < row_end; i++) {
for (uint32_t j = 0; j < nof_cols_b; j++) {
// Initialize result element
uint64_t a_idx = config.columns_batch ? (i * nof_cols_a + 0) * stride : i * nof_cols_a + 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the + 0 and add a comment explaining it

Comment thread icicle/include/icicle/backend/vec_ops_backend.h
Comment thread icicle/include/icicle/backend/vec_ops_backend.h
Comment thread icicle/include/icicle/vec_ops.h Outdated
Comment thread icicle/src/matrix_ops.cpp Outdated
Comment thread icicle/src/rings/polyring_vec_ops.cpp Outdated
Comment thread icicle/tests/test_matrix_api.h
Comment thread icicle/tests/test_matrix_api.h Outdated
Comment thread icicle/tests/test_matrix_api.h Outdated
Comment thread icicle/tests/test_matrix_api.h Outdated
@lnm-at-work lnm-at-work requested a review from LeonHibnik June 16, 2025 16:53
@lnm-at-work lnm-at-work merged commit 7b79c44 into main Jun 17, 2025
115 of 302 checks passed
@lnm-at-work lnm-at-work deleted the matmul-api branch June 17, 2025 17:21
LeonHibnik pushed a commit that referenced this pull request Jun 30, 2025
Co-authored-by: LNM <lisa@ingonyama.com>
Co-authored-by: yshekel <yshekel@gmail.com>
@STARRGAXE

Copy link
Copy Markdown

Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants