Skip to content

Fix GEMMT transforming the input array B in some complex cases #5143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions interface/gemmt.c
Original file line number Diff line number Diff line change
Expand Up @@ -688,5 +688,19 @@ void CNAME(enum CBLAS_ORDER order, enum CBLAS_UPLO Uplo,

IDEBUG_END;

/* transform B back if necessary */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-frbg Not sure if I'm interpreting the overall code correctly, but shouldn't B not be written to at all?

While GEMMT is not part of standard BLAS and only exists as an "extension", it does have a definition within MKL, where B is marked as "const". In other words, writing to B should never happen within GEMMT (ie. no in-place operations on B).

https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2025-0/cblas-gemmt.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the promise is only that the inputs will be unchanged on exit, IMHO what happens behind closed doors is implementation-defined. And the current implementation is inefficient and needs to be redone eventually. It was only a quick, GEMV-based hack to provide any support for GEMMT at all when the Reference implementation did not yet have it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's better to have something rather than nothing. Looking purely at the implementation side, writing to B would still be ill-advised as it's promised as "const" by MKL. One possible case is that B is shared with another thread (say in a non-OpenBLAS function), which expects it unchanged. There may also be a possible performance degradation due to cache pollution stemming from unnecessary writes to B.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I say the entire implementation is an ugly hack and needs to be replaced - but that is another can of worms, namely #4921, and I can only do so much in a day (and sometimes not even that)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if defined(COMPLEX)
if (transb > 1){
#ifndef CBLAS
IMATCOPY_K_CNC(nrowb, ncolb, (FLOAT)(1.0), (FLOAT)(0.0), b, ldb);
#else
if (order == CblasColMajor)
IMATCOPY_K_CNC(nrowb, ncolb, (FLOAT)(1.0), (FLOAT)(0.0), b, ldb);
if (order == CblasRowMajor)
IMATCOPY_K_RNC(nrowb, ncolb, (FLOAT)(1.0), (FLOAT)(0.0), b, ldb);
#endif
}
#endif

return;
}
Loading