Skip to content

unprivileged/integrated-matrix: Add arithmetic considerations section#4

Merged
ptomsich merged 1 commit intoriscv:integrated-matrix-extensionfrom
ptomsich:ptomsich/arithmetic-considerations
Feb 28, 2026
Merged

unprivileged/integrated-matrix: Add arithmetic considerations section#4
ptomsich merged 1 commit intoriscv:integrated-matrix-extensionfrom
ptomsich:ptomsich/arithmetic-considerations

Conversation

@ptomsich
Copy link
Copy Markdown
Collaborator

Specify when intermediate rounding is permitted during the K_eff-deep accumulation of a matrix multiply-accumulate instruction.

For widening instructions (W=2, W=4), define the sub-dot-product as the W products of (SEW/W)-bit elements within one SEW-wide slot and note that each individual product is exact at SEW precision.

For floating-point: the implementation partitions the λ×LMUL sub-dot-products into groups of G (power-of-two, 1 ≤ G ≤ λ), accumulates each group at ≥ 2×SEW internal precision, then rounds once and adds to C. G is implementation-defined, allowing both systolic (G=1) and outer-product (G ≈ λ) datapaths. Bit-exact reproducibility across implementations is explicitly not guaranteed.

For integer: modular (wrapping) arithmetic makes the result uniquely defined regardless of accumulation order.

@efocht-oct
Copy link
Copy Markdown

efocht-oct commented Feb 23, 2026

Looks good to me, mostly. I would drop the "accumulates each group at ≥ 2×SEW internal precision" mentioned above, it doesn't need 2SEW internal precision for accumulating the groups, IMO. It is probably 2SEW/W or something like that.

@ptomsich
Copy link
Copy Markdown
Collaborator Author

Looks good to me, mostly. I would drop the "accumulates each group at ≥ 2×SEW internal precision" mentioned above, it doesn't need 2_SEW internal precision for accumulating the groups, IMO. It is probably 2_SEW/W or something like that.

No worries, this is only left in the commit-message, but the specification text already has this removed.

Copy link
Copy Markdown
Collaborator

@joseemoreira joseemoreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with committing this for now. I need to think more about what is written and see if there is something that is not covered. I am particularly concerned with the following sentence:

"Within each group, the G partial results are accumulated using internal precision that requires no rounding to SEW precision inside a group."

I am afraid that this does not cover the common practice for bf16/fp16 inputs and fp32 output.

I also want to include language that specifies what an implementor must say about the implementation to faciliate the process of compliance testing.

@joseemoreira
Copy link
Copy Markdown
Collaborator

joseemoreira commented Feb 28, 2026 via email

Specify when intermediate rounding is permitted during the K_eff-deep
accumulation of a matrix multiply-accumulate instruction.

For widening instructions (W=2, W=4), define the sub-dot-product as
the W products of (SEW/W)-bit elements within one SEW-wide slot and
note that each individual product is exact at SEW precision.

For floating-point: the implementation partitions the λ×LMUL
sub-dot-products into groups of G (power-of-two, 1 ≤ G ≤ λ),
accumulates each group at ≥ 2×SEW internal precision, then rounds
once and adds to C.  G is implementation-defined, allowing both
systolic (G=1) and outer-product (G ≈ λ) datapaths.  Bit-exact
reproducibility across implementations is explicitly not guaranteed.

For integer: modular (wrapping) arithmetic makes the result uniquely
defined regardless of accumulation order.
@ptomsich ptomsich force-pushed the ptomsich/arithmetic-considerations branch from 8e56759 to ebdf18e Compare February 28, 2026 21:10
@ptomsich ptomsich merged commit 0a58e11 into riscv:integrated-matrix-extension Feb 28, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants