unprivileged/integrated-matrix: Add arithmetic considerations section#4
Conversation
|
Looks good to me, mostly. I would drop the "accumulates each group at ≥ 2×SEW internal precision" mentioned above, it doesn't need 2SEW internal precision for accumulating the groups, IMO. It is probably 2SEW/W or something like that. |
No worries, this is only left in the commit-message, but the specification text already has this removed. |
joseemoreira
left a comment
There was a problem hiding this comment.
I am OK with committing this for now. I need to think more about what is written and see if there is something that is not covered. I am particularly concerned with the following sentence:
"Within each group, the G partial results are accumulated using internal precision that requires no rounding to SEW precision inside a group."
I am afraid that this does not cover the common practice for bf16/fp16 inputs and fp32 output.
I also want to include language that specifies what an implementor must say about the implementation to faciliate the process of compliance testing.
|
Hi Philipp,
Please pardon my ignorance, but I don't see any of your changes to the repository. Github claims that my own "joseemoreira/integrated-matrix-extension" branch is not behind the upstream ("riscv/integrated-matrix-extension"), but I don't see any of your changes. Do you still have to commit them?
This comic summarizes my knowledge of git: https://www.reddit.com/r/xkcd/comments/3qt26k/xkcd_1597_git/.
Jose
…________________________________
From: Philipp Tomsich ***@***.***>
Sent: Monday, February 23, 2026 12:18 PM
To: riscv/integrated-matrix-extension ***@***.***>
Cc: Jose Moreira ***@***.***>; Review requested ***@***.***>
Subject: [EXTERNAL] Re: [riscv/integrated-matrix-extension] unprivileged/integrated-matrix: Add arithmetic considerations section (PR #4)
ptomsich left a comment (riscv/integrated-matrix-extension#4) Looks good to me, mostly. I would drop the "accumulates each group at ≥ 2×SEW internal precision" mentioned above, it doesn't need 2_SEW internal precision for accumulating the groups,
[https://avatars.githubusercontent.com/u/14983582?s=20&v=4]ptomsich left a comment (riscv/integrated-matrix-extension#4)<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_riscv_integrated-2Dmatrix-2Dextension_pull_4-23issuecomment-2D3946121718&d=DwMFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=CGtVG21dwYxrAUKKyZIqf-voRGwP7GZaadGnX9V2piY&m=PR5zgC8vhafy60dXxQjrLRaYflSdYkRH3fJkCkL9aIMLB0MyBXNyg3l7TbVIcpWR&s=MSP903Tv4XzqtghOeaGxE1-16nYcyB04lNFJxe7HF4w&e=>
Looks good to me, mostly. I would drop the "accumulates each group at ≥ 2×SEW internal precision" mentioned above, it doesn't need 2_SEW internal precision for accumulating the groups, IMO. It is probably 2_SEW/W or something like that.
No worries, this is only left in the commit-message, but the specification text already has this removed.
―
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_riscv_integrated-2Dmatrix-2Dextension_pull_4-23issuecomment-2D3946121718&d=DwMFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=CGtVG21dwYxrAUKKyZIqf-voRGwP7GZaadGnX9V2piY&m=PR5zgC8vhafy60dXxQjrLRaYflSdYkRH3fJkCkL9aIMLB0MyBXNyg3l7TbVIcpWR&s=MSP903Tv4XzqtghOeaGxE1-16nYcyB04lNFJxe7HF4w&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AI5FZT27GFT723EGMZSG7KT4NMY5HAVCNFSM6AAAAACV45WDJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSNBWGEZDCNZRHA&d=DwMFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=CGtVG21dwYxrAUKKyZIqf-voRGwP7GZaadGnX9V2piY&m=PR5zgC8vhafy60dXxQjrLRaYflSdYkRH3fJkCkL9aIMLB0MyBXNyg3l7TbVIcpWR&s=74_JyfniWAC5Hj6M98WEuEfiJaRHywEC8Xh7eb-TMoo&e=>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
Specify when intermediate rounding is permitted during the K_eff-deep accumulation of a matrix multiply-accumulate instruction. For widening instructions (W=2, W=4), define the sub-dot-product as the W products of (SEW/W)-bit elements within one SEW-wide slot and note that each individual product is exact at SEW precision. For floating-point: the implementation partitions the λ×LMUL sub-dot-products into groups of G (power-of-two, 1 ≤ G ≤ λ), accumulates each group at ≥ 2×SEW internal precision, then rounds once and adds to C. G is implementation-defined, allowing both systolic (G=1) and outer-product (G ≈ λ) datapaths. Bit-exact reproducibility across implementations is explicitly not guaranteed. For integer: modular (wrapping) arithmetic makes the result uniquely defined regardless of accumulation order.
8e56759 to
ebdf18e
Compare
0a58e11
into
riscv:integrated-matrix-extension
Specify when intermediate rounding is permitted during the K_eff-deep accumulation of a matrix multiply-accumulate instruction.
For widening instructions (W=2, W=4), define the sub-dot-product as the W products of (SEW/W)-bit elements within one SEW-wide slot and note that each individual product is exact at SEW precision.
For floating-point: the implementation partitions the λ×LMUL sub-dot-products into groups of G (power-of-two, 1 ≤ G ≤ λ), accumulates each group at ≥ 2×SEW internal precision, then rounds once and adds to C. G is implementation-defined, allowing both systolic (G=1) and outer-product (G ≈ λ) datapaths. Bit-exact reproducibility across implementations is explicitly not guaranteed.
For integer: modular (wrapping) arithmetic makes the result uniquely defined regardless of accumulation order.