Skip to content

[FPSAN] Broaden FPSan MMA dtype and minimum-shape coverage#10532

Merged
jeffniu-openai merged 1 commit into
mainfrom
jeffniu/fpsan-mma-coverage-reland
Jun 12, 2026
Merged

[FPSAN] Broaden FPSan MMA dtype and minimum-shape coverage#10532
jeffniu-openai merged 1 commit into
mainfrom
jeffniu/fpsan-mma-coverage-reland

Conversation

jeffniu-openai added a commit that referenced this pull request Jun 9, 2026
PR description written by Codex

## Summary
- Add multi-CTA support to `ttg.local_gather` and `ttg.local_scatter`.

## Stack
Merge bottom-up:
- [ ] 👉 #10472
- [ ] #10473
- [ ] #10527
- [ ] #10532
- [ ] #10533
- [ ] #10542
- [ ] #10548
jeffniu-openai added a commit that referenced this pull request Jun 11, 2026
PR description written by Codex

## Summary
- Load shared-memory TCGen MMA operands directly instead of
round-tripping through global scratch.

## Stack
Merge bottom-up:
- [x] #10472
- [ ] #10473 (this PR)
- [ ] #10527
- [ ] #10532
- [ ] #10533
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-direct-result-layout branch from 041da14 to 3e07ffe Compare June 11, 2026 21:45
@jeffniu-openai jeffniu-openai requested a review from ptillet as a code owner June 11, 2026 21:45
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-mma-coverage-reland branch from d3b1c72 to 1012a6a Compare June 11, 2026 21:46
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-direct-result-layout branch from 3e07ffe to ceff22c Compare June 11, 2026 21:57
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-mma-coverage-reland branch 2 times, most recently from 905271d to 21818ac Compare June 11, 2026 22:05
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-direct-result-layout branch from 2f74a21 to ebb2c60 Compare June 11, 2026 23:24
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-mma-coverage-reland branch from 21818ac to 8ceb83f Compare June 11, 2026 23:24
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Load shared MMA operands directly into their result layouts and reuse
existing scale shadows, avoiding redundant layout conversions and scale
snapshots, and also load accumulator directly into its MMA layout.

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [ ] #10527 (this PR)
- [ ] #10532
- [ ] #10533
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
Base automatically changed from jeffniu/fpsan-direct-result-layout to main June 12, 2026 01:01
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-mma-coverage-reland branch from 8ceb83f to 83c74b9 Compare June 12, 2026 01:02
@jeffniu-openai jeffniu-openai merged commit cc9bfee into main Jun 12, 2026
10 checks passed
@jeffniu-openai jeffniu-openai deleted the jeffniu/fpsan-mma-coverage-reland branch June 12, 2026 01:55
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Optimize i8 decomposition by reordering the dots and eagerly combining
into the accumulator-on-the-fly to minimize register pressure, and
include a basic subtiling heuristic determined experimentally

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [x] #10527
- [x] #10532
- [ ] #10533 (this PR)
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Fix FPSAN TMEM emulation for initialized scratch synchronization,
predicated stores, reduction loads, and scale-copy reinterpret views.

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [x] #10527
- [x] #10532
- [x] #10533
- [ ] #10542 (this PR)
- [ ] #10548
- [ ] #10561
- [ ] #10559
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Remove redundant payload sign clears before masked multiplies so NVPTX
cannot fold them to `abs.f32`, which quiets signaling NaNs. Add a
reduced one-warp regression that fails bitwise on PR #10542’s base and
passes with the fix.

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [x] #10527
- [x] #10532
- [x] #10533
- [x] #10542
- [ ] #10548 (this PR)
- [ ] #10561
- [ ] #10559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants