Skip to content

[FPSAN] Fix TMEM emulation correctness#10542

Merged
jeffniu-openai merged 1 commit into
mainfrom
jeffniu/fpsan-tmem-correctness
Jun 12, 2026
Merged

[FPSAN] Fix TMEM emulation correctness#10542
jeffniu-openai merged 1 commit into
mainfrom
jeffniu/fpsan-tmem-correctness

Conversation

@jeffniu-openai

@jeffniu-openai jeffniu-openai commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

jeffniu-openai added a commit that referenced this pull request Jun 9, 2026
PR description written by Codex

## Summary
- Add multi-CTA support to `ttg.local_gather` and `ttg.local_scatter`.

## Stack
Merge bottom-up:
- [ ] 👉 #10472
- [ ] #10473
- [ ] #10527
- [ ] #10532
- [ ] #10533
- [ ] #10542
- [ ] #10548
jeffniu-openai added a commit that referenced this pull request Jun 11, 2026
PR description written by Codex

## Summary
- Load shared-memory TCGen MMA operands directly instead of
round-tripping through global scratch.

## Stack
Merge bottom-up:
- [x] #10472
- [ ] #10473 (this PR)
- [ ] #10527
- [ ] #10532
- [ ] #10533
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch 2 times, most recently from ec6c757 to cd7ba80 Compare June 11, 2026 21:47
jeffniu-openai added a commit that referenced this pull request Jun 11, 2026
Rebase PR #10542 onto the updated stack.
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 09d6ca5 to 5a6c843 Compare June 11, 2026 21:47
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from cd7ba80 to 58edc77 Compare June 11, 2026 21:57
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 5a6c843 to 4b569dc Compare June 11, 2026 21:57
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from 58edc77 to d5b4894 Compare June 11, 2026 22:05
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 4b569dc to 1cbc878 Compare June 11, 2026 22:05
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from d5b4894 to 3231015 Compare June 11, 2026 23:24
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 1cbc878 to 72efe81 Compare June 11, 2026 23:24
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Load shared MMA operands directly into their result layouts and reuse
existing scale shadows, avoiding redundant layout conversions and scale
snapshots, and also load accumulator directly into its MMA layout.

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [ ] #10527 (this PR)
- [ ] #10532
- [ ] #10533
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 72efe81 to 992c380 Compare June 12, 2026 01:02
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from 3231015 to 6e6e181 Compare June 12, 2026 01:02
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Reland of increased fpsan test coverage now that fpsan is faster

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [x] #10527
- [ ] #10532 (this PR)
- [ ] #10533
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from 6e6e181 to 471d978 Compare June 12, 2026 01:56
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 992c380 to 6104d1b Compare June 12, 2026 01:56
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Optimize i8 decomposition by reordering the dots and eagerly combining
into the accumulator-on-the-fly to minimize register pressure, and
include a basic subtiling heuristic determined experimentally

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [x] #10527
- [x] #10532
- [ ] #10533 (this PR)
- [ ] #10542
- [ ] #10548
- [ ] #10561
- [ ] #10559
Base automatically changed from jeffniu/fpsan-unified-imma to main June 12, 2026 02:43
@jeffniu-openai jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 6104d1b to 68bf15f Compare June 12, 2026 02:43
@jeffniu-openai jeffniu-openai merged commit 05f538e into main Jun 12, 2026
19 of 20 checks passed
@jeffniu-openai jeffniu-openai deleted the jeffniu/fpsan-tmem-correctness branch June 12, 2026 03:55
jeffniu-openai added a commit that referenced this pull request Jun 12, 2026
PR description written by Codex

Remove redundant payload sign clears before masked multiplies so NVPTX
cannot fold them to `abs.f32`, which quiets signaling NaNs. Add a
reduced one-warp regression that fails bitwise on PR #10542’s base and
passes with the fix.

## Stack
Merge bottom-up:
- [x] #10472
- [x] #10473
- [x] #10527
- [x] #10532
- [x] #10533
- [x] #10542
- [ ] #10548 (this PR)
- [ ] #10561
- [ ] #10559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants