[FPSAN] Fix TMEM emulation correctness by jeffniu-openai · Pull Request #10542 · triton-lang/triton

jeffniu-openai · 2026-06-08T17:38:36Z

PR description written by Codex

Fix FPSAN TMEM emulation for initialized scratch synchronization, predicated stores, reduction loads, and scale-copy reinterpret views.

Stack

Merge bottom-up:

PR description written by Codex ## Summary - Add multi-CTA support to `ttg.local_gather` and `ttg.local_scatter`. ## Stack Merge bottom-up: - [ ] 👉 #10472 - [ ] #10473 - [ ] #10527 - [ ] #10532 - [ ] #10533 - [ ] #10542 - [ ] #10548

PR description written by Codex ## Summary - Load shared-memory TCGen MMA operands directly instead of round-tripping through global scratch. ## Stack Merge bottom-up: - [x] #10472 - [ ] #10473 (this PR) - [ ] #10527 - [ ] #10532 - [ ] #10533 - [ ] #10542 - [ ] #10548 - [ ] #10561 - [ ] #10559

Rebase PR #10542 onto the updated stack.

PR description written by Codex Load shared MMA operands directly into their result layouts and reuse existing scale shadows, avoiding redundant layout conversions and scale snapshots, and also load accumulator directly into its MMA layout. ## Stack Merge bottom-up: - [x] #10472 - [x] #10473 - [ ] #10527 (this PR) - [ ] #10532 - [ ] #10533 - [ ] #10542 - [ ] #10548 - [ ] #10561 - [ ] #10559

PR description written by Codex Reland of increased fpsan test coverage now that fpsan is faster ## Stack Merge bottom-up: - [x] #10472 - [x] #10473 - [x] #10527 - [ ] #10532 (this PR) - [ ] #10533 - [ ] #10542 - [ ] #10548 - [ ] #10561 - [ ] #10559

PR description written by Codex Optimize i8 decomposition by reordering the dots and eagerly combining into the accumulator-on-the-fly to minimize register pressure, and include a basic subtiling heuristic determined experimentally ## Stack Merge bottom-up: - [x] #10472 - [x] #10473 - [x] #10527 - [x] #10532 - [ ] #10533 (this PR) - [ ] #10542 - [ ] #10548 - [ ] #10561 - [ ] #10559

PR description written by Codex Remove redundant payload sign clears before masked multiplies so NVPTX cannot fold them to `abs.f32`, which quiets signaling NaNs. Add a reduced one-warp regression that fails bitwise on PR #10542’s base and passes with the fix. ## Stack Merge bottom-up: - [x] #10472 - [x] #10473 - [x] #10527 - [x] #10532 - [x] #10533 - [x] #10542 - [ ] #10548 (this PR) - [ ] #10561 - [ ] #10559

jeffniu-openai requested review from peterbell10 and ptillet as code owners June 8, 2026 17:38

pawelszczerbuk approved these changes Jun 9, 2026

View reviewed changes

This was referenced Jun 10, 2026

[GSan][FPSan] Fix sanitizer correctness and performance #10561

Draft

[NVIDIA] Support local gather from multi-CTA subslices #10559

Open

jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch 2 times, most recently from ec6c757 to cd7ba80 Compare June 11, 2026 21:47

jeffniu-openai added a commit that referenced this pull request Jun 11, 2026

[FPSAN] Fix TMEM emulation correctness

5a6c843

Rebase PR #10542 onto the updated stack.

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 09d6ca5 to 5a6c843 Compare June 11, 2026 21:47

jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from cd7ba80 to 58edc77 Compare June 11, 2026 21:57

jeffniu-openai requested review from antiagainst, lezcano and zhanglx13 as code owners June 11, 2026 21:57

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 5a6c843 to 4b569dc Compare June 11, 2026 21:57

jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from 58edc77 to d5b4894 Compare June 11, 2026 22:05

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 4b569dc to 1cbc878 Compare June 11, 2026 22:05

jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from d5b4894 to 3231015 Compare June 11, 2026 23:24

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 1cbc878 to 72efe81 Compare June 11, 2026 23:24

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 72efe81 to 992c380 Compare June 12, 2026 01:02

jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from 3231015 to 6e6e181 Compare June 12, 2026 01:02

jeffniu-openai force-pushed the jeffniu/fpsan-unified-imma branch from 6e6e181 to 471d978 Compare June 12, 2026 01:56

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 992c380 to 6104d1b Compare June 12, 2026 01:56

Base automatically changed from jeffniu/fpsan-unified-imma to main June 12, 2026 02:43

[FPSAN] Fix TMEM emulation correctness

68bf15f

jeffniu-openai force-pushed the jeffniu/fpsan-tmem-correctness branch from 6104d1b to 68bf15f Compare June 12, 2026 02:43

jeffniu-openai merged commit 05f538e into main Jun 12, 2026
19 of 20 checks passed

jeffniu-openai deleted the jeffniu/fpsan-tmem-correctness branch June 12, 2026 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FPSAN] Fix TMEM emulation correctness#10542

[FPSAN] Fix TMEM emulation correctness#10542
jeffniu-openai merged 1 commit into
mainfrom
jeffniu/fpsan-tmem-correctness

jeffniu-openai commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffniu-openai commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stack

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeffniu-openai commented Jun 8, 2026 •

edited

Loading