-
Notifications
You must be signed in to change notification settings - Fork 294
[cleanup][4/x] unify weight casting #1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1481
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9f15418 with merge base 12396c6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5d789dd ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 95d7c47 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 03f6e99 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 27996f8 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e3da31f ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e3da31f ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d013d85 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
|
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4a90cf3 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 069a0fc ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 069a0fc ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 069a0fc ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary:
Removes redundant logic for weight casting
Performance/peak_mem on torchtitan llama 3 8B on 8 NVIDIA H100 GPUs:
before this PR stack, every experiment has float8 + compile
ac: selective(op)
ac: none
after this PR stack
ac: selective(op)
ac: none
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: