-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cleanup][4/x] unify weight casting #1481
base: main
Are you sure you want to change the base?
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1481
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9f15418 with merge base 12396c6 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5d789dd3ea6c508c767907951a38b905a745f3d7 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 95d7c478dcff2d6b1203dae2855a2894d8b1e3d0 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 03f6e9939d866c719c132eb27125540817fc692a ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 27996f8401a77ca2fc5fdf1bb2b200d3b9fd41a7 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e3da31f48634640b1b569228ad3a5d3964860acb ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e3da31f48634640b1b569228ad3a5d3964860acb ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d013d859f3f4230e28207e70b8aafcfd907d5c45 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
|
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4a90cf36ef27f52977b095897cc61df9117d67c3 ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 069a0fcb69df9e24a2fe0e106f2db16b31fa339f ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 069a0fcb69df9e24a2fe0e106f2db16b31fa339f ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary: Not ready for review yet, performance regression because tensorwise abs+max and weight casting is happening twice between fwd and bwd. Limitation of something in PT2 stack? Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 069a0fcb69df9e24a2fe0e106f2db16b31fa339f ghstack-comment-id: 2568319095 Pull Request resolved: #1481
Summary:
Removes redundant logic for weight casting
Performance/peak_mem on torchtitan llama 3 8B on 8 NVIDIA H100 GPUs:
before this PR stack, every experiment has float8 + compile
ac: selective(op)
ac: none
after this PR stack
ac: selective(op)
ac: none
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: