Optimize cudaGetDeviceProperties runtime overhead #4209

jiawenliu64 · 2025-05-29T03:23:16Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1284

Further optimize FP8 kernels runtime overhead with cudaGetDeviceProperties by only triggering it once

Before this Diff: Trace

After this Diff: Trace

Differential Revision: D75574880

facebook-github-bot · 2025-05-29T03:23:24Z

This pull request was exported from Phabricator. Differential Revision: D75574880

Summary: X-link: facebookresearch/FBGEMM#1284 Further optimize FP8 kernels runtime overhead with `cudaGetDeviceProperties` by only triggering it once Before this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748487716%2Flocalhost%2Flibkineto_activities_3431969.json.gz&bucket=gpu_traces) After this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748488054%2Flocalhost%2Flibkineto_activities_3821152.json.gz&bucket=gpu_traces) Differential Revision: D75574880

netlify · 2025-05-29T03:43:36Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`27df191`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/6837dd3b4948c6000872e85d
😎 Deploy Preview	https://deploy-preview-4209--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Summary: X-link: facebookresearch/FBGEMM#1284 Further optimize FP8 kernels runtime overhead with `cudaGetDeviceProperties` by only triggering it once Before this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748487716%2Flocalhost%2Flibkineto_activities_3431969.json.gz&bucket=gpu_traces) After this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748488054%2Flocalhost%2Flibkineto_activities_3821152.json.gz&bucket=gpu_traces) Differential Revision: D75574880

facebook-github-bot · 2025-05-29T03:46:40Z

This pull request was exported from Phabricator. Differential Revision: D75574880

Summary: Pull Request resolved: pytorch#4209 X-link: facebookresearch/FBGEMM#1284 Further optimize FP8 kernels runtime overhead with `cudaGetDeviceProperties` by only triggering it once Before this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748487716%2Flocalhost%2Flibkineto_activities_3431969.json.gz&bucket=gpu_traces) After this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748488054%2Flocalhost%2Flibkineto_activities_3821152.json.gz&bucket=gpu_traces) Differential Revision: D75574880

Summary: X-link: facebookresearch/FBGEMM#1284 Further optimize FP8 kernels runtime overhead with `cudaGetDeviceProperties` by only triggering it once Before this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748487716%2Flocalhost%2Flibkineto_activities_3431969.json.gz&bucket=gpu_traces) After this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748488054%2Flocalhost%2Flibkineto_activities_3821152.json.gz&bucket=gpu_traces) Differential Revision: D75574880

facebook-github-bot · 2025-05-29T03:50:22Z

This pull request was exported from Phabricator. Differential Revision: D75574880

Summary: Pull Request resolved: pytorch#4209 X-link: facebookresearch/FBGEMM#1284 Further optimize FP8 kernels runtime overhead with `cudaGetDeviceProperties` by only triggering it once Before this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748487716%2Flocalhost%2Flibkineto_activities_3431969.json.gz&bucket=gpu_traces) After this Diff: [Trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1748488054%2Flocalhost%2Flibkineto_activities_3821152.json.gz&bucket=gpu_traces) Differential Revision: D75574880

facebook-github-bot · 2025-05-29T04:06:09Z

This pull request was exported from Phabricator. Differential Revision: D75574880

facebook-github-bot · 2025-05-29T16:35:50Z

This pull request has been merged in bcde9c1.

facebook-github-bot added cla signed fb-exported labels May 29, 2025

jiawenliu64 force-pushed the export-D75574880 branch from eeb6898 to 2f973e6 Compare May 29, 2025 03:43

jiawenliu64 force-pushed the export-D75574880 branch from 2f973e6 to 184ddc9 Compare May 29, 2025 03:44

jiawenliu64 force-pushed the export-D75574880 branch from 184ddc9 to d9b71a1 Compare May 29, 2025 03:46

jiawenliu64 force-pushed the export-D75574880 branch from d9b71a1 to a2b0e5c Compare May 29, 2025 03:47

jiawenliu64 force-pushed the export-D75574880 branch from a2b0e5c to 7caa61f Compare May 29, 2025 03:50

jiawenliu64 force-pushed the export-D75574880 branch from 7caa61f to 27df191 Compare May 29, 2025 04:06

facebook-github-bot closed this in bcde9c1 May 29, 2025

facebook-github-bot added the Merged label May 29, 2025

gchalump added feature:fp8 feature:genai category:improvement labels May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize cudaGetDeviceProperties runtime overhead #4209

Optimize cudaGetDeviceProperties runtime overhead #4209

Uh oh!

jiawenliu64 commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

netlify bot commented May 29, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

Uh oh!

Optimize cudaGetDeviceProperties runtime overhead #4209

Optimize cudaGetDeviceProperties runtime overhead #4209

Uh oh!

Conversation

jiawenliu64 commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

netlify bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

facebook-github-bot commented May 29, 2025

Uh oh!

Uh oh!

netlify bot commented May 29, 2025 •

edited

Loading