- 
                Notifications
    
You must be signed in to change notification settings  - Fork 68
 
[prototype] CUDA graph capture/replay #5427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
          
 Review updated until commit adc4e08 Description
 Changes walkthrough 📝
 PR Reviewer Guide 🔍Here are some key observations to aid the review process: 
  | 
    
| heuristics_ = std::move(maybe_heuristics.value()); | ||
| 
               | 
          ||
| if (isOptionDisabled(DisableOption::KernelReuse)) { | ||
| // It's safer to use CUDA graph when KernelReuse is disabled. When | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to disable kernel reuse without environment variables and from Python?
This PR is not ready to merge -- too many safety checks are missing and see code comments for details. However, this PR gives an estimate on how much properly enabling CUDA graph in nvFuser can help this particular benchmark, which happens to use bounded dynamic shapes.
Without CUDA graph:
With CUDA graph: