-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Labels
Description
Hi! Very exciting project. Is it possible to use these kernels inside cuda graphs? I've gotten llama working end-to-end with cuTile-rs kernels (and ordinary kernels for rope + scatter because of an iota bug) but I cant figure out how to put them all in a cuda graph. Currently I'm bottlenecked to ~75ms per forward pass because of the overhead of launching each kernel one-by-one. An example showing how to use cuTile-rs kernels inside a cuda graph (preferably a mixed cuda graph with normal kernels in there as well) would be very useful.
Reactions are currently unavailable