Skip to content

Cuda Graph Example #3

@jafioti

Description

@jafioti

Hi! Very exciting project. Is it possible to use these kernels inside cuda graphs? I've gotten llama working end-to-end with cuTile-rs kernels (and ordinary kernels for rope + scatter because of an iota bug) but I cant figure out how to put them all in a cuda graph. Currently I'm bottlenecked to ~75ms per forward pass because of the overhead of launching each kernel one-by-one. An example showing how to use cuTile-rs kernels inside a cuda graph (preferably a mixed cuda graph with normal kernels in there as well) would be very useful.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions