Skip to content

Conversation

@jeremylt
Copy link
Member

@jeremylt jeremylt commented Feb 19, 2025

Ok, the idea here is to launch all of the gen suboperators on separate streams and then sync once at the end.

Overall approach:

Loop over all operators

  • create stream for suboperator
  • try to run kernel for suboperator
  • destroy stream

Sync all streams

Return active vec access

Loop over all operators

  • fallback on any operators that couldn't run

@jrwrigh
Copy link
Collaborator

jrwrigh commented Feb 19, 2025

Would we want this for shared as well? In cases where e.g. Gen-simplex can't be used and falls back to shared.

Actually, can't this exist entirely outside of Gen vs Shared since it's a composite operator? (And thus doesn't necessarily have to do with how the operators are compiled, but how they are called?)

Probably showing my ignorance on how the GPU backends are setup... 😅

@jeremylt
Copy link
Member Author

GPU-shared doesn't have its own operators. GPU-ref holds the operator implementation. The GPU-ref operators and GPU-gen operators don't really play nice together so I don't think we'll be able to easily launch both at the same time with how the code is currently designed.

@jeremylt jeremylt force-pushed the jeremy/gpu-composite branch 3 times, most recently from 214a594 to 6399c72 Compare February 24, 2025 17:58
@jeremylt
Copy link
Member Author

@zatkins-dev we need to check perf, but this should help for composite operators where the suboperators that are all of the same basis type (all tensor or all non-tensor)

@jeremylt jeremylt force-pushed the jeremy/gpu-composite branch from 6399c72 to 09536b6 Compare February 24, 2025 19:53
@jeremylt jeremylt force-pushed the jeremy/gpu-composite branch from 09536b6 to 087855a Compare February 25, 2025 22:42
@jeremylt
Copy link
Member Author

This also has a correctness fix, so I'll merge for now and we can perf tune going forward

@jeremylt jeremylt merged commit 6a744a6 into main Feb 26, 2025
28 checks passed
@jeremylt jeremylt deleted the jeremy/gpu-composite branch February 26, 2025 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants