-
Notifications
You must be signed in to change notification settings - Fork 62
Gen Composite Operators #1758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gen Composite Operators #1758
Conversation
df1fedd to
ff9bb46
Compare
|
Would we want this for shared as well? In cases where e.g. Gen-simplex can't be used and falls back to shared. Actually, can't this exist entirely outside of Gen vs Shared since it's a composite operator? (And thus doesn't necessarily have to do with how the operators are compiled, but how they are called?) Probably showing my ignorance on how the GPU backends are setup... 😅 |
|
GPU-shared doesn't have its own operators. GPU-ref holds the operator implementation. The GPU-ref operators and GPU-gen operators don't really play nice together so I don't think we'll be able to easily launch both at the same time with how the code is currently designed. |
214a594 to
6399c72
Compare
|
@zatkins-dev we need to check perf, but this should help for composite operators where the suboperators that are all of the same basis type (all tensor or all non-tensor) |
6399c72 to
09536b6
Compare
09536b6 to
087855a
Compare
|
This also has a correctness fix, so I'll merge for now and we can perf tune going forward |
Ok, the idea here is to launch all of the gen suboperators on separate streams and then sync once at the end.
Overall approach:
Loop over all operators
Sync all streams
Return active vec access
Loop over all operators