Gen Composite Operators #1758

jeremylt · 2025-02-19T18:45:02Z

Ok, the idea here is to launch all of the gen suboperators on separate streams and then sync once at the end.

Overall approach:

Loop over all operators

create stream for suboperator
try to run kernel for suboperator
destroy stream

Sync all streams

Return active vec access

Loop over all operators

fallback on any operators that couldn't run

jrwrigh · 2025-02-19T19:32:16Z

Would we want this for shared as well? In cases where e.g. Gen-simplex can't be used and falls back to shared.

Actually, can't this exist entirely outside of Gen vs Shared since it's a composite operator? (And thus doesn't necessarily have to do with how the operators are compiled, but how they are called?)

Probably showing my ignorance on how the GPU backends are setup... 😅

jeremylt · 2025-02-19T19:35:58Z

GPU-shared doesn't have its own operators. GPU-ref holds the operator implementation. The GPU-ref operators and GPU-gen operators don't really play nice together so I don't think we'll be able to easily launch both at the same time with how the code is currently designed.

jeremylt · 2025-02-24T18:00:33Z

@zatkins-dev we need to check perf, but this should help for composite operators where the suboperators that are all of the same basis type (all tensor or all non-tensor)

jeremylt · 2025-02-26T17:38:12Z

This also has a correctness fix, so I'll merge for now and we can perf tune going forward

jeremylt added GPU CPU 0-WIP HIP labels Feb 19, 2025

jeremylt self-assigned this Feb 19, 2025

jeremylt force-pushed the jeremy/gpu-composite branch from df1fedd to ff9bb46 Compare February 19, 2025 18:47

jeremylt force-pushed the jeremy/gpu-composite branch 3 times, most recently from 214a594 to 6399c72 Compare February 24, 2025 17:58

jeremylt added CUDA 1-In Review and removed CPU 0-WIP labels Feb 24, 2025

jeremylt added 4 commits February 24, 2025 12:53

gpu - isolate gen ApplyAdd inner logic

ea04d07

gpu - allow running shared kernels on stream

e9c76bd

op - minor, make Apply call ApplyAdd on composite over subs

58e06b7

gpu - gen ApplyAdd functions

c99afcd

jeremylt force-pushed the jeremy/gpu-composite branch from 6399c72 to 09536b6 Compare February 24, 2025 19:53

gpu - gen put suboperators on separate streams

087855a

jeremylt force-pushed the jeremy/gpu-composite branch from 09536b6 to 087855a Compare February 25, 2025 22:42

gpu - gen should use GetArray over GetArrayWrite

0c8fbee

jeremylt merged commit 6a744a6 into main Feb 26, 2025
28 checks passed

jeremylt deleted the jeremy/gpu-composite branch February 26, 2025 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gen Composite Operators #1758

Gen Composite Operators #1758

Uh oh!

jeremylt commented Feb 19, 2025 •

edited

Loading

Uh oh!

jrwrigh commented Feb 19, 2025

Uh oh!

jeremylt commented Feb 19, 2025

Uh oh!

jeremylt commented Feb 24, 2025

Uh oh!

jeremylt commented Feb 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gen Composite Operators #1758

Gen Composite Operators #1758

Uh oh!

Conversation

jeremylt commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrwrigh commented Feb 19, 2025

Uh oh!

jeremylt commented Feb 19, 2025

Uh oh!

jeremylt commented Feb 24, 2025

Uh oh!

jeremylt commented Feb 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremylt commented Feb 19, 2025 •

edited

Loading