-
Notifications
You must be signed in to change notification settings - Fork 241
Move CUDA stuff to an extension #4499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Possibly, we should simply implement a CUDA extension in this PR with appropriate organization of the code and get on with the breaking change! tl;dr then after this is merged, anybody doing computations on nvidia GPU has to write using Oceananigans
using CUDA |
@simone-silvestri curious to hear your thoughts |
I think it's a good idea. It provides templates to add new architectures and makes the code completely architecture agnostic. the extra |
69eb545
to
59a441d
Compare
@michel2323 let us know when this is ready for prime time |
4bbb99e
to
3e84145
Compare
isnothing(devices) ? device!(node_rank % ndevices()) : device!(devices[node_rank+1]) | ||
isnothing(devices) ? device!(child_architecture, node_rank % ndevices(child_architecture)) : device!(child_architecture, devices[node_rank+1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glwagner For the failing tests, we have 4 in total
|
I think the main problem is that I haven't figured out what |
To my knowledge that's an ill-formed query. |
@simone-silvestri @glwagner How do you want to proceed with these? Can this be rewritten to only use stuff from |
* versioninfo * dispatch fixes * Enable tests * Field broadcast fix
Finally ready. The documentation breaks due to something unrelated I think. buildkite/oceananigans-distributed isn't run because the code comes from a fork. |
device(a::GPU) = a.device | ||
device!(::CPU, i) = KA.device!(CPU(), i+1) | ||
device!(::CPU) = nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single arg device!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vchuravy we may want to chat, but as far as I can tell there is basically an inconsistency in the concept of "device" in Oceananigans (we use the backend and "device id" interchangeably). It's a mess and we need to clean this up; I was thinking we would do this in a subsequent PR, but another option is to do it prior to this PR...
@michel2323 I don't see a call to device!
in the code you linked. There is a call to switch_device!
-- is that a synonym for device!
?
break | ||
end | ||
end | ||
if backend isa Nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if backend isa Nothing | |
if backend === nothing |
regional_return_values = Vector(undef, length(devs)) | ||
for (r, dev) in enumerate(devs) | ||
switch_device!(dev) | ||
# switch_device!(dev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to switch devices don't you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we plan to discontinue support for multi-device MultiRegion (eg move towards the requirement that all multi-device code uses MPI). I don't think multi-device functionality is tested either...
if isdefined(Main, :CUDA) | ||
try | ||
return versioninfo_with_gpu(GPU()) | ||
catch | ||
return "No GPU device found." | ||
end | ||
else | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is getting close which is exciting! |
@siddharthabishnu can you take a look at the cubed sphere / multi region stuff here, it will affect you |
This PR isolates CUDA into
src/arch_cuda.jl
. This removes any direct CUDA calls in the remaining Oceananigans code base. That feel can either serve as a template for a new GPU architecture or for a future CUDA extension. @vchuravy