Conversation
4c59755 to
e85e43a
Compare
e85e43a to
6b22195
Compare
6b22195 to
e51b5b6
Compare
|
Run tests from your local machine with: Output (all tests pass): |
e51b5b6 to
17fe38d
Compare
|
TODO: Add method for shared_memory(::Device, ::Type{T}, dims...) where {T} = Metal.Shared[Static/Dynamic]Array |
17fe38d to
7806711
Compare
If it's ok, I'll add this in a next PR, once I understand what's going on in ClimaCore a little better. |
haakon-e
left a comment
There was a problem hiding this comment.
Comments to assist review
There was a problem hiding this comment.
The updates in this function is mostly formatting changes, with a few exceptions I'll outline below
| max_blocks = grid_size_limit(kernel) | ||
| max_threads_in_block = block_size_limit(block_size, kernel) | ||
|
|
||
| params = ClimaComms._compute_launch_params_simple( |
There was a problem hiding this comment.
Defined _compute_launch_params_simple in src/devices.jl for re-use in the metal extension. Should be equivalent to the existing code
| threads_in_block = min(max_threads_in_block, max_required_threads) | ||
| blocks = cld(n_items, items_in_thread * threads_in_block) | ||
| kernel(; blocks, threads = threads_in_block) | ||
| params = ClimaComms._compute_launch_params_coarsened( |
There was a problem hiding this comment.
Defined _compute_launch_params_coarsened in src/devices.jl for re-use in the metal extension.
There was a problem hiding this comment.
If you open this file next to the ext/ClimaCommsCUDAExt.jl file, they should look pretty identical apart from renaming CUDA to Metal.
There was a problem hiding this comment.
Changed all float types to Float32 so this file is compatible with Metal backend, for testing purposes.
There was a problem hiding this comment.
Changed all float types to Float32 so this file is compatible with Metal backend, for testing purposes, and/or skip Float64 tests.
Purpose
This pull request adds support for Apple's Metal GPU backend to ClimaComms
Content
The main changes include introducing the
MetalDevicetype, implementing Metal-specific device and kernel launch logic, updating device selection and backend loading, and expanding tests to cover Metal. Additionally, the kernel launch parameter computation logic is refactored and centralized for use by both CUDA and Metal backends.Metal backend support:
MetalDevicetype and documentation tosrc/devices.jlto support Apple GPU acceleration.ext/ClimaCommsMetalExt.jl.Device selection and backend loading:
Metalas a valid value forCLIMACOMMS_DEVICE, with appropriate error handling and backend loading insrc/devices.jlandsrc/loading.jl.Kernel launch parameter computation:
_compute_launch_params_simpleand_compute_launch_params_coarsened) insrc/devices.jl, now used by both CUDA and Metal backends.CUDA backend improvements:
ext/ClimaCommsCUDAExt.jlfor consistency with Metal backend, using the new launch parameter helpers and simplifying function definitions.Testing and compatibility:
Float32for compatibility.