-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit db5703d
committed
GPU autoscheduling with Mullapdui2016: the reference implementation
Reverse engineer the GPU scheduling feature as stated in Section 5.4 of
Mullapudi's article:
Mullapudi, Adams, Sharlet, Ragan-Kelley, Fatahalian. Automatically
scheduling Halide image processing pipelines.
ACM Transactions on Graphics, 35(4), 83pp 1–11
https://doi.org/10.1145/2897824.2925952
When `target=cuda` is detected in the code generator command line
arguments, intercept all `vectorize`, `parallel` scheduling calls
requested by the auto-vectorization algorithm and the
auto-parallelization algo with the class `GPUTilingDedup` for deferred
execution.
Implement the class `GPUTilingDedup` to ensure all Halide gpu schedule
calls are idempotent: no matter how many times the Stage is vectorized,
reordered, and then repeated `vectorized, the `gpu_threads()` is called exactly once.
Also, intercept all `split` and `reorder` scheduling calls by
Mullapudi's auto-splitting algorithm.
Implement the clss `GPUTileHelper` to enforce atomic tranaction of the
gpu schedules. If the current stage is `compute_root`, mark all auto-split
inner dimensions as `gpu_threads`, and outer dimensions as `gpu_blocks`.
If the Stage is `compute_at` another Stage, mark all `vectorize`
dimensions as `gpu_threads`.
If auto-splitting of the current stage does not result in any tile,
implement a rudimentary tiling having tile size = vector_length x
parallel_factor.
If Mullapudi does not call any split, vectorize, or parallel schedules,
assume scalar reduction routine. Implement it on the GPU via
`single_thread`.1 parent f11e80d commit db5703dCopy full SHA for db5703d
File tree
1 file changed
+505
-53
lines changedFilter options
- src/autoschedulers/mullapudi2016
1 file changed
+505
-53
lines changed
0 commit comments