Commit db5703d

committed

GPU autoscheduling with Mullapdui2016: the reference implementation

Reverse engineer the GPU scheduling feature as stated in Section 5.4 of Mullapudi's article: Mullapudi, Adams, Sharlet, Ragan-Kelley, Fatahalian. Automatically scheduling Halide image processing pipelines. ACM Transactions on Graphics, 35(4), 83pp 1–11 https://doi.org/10.1145/2897824.2925952 When `target=cuda` is detected in the code generator command line arguments, intercept all `vectorize`, `parallel` scheduling calls requested by the auto-vectorization algorithm and the auto-parallelization algo with the class `GPUTilingDedup` for deferred execution. Implement the class `GPUTilingDedup` to ensure all Halide gpu schedule calls are idempotent: no matter how many times the Stage is vectorized, reordered, and then repeated `vectorized, the `gpu_threads()` is called exactly once. Also, intercept all `split` and `reorder` scheduling calls by Mullapudi's auto-splitting algorithm. Implement the clss `GPUTileHelper` to enforce atomic tranaction of the gpu schedules. If the current stage is `compute_root`, mark all auto-split inner dimensions as `gpu_threads`, and outer dimensions as `gpu_blocks`. If the Stage is `compute_at` another Stage, mark all `vectorize` dimensions as `gpu_threads`. If auto-splitting of the current stage does not result in any tile, implement a rudimentary tiling having tile size = vector_length x parallel_factor. If Mullapudi does not call any split, vectorize, or parallel schedules, assume scalar reduction routine. Implement it on the GPU via `single_thread`.

1 parent f11e80d commit db5703dCopy full SHA for db5703d

1 file changed

+505

-53

lines changed

src/autoschedulers/mullapudi2016
- AutoSchedule.cpp

1 file changed

+505

-53

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit db5703d

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments