Skip to content

Commit 18dcaee

Browse files
authored
Add support for KA for GPUs (#34)
1 parent cec89e6 commit 18dcaee

File tree

69 files changed

+2246
-126
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+2246
-126
lines changed

Project.toml

+7-3
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
name = "ExaAdmm"
22
uuid = "4d6a948c-1075-4240-a564-361a5d4e22a2"
33
authors = ["Youngdae Kim <[email protected]>", "Kibaek Kim <[email protected]>", "Weiqi Zhang <[email protected]>", "François Pacaud <[email protected]>", "Michel Schanen <[email protected]>"]
4-
version = "0.1.3"
4+
version = "0.2.0"
55

66
[deps]
7+
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
78
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
89
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
910
ExaTron = "28b18bf8-76f9-41ea-81fa-0f922810b349"
1011
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
12+
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
1113
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
1214
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
1315
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
@@ -16,8 +18,10 @@ SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
1618
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
1719

1820
[compat]
21+
AMDGPU = "0.3"
1922
CUDA = "3.4"
20-
ExaTron = "1"
23+
ExaTron = "2"
2124
FileIO = "1.14"
22-
julia = "1.7"
25+
KernelAbstractions = "0.8"
2326
MPI = "0.19"
27+
julia = "1.7"

README.md

+37-14
Original file line numberDiff line numberDiff line change
@@ -8,29 +8,52 @@ ExaAdmm.jl implements the two-level alternating direction method of multipliers
88
The package can be installed in the Julia REPL with the command below:
99

1010
```julia
11-
] ExaAdmm
11+
] add ExaAdmm
1212
```
1313

14-
Running the algorithms on GPU requires Nvidia GPUs with `CUDA.jl`.
14+
Running the algorithms on the GPU requires either NVIDIA GPUs with [`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl) or [`KernelAbstractions.jl`](https://github.com/JuliaGPU/KernelAbstractions.jl) (KA) with the respective device support (e.g., [`AMDGPU.jl`](https://github.com/JuliaGPU/AMDGPU.jl) and `ROCKernels.jl`). Currently, only the ACOPF problem is supported using KA.
1515

1616
## How to run
1717

1818
Currently, `ExaAdmm.jl` supports electrical grid files in the MATLAB format. You can download them from [here](https://github.com/MATPOWER/matpower).
19-
Below shows an example of solving `case1354pegase.m` using `ExaAdmm.jl` on GPUs.
19+
Below shows an example of solving `case1354pegase.m` using `ExaAdmm.jl` on an NVIDIA GPU
2020

2121
```julia
22-
env, mod = ExaAdmm.solve_acopf(
23-
"case1354pegase.m";
24-
rho_pq=1e1,
25-
rho_va=1e3,
26-
outer_iterlim=20,
27-
inner_iterlim=20,
28-
scale=1e-4,
29-
tight_factor=0.99,
30-
use_gpu=true
22+
using ExaAdmm
23+
24+
env, mod = solve_acopf(
25+
"case1354pegase.m";
26+
rho_pq=1e1,
27+
rho_va=1e3,
28+
outer_iterlim=20,
29+
inner_iterlim=20,
30+
scale=1e-4,
31+
tight_factor=0.99,
32+
use_gpu=true,
33+
verbose=1
3134
);
3235
```
33-
36+
and the same example on an AMD GPU:
37+
```julia
38+
using ExaAdmm
39+
using AMDGPU
40+
using ROCKernels
41+
42+
ExaAdmm.KAArray{T}(n::Int, ::ROCDevice) where {T} = ROCArray{T}(undef, n)
43+
44+
env, mod = solve_acopf(
45+
"case1354pegase.m";
46+
rho_pq=1e1,
47+
rho_va=1e3,
48+
outer_iterlim=20,
49+
inner_iterlim=20,
50+
scale=1e-4,
51+
tight_factor=0.99,
52+
use_gpu=true,
53+
ka_device = ROCDevice(),
54+
verbose=1
55+
)
56+
```
3457
The following table shows parameter values we used for solving pegase and ACTIVSg data.
3558

3659
Data | rho_pq | rho_va | scale | obj_scale
@@ -49,7 +72,7 @@ We have used the same `tight_factor=0.99`, `outer_iterlim=20`, and `inner_iterli
4972
- Youngdae Kim and Kibaek Kim. "Accelerated Computation and Tracking of AC Optimal Power Flow Solutions using GPUs" arXiv preprint arXiv:2110.06879, 2021
5073
- Youngdae Kim, François Pacaud, Kibaek Kim, and Mihai Anitescu. "Leveraging GPU batching for scalable nonlinear programming through massive lagrangian decomposition" arXiv preprint arXiv:2106.14995, 2021
5174

52-
## Acknowledgements
75+
## Acknowledgments
5376

5477
This research was supported by the Exascale ComputingProject (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
5578
This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357.

src/ExaAdmm.jl

+26-2
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,24 @@ using LinearAlgebra
77
using SparseArrays
88
using MPI
99
using CUDA
10+
import AMDGPU: ROCArray, has_rocm_gpu
11+
using KernelAbstractions
1012
using ExaTron
1113
using Random
1214

15+
const KA = KernelAbstractions
16+
17+
export solve_acopf
18+
19+
struct KAArray{T} end
20+
1321
include("utils/parse_matpower.jl")
1422
include("utils/opfdata.jl")
1523
include("utils/environment.jl")
1624
include("utils/grid_data.jl")
1725
include("utils/print_statistics.jl")
1826
include("utils/utilities_gpu.jl")
27+
include("utils/utilities_ka.jl")
1928

2029
include("algorithms/admm_two_level.jl")
2130

@@ -44,7 +53,7 @@ include("models/acopf/acopf_admm_update_residual_cpu.jl")
4453
include("models/acopf/acopf_admm_update_lz_cpu.jl")
4554
include("models/acopf/acopf_admm_prepoststep_cpu.jl")
4655

47-
# GPU specific implementation
56+
# CUDA specific implementation
4857
include("models/acopf/acopf_init_solution_gpu.jl")
4958
include("models/acopf/acopf_generator_kernel_gpu.jl")
5059
include("models/acopf/acopf_eval_linelimit_kernel_gpu.jl")
@@ -59,9 +68,25 @@ include("models/acopf/acopf_admm_update_residual_gpu.jl")
5968
include("models/acopf/acopf_admm_update_lz_gpu.jl")
6069
include("models/acopf/acopf_admm_prepoststep_gpu.jl")
6170

71+
# KA specific implementation
72+
include("models/acopf/acopf_init_solution_ka.jl")
73+
include("models/acopf/acopf_generator_kernel_ka.jl")
74+
include("models/acopf/acopf_eval_linelimit_kernel_ka.jl")
75+
include("models/acopf/acopf_tron_linelimit_kernel_ka.jl")
76+
include("models/acopf/acopf_auglag_linelimit_kernel_ka.jl")
77+
include("models/acopf/acopf_bus_kernel_ka.jl")
78+
include("models/acopf/acopf_admm_update_x_ka.jl")
79+
include("models/acopf/acopf_admm_update_xbar_ka.jl")
80+
include("models/acopf/acopf_admm_update_z_ka.jl")
81+
include("models/acopf/acopf_admm_update_l_ka.jl")
82+
include("models/acopf/acopf_admm_update_residual_ka.jl")
83+
include("models/acopf/acopf_admm_update_lz_ka.jl")
84+
include("models/acopf/acopf_admm_prepoststep_ka.jl")
85+
6286
# Rolling horizon
6387
include("models/acopf/acopf_admm_rolling_cpu.jl")
6488
include("models/acopf/acopf_admm_rolling_gpu.jl")
89+
include("models/acopf/acopf_admm_rolling_ka.jl")
6590

6691
# ----------------------------------------
6792
# Multi-period ACOPF implementation
@@ -133,5 +158,4 @@ include("models/mpec/mpec_admm_update_residual_gpu.jl")
133158
include("models/mpec/mpec_admm_update_lz_gpu.jl")
134159
include("models/mpec/mpec_admm_prepoststep_gpu.jl")
135160
=#
136-
137161
end # module

src/algorithms/admm_two_level.jl

+13-13
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
function admm_two_level(
2-
env::AdmmEnv, mod::AbstractOPFModel
2+
env::AdmmEnv, mod::AbstractOPFModel, device::Union{Nothing,KA.GPU}=nothing
33
)
44
par = env.params
55
info = mod.info
@@ -13,7 +13,7 @@ function admm_two_level(
1313
par.beta = par.initial_beta
1414

1515
if par.verbose > 0
16-
admm_update_residual(env, mod)
16+
admm_update_residual(env, mod, device)
1717
@printf("%8s %8s %10s %10s %10s %10s %10s %10s %10s %10s %10s\n",
1818
"Outer", "Inner", "Objval", "AugLag", "PrimRes", "EpsPrimRes",
1919
"DualRes", "||z||", "||Ax+By||", "OuterTol", "Beta")
@@ -27,19 +27,19 @@ function admm_two_level(
2727

2828
overall_time = @timed begin
2929
while info.outer < par.outer_iterlim
30-
admm_increment_outer(env, mod)
31-
admm_outer_prestep(env, mod)
30+
admm_increment_outer(env, mod, device)
31+
admm_outer_prestep(env, mod, device)
3232

33-
admm_increment_reset_inner(env, mod)
33+
admm_increment_reset_inner(env, mod, device)
3434
while info.inner < par.inner_iterlim
3535
admm_increment_inner(env, mod)
36-
admm_inner_prestep(env, mod)
36+
admm_inner_prestep(env, mod, device)
3737

38-
admm_update_x(env, mod)
39-
admm_update_xbar(env, mod)
40-
admm_update_z(env, mod)
41-
admm_update_l(env, mod)
42-
admm_update_residual(env, mod)
38+
admm_update_x(env, mod, device)
39+
admm_update_xbar(env, mod, device)
40+
admm_update_z(env, mod, device)
41+
admm_update_l(env, mod, device)
42+
admm_update_residual(env, mod, device)
4343

4444
info.eps_pri = sqrt_d / (2500*info.outer)
4545

@@ -65,7 +65,7 @@ function admm_two_level(
6565
break
6666
end
6767

68-
admm_update_lz(env, mod)
68+
admm_update_lz(env, mod, device)
6969

7070
if info.norm_z_curr > par.theta*info.norm_z_prev
7171
par.beta = min(par.inc_c*par.beta, 1e24)
@@ -74,7 +74,7 @@ function admm_two_level(
7474
end # @timed
7575

7676
info.time_overall = overall_time.time
77-
admm_poststep(env, mod)
77+
admm_poststep(env, mod, device)
7878

7979
if par.verbose > 0
8080
print_statistics(env, mod)

src/interface/solve_acopf.jl

+21-5
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,33 @@ function solve_acopf(case::String;
22
case_format="matpower",
33
outer_iterlim=20, inner_iterlim=1000, rho_pq=400.0, rho_va=40000.0,
44
obj_scale=1.0, scale=1e-4, storage_ratio=0.0, storage_charge_max=1.0,
5-
use_gpu=false, use_linelimit=true, use_projection=false, tight_factor=1.0,
5+
use_gpu=false, ka_device=nothing, use_linelimit=true, use_projection=false, tight_factor=1.0,
66
outer_eps=2*1e-4, gpu_no=0, verbose=1
77
)
8-
T = Float64; TD = Array{Float64,1}; TI = Array{Int,1}; TM = Array{Float64,2}
9-
if use_gpu
8+
T = Float64
9+
# 1. ka_device = nothing and use_gpu = false, CPU version of the code is used
10+
# 2. ka_device = KA.CPU() and use_gpu = false, CPU version of the code is used, NOT the KA.CPU kernels
11+
# due to nested kernels limitations and no added benefit
12+
# 3. ka_device = nothing and use_gpu = true, use original CUDA.jl kernels
13+
# 4. ka_device is a KA.GPU and use_gpu = true, use KA kernels
14+
if !use_gpu && (isa(ka_device, Nothing) || isa(ka_device, KA.CPU))
15+
TD = Array{Float64,1}; TI = Array{Int,1}; TM = Array{Float64,2}
16+
ka_device = nothing
17+
elseif use_gpu && isa(ka_device, Nothing)
1018
CUDA.device!(gpu_no)
1119
TD = CuArray{Float64,1}; TI = CuArray{Int,1}; TM = CuArray{Float64,2}
20+
elseif use_gpu && isa(ka_device, KA.Device)
21+
if has_cuda_gpu()
22+
TD = CuArray{Float64,1}; TI = CuArray{Int,1}; TM = CuArray{Float64,2}
23+
elseif has_rocm_gpu()
24+
TD = ROCArray{Float64,1}; TI = ROCArray{Int,1}; TM = ROCArray{Float64,2}
25+
end
26+
else
27+
error("Inconsistent device selection use_gpu=$use_gpu and ka_device=$(typepof(ka_device))")
1228
end
1329

1430
env = AdmmEnv{T,TD,TI,TM}(case, rho_pq, rho_va; case_format=case_format,
15-
use_gpu=use_gpu, use_linelimit=use_linelimit,
31+
use_gpu=use_gpu, ka_device=ka_device, use_linelimit=use_linelimit,
1632
use_projection=use_projection, tight_factor=tight_factor, gpu_no=gpu_no,
1733
storage_ratio=storage_ratio, storage_charge_max=storage_charge_max,
1834
verbose=verbose)
@@ -24,7 +40,7 @@ function solve_acopf(case::String;
2440
env.params.outer_iterlim = outer_iterlim
2541
env.params.inner_iterlim = inner_iterlim
2642

27-
admm_two_level(env, mod)
43+
admm_two_level(env, mod, isa(ka_device, KA.CPU) ? nothing : ka_device)
2844

2945
return env, mod
3046
end

src/models/acopf/acopf_admm_increment.jl

+6-3
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ Increment outer iteration counter by one.
33
"""
44
function admm_increment_outer(
55
env::AdmmEnv,
6-
mod::AbstractOPFModel
6+
mod::AbstractOPFModel,
7+
device=nothing
78
)
89
mod.info.outer += 1
910
return
@@ -14,7 +15,8 @@ Reset inner iteration counter to zero.
1415
"""
1516
function admm_increment_reset_inner(
1617
env::AdmmEnv,
17-
mod::AbstractOPFModel
18+
mod::AbstractOPFModel,
19+
device=nothing
1820
)
1921
mod.info.inner = 0
2022
return
@@ -25,7 +27,8 @@ Increment inner iteration counter by one.
2527
"""
2628
function admm_increment_inner(
2729
env::AdmmEnv,
28-
mod::AbstractOPFModel
30+
mod::AbstractOPFModel,
31+
device=nothing
2932
)
3033
mod.info.inner += 1
3134
mod.info.cumul += 1

src/models/acopf/acopf_admm_prepoststep_cpu.jl

+6-3
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ Implement any algorithmic steps required before each outer iteration.
33
"""
44
function admm_outer_prestep(
55
env::AdmmEnv{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}},
6-
mod::AbstractOPFModel{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}}
6+
mod::AbstractOPFModel{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}},
7+
device::Nothing=nothing
78
)
89
sol, info = mod.solution, mod.info
910
info.norm_z_prev = norm(sol.z_curr)
@@ -15,7 +16,8 @@ Implement any algorithmic steps required before each inner iteration.
1516
"""
1617
function admm_inner_prestep(
1718
env::AdmmEnv{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}},
18-
mod::AbstractOPFModel{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}}
19+
mod::AbstractOPFModel{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}},
20+
device::Nothing=nothing
1921
)
2022
sol = mod.solution
2123
sol.z_prev .= sol.z_curr
@@ -27,7 +29,8 @@ Implement any steps required after the algorithm terminates.
2729
"""
2830
function admm_poststep(
2931
env::AdmmEnv{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}},
30-
mod::AbstractOPFModel{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}}
32+
mod::AbstractOPFModel{Float64,Array{Float64,1},Array{Int,1},Array{Float64,2}},
33+
device::Nothing=nothing
3134
)
3235
data, sol, info, grid_data = env.data, mod.solution, mod.info, mod.grid_data
3336

src/models/acopf/acopf_admm_prepoststep_gpu.jl

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
function admm_outer_prestep(
22
env::AdmmEnv{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}},
3-
mod::AbstractOPFModel{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}}
3+
mod::AbstractOPFModel{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}},
4+
device::Nothing=nothing
45
)
56
sol, info = mod.solution, mod.info
67
info.norm_z_prev = CUDA.norm(sol.z_curr)
@@ -9,7 +10,8 @@ end
910

1011
function admm_inner_prestep(
1112
env::AdmmEnv{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}},
12-
mod::AbstractOPFModel{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}}
13+
mod::AbstractOPFModel{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}},
14+
device::Nothing=nothing
1315
)
1416
sol = mod.solution
1517
@cuda threads=64 blocks=(div(mod.nvar-1, 64)+1) copy_data_kernel(mod.nvar, sol.z_prev, sol.z_curr)
@@ -20,7 +22,8 @@ end
2022

2123
function admm_poststep(
2224
env::AdmmEnv{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}},
23-
mod::AbstractOPFModel{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}}
25+
mod::AbstractOPFModel{Float64,CuArray{Float64,1},CuArray{Int,1},CuArray{Float64,2}},
26+
device::Nothing=nothing
2427
)
2528
data, sol, info, grid_data = env.data, mod.solution, mod.info, mod.grid_data
2629

0 commit comments

Comments
 (0)