-
Notifications
You must be signed in to change notification settings - Fork 139
Description
Describe the bug
When defining local arrays int[] arr = new int[1_000_000], it creates PTX code that is 1,000,000 lines long with compilation time exceeding 10 minutes and consuming over 10 GB of RAM. The output PTX code is megabytes in size, containing a million lines that initialize all array elements to zero, instead of using a loop for initialization!!!
public static void kernel0(Index1D index,ArrayView1D<int,Stride1D.Dense> arr ) { int n = _n_; int[] t = new int[n]; t[0] = arr[index]; for (int i = 0; i < n; i++) { arr[index] += t[i]; } }
//var ptxKernel=(PTXCompiledKernel)k.GetCompiledKernel(); //Console.WriteLine(ptxKernel.Name); //Console.WriteLine(ptxKernel.Info); //Console.WriteLine(ptxKernel.PTXAssembly);
........
........ n lines
add.u64 %rd9971, %rd4, %rd9972;
st.local.b32 [%rd9971], 0;
mul.wide.u32 %rd9973, 9968, 4;
add.u64 %rd9972, %rd4, %rd9973;
st.local.b32 [%rd9972], 0;
mul.wide.u32 %rd9974, 9969, 4;
add.u64 %rd9973, %rd4, %rd9974;
st.local.b32 [%rd9973], 0;
mul.wide.u32 %rd9975, 9970, 4;
add.u64 %rd9974, %rd4, %rd9975;
st.local.b32 [%rd9974], 0;
mul.wide.u32 %rd9976, 9971, 4;
add.u64 %rd9975, %rd4, %rd9976;
st.local.b32 [%rd9975], 0;
mul.wide.u32 %rd9977, 9972, 4;
add.u64 %rd9976, %rd4, %rd9977;
st.local.b32 [%rd9976], 0;
mul.wide.u32 %rd9978, 9973, 4;
add.u64 %rd9977, %rd4, %rd9978;
st.local.b32 [%rd9977], 0;
mul.wide.u32 %rd9979, 9974, 4;
add.u64 %rd9978, %rd4, %rd9979;
st.local.b32 [%rd9978], 0;
.......
.......
Environment
- ILGPU version: [e.g., 1.5.1]
- .NET version: [e.g., .NET 8]
- Operating system: [e.g., Windows 10]
- Hardware (if GPU-related): [e.g., NVIDIA GeForce GTX 1080]
Steps to reproduce
111
Expected behavior
111
Additional context
No response