Skip to content

Commit 396ebff

Browse files
authored
Fix invalid GSoC23 links (#56)
* Fix invalid links * Fix size of subtitle
1 parent 96a5397 commit 396ebff

File tree

1 file changed

+10
-12
lines changed

1 file changed

+10
-12
lines changed

outreach/gsoc/2023/gpu-acceleration-in-trixi-jl-using-cuda-jl.md

+10-12
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
- Mentee: [Huiyu Xie](https://github.com/huiyuxie)
88
- Mentors: [Hendrik Ranocha](https://github.com/ranocha) and [Michael Schlottke-Lakemper](https://github.com/sloede)
9-
- Project Link: [https://github.com/huiyuxie/trixi\_cuda](https://github.com/huiyuxie/trixi_cuda)
9+
- Project Link: [https://github.com/huiyuxie/trixi\_cuda](https://github.com/czha/TrixiGPU.jl/tree/legacy)
1010

1111
The goal of this GSoC project was to accelerate Trixi.jl using GPUs.
1212

@@ -24,14 +24,14 @@ The project was focused on enhancing the [Trixi.jl](https://github.com/trixi-fra
2424

2525
Please note that the third step was planned but remains incomplete due to time constraints and this step will be completed in the future if possible.
2626

27-
#### How to Setup
28-
This project was entirely set up and tested on Amazon Web Services (AWS), and the instance type chosen was `p3.2xlarge` (see [the link](https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing) for more details). Here is the link to the specific information of both [CPU and GPU](https://github.com/huiyuxie/trixi_cuda/blob/main/docs/env_info.md) used for this project. Note that this project is reproducible by following the setup instructions provided link aboout [how to set up environment](https://github.com/huiyuxie/trixi_cuda/blob/main/docs/project_setup.md). Also, for individuals without an Nvidia GPU but interested in experimenting with CUDA, here is a link detailing how to [set up a cloud GPU on AWS](https://github.com/huiyuxie/trixi_cuda/blob/main/docs/aws_gpu_setup.md).
27+
### How to Setup
28+
This project was entirely set up and tested on Amazon Web Services (AWS), and the instance type chosen was [`p3.2xlarge`](https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing). Here is the link to the specific information of both [CPU and GPU](https://github.com/czha/TrixiGPU.jl/blob/legacy/docs/env_info.md) used for this project. Note that this project is reproducible by following the setup instructions provided link aboout [how to set up environment](https://github.com/czha/TrixiGPU.jl/blob/legacy/docs/project_setup.md). Also, for individuals without an Nvidia GPU but interested in experimenting with CUDA, here is a link detailing how to [set up a cloud GPU on AWS](https://github.com/czha/TrixiGPU.jl/blob/legacy/docs/aws_gpu_setup.md).
2929

3030

3131
## Key Highlights
32-
The overview of the project repository can be accessed through this [README](https://github.com/huiyuxie/trixi_cuda) file. Here is a detailed description of the highlights of this project.
32+
The overview of the project repository can be accessed through this [README.md](https://github.com/czha/TrixiGPU.jl/blob/legacy/README.md) file. Here is a detailed description of the highlights of this project.
3333

34-
#### 1. Kernel Prototyping
34+
### 1. Kernel Prototyping
3535
Several function (kernel) naming rules were applied in the kernel prototyping process:
3636
- The functions for GPU kernel parallel computing must end with `_kernel`
3737
- The functions for calling the GPU kernels must begin with `cuda_`
@@ -47,7 +47,7 @@ Based on these points, the work began with `dg_1d.jl`, and then extended to `dg_
4747
- GPU parallel computing can run into race conditions ([Issue #5](https://github.com/huiyuxie/trixi_cuda/issues/5))
4848
- The `Float32` type can be promoted to `Float64` type in the GPU computing process ([Issue #3](https://github.com/huiyuxie/trixi_cuda/issues/3) and [PR #1604](https://github.com/trixi-framework/Trixi.jl/pull/1604))
4949

50-
#### 2. Kernel Configuration
50+
### 2. Kernel Configuration
5151
The GPU kernels were designed to be launched with the appropriate size of threads and blocks. The occupancy API `CUDA.launch_configuration` was used to create kernel configurator functions for 1D, 2D, and 3D kernels (i.e., `configurator_1d`, `configurator_2d`, and `configurator_3d`).
5252

5353
Specifically, in kernel configurator functions, `CUDA.launch_configuration` would first return a suggested number of threads for the compiled but not yet run kernel, and then the number of blocks would be computed through dividing the corresponding array size by the number of threads.
@@ -66,7 +66,7 @@ julia> attribute(device(),CUDA.DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK) 1024
6666
```
6767
the kernel could be addressed in the crrent GPU version but may not in some other GPU versions (as different GPU gives different attribute data like `CUDA.DEVICE_ATTRIBUTE_MAX_GRID_DIM_X` and `CUDA.DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK`). So it was suggested to introduce the use of a stride loop for the current GPU kernels.
6868

69-
#### 3. Kernel Optimization
69+
### 3. Kernel Optimization
7070
Some work on kernel optimization has already been done during the process of kernel prototyping, such as avoiding the use of conditional branches and minimizing kernel calls. But the general work for kernel optimization has not yet been introdued (so this part is somewhat related to the future work).
7171

7272
In summary, the kernel optimization should be based on kernel benchmarks and kernel profiling, and here are some factors that can be considered to improve performance:
@@ -75,9 +75,9 @@ In summary, the kernel optimization should be based on kernel benchmarks and ker
7575
- Multi-GPU/Multi-Thread: The performance can be further improved if multiple GPUs or multiple threads are used.
7676

7777
## Performance Benchmarks
78-
The performance benchmarks were conducted for both CPU and GPU on `Float64` and `Float32` types, respectively. The example files `elixir_advection_basic.jl`, `elixir_euler_ec.jl`, and `elixir_euler_source_terms.jl` were chosen from `tree_1d_dgsem`, `tree_2d_dgsem`, and `tree_3d_dgsem` under the `src/examples` directory. These examples were chosen because they are consistent in case of 1D, 2D, and 3D. Please note that all the examples have passed the accuracy tests and you can check them using this [link to examples](https://github.com/huiyuxie/trixi_cuda/tree/main/cuda_julia/examples).
78+
The performance benchmarks were conducted for both CPU and GPU on `Float64` and `Float32` types, respectively. The example files `elixir_advection_basic.jl`, `elixir_euler_ec.jl`, and `elixir_euler_source_terms.jl` were chosen from `tree_1d_dgsem`, `tree_2d_dgsem`, and `tree_3d_dgsem` under the `src/examples` directory. These examples were chosen because they are consistent in case of 1D, 2D, and 3D. Please note that all the examples have passed the accuracy tests and you can check them using this [link to examples](https://github.com/czha/TrixiGPU.jl/tree/legacy/src/examples).
7979

80-
The benchmark results were archived in another file and please use this [link to benchmarks](https://github.com/huiyuxie/trixi_cuda/blob/main/docs/cuda_benchmarks.md) to check them. Also note that the benchmarks were focuesd on the time integration part (i.e., on `OrdinaryDiffEq.solve`), see a benchmark exmaple below
80+
The benchmark results were archived in another file and please use this [link to benchmarks](https://github.com/czha/TrixiGPU.jl/blob/legacy/docs/cuda_benchmark.md) to check them. Also note that the benchmarks were focuesd on the time integration part (i.e., on `OrdinaryDiffEq.solve`), see a benchmark exmaple below
8181
```Julia
8282
# Run on CPU
8383
@benchmark begin
@@ -99,7 +99,7 @@ In addition, the results indicate that the GPU performs better with 2D and 3D ex
9999
## Future Work
100100
The future work is listed here, ranging from specific to more general, from top to bottom:
101101
1. Resolve [Issue #9](https://github.com/huiyuxie/trixi_cuda/issues/9) and [Issue #11](https://github.com/huiyuxie/trixi_cuda/issues/11) (and any upcoming issues)
102-
2. Complete the prototype for the remaining kernels (please refer to the Kernel to be Implemented from the [README](https://github.com/huiyuxie/trixi_cuda/blob/main/README.md) file).
102+
2. Complete the prototype for the remaining kernels (please refer to the Kernel to be Implemented from the [README.md](https://github.com/czha/TrixiGPU.jl/blob/legacy/README.md) file).
103103
3. Update [PR #1604](https://github.com/trixi-framework/Trixi.jl/pull/1604) and make it merged into the repository
104104
4. Optimize CUDA kernels to improve performance (especially data transfer, please refer to the kernel optimization part)
105105
5. Prototype the GPU kernels for other DG solvers (for example, `DGMulti`, etc.)
@@ -113,5 +113,3 @@ Special thanks go to my GSoC mentor [Hendrik Ranocha](https://github.com/ranocha
113113
Tim Besard](https://github.com/maleadt) (@maleadt, though he is not my mentor), whose guidance and support throughout our regular discussions have been instrumental in answering my questions and overcoming hurdles. The Julia community is incredibly welcoming and supportive, and I am proud to have been a part of this endeavor.
114114

115115
I am filled with appreciation for this fantastic summer of learning and development, and I look forward to seeing the continued growth of Julia and the contributions of its vibrant community.
116-
117-

0 commit comments

Comments
 (0)