Skip to content

Releases: trixi-gpu/TrixiCUDA.jl

TrixiCUDA v0.1.0-beta.6

31 Jan 01:19
ddc9c07
Compare
Choose a tag to compare
Pre-release

What's New

  • Now compatible with CUDA.jl 5.3.3 for Julia 1.10 but still not compatible with Julia 1.11

What's Changed

  • Adapt DGSEM to enable caching of its data on GPU by @huiyuxie in #117
  • Fuse setting diagonal elements to zeros in derivative_split into kernels by @huiyuxie in #119
  • Optimize volume integral kernels for shock capturing (less common use) by @huiyuxie in #120
  • Optimize volume integral kernels for shock capturing (frequent use) by @huiyuxie in #121
  • Small optimization patch for flux differencing kernels by @huiyuxie in #122
  • Enable double caching: CPU cache and GPU cache by @huiyuxie in #123
  • Bump compat for Trixi to 0.9.15 by @huiyuxie in #124
  • Update README.md by @huiyuxie in #125
  • Optimize volume integral kernels for shock capturing (frequent use) by @huiyuxie in #126
  • Add check for shared memory size per thread block by @huiyuxie in #127
  • Separate GPU kernels and host functions in different files by @huiyuxie in #128

Full Changelog: v0.1.0-beta.5...v0.1.0-beta.6

TrixiCUDA v0.1.0-beta.5

15 Jan 21:42
3336c7c
Compare
Choose a tag to compare
Pre-release

What's Changed

  • Small changes regarding variable name on tests by @huiyuxie in #88
  • Add scripts for benchmarking and profiling workflows by @huiyuxie in #92
  • Refactor solver and tests to support Float32 computations by @huiyuxie in #94
  • Add one more example to benchmark by @huiyuxie in #95
  • Update README.md by @huiyuxie in #96
  • Combine similar kernels using cooperative groups by @huiyuxie in #97
  • Relax inbounds checking within GPU kernels by @huiyuxie in #99
  • Fuse reset_du! function into volume integral kernels by @huiyuxie in #100
  • Relax inbounds checking in minor GPU kernels by @huiyuxie in #101
  • Optimize volume integral kernels by @huiyuxie in #102
  • Update README.md by @huiyuxie in #103
  • Load package with device property querying by @huiyuxie in #104
  • Optimize volume integral kernel for flux differencing by @huiyuxie in #105
  • Bump crate-ci/typos from 1.28.1 to 1.29.0 by @dependabot in #106
  • Switch to less parallelism to avoid redundant computation by @huiyuxie in #107
  • Remove comments for reset_du! function in tests by @huiyuxie in #108
  • Update some critical comments by @huiyuxie in #111
  • Adapt wrap_array for GPU arrays by @huiyuxie in #112
  • Optimize volume integral kernels for flux differencing by @huiyuxie in #114
  • Optimization patch for volume integral kernels by @huiyuxie in #115
  • Optimize volume integral kernels for larger arrays (less common use) by @huiyuxie in #116

Full Changelog: v0.1.0-beta.4...v0.1.0-beta.5

TrixiCUDA v0.1.0-beta.4

10 Dec 00:41
6956b3c
Compare
Choose a tag to compare
Pre-release

What's Changed

  • Enhance old macros by @huiyuxie in #61
  • Optimize kernel configurators by @huiyuxie in #62
  • Bump crate-ci/typos from 1.24.6 to 1.25.0 by @dependabot in #65
  • Add dependencies to documentation by @huiyuxie in #67
  • Enable documentation build again by @huiyuxie in #69
  • Bump crate-ci/typos from 1.25.0 to 1.27.3 by @dependabot in #75
  • Bump codecov/codecov-action from 4 to 5 by @dependabot in #76
  • Refactor CI workflow for sanity check and readability by @huiyuxie in #78
  • CompatHelper: bump compat for Trixi to 0.9, (keep existing compat) by @github-actions in #79
  • Clean docs build and add docs to CompatHelper by @huiyuxie in #80
  • CompatHelper: add new compat entry for Documenter at version 1 for package docs, (keep existing compat) by @github-actions in #81
  • Remove macOS tests in CI by @huiyuxie in #82
  • Add recent updates to README.md by @huiyuxie in #83
  • Fix small typos from README.md by @huiyuxie in #84
  • Bump crate-ci/typos from 1.27.3 to 1.28.1 by @dependabot in #85
  • Parallelization of compute coefficients functions on GPU by @huiyuxie in #63
  • Set version bounds for Trixi.jl to ensure compatibility by @huiyuxie in #86
  • Extend margin in JuliaFormatter by @huiyuxie in #87

Full Changelog: v0.1.0-beta.3...v0.1.0-beta.4

TrixiCUDA v0.1.0-beta.3

30 Sep 03:36
699f513
Compare
Choose a tag to compare
Pre-release

New step would be implementing the initialization part of DG elements, interfaces, boundaries, and mortars. Also check the indicator failure when coupling the GPU cache in the function call.

What's Changed

Full Changelog: v0.1.0-beta.2...v0.1.0-beta.3

TrixiCUDA v0.1.0-beta.2

19 Sep 15:28
d56bcbb
Compare
Choose a tag to compare
Pre-release

All kernels for tree mesh with DGSEM are completed and here are tasks to do for the next release:

  • Implement cache to store the arrays that are frequently used on GPU
  • Move data transfer outside of the iterative solver
  • Start the process of kernel optimization on GPU
  • *Start the implementation for structured mesh with DGSEM on GPU
  • *Start the implementation for tree mesh initialization on GPU

The last two directions need to be discussed.

What's Changed

  • Bump crate-ci/typos from 1.24.3 to 1.24.5 by @dependabot in #37
  • Add shock capturing with nonconservative_terms::True by @huiyuxie in #36
  • Macro for testing approximate equality for GPU and CPU arrays by @huiyuxie in #38
  • Fix boundary flux kernel with multiple dispatches by @huiyuxie in #39
  • Flux differencing for 3D with nonconservative_terms::True by @huiyuxie in #40
  • Interface flux for 3D with nonconservative_terms::True by @huiyuxie in #41
  • Mortar flux with nonconservative_terms::True by @huiyuxie in #42
  • Shock capturing with nonconservative_terms::True by @huiyuxie in #46

Full Changelog: v0.1.0-beta...v0.1.0-beta.2

TrixiCUDA v0.1.0-beta

09 Sep 09:08
c9185f8
Compare
Choose a tag to compare
TrixiCUDA v0.1.0-beta Pre-release
Pre-release

Kernels left to be implemented:

  • nonconservative_terms::True for flux differencing, interface flux, mortar flux (only 3D, and wait for mutable structs MHD to be fixed)
  • nonconservative_terms::True for shock capturing (1D, 2D, 3D)

Implementation for VolumeIntegralPureLGLFiniteVolume is not necessary as @ranocha suggested.

What's Changed

  • Add mortar flux kernel with nonconservative_terms::False to 2D and 3D by @huiyuxie in #24
  • Update README.md by @huiyuxie in #25
  • Bump crate-ci/typos from 1.24.1 to 1.24.3 by @dependabot in #28
  • Remove unused arguments from kernel function by @huiyuxie in #27
  • Use math expression to enhance performance by @huiyuxie in #30
  • Refactor and add more tests for DGSEM solver with tree mesh by @huiyuxie in #31
  • Add boundary flux kernel with nonconservative_terms::True to 1D and 2D by @huiyuxie in #32
  • Add volume integral kernel with volume_integral::VolumeIntegralShockCapturingHG by @huiyuxie in #34
  • Add more compatible examples and update docs by @huiyuxie in #35

Full Changelog: v0.1.0-alpha...v0.1.0-beta

TrixiCUDA v0.1.0-alpha

29 Aug 02:34
0fb397d
Compare
Choose a tag to compare
Pre-release

Here are some kernels left to be implemented for TreeMesh with DGSEM:

  • calc_mortar_flux!
  • calc_boundary_flux! - nonconservative_terms::True
  • calc_volume_integral! - volume_integral::VolumeIntegralShockCapturingHG
    and volume_integral::VolumeIntegralPureLGLFiniteVolume

What's Changed

  • CompatHelper: add new compat entry for StaticArrays at version 1, (keep existing compat) by @github-actions in #17
  • CompatHelper: add new compat entry for StrideArrays at version 0.1, (keep existing compat) by @github-actions in #18
  • CompatHelper: add new compat entry for SciMLBase at version 2, (keep existing compat) by @github-actions in #19
  • CompatHelper: add new compat entry for SimpleUnPack at version 1, (keep existing compat) by @github-actions in #20
  • Drop unpack and use standard destructing syntax by @ErikQQY in #21
  • Bump crate-ci/typos from 1.23.6 to 1.24.1 by @dependabot in #23

New Contributors

  • @github-actions made their first contribution in #17
  • @ErikQQY made their first contribution in #21
  • @dependabot made their first contribution in #23

Full Changelog: https://github.com/czha/TrixiGPU.jl/commits/v0.1.0-alpha