Skip to content

Commit b5f6f65

Browse files
committed
Improve comments
1 parent 1719dc1 commit b5f6f65

File tree

1 file changed

+1
-3
lines changed
  • src/schemes/fluid/weakly_compressible_sph

1 file changed

+1
-3
lines changed

src/schemes/fluid/weakly_compressible_sph/rhs.jl

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -162,13 +162,11 @@ end
162162

163163
# Optimized version for WCSPH with `ContinuityDensity` in 3D,
164164
# which combines the velocity and density load into one wide load.
165-
# This is significantly faster on GPUs.
165+
# This is significantly faster on GPUs than the 4 individual loads of `extract_svector`.
166166
@inline function velocity_and_density(v, ::ContinuityDensity,
167167
::WeaklyCompressibleSPHSystem{3}, particle)
168168
# Since `v` is stored as a 4 x N matrix, this aligned load extracts one column
169169
# of `v` corresponding to `particle`.
170-
# As opposed to `extract_svector`, this will translate to a single wide load instruction
171-
# on the GPU, which is faster than 4 separate loads.
172170
# Note that this doesn't work for 2D because it requires a stride of 2^n.
173171
vrho_particle = SIMD.vloada(SIMD.Vec{4, eltype(v)}, pointer(v, 4 * (particle - 1) + 1))
174172

0 commit comments

Comments
 (0)