File tree Expand file tree Collapse file tree 1 file changed +1
-3
lines changed
src/schemes/fluid/weakly_compressible_sph Expand file tree Collapse file tree 1 file changed +1
-3
lines changed Original file line number Diff line number Diff line change @@ -161,13 +161,11 @@ end
161161
162162# Optimized version for WCSPH with `ContinuityDensity` in 3D,
163163# which combines the velocity and density load into one wide load.
164- # This is significantly faster on GPUs.
164+ # This is significantly faster on GPUs than the 4 individual loads of `extract_svector` .
165165@inline function velocity_and_density (v, :: ContinuityDensity ,
166166 :: WeaklyCompressibleSPHSystem{3} , particle)
167167 # Since `v` is stored as a 4 x N matrix, this aligned load extracts one column
168168 # of `v` corresponding to `particle`.
169- # As opposed to `extract_svector`, this will translate to a single wide load instruction
170- # on the GPU, which is faster than 4 separate loads.
171169 # Note that this doesn't work for 2D because it requires a stride of 2^n.
172170 vrho_particle = SIMD. vloada (SIMD. Vec{4 , eltype (v)}, pointer (v, 4 * (particle - 1 ) + 1 ))
173171
You can’t perform that action at this time.
0 commit comments