File tree Expand file tree Collapse file tree 1 file changed +1
-3
lines changed
src/schemes/fluid/weakly_compressible_sph Expand file tree Collapse file tree 1 file changed +1
-3
lines changed Original file line number Diff line number Diff line change @@ -162,13 +162,11 @@ end
162162
163163# Optimized version for WCSPH with `ContinuityDensity` in 3D,
164164# which combines the velocity and density load into one wide load.
165- # This is significantly faster on GPUs.
165+ # This is significantly faster on GPUs than the 4 individual loads of `extract_svector` .
166166@inline function velocity_and_density (v, :: ContinuityDensity ,
167167 :: WeaklyCompressibleSPHSystem{3} , particle)
168168 # Since `v` is stored as a 4 x N matrix, this aligned load extracts one column
169169 # of `v` corresponding to `particle`.
170- # As opposed to `extract_svector`, this will translate to a single wide load instruction
171- # on the GPU, which is faster than 4 separate loads.
172170 # Note that this doesn't work for 2D because it requires a stride of 2^n.
173171 vrho_particle = SIMD. vloada (SIMD. Vec{4 , eltype (v)}, pointer (v, 4 * (particle - 1 ) + 1 ))
174172
You can’t perform that action at this time.
0 commit comments