Skip to content

Inbound checks for interp#247

Draft
marinlauber wants to merge 7 commits intomasterfrom
out-of-bound-interp
Draft

Inbound checks for interp#247
marinlauber wants to merge 7 commits intomasterfrom
out-of-bound-interp

Conversation

@marinlauber
Copy link
Member

I've noticed that interp can be called with locations that are outside the computational mesh. This is an issue since it sometimes returns rubbish.

I am not too sure what the best approach here, a manual check like I implemented, or removing the @inbounds from the interpolation loop

# Linearly weighted sum over arr[R] (in serial)
s = zero(T)
@fastmath @inbounds @simd for J in R
    weight = prod(@. ifelse(J.I==I.I,1-y,y))
    s += arr[J]*weight
end

My version has the benefit of working regardless of the value of x and simply returns zero(T).

Additionally, this is a challenge to test since @inbonds is removed by @test, and the code never reaches the point of testing the condition, simply throwing an OutOfBoundError.

@b-fg
Copy link
Member

b-fg commented Aug 18, 2025

I would throw a warning, on top of returning 0.

Since we are looking at interp here, it would also be nice to have a GPU version for it!

@marinlauber
Copy link
Member Author

marinlauber commented Aug 27, 2025

I would throw a warning, on top of returning 0.

So I've been looking a bit into this, @warn is not possible on the GPU...

Since we are looking at interp here, it would also be nice to have a GPU version for it!

my idea was to make a function interp(Array{SVector}, arr) and have this run on the GPU since the original interp can already be passed to a kernel.

@marinlauber

This comment was marked as resolved.

@marinlauber
Copy link
Member Author

Actually, the interp works on the GPU already. You just need to call it inside a @loop

interp_u = CuArray(zeros(N,3))
xs = CuArray([SA{Float32}[(i-1)*sim.L,0,0] for i in 1:N])
@WaterLily.loop interp_u[I,:] .= WaterLily.interp(xs[I],sim.flow.u) over I in CartesianIndices(1:N)

and N=1 can be used if only one value is required (which I don't think happens often). I guess this removes the need for a special interp kernel? @b-fg @weymouth

@b-fg
Copy link
Member

b-fg commented Feb 5, 2026

Yes, I'd it is not necessary then. We could include this info in the docstrings though.

@weymouth
Copy link
Collaborator

weymouth commented Feb 5, 2026

There is no need for @loop when you don't shift I

p = CUDA.rand(10,18)
u = CUDA.rand(10,18,2)
x = CuArray([SA_F32[i-1.5,2i+0.5] for i in 1:8])
WaterLily.interp.(x,Ref(p)) # Broadcast
WaterLily.interp.(x,Ref(u)) # Broadcast 

This works fine on the GPU.

Testing the out-of range stuff is fine for scalars.

julia> WaterLily.interp.(x,Ref(p)) # Broadcast
8-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.00037213776
 0.12910923
 0.72426826
 0.3301847
 0.31054902
 0.31111467
 0.0

Since x[1] and x[end] are outside the range, the pressure returns zero. But for vectors I get

julia> WaterLily.interp.(x,Ref(u)) # Broadcast
8-element CuArray{SVector{2, Float32}, 1, CUDA.DeviceMemory}:
 [0.3668502, 0.0]
 [0.1525101, 0.50583553]
 [0.5248325, 0.4398519]
 [0.07852239, 0.6513722]
 [0.26857775, 0.34893203]
 [0.18739545, 0.56745744]
 [0.6781024, 0.6339218]
 [0.0, 0.0]

This one is a little funny since x[1] is shifted to [0,2.5] in the i direction which is within the domain. the j direction is shifted to [-0.5,3] which is out, so the final interpolant is [value,0]. Is that what we want to happen?

@marinlauber
Copy link
Member Author

marinlauber commented Feb 5, 2026

@weymouth, neat approach with Ref.

Not sure it's what we want indeed. The correct way is to check whether the initial point passed is inside the domain, and not if a shifted point is, something like this I suppose

function _interp(x::SVector{D,T}, arr::AbstractArray{T,D}) where {D,T}
    # Index below the interpolation coordinate and the difference
    x = x .+ 1.5f0; i = floor.(Int,x); y = x.-i

    # CartesianIndices around x
    I = CartesianIndex(i...); R = I:I+oneunit(I)

    # Linearly weighted sum over arr[R] (in serial)
    s = zero(T)
    @fastmath @inbounds @simd for J in R
        weight = prod(@. ifelse(J.I==I.I,1-y,y))
        s += arr[J]*weight
    end
    return s
end

using EllipsisNotation
function interp(x::SVector{D,T}, varr::AbstractArray{T}) where {D,T}
    !(all(0 .≤ x) && all(x .≤ size(varr)[1:D].-2)) && return zero(x)
    # Shift to align with each staggered grid component and interpolate
    @inline shift(i) = SVector{D,T}(ifelse(i==j,0.5,0.) for j in 1:D)
    return SVector{D,T}(_interp(x+shift(i),@view(varr[..,i])) for i in 1:D)
end
function interp(x::SVector{D,T}, arr::AbstractArray{T,D}) where {D,T}
    !(all(0 .≤ x) && all(x .≤ size(arr).-2)) ? zero(T) : _interp(x, arr)
end

which returns this for your test case

8-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.5563227
 0.2339635
 0.617647
 0.47089913
 0.6314482
 0.01321663
 0.0
8-element CuArray{SVector{2, Float32}, 1, CUDA.DeviceMemory}:
 [0.0, 0.0]
 [0.37724477, 0.3874247]
 [0.639042, 0.6405743]
 [0.49747413, 0.90232533]
 [0.37469953, 0.6171435]
 [0.430172, 0.49962258]
 [0.6113318, 0.65551376]
 [0.0, 0.0]

which is probably better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants