Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add lazy compute operations #25

Open
wants to merge 21 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Graphs = "86223c79-3864-5bf0-83f7-82e725a168b6"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
ProgressLogging = "33c8b6b6-d38a-422a-b730-caa89a2f386c"
Rasters = "a3a2b9e3-a471-40c9-b274-f788e487c689"
Expand All @@ -20,6 +21,7 @@ ArnoldiMethod = "0.0.4, 0.4"
DelimitedFiles = "1"
Graphs = "1"
LaTeXStrings = "1.1"
LinearSolve = "2.38.0"
Plots = "1.4"
ProgressLogging = "0.1"
Rasters = "0.12"
Expand Down
35 changes: 35 additions & 0 deletions examples/2_landmarks.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,41 @@ kbetw_coarse = @time ConScape.betweenness_kweighted(h_coarse, distance_transform
cor(filter(!isnan, kbetw), filter(!isnan, kbetw_coarse))
```

We can write this using a lazy problem definition:

```julia
ops = (;
func=ConScape.ConnectedHabitat(; distance_transformation=x -> exp(-x/75)),
qbetw=ConScape.BetweennessQweighted(),
kbetw=ConScape.BetweennessKweighted(),
)
problem = ConScape.Problem(ops; θ=theta, mode=ConScape.MatrixSolve())
problem = ConScape.Problem(ops; θ=theta, mode=ConScape.VectorSolve())
```

Then run it for all operations on both normal and coarse grids

```julia
res = ConScape.compute(problem, g)
res_coarse = ConScape.compute(problem, g_coarse)
```

We can plot these outputs:
```julia
heatmap(res.qbetw)
heatmap(res_coarse.qbetw)
heatmap(res.kbetw)
heatmap(res_coarse.kbetw)
heatmap(res.func)
heatmap(res_coarse.func)
```

And we can check the corelation similarly to above, by getting
layers from `res` and `res_coarse`
```julia
cor(filter(!isnan, res.kbetw), filter(!isnan, res_coarse.kbetw))
```

# Test landmark performance for amount of connected habitat

```julia
Expand Down
4 changes: 3 additions & 1 deletion src/ConScape.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module ConScape
using SparseArrays, LinearAlgebra
using Graphs, Plots, SimpleWeightedGraphs, ProgressLogging, ArnoldiMethod
using Rasters
using LinearSolve
using Rasters.DimensionalData

abstract type ConnectivityFunction <: Function end
Expand All @@ -26,5 +27,6 @@ module ConScape
include("io.jl")
# Utilities
include("utils.jl")

include("problem.jl")
include("tiles.jl")
end
6 changes: 6 additions & 0 deletions src/grid.jl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that memory use and flops is also related to the prob::Problem. It could be easy to assess the expected number of flops in VectorSolve() where we can do the calculation for one node and multiply by the number of targets, but it may be more difficult for a MatrixSolve() without performing the operation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense, memory can just be checked by running one column.

Flops may need us to get some values I'm not sure we have access to (but e.g. UMFPACK dose track flops internally).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are refactoring this, it would make sense to already think of #29. Maybe add a field connected_components that is a vector of vectors, where each vector corresponds to a list of node of the associated connected component.

Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,12 @@ function _heatmap(canvas, g; kwargs...)
end
end

function assess(g::Grid)
targetidx, targetnodes = _targetidx_and_nodes(g)
# Calculate memory use and expected flops for
# targetnodes, or something...
end

"""
is_strongly_connected(g::Grid)::Bool

Expand Down
104 changes: 101 additions & 3 deletions src/gridrsp.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

struct GridRSP
g::Grid
θ::Float64
Expand All @@ -12,7 +13,6 @@ end
Construct a GridRSP from a `g::Grid` based on the inverse temperature parameter `θ::Real`.
"""
function GridRSP(g::Grid; θ=nothing)

Pref = _Pref(g.affinities)
W = _W(Pref, θ, g.costmatrix)

Expand Down Expand Up @@ -48,9 +48,19 @@ function Base.show(io::IO, ::MIME"text/html", grsp::GridRSP)
write(io, "<h4>$t</h4>")
show(io, MIME"text/html"(), plot_outdegrees(grsp.g))
end

DimensionalData.dims(grsp::GridRSP) = dims(grsp.g)

abstract type AbstractOperation end
abstract type RSPOperation <: AbstractOperation end

function keywords(o::T) where T<:AbstractOperation
vals = map(f -> getfield(o, f), fieldnames(T))
return NamedTuple{fieldnames(T)}(vals)
end

struct BetweennessQweighted <: RSPOperation end

compute(r::BetweennessQweighted, grsp::GridRSP) = betweenness_qweighted(grsp)

"""
betweenness_qweighted(grsp::GridRSP)::Matrix{Float64}
Expand All @@ -76,7 +86,8 @@ function betweenness_qweighted(grsp::GridRSP)
return _maybe_raster(bet, grsp)
end


struct EdgeBetweennessQweighted <: RSPOperation end
compute(r::EdgeBetweennessQweighted, grsp::GridRSP) = edge_betweenness_qweighted(grsp)

"""
edge_betweenness_qweighted(grsp::GridRSP)::Matrix{Float64}
Expand All @@ -98,6 +109,14 @@ function edge_betweenness_qweighted(grsp::GridRSP)
return betmatrix
end

@kwdef struct BetweennessKweighted{CV,DT,DV} <: RSPOperation
connectivity_function::CV=expected_cost
distance_transformation::DT=nothing
diagvalue::DV=nothing
end

compute(o::BetweennessKweighted, grsp::GridRSP) =
betweenness_kweighted(grsp; keywords(o)...)

"""
betweenness_kweighted(grsp::GridRSP;
Expand Down Expand Up @@ -153,7 +172,13 @@ function betweenness_kweighted(grsp::GridRSP;
return _maybe_raster(bet, grsp)
end

@kwdef struct EdgeBetweennessK{CV,DT,DV} <: RSPOperation
distance_transformation::DT=inv(grsp.g.costfunction)
diagvalue::DV=nothing
end


compute(r::EdgeBetweennessK, grsp::GridRSP) = edge_betweenness_kweighted(grsp; keywords(r)...)

"""
edge_betweenness_kweighted(grsp::GridRSP; [distance_transformation=inv(grsp.g.costfunction), diagvalue=nothing])::SparseMatrixCSC{Float64,Int}
Expand Down Expand Up @@ -189,6 +214,9 @@ function edge_betweenness_kweighted(grsp::GridRSP; distance_transformation=inv(g
end


struct ExpectedCost <: RSPOperation end

compute(r::ExpectedCost, grsp::GridRSP) = expected_cost(grsp)

"""
expected_cost(grsp::GridRSP)::Matrix{Float64}
Expand All @@ -200,23 +228,48 @@ function expected_cost(grsp::GridRSP)
return RSP_expected_cost(grsp.W, grsp.g.costmatrix, grsp.Z, targetnodes)
end


struct FreeDnergyDistance <: RSPOperation end

compute(r::FreeDnergyDistance, grsp::GridRSP) = free_energy_distance(grsp)

function free_energy_distance(grsp::GridRSP)
targetidx, targetnodes = _targetidx_and_nodes(grsp.g)
return RSP_free_energy_distance(grsp.Z, grsp.θ, targetnodes)

end

struct SurvivalProbability <: RSPOperation end

compute(r::SurvivalProbability, grsp::GridRSP) = survival_probability(grsp)

function survival_probability(grsp::GridRSP)
targetidx, targetnodes = _targetidx_and_nodes(grsp.g)

return RSP_survival_probability(grsp.Z, grsp.θ, targetnodes)
end

struct PowerMeanProximity <: RSPOperation end

compute(r::PowerMeanProximity, grsp::GridRSP) = power_mean_proximity(grsp)

function power_mean_proximity(grsp::GridRSP)

targetidx, targetnodes = _targetidx_and_nodes(grsp.g)
return RSP_power_mean_proximity(grsp.Z, grsp.θ, targetnodes)
end

struct LeastCostDistance <: RSPOperation end


compute(r::LeastCostDistance, grsp::GridRSP) = least_cost_distance(grsp)

least_cost_distance(grsp::GridRSP) = least_cost_distance(grsp.g)

struct MeanKullbackLeiblerDivergence <: RSPOperation end

compute(r::MeanKullbackLeiblerDivergence, grsp::GridRSP) = mean_kl_divergence(grsp)

"""
mean_kl_divergence(grsp::GridRSP)::Float64

Expand All @@ -229,6 +282,10 @@ function mean_kl_divergence(grsp::GridRSP)
return qs'*(RSP_free_energy_distance(grsp.Z, grsp.θ, targetnodes) - expected_cost(grsp))*qt*grsp.θ
end

struct MeanLeastCostKullbackLeiblerDivergence <: RSPOperation end

compute(r::MeanLeastCostKullbackLeiblerDivergence, grsp::GridRSP) = mean_kl_divergence(grsp)


"""
mean_lc_kl_divergence(grsp::GridRSP)::Float64
Expand Down Expand Up @@ -281,12 +338,20 @@ function least_cost_kl_divergence(C::SparseMatrixCSC, Pref::SparseMatrixCSC, tar
# Pointer swap
tmp = from
from = to

to = tmp
end

return kl_div
end

@kwdef struct LeastCostKullbackLeiblerDivergence <: RSPOperation
target::Tuple{Int,Int}
end

compute(r::LeastCostKullbackLeiblerDivergence, grsp::GridRSP) =
least_cost_kl_divergence(grsp, r.target)

"""
least_cost_kl_divergence(grsp::GridRSP, target::Tuple{Int,Int})

Expand All @@ -305,6 +370,18 @@ function least_cost_kl_divergence(grsp::GridRSP, target::Tuple{Int,Int})
return reshape(div, grsp.g.nrows, grsp.g.ncols)
end


@kwdef struct ConnectedHabitat{CV,DT,DV} <: RSPOperation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this could be better abstracted - the "equivalent connected habitat" metrics extends beyond the RSP framework. The calculation of connected habitat is simply qˢ .* (S*qᵗ), where S is a proximity matrix. A proximity matrix is derived from a distance matrix, by applying a function distance_transformation that usually involves the dispersal ability of the species considered.

It would be nice to have a structure like AbstractDistance, and from that we could define a RSPDistance (which boils down to the expected_cost) but also a LCPDistance, a ResistanceDistance and what not. Those could be an input to ConnectedHabitat, and would be blindly used to calculate the distance matrix.

More generally, a distance object should be provided to the prob::Problem, and be used to inform the specific operation requested. If no operation is requested, then compute could output an object of the sort ExplicitGrid (I feel like this is a bad name, we should reflect on this), which is essentially a graph with edge weights corresponding to the distance evaluation. This object would somewhat be a sort of RSPGrid, but more interpretable and generic.

This reflection brings me to a second point: the GridRSP object is too specific. The primary goal of this is to checkpoint some calculations, but this should be the role of the Problem structure. RSPGrid is only a buffer with no interest from the user. This buffer could be hidden in prob::Problem. Notice that this is very similar to the solution used in SciML and LinearSolve.jl, where init(prob) allows to do some caching for e.g. the LU factors when doing LU solve (https://docs.sciml.ai/SciMLBase/dev/interfaces/Init_Solve/ and https://docs.sciml.ai/LinearSolve/stable/tutorials/caching_interface/)

Copy link
Collaborator Author

@rafaqz rafaqz Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes GridRSP is pretty much hidden now in this PR. You call compute on a Problem and Grid and GridRSP is internal.

I'm working towards preallocating the whole problem up front like LinearSolve does but most of the code is unfortunately not written in that style. The first step will be to separate out non-allocating ! methods for everything so the workspace arrays can be passed in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also good point about using more appropriate abstractions.

Copy link
Collaborator Author

@rafaqz rafaqz Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a good discussion with @ninbrm about refactoring distances and fixing the abstract type structure to better represent the range of problems. Theta really doesn't need to be a parameter but the fact that distance functions are curreently used like functions rather than constructed objects currently makes it awkward to put parameters in them. So we should change them to be types like you describe.

I'll implement this new abstract type hierarchy in the next iteration of this PR and we can discuss from there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looking forward to that!

Copy link
Collaborator Author

@rafaqz rafaqz Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok the new abstractions are kind of implemented now :)

It works in the landmarks.jmd example at least, but needs some more polish and thought. Eventually we can refactor some of the complexity away but for now Im trying to build it on top of the current methods without changing them too much, so the old tests still run and we can use them to check the new methods.

# TODO not sure which kw to use here
connectivity_function::CV=expected_cost
distance_transformation::DT=nothing
diagvalue::DV=nothing
θ::Union{Nothing,Real}=nothing
approx::Bool=false
end

compute(r::ConnectedHabitat, grsp::GridRSP) = connected_habitat(grsp; keywords(r)...)

"""
connected_habitat(grsp::Union{Grid,GridRSP};
connectivity_function=expected_cost,
Expand Down Expand Up @@ -436,9 +513,19 @@ function connected_habitat(grsp::GridRSP,

newh = GridRSP(newg, θ=grsp.θ)


return connected_habitat(newh; diagvalue=diagvalue, distance_transformation=distance_transformation)
end

@kwdef struct EigMax{F,DT,DV,T} <: RSPOperation
connectivity_function::F=expected_cost
Tdistance_transformation::DT=nothing
diagvalue::DV=nothing
tol::T=1e-14
end

compute(r::EigMax, grsp::GridRSP) = eigmax(grsp; keywords(r)...)

"""
eigmax(grsp::GridRSP;
connectivity_function=expected_cost,
Expand Down Expand Up @@ -564,6 +651,16 @@ function LinearAlgebra.eigmax(grsp::GridRSP;
return vˡ, λ₀[1], vʳ
end

@kwdef struct Criticality{DT,DV,AV,QT,QS} <: RSPOperation
distance_transformation::DT=inv(grsp.g.costfunction)
diagvalue::DV=nothing
avalue::AV=floatmin()
qˢvalue::QS=0.0
qᵗvalue::QT=0.0
end

compute(r::Criticality, grsp::GridRSP) = criticality(grsp; keywords(r)...)

"""
criticality(grsp::GridRSP[;
distance_transformation=inv(grsp.g.costfunction),
Expand Down Expand Up @@ -605,3 +702,4 @@ function criticality(grsp::GridRSP;

return _maybe_raster(landscape, grsp)
end

Empty file added src/operations.jl
Empty file.
Loading
Loading