Running on #2803, I struggled to hit the third branch in the Gibbs tilde_assume!! call (these lines)
|
# If the varname has not been conditioned on, nor is it a target variable, its |
|
# presumably a new variable that should be sampled from its prior. We need to add |
|
# this new variable to the global `varinfo` of the context, but not to the local one |
|
# being used by the current sampler. |
|
# |
|
# TODO(penelopeysm): How is the RNG controlled here? |
|
value = rand(right) |
|
vnt = get_global_vnt(context) |
|
vnt = DynamicPPL.templated_setindex!!(vnt, value, vn, template) |
|
set_global_vnt!(context, vnt) |
but managed to do so with this:
using Turing, Random
@model function f()
x ~ Normal()
y ~ Normal()
if x > 0
z ~ Normal()
end
end
spl = Gibbs(:x => MH(), :y => MH(), :z => MH())
sample(Xoshiro(470), f(), spl, 100; check_model=false) |> mean
Now, the bug I was hoping to find was that the rng is not controlled and therefore the value of z chosen will not be deterministic. However, the real bug here is that set_global_vnt! stores a VarNamedTuple of a different type from the original one (because the VNT has a new variable inside it). This causes Gibbs to crash because the type parameter of the ref is fixed at construction time.
I think the solution to the crash is fairly simple — we just need to make the Ref an abstract type in the struct. This was done for pMCMC in Turing v0.43 and there were no real performance differences at all.
|
# TODO(penelopeysm): I don't like that this is an abstract type. However, the problem is |
|
# that the type of VarInfo can change during execution, especially with PG-inside-Gibbs |
|
# when you have to muck with merging VarInfos from different sub-conditioned models. |
|
# |
|
# However, I don't think that this is actually a problem in practice. Whenever we do |
|
# Libtask.get_taped_globals, that is already type unstable anyway, so accessing this |
|
# field here is not going to cause extra type instability. This change is associated |
|
# with Turing v0.43, and I benchmarked on v0.42 vs v0.43, and v0.43 is actually faster |
|
# (probably due to underlying changes in DynamicPPL), so I'm not really bothered by |
|
# this. |
|
varinfo::AbstractVarInfo |
|
resample::Bool |
However, I'm not sure what the performance impacts are here, so that needs to be studied. Note that this change will affect type stability of get_global_vnt (but it is indeed possible that there are enough function barriers in the Gibbs codebase to make this not a meaningful problem). Once the crash is fixed, we could then fix the original reproducibility issue.
I don't think any of this is breaking, and it's a code path that is only hit in a vanishing number of use cases, so I'm ok with leaving this until after v0.44 is released.
Running on #2803, I struggled to hit the third branch in the Gibbs
tilde_assume!!call (these lines)Turing.jl/src/mcmc/gibbs.jl
Lines 265 to 274 in e8cdc08
but managed to do so with this:
Now, the bug I was hoping to find was that the rng is not controlled and therefore the value of
zchosen will not be deterministic. However, the real bug here is thatset_global_vnt!stores a VarNamedTuple of a different type from the original one (because the VNT has a new variable inside it). This causes Gibbs to crash because the type parameter of the ref is fixed at construction time.I think the solution to the crash is fairly simple — we just need to make the Ref an abstract type in the struct. This was done for pMCMC in Turing v0.43 and there were no real performance differences at all.
Turing.jl/src/mcmc/particle_mcmc.jl
Lines 27 to 38 in a979e75
However, I'm not sure what the performance impacts are here, so that needs to be studied. Note that this change will affect type stability of
get_global_vnt(but it is indeed possible that there are enough function barriers in the Gibbs codebase to make this not a meaningful problem). Once the crash is fixed, we could then fix the original reproducibility issue.I don't think any of this is breaking, and it's a code path that is only hit in a vanishing number of use cases, so I'm ok with leaving this until after v0.44 is released.