Description
(I've divided this into three subsections, and can split into separate issues if preferable)
Length of ground truth runs
As I understand it, the current ground truth estimates are obtained from Stan with 150000 samples and 10 chains.
For certain models, such as `gym.targets.VectorModel(gym.targets.BrownianMotionUnknownScalesMissingMiddleObservations(), flatten_sample_transformations=True,), I have produced my own ground truths via longer runs of Blackjax's NUTS (10 million steps, 4 chains), and found results that differ enough to matter for my use cases (namely, estimating efficiency of different samplers)
from blackjax run: [ 0.11525708 0.09256472 0.05635736 -0.03410918 -0.05100336 -0.18196875
-0.18945307 -0.25923407 -0.25987643 -0.32402724 -0.22958763 -0.28165078
-0.3362609 -0.38868254 -0.44175696 -0.4945148 -0.5447676 -0.6013282
-0.6559048 -0.7087315 -0.75866866 -0.8134075 -0.8074223 -0.7784713
-0.82167107 -0.7737639 -0.743899 -0.7613981 -0.6401507 -0.6669518
-0.64461184 0.11305185]
from gym: [ 0.11984811 0.10274264 0.06093274 -0.03870019 -0.04362268 -0.19021639
-0.1856622 -0.26851514 -0.26010785 -0.3334386 -0.21788554 -0.2735482
-0.33083084 -0.38252977 -0.43280044 -0.49400684 -0.54860604 -0.60449123
-0.65569454 -0.7083658 -0.76391494 -0.8189823 -0.8105346 -0.7771473
-0.8268097 -0.7768991 -0.7374106 -0.7740582 -0.6294383 -0.670295
-0.6432216 0.10105278]
(See e.g. the hierarchical params, in particular, the second and final elements of the array).
If my results are actually more accurate (of course it's possible there's a mistake on my end), then would it be possible to switch to the results of a longer run (either of Stan or Blackjax, but see the final section below) in inference-gym?
Adding ground truth expectations of second moment
I would also like to add ground truth estimates of the second moment, i.e.
Blackjax vs Stan
Currently, Stan is used by inference-gym to produce samples for ground truth estimates, run via CmdStanPy. How open would inference-gym be to switching to Blackjax's NUTS implementation instead, to obtain a fully Python setup? (Or even the TFP NUTS implementation)