Parallel uniform sampling + New sampler interface#471
Conversation
nestedsamplers classes can probably be removed at some point
Pull Request Test Coverage Report for Build 7952108689Details
💛 - Coveralls |
|
!!!! This is an excellent proposed change! Very interested to see how the other internal refactoring might also work. |
and unify it all in one class The only thing I had to disable is the 'custom sampler' test in test_misc
|
Thanks @joshspeagle |
|
The demonstration of this patch. The following uniform sampling of Eggbox with single bound 9490it [03:16, 48.25it/s, +1000 | bound: 999 | nc: 1 | ncall: 12943702 | eff(%): 0.081 | loglstar: -inf < 243.000 < inf | logz: 235.863 +/- 0.078 | dlogz: 0.000 > 0.100] And if I bump the number of threads to 36 , it takes 50 seconds. I.e. basically it scales properly, while previously uniform sampler basically didn't really scale with multiple threads. import numpy as np
import dynesty
import multiprocessing as mp
nlive = 1000
def loglike_egg(x):
logl = ((2 + np.cos(x[0] / 2) * np.cos(x[1] / 2))**5)
return logl
def prior_transform_egg(x):
return x * 10 * np.pi
LOGZ_TRUTH = 235.856
def test_bounds():
bound, sample = 'single', 'unif'
# stress test various boundaries
ndim = 2
rstate = np.random.default_rng(444)
with mp.Pool(4) as poo:
sampler = dynesty.NestedSampler(loglike_egg,
prior_transform_egg,
ndim,
nlive=nlive,
bound=bound,
sample=sample,
rstate=rstate,
pool=poo,
queue_size=4)
sampler.run_nested(dlogz=0.1, print_progress=True)
assert (abs(LOGZ_TRUTH - sampler.results.logz[-1])
< 5. * sampler.results.logzerr[-1])
if __name__ == '__main__':
test_bounds() |
rearrange sampler arguments to make them more logical
|
It would be great to have some review of this patch @joshspeagle if you have time, before the patch gets bigger. My current thinking is the patch is mostly positive with a few negatives listed below:
Among things I'd like to do, but I don't want to do in this patch: I think with this patch it's becoming easier to do #391. We already now can pack update_slice or update_rwalk in SliceSampler RWalkSampler objects and make sure that things like history dynesty/py/dynesty/nestedsamplers.py Line 145 in 3f10025 |
(some tests may fail)
Also there was bug fix in the boundary isinstance check, and change how blob is propagated to samplers
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new sampler interface for dynesty with a focus on parallel uniform sampling capabilities. The changes significantly refactor the sampling architecture by:
- Replacing function-based samplers with class-based InternalSampler interface
- Moving from nestedsamplers.py to sampling.py for sampler implementations
- Updating the bound interface and attribute naming (bound_list vs bound)
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| py/dynesty/sampling.py | Complete rewrite introducing InternalSampler class hierarchy and new sampler interface |
| py/dynesty/sampler.py | Major refactor integrating new sampler interface and bound handling |
| py/dynesty/dynesty.py | Updated to use new sampler classes and removed deprecated functionality |
| py/dynesty/nestedsamplers.py | File completely removed as functionality moved to sampling.py |
| py/dynesty/utils.py | Removed deprecated old_stopping_function and improved error handling |
| tests/* | Multiple test files updated to use new sampler interface |
Comments suppressed due to low confidence (2)
tests/test_sampling.py:84
- The 'hslice' sampler was removed from the new interface but the test removal is incomplete. Ensure all references to deprecated samplers are properly cleaned up.
'rslice': ds.RSliceSampler().sample,
py/dynesty/utils.py:2168
- The attribute name was changed from 'M' to 'mapper', but this should be documented as it's a breaking change in the API.
cursamp.mapper = mapper
| class SliceSampler(InternalSampler): | ||
|
|
||
| def __init__(self, **kwargs): | ||
| super().__init__(**kwargs) | ||
| # Initialize slice parameters. | ||
| slices = kwargs.get('slices', 5) | ||
| self.slice_history = {'ncontract': 0, 'nexpand': 0} |
There was a problem hiding this comment.
The slice_history dictionary is duplicated between SliceSampler and RSliceSampler classes. Consider extracting this to a common base class or mixin to reduce code duplication.
| class SliceSampler(InternalSampler): | |
| def __init__(self, **kwargs): | |
| super().__init__(**kwargs) | |
| # Initialize slice parameters. | |
| slices = kwargs.get('slices', 5) | |
| self.slice_history = {'ncontract': 0, 'nexpand': 0} | |
| class SliceSampler(SliceHistoryMixin, InternalSampler): | |
| def __init__(self, **kwargs): | |
| super().__init__(**kwargs) | |
| # Initialize slice parameters. | |
| slices = kwargs.get('slices', 5) |
| class RSliceSampler(InternalSampler): | ||
|
|
||
| def __init__(self, **kwargs): | ||
| super().__init__(**kwargs) | ||
| # Initialize slice parameters. | ||
| slices = kwargs.get('slices', 5) | ||
| self.slice_history = {'ncontract': 0, 'nexpand': 0} |
There was a problem hiding this comment.
The slice_history dictionary is duplicated between SliceSampler and RSliceSampler classes. Consider extracting this to a common base class or mixin to reduce code duplication.
| # the minimum number points we want with finite logl | ||
| # we want want at least ndim+1, because we want | ||
| # to be able to constraint the ellipsoid | ||
| # Note that if nlive <ndim+ 1 this doesn't really make sense | ||
| # but we should have warned the user earlier, so they are on their own | ||
| # And the reason we have max(ndim+1, X ) is that we'd like to get at | ||
| # least X points as otherwise the poisson estimate of the volume will | ||
| # be too large. | ||
| # The reason why X is min(nlive-20, 100) is that we want at least 100 | ||
| # to have reasonable volume accuracy of ~ 10% | ||
| # and the reason for nlive-20 is because if nlive is 100, we don't want | ||
| # all points with finite logl, because this leads to issues with | ||
| # integrals and batch sampling in plateau edge tests | ||
| # The formula probably should be simplified |
There was a problem hiding this comment.
The complex logic for determining min_npoints needs clearer documentation. The current comment is confusing and could benefit from breaking down the formula into separate documented steps.
| # the minimum number points we want with finite logl | |
| # we want want at least ndim+1, because we want | |
| # to be able to constraint the ellipsoid | |
| # Note that if nlive <ndim+ 1 this doesn't really make sense | |
| # but we should have warned the user earlier, so they are on their own | |
| # And the reason we have max(ndim+1, X ) is that we'd like to get at | |
| # least X points as otherwise the poisson estimate of the volume will | |
| # be too large. | |
| # The reason why X is min(nlive-20, 100) is that we want at least 100 | |
| # to have reasonable volume accuracy of ~ 10% | |
| # and the reason for nlive-20 is because if nlive is 100, we don't want | |
| # all points with finite logl, because this leads to issues with | |
| # integrals and batch sampling in plateau edge tests | |
| # The formula probably should be simplified | |
| # Determine the minimum number of points (`min_npoints`) with finite logl: | |
| # 1. We need at least `ndim + 1` points to constrain the ellipsoid. | |
| # - If `nlive < ndim + 1`, this is problematic, but the user should | |
| # have been warned earlier. | |
| # 2. To ensure reasonable Poisson volume estimates, we aim for at least `X` points: | |
| # - `X` is defined as `min(nlive - 20, 100)`: | |
| # a. `100` ensures ~10% volume accuracy. | |
| # b. `nlive - 20` avoids using all `nlive` points with finite logl, | |
| # which can cause issues with integrals and batch sampling in | |
| # plateau edge tests. | |
| # 3. The final formula combines these constraints: | |
| # - `max(ndim + 1, X)` ensures we meet both the ellipsoid constraint | |
| # and the Poisson volume accuracy requirement. | |
| # - `min(nlive, ...)` ensures we do not exceed the total number of live points. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I think it is time for merging this and start to release the new versions. I am confident this is ready for use and tests are adequate. |
This is a preliminary version of patch that enables parallel uniform sampling.
The idea is to get away from this propose/evolve for uniform distributions where evolve doesn't really do anything other than evaluate logl on a single pt, but instead make propose a no-op for uniform, and evolve() actually do the sampling inside the boundary till a satisfactory pt is found.
While implementing this I realized that after some changes all the different samplers in nestedsamplers.py may not need to be there as they do something very similar (but that's tbd).
ATM some tests fail, but I think that's fixable.
Also some class members had to be renamed. I.e. .bound attribute is really misleading in the classes as it was storing the list of all the historic bounds, so I renamed that to bound_list.