Skip to content

refactor(generator): migrate univariate specs to pysatl-core distribtions#5

Open
Desiment wants to merge 3 commits into
mainfrom
des/pysatl-core-integration
Open

refactor(generator): migrate univariate specs to pysatl-core distribtions#5
Desiment wants to merge 3 commits into
mainfrom
des/pysatl-core-integration

Conversation

@Desiment

@Desiment Desiment commented May 20, 2026

Copy link
Copy Markdown
Contributor

Migrate pysatl_cpd/data/generator univariate sampling/specs from SciPy-style classes (NormalSpec, UniformSpec, etc.) to pysatl-core with UnivariateDistributionSpec, YAML support for optional parameterization, and UNURAN sampling; then update docs/notebooks and satisfy hooks/coverage.

Description of incoming changes:

  • Replace only: remove old public univariate spec classes; no compatibility aliases.
  • Use best-effort reproducibility; exact seeded reproducibility is not possible for core/UNURAN sampling now, so corresponding golden test marked as expecting to fail.
  • Drop Student-t distribution entirely.
  • Keep YAML support, including old kind: normal|uniform|exponential shapes translated into the new unified spec.
  • Use pysatl-core family names/parameter names:
    • UnivariateDistributionSpec("Normal", "meanStd", mu=..., sigma=...)
    • UnivariateDistributionSpec("ContinuousUniform", "standard", lower_bound=..., upper_bound=...)
    • UnivariateDistributionSpec("Exponential", "scale", beta=...)

Detailied description

  • Replaced univariate spec model in pysatl_cpd/data/generator/specs.py with UnivariateDistributionSpec.
  • Updated YAML/config parsing in pysatl_cpd/data/generator/config.py:
    • old shapes kind: normal|uniform|exponential now map to UnivariateDistributionSpec
    • new generic kind: univariate with optional parametrization_name supported
    • student_t removed/rejected
  • Switched univariate sampling in pysatl_cpd/data/generator/segments/sampling.py to pysatl-core + DefaultUnuranSamplingStrategy.
  • Updated generator presets in pysatl_cpd/data/generator/presets.py to use unified specs.
  • Updated exports in pysatl_cpd/data/generator/__init__.py to remove old univariate classes.
  • Rewrote tests to use UnivariateDistributionSpec:
    • tests/unit/data/generator/test_specs.py
    • tests/unit/data/generator/test_config.py
    • tests/unit/data/generator/segments/test_sampling.py
    • tests/unit/data/generator/test_series.py
    • tests/unit/data/generator/test_dataset_generator.py
    • tests/unit/data/dataset/test_dataset_split.py
    • tests/regression/test_seeded_generators.py
  • Marked seeded golden regression as xfail because pysatl-core UNURAN sampling lacks exposed seed control.
  • Rewrote remaining Python doc examples from NormalSpec(...) style to UnivariateDistributionSpec(...) in multiple __init__.py files across the codebase.
  • Updated notebooks:
    • notebooks/user_guide/02-generator-api.ipynb
    • notebooks/user_guide/04-core-api-visualization.ipynb
  • Validation completed successfully:
    • poetry run pytest tests/unit/data/generator tests/unit/data/dataset/test_dataset_split.py tests/regression/test_seeded_generators.py → 97 passed, 1 xfailed
    • poetry run ruff check → passed
    • poetry run mypy → passed
    • poetry run pre-commit run --all-files → passed
    • Notebook JSON validation → passed

Key Decisions

  • Use only UnivariateDistributionSpec as the public univariate spec API; remove NormalSpec, UniformSpec, ExponentialSpec, StudentTSpec.
  • Keep multivariate normal sampling on NumPy; only univariate/independent-column univariate sampling moved to pysatl-core UNURAN.
  • Dp not preserve old YAML univariate shapes by translating them to core-backed unified specs.
  • Drop Student-t entirely per user request.
  • Accept best-effort seeded behavior for core/UNURAN; exact seeded golden behavior is not preserved.

Replace legacy univariate generator specs with a unified core-backed
distribution spec, route univariate sampling through pysatl-core UNURAN,
and update docs, notebooks, and tests to match the new API.
@Desiment Desiment requested review from LeonidElkin and iraedeus May 20, 2026 13:50
Comment thread pysatl_cpd/data/generator/segments/sampling.py Outdated
Comment thread pysatl_cpd/data/generator/segments/sampling.py
Comment thread tests/unit/data/generator/segments/test_sampling.py
@Desiment Desiment marked this pull request as draft May 20, 2026 14:21
Desiment added 2 commits May 20, 2026 17:25
* Current UNURAN sampling strategy unstable for sampling data with
large means and deviations. Set temporarily default strategy.
* Ensure correct mypy behavior by passing strategies explicily
@Desiment Desiment force-pushed the des/pysatl-core-integration branch from 4fbc962 to 46cf41d Compare May 20, 2026 15:48
@Desiment Desiment marked this pull request as ready for review May 20, 2026 15:56
@Desiment Desiment requested a review from LeonidElkin May 20, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants