Skip to content

feat: implement HDF5 support for saving inference data and configurations#836

Merged
Qazalbash merged 8 commits into
devfrom
shift-to-hdf5
May 31, 2026
Merged

feat: implement HDF5 support for saving inference data and configurations#836
Qazalbash merged 8 commits into
devfrom
shift-to-hdf5

Conversation

@Qazalbash

Copy link
Copy Markdown
Member

No description provided.

@Qazalbash Qazalbash self-assigned this May 29, 2026
@Qazalbash Qazalbash added the enhancement New feature or request label May 29, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the inference data storage from plain text and JSON files to a unified HDF5 format (inference_data.hdf5), introducing HDF5 read/write utilities in utils.py and updating flowMC_based.py, numpyro_based.py, and guru.py accordingly. The review feedback highlights critical performance issues where HDF5 files are repeatedly opened and closed inside loops during chain and inference data saving, which could cause I/O bottlenecks. Additionally, the reviewer notes a potential crash when writing JAX arrays as attributes to HDF5 and suggests converting them to standard NumPy arrays, as well as a more robust way to check for empty data buffers.

Comment thread src/gwkokab/analysis/core/flowMC_based.py Outdated
Comment thread src/gwkokab/analysis/core/numpyro_based.py
Comment thread src/gwkokab/analysis/core/utils.py
Comment thread src/gwkokab/analysis/core/flowMC_based.py Outdated
Qazalbash added 2 commits May 30, 2026 20:57
…ok template

- Introduced `generate_report.py` to handle report generation from input data files.
- Integrated Papermill for executing Jupyter Notebook templates and generating HTML reports.
- Added a new Jupyter Notebook template `template_report.ipynb` for report formatting.
- Updated `pyproject.toml` to include new dependencies: `nbconvert`, `papermill`, and `plotly`.
- Registered new command line entry point for report generation in `pyproject.toml`.
- Included the notebook template in package data for distribution.
@Qazalbash Qazalbash changed the base branch from main to dev May 30, 2026 16:21
@Qazalbash

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the inference output storage from plain text files to a unified HDF5 format and introduces a new HTML report generation feature using papermill and nbconvert. The review feedback highlights critical performance issues due to excessive I/O overhead from repeatedly opening and closing the HDF5 file during MCMC sampling, saving chains, and writing acceptance rates. Additionally, the feedback identifies a potential PermissionError when writing temporary notebooks to the package installation directory, a runtime TypeError when saving JAX arrays as HDF5 attributes, and a missing corner dependency in pyproject.toml required by the new reporting template.

Comment thread src/gwkokab/analysis/core/flowMC_based.py
Comment thread src/gwkokab/analysis/report/generate_report.py Outdated
Comment thread src/gwkokab/analysis/core/flowMC_based.py
Comment thread src/gwkokab/analysis/core/numpyro_based.py
Comment thread src/gwkokab/analysis/core/utils.py
Comment thread pyproject.toml
Qazalbash and others added 5 commits May 30, 2026 22:24
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Qazalbash Qazalbash merged commit 3739822 into dev May 31, 2026
5 checks passed
@Qazalbash Qazalbash deleted the shift-to-hdf5 branch May 31, 2026 00:43
Qazalbash added a commit that referenced this pull request Jun 2, 2026
… file tools, and API consistency improvements (#840)

* feat: implement HDF5 support for saving inference data and configurations (#836)

* feat: implement HDF5 support for saving inference data and configurations

* fix: convert HDF5 dataset attributes to a dictionary

* Add report generation functionality with Papermill and Jupyter Notebook template

- Introduced `generate_report.py` to handle report generation from input data files.
- Integrated Papermill for executing Jupyter Notebook templates and generating HTML reports.
- Added a new Jupyter Notebook template `template_report.ipynb` for report formatting.
- Updated `pyproject.toml` to include new dependencies: `nbconvert`, `papermill`, and `plotly`.
- Registered new command line entry point for report generation in `pyproject.toml`.
- Included the notebook template in package data for distribution.

* Update src/gwkokab/analysis/report/generate_report.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* style: format output notebook path assignment

* feat: add corner library dependency for enhanced plotting capabilities

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: ensure compatibility with JAX Array in HDF5 write function

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* feat: using file descriptor to reduce IO overhead

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* refactor: rename parameters for consistency in `MultiSourceModelCore` and `SubPopulationModelCore` (#837)

* refactor: rename parameters for consistency in `MultiSourceModelCore` and `SubPopulationModelCore`

* Update src/gwkokab/analysis/multisource/common.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: reorder parameters in `MultiSourceModelCore`

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: correct quotation marks in HTML script tag for virtual-webgl.js

* feat: utils to repack and replace `.h5` and `.hdf5` files (#838)

* feat: utils to repack and replace `.h5` and `.hdf5` files

* fix: remove bugs from option parsing and update epilog

* feat: utilities to calculate marginal probabilities of mixture models (#833)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update src/gwkokab/analysis/core/utils.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update src/gwkokab/analysis/utils/marginals.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update src/gwkokab/analysis/report/generate_report.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: reorder imports

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant