feat: implement HDF5 support for saving inference data and configurations#836
Conversation
There was a problem hiding this comment.
Code Review
This pull request transitions the inference data storage from plain text and JSON files to a unified HDF5 format (inference_data.hdf5), introducing HDF5 read/write utilities in utils.py and updating flowMC_based.py, numpyro_based.py, and guru.py accordingly. The review feedback highlights critical performance issues where HDF5 files are repeatedly opened and closed inside loops during chain and inference data saving, which could cause I/O bottlenecks. Additionally, the reviewer notes a potential crash when writing JAX arrays as attributes to HDF5 and suggests converting them to standard NumPy arrays, as well as a more robust way to check for empty data buffers.
…ok template - Introduced `generate_report.py` to handle report generation from input data files. - Integrated Papermill for executing Jupyter Notebook templates and generating HTML reports. - Added a new Jupyter Notebook template `template_report.ipynb` for report formatting. - Updated `pyproject.toml` to include new dependencies: `nbconvert`, `papermill`, and `plotly`. - Registered new command line entry point for report generation in `pyproject.toml`. - Included the notebook template in package data for distribution.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates the inference output storage from plain text files to a unified HDF5 format and introduces a new HTML report generation feature using papermill and nbconvert. The review feedback highlights critical performance issues due to excessive I/O overhead from repeatedly opening and closing the HDF5 file during MCMC sampling, saving chains, and writing acceptance rates. Additionally, the feedback identifies a potential PermissionError when writing temporary notebooks to the package installation directory, a runtime TypeError when saving JAX arrays as HDF5 attributes, and a missing corner dependency in pyproject.toml required by the new reporting template.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… file tools, and API consistency improvements (#840) * feat: implement HDF5 support for saving inference data and configurations (#836) * feat: implement HDF5 support for saving inference data and configurations * fix: convert HDF5 dataset attributes to a dictionary * Add report generation functionality with Papermill and Jupyter Notebook template - Introduced `generate_report.py` to handle report generation from input data files. - Integrated Papermill for executing Jupyter Notebook templates and generating HTML reports. - Added a new Jupyter Notebook template `template_report.ipynb` for report formatting. - Updated `pyproject.toml` to include new dependencies: `nbconvert`, `papermill`, and `plotly`. - Registered new command line entry point for report generation in `pyproject.toml`. - Included the notebook template in package data for distribution. * Update src/gwkokab/analysis/report/generate_report.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * style: format output notebook path assignment * feat: add corner library dependency for enhanced plotting capabilities Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: ensure compatibility with JAX Array in HDF5 write function Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: using file descriptor to reduce IO overhead --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * refactor: rename parameters for consistency in `MultiSourceModelCore` and `SubPopulationModelCore` (#837) * refactor: rename parameters for consistency in `MultiSourceModelCore` and `SubPopulationModelCore` * Update src/gwkokab/analysis/multisource/common.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: reorder parameters in `MultiSourceModelCore` --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: correct quotation marks in HTML script tag for virtual-webgl.js * feat: utils to repack and replace `.h5` and `.hdf5` files (#838) * feat: utils to repack and replace `.h5` and `.hdf5` files * fix: remove bugs from option parsing and update epilog * feat: utilities to calculate marginal probabilities of mixture models (#833) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/gwkokab/analysis/core/utils.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/gwkokab/analysis/utils/marginals.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/gwkokab/analysis/report/generate_report.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: reorder imports --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
No description provided.