Dev#839
Conversation
…ions (#836) * feat: implement HDF5 support for saving inference data and configurations * fix: convert HDF5 dataset attributes to a dictionary * Add report generation functionality with Papermill and Jupyter Notebook template - Introduced `generate_report.py` to handle report generation from input data files. - Integrated Papermill for executing Jupyter Notebook templates and generating HTML reports. - Added a new Jupyter Notebook template `template_report.ipynb` for report formatting. - Updated `pyproject.toml` to include new dependencies: `nbconvert`, `papermill`, and `plotly`. - Registered new command line entry point for report generation in `pyproject.toml`. - Included the notebook template in package data for distribution. * Update src/gwkokab/analysis/report/generate_report.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * style: format output notebook path assignment * feat: add corner library dependency for enhanced plotting capabilities Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: ensure compatibility with JAX Array in HDF5 write function Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: using file descriptor to reduce IO overhead --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… and `SubPopulationModelCore` (#837) * refactor: rename parameters for consistency in `MultiSourceModelCore` and `SubPopulationModelCore` * Update src/gwkokab/analysis/multisource/common.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: reorder parameters in `MultiSourceModelCore` --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…#833) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request transitions the storage of inference results, configurations, and metadata from plain text and JSON files to a unified HDF5 format (inference_data.hdf5), updating both FlowMC and NumPyro samplers accordingly. It also introduces an automated HTML report generation tool using a Jupyter notebook template, standardizes parameter naming across hybrid models, and adds an HDF5 repacking script. The review feedback highlights several critical issues: a potential crash in generate_marginal_probs when inspecting classes without an explicit __init__, a bug in write_to_hdf5 where JAX Arrays are not converted to NumPy arrays during dataset overwrites, an argument parsing error in h5repack.py that prevents the file argument from being matched, and a lack of robustness in the report notebook when certain datasets are missing from the HDF5 file.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if data is not None: | ||
| if dataset_path in f: | ||
| del f[dataset_path] | ||
| elif isinstance(data, Array): | ||
| data = np.asarray(data) | ||
| obj = f.create_dataset(dataset_path, data=data) |
There was a problem hiding this comment.
In write_to_hdf5, the elif isinstance(data, Array): block is skipped if dataset_path in f is True. This means JAX Arrays are not converted to NumPy arrays when overwriting an existing dataset, which can cause h5py to fail or behave unexpectedly. Changing elif to if ensures the conversion always runs.
| if data is not None: | |
| if dataset_path in f: | |
| del f[dataset_path] | |
| elif isinstance(data, Array): | |
| data = np.asarray(data) | |
| obj = f.create_dataset(dataset_path, data=data) | |
| if data is not None: | |
| if dataset_path in f: | |
| del f[dataset_path] | |
| if isinstance(data, Array): | |
| data = np.asarray(data) | |
| obj = f.create_dataset(dataset_path, data=data) |
| param_names = list(inspect.signature(model_meta_cls.__init__).parameters.keys()) # type: ignore | ||
| param_names.remove("self") |
There was a problem hiding this comment.
Using inspect.signature(model_meta_cls.__init__) and then param_names.remove("self") will crash if model_meta_cls does not explicitly define __init__ (like EccentricityMattersCore), because object.__init__ has no self parameter in its signature (or raises ValueError). Using inspect.signature(model_meta_cls) is the standard, safe way to get the constructor signature without self and works correctly for all classes.
| param_names = list(inspect.signature(model_meta_cls.__init__).parameters.keys()) # type: ignore | |
| param_names.remove("self") | |
| param_names = list(inspect.signature(model_meta_cls).parameters.keys()) |
| parser.add_argument( | ||
| "options", | ||
| nargs=argparse.REMAINDER, | ||
| help="Additional options to pass to h5repack (e.g., '-f GZIP=9 -f SHUF')", | ||
| ) | ||
| parser.add_argument("file", help="Path to the HDF5 file to repack") | ||
| args, options = parser.parse_known_args() |
There was a problem hiding this comment.
Defining a positional argument "options" with nargs=argparse.REMAINDER before "file" causes "options" to consume all remaining arguments (including the file path), leaving "file" unmatched and causing the script to always fail with a required argument error. Removing the "options" argument from the parser allows parse_known_args() to correctly capture all unrecognized options in the options list while successfully parsing "file".
parser.add_argument("file", help="Path to the HDF5 file to repack")
args, options = parser.parse_known_args()| "outputs": [], | ||
| "source": [ | ||
| "if SAMPLER_NAME == \"flowMC\":\n", | ||
| " global_acc_train = read_from_hdf5(inference_data_file, \"/acceptances/global/train\")\n", |
There was a problem hiding this comment.
The notebook assumes that all training/production datasets (like /acceptances/global/train, loss, /chains/train/...) are always present in the HDF5 file. If any phase was skipped or had no data, these datasets won't exist, causing read_from_hdf5 to raise a ValueError and crash the entire automated report generation. Consider checking if the datasets exist in the file before reading them.
Summary
Related To
Description
Additional Notes (Optional)