WIP: Add kwargs to InferenceData.to_netcdf()#2410
WIP: Add kwargs to InferenceData.to_netcdf()#2410cowirihy wants to merge 3 commits intoarviz-devs:mainfrom
Conversation
|
Thanks @cowirihy I'll try my best to review this week. |
OriolAbril
left a comment
There was a problem hiding this comment.
Thanks for the PR! Sorry about the delayed review. I hope it is helpful.
| Other keyword arguments will be passed to `xarray.Dataset.to_netcdf()`. If | ||
| provided these will serve to override dict items that relate to `compress` and | ||
| `engine` parameters described above. |
There was a problem hiding this comment.
This would be the parameter description according to numpydoc, so indented under a **kwargs parameter with no type: https://numpydoc.readthedocs.io/en/latest/format.html#parameters (the last paragraph of this section).
Also, if you use :meth:`xarray.Dataset.to_netcdf` or even `xarray.Dataset.to_netcdf` (given our sphinx configuration) it will be rendered as a link to the respective docs in the xarray website. You can check the rendered docstring preview from your PR at https://arviz--2410.org.readthedocs.build/en/2410/api/generated/arviz.InferenceData.to_netcdf.html
| try: | ||
| encoding_kw2 = kwargs["encoding"] | ||
| except KeyError: | ||
| encoding_kw2 = {} |
There was a problem hiding this comment.
we generally use .get for these kind of operations: encoding_kw2 = kwargs.get("encoding", {})
| for var_name, kw1 in encoding_kw1.items(): | ||
| try: | ||
| kw2 = encoding_kw2[var_name] | ||
| except KeyError: | ||
| kw2 = {} | ||
| encoding_kw_merged[var_name] = {**kw1,**kw2} |
There was a problem hiding this comment.
I think the logic here would not work as expected when there are non compressible types. My line of thought/duck debugging:
encoding_kw2is full and has elements for all variablesencoding_kw_mergedis empty- We loop only over the variable names in
encoding_kw1which will only contain compressible variables. Then for each of these variables only:- We merge the respective variable specifics
encoding_kw1andencoding_kw2
- We merge the respective variable specifics
encoding_kw_mergedhas the same keys asencoding_kw1and the merged dicts as values.- If there were no compressible variables,
encoding_kw_mergedwould be empty even withencoding_kw2being full
- If there were no compressible variables,
Potential proposal:
| for var_name, kw1 in encoding_kw1.items(): | |
| try: | |
| kw2 = encoding_kw2[var_name] | |
| except KeyError: | |
| kw2 = {} | |
| encoding_kw_merged[var_name] = {**kw1,**kw2} | |
| for var_name in data.data_vars: | |
| kw1 = encoding_kw1.get(var_name, {}) | |
| kw2 = encoding_kw2.get(var_name, {}) | |
| encoding_kw_merged[var_name] = kw1 | kw2 |
| # 1) define an InferenceData object (e.g. from file) | ||
| # 2) define different sets of `**kwargs` to pass | ||
| # 3) use inference_data.to_netcdf(filepath,**kwargs) | ||
| # 4) test these make it through to `data.to_netcdf()` as intended - TODO how? |
There was a problem hiding this comment.
I think what you propose is about right. Pseudocode idea:
idata = load...
# store with encoding kwargs that mean small but non-neglibible loss of precision
# and as previous test, check requested filename exists
idata_encoded = load...
for group in idata.groups:
# use https://docs.xarray.dev/en/stable/generated/xarray.testing.assert_allclose.html#xarray.testing.assert_allclose
# once as
with pytest.raises(AssertionError):
`assert_allclose(... tol=low/default)
# then again as
assert_allclose(..., tol=high)
# clean up files|
closing for now, please reopen if you are able to continue working on this @cowirihy |
Relates to #2298, with solution broadly along the lines of that sketched out in the issue.
Added
**kwargstoInferenceData.to_netcdf()method, to allow any of the parameters that can be passed toxarray.Dataset.to_netcdf()to get passed through.E.g. for my usage case I define an
encoding={'var_A' : {"dtype": "int16", "scale_factor" : 0.1}}dict, so thatvar_Asamples get stored via 16-bit integers and to 1 decimal place precision, to economise on file size but with an inconsequential loss of precision. Note this would be done forvar_Ain any group in which it appears, e.g. bothposteriorandpriorgroups if present.I've put in a placeholder for where a new unittest could be added, but am not so confident in defining this. What I envisage, which I've tested via a seperate script my end, is the following:
InferenceDatainstance, reading fromnetcdffile as I can see other unittests do alreadyencodingsettings for a couple of the RVs in the model to which the data relatesnetcdffile but passingencoding(and/or other params that would alter the behaviour ofDataset.to_netcdf)Help welcome in setting up the latter! It would also be worthwhile verifying via tests that the handling code I've included (populating the
kwargsdict based oncompressandengineparameters per previous) is working as intended and in a backwards compatible manner; it should! Perhaps existing tests are adequate to prove this though?Checklist
📚 Documentation preview 📚: https://arviz--2410.org.readthedocs.build/en/2410/