HDF5 support for resampled opacity DB#399
Merged
Merged
Conversation
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a HDF5 support for resampled opacity databases as an alternative to SQLite. There is overlap with this PR: #360
The HDF5 files can store opacities in two different formats:
log10_uint16andlog10_float32.The
log10_uint16approach saves log10 opacities using unsigned 16 bit integers as described in #397. When lzf + shuffle compression is used, this decreases resampled opacity file sizes by a factor of 10. The resulting spectra with thelog10_uint16are the same as those computed with a SQLite opacity DB to within a factor of < 0.04% in core tests (reflected + transmission + thermal). Also, thelog10_uint16approach with compression is as fast or faster than SQLite (see benchmark below).The
log10_float32storage format is just log10 of opacity as a float32, which stores the opacity more accurately.What changed
RetrieveOpacitiesHDF5and wiredopannection(...)to dispatch to it automatically for.h5and.hdf5files.nearestandlinearquery modes.log10_uint16log10_float32convert_sqlite_to_hdf5(...)topicaso/opacity_factory.pyso users can convert a resampled SQLite opacity database directly into the HDF5 layout expected by PICASO.optics.pyso NumPy scalar values are converted to plain Python types before parameter binding.Validation
I have attached a test that I ran on my local machine. I run thermal + transmission + reflected light calculations of an Earth-like atmosphere with both an SQLite opacity DB and an HDF5 opacity DB with
log10_uint16storage as well as lzf+shuffle compression. The results of the test are pasted below.For
query_method = nearest, the relative difference between spectra is often around ~1e-6 and at maximum ~4e-4. Runtimes for HDF5 are faster in all cases.For
query_method = linear, the relative difference is large because of a bug in the SQLite indexing as described in #398. A separate PR should fix this issue with SQLite because "fixing" the bug might break previous SQLite DBs and so choices must be made. When I fix the bug for my test opacity DB, then I get good agreement between the HDF5 and SQLite path forquery_method = linear. The runtimes are faster for the HDF5 because I implemented numba-based linear interpolation.test.py