Add support for distributed cholla datasets.#4702
Conversation
you'll need to
Maybe that's a bug with
I think unit tests should be preferred whenever they suffice for a couple reasons:
That said, if what you need is some answer tests, go for it ! |
2806b17 to
4428058
Compare
| from .fields import ChollaFieldInfo | ||
|
|
||
|
|
||
| def _split_fname_proc_suffix(filename: str): |
There was a problem hiding this comment.
Could you put a short note about how this is different from os.path.splitext? Just to avoid future confusion.
There was a problem hiding this comment.
Good point -- I overhauled the docstring to try to make it more clear (and I explicitly addressed how it differs from os.path.splitext)
matthewturk
left a comment
There was a problem hiding this comment.
only minor stuff -- looks good otherwise
| def io_iter(self, chunks, fields): | ||
| # this is loosely inspired by the implementation used for Enzo/Enzo-E | ||
| # - those other options use the lower-level hdf5 interface. Unclear | ||
| # whether that affords any advantages... |
There was a problem hiding this comment.
Good question. I think in the past it did because we avoided having to re-allocate temporary scratch space, but I am not sure that would hold up to current inquiries. I think the big advantage those have is tracking the groups within the iteration.
| fh, filename = None, None | ||
| for chunk in chunks: | ||
| for obj in chunk.objs: | ||
| if obj.filename is None: # unclear when this case arises... |
There was a problem hiding this comment.
likely it will not here, unless you manually construct virtual grids
There was a problem hiding this comment.
Out of curiosity, what is a virtual grid?
I realize this may be an involved answer - so if you could just point me to a frontend (or other area of the code) using virtual grids, I can probably investigate that on my own.
|
My apologies for taking a while to follow up on this. I plan to circle back in the next week or so. |
Co-authored-by: Matthew Turk <matthewturk@gmail.com>
|
The test failure from the cancelled test here is unrelated, see #5153 |
|
@matthewturk, I know it's been a year and a half, but I finally made the requested changes. After you take another look, (and once you guys figure out how to handle that cancelled test), I think this will be good to go |
|
hey @mabruzzo if you merge with main again the skipped test should run. |
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
|
@matthewturk, this is just a gentle nudge about this PR. As I've mentioned elsewhere, we may want to close this PR in favor of #5170 (#5170 includes all the changes from here plus a few more changes in order to be a little more general purpose). |
|
Hi @mabruzzo -- which would you prefer of those options? |
|
@matthewturk, personally, I'd prefer that we close this in favor of #5170. #5170 is not that much bigger and I think I had to overwrite a little code I contributed here (it's been a while, so I don't precisely remember the details). But, I'm flexible |
|
If they're indeed in conflict, and that one obviates this one, I say go for it. |
|
Superseded by #5170 |
PR Summary
This PR adds support for loading Cholla datasets that are distributed over multiple files. Previously, the frontend could only load Cholla datasets after they were concatenated into a single large dataset.
This functionality is currently a little inefficient right now - we need to read in every hdf5 file to figure out the mapping between spatial locations and locations on disk. This seems like something we can easily improve in the future (possibly by having Cholla write out an extra attribute how 3D locations are mapped into 1D).
PR Checklist
For this PR, I suspect that we will need to upload a new test dataset. I just had a few questions: