Skip to content

Add support for distributed cholla datasets.#4702

Closed
mabruzzo wants to merge 6 commits into
yt-project:mainfrom
mabruzzo:cholla-frontend-improvements
Closed

Add support for distributed cholla datasets.#4702
mabruzzo wants to merge 6 commits into
yt-project:mainfrom
mabruzzo:cholla-frontend-improvements

Conversation

@mabruzzo
Copy link
Copy Markdown
Contributor

PR Summary

This PR adds support for loading Cholla datasets that are distributed over multiple files. Previously, the frontend could only load Cholla datasets after they were concatenated into a single large dataset.

This functionality is currently a little inefficient right now - we need to read in every hdf5 file to figure out the mapping between spatial locations and locations on disk. This seems like something we can easily improve in the future (possibly by having Cholla write out an extra attribute how 3D locations are mapped into 1D).

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.

For this PR, I suspect that we will need to upload a new test dataset. I just had a few questions:

  • It's been a while since I've done this. Could someone remind me of the procedure for doing this?
  • Weirdly enough, I get the following message when I run the unit-tests on the main branch. Do you have any idea why this is happening? (For context, the other 3 tests all run)

    yt/frontends/cholla/tests/test_outputs.py::test_cholla_data SKIPPED (cannot load dataset ChollaSimple/0.h5)

  • Is there any preference for unit tests vs answer-tests when it comes to frontends?

@neutrinoceros
Copy link
Copy Markdown
Member

It's been a while since I've done this. Could someone remind me of the procedure for doing this?

you'll need to

Weirdly enough, I get the following message when I run the unit-tests on the main branch. Do you have any idea why this is happening? (For context, the other 3 tests all run)

Maybe that's a bug with small_patch_amr. I suggest trying to work on a simplified version of the test and refine it until it doesn't skip, to discover what's happening.

Is there any preference for unit tests vs answer-tests when it comes to frontends?

I think unit tests should be preferred whenever they suffice for a couple reasons:

  • answer tests are currently deeply rooted in the nose test framework (migration to pytest is still ongoing), so adding more of them makes this long lasting migration ever so slightly harder
  • fast tests are easier to scale

That said, if what you need is some answer tests, go for it !

@neutrinoceros neutrinoceros added code frontends Things related to specific frontends enhancement Making something better labels Oct 11, 2023
@mabruzzo mabruzzo force-pushed the cholla-frontend-improvements branch from 2806b17 to 4428058 Compare October 18, 2023 14:28
Comment thread yt/frontends/cholla/data_structures.py Outdated
from .fields import ChollaFieldInfo


def _split_fname_proc_suffix(filename: str):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put a short note about how this is different from os.path.splitext? Just to avoid future confusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- I overhauled the docstring to try to make it more clear (and I explicitly addressed how it differs from os.path.splitext)

matthewturk
matthewturk previously approved these changes Oct 18, 2023
Copy link
Copy Markdown
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only minor stuff -- looks good otherwise

Comment thread yt/frontends/cholla/data_structures.py Outdated
Comment thread yt/frontends/cholla/io.py
def io_iter(self, chunks, fields):
# this is loosely inspired by the implementation used for Enzo/Enzo-E
# - those other options use the lower-level hdf5 interface. Unclear
# whether that affords any advantages...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think in the past it did because we avoided having to re-allocate temporary scratch space, but I am not sure that would hold up to current inquiries. I think the big advantage those have is tracking the groups within the iteration.

Comment thread yt/frontends/cholla/io.py
fh, filename = None, None
for chunk in chunks:
for obj in chunk.objs:
if obj.filename is None: # unclear when this case arises...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely it will not here, unless you manually construct virtual grids

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is a virtual grid?

I realize this may be an involved answer - so if you could just point me to a frontend (or other area of the code) using virtual grids, I can probably investigate that on my own.

Comment thread yt/frontends/cholla/io.py
@mabruzzo
Copy link
Copy Markdown
Contributor Author

My apologies for taking a while to follow up on this. I plan to circle back in the next week or so.

Co-authored-by: Matthew Turk <matthewturk@gmail.com>
@chrishavlin
Copy link
Copy Markdown
Contributor

The test failure from the cancelled test here is unrelated, see #5153

@mabruzzo
Copy link
Copy Markdown
Contributor Author

@matthewturk, I know it's been a year and a half, but I finally made the requested changes. After you take another look, (and once you guys figure out how to handle that cancelled test), I think this will be good to go

@chrishavlin
Copy link
Copy Markdown
Contributor

hey @mabruzzo if you merge with main again the skipped test should run.

@mabruzzo
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@mabruzzo
Copy link
Copy Markdown
Contributor Author

@matthewturk, this is just a gentle nudge about this PR.

As I've mentioned elsewhere, we may want to close this PR in favor of #5170 (#5170 includes all the changes from here plus a few more changes in order to be a little more general purpose).

@matthewturk
Copy link
Copy Markdown
Member

Hi @mabruzzo -- which would you prefer of those options?

@mabruzzo
Copy link
Copy Markdown
Contributor Author

@matthewturk, personally, I'd prefer that we close this in favor of #5170. #5170 is not that much bigger and I think I had to overwrite a little code I contributed here (it's been a while, so I don't precisely remember the details).

But, I'm flexible

@matthewturk
Copy link
Copy Markdown
Member

If they're indeed in conflict, and that one obviates this one, I say go for it.

@mabruzzo
Copy link
Copy Markdown
Contributor Author

Superseded by #5170

@mabruzzo mabruzzo closed this Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code frontends Things related to specific frontends enhancement Making something better frontend: cholla

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants