Add support for distributed cholla datasets. by mabruzzo · Pull Request #4702 · yt-project/yt

mabruzzo · 2023-10-10T23:19:39Z

PR Summary

This PR adds support for loading Cholla datasets that are distributed over multiple files. Previously, the frontend could only load Cholla datasets after they were concatenated into a single large dataset.

This functionality is currently a little inefficient right now - we need to read in every hdf5 file to figure out the mapping between spatial locations and locations on disk. This seems like something we can easily improve in the future (possibly by having Cholla write out an extra attribute how 3D locations are mapped into 1D).

PR Checklist

Adds a test for any bugs fixed. Adds tests for new features.

For this PR, I suspect that we will need to upload a new test dataset. I just had a few questions:

It's been a while since I've done this. Could someone remind me of the procedure for doing this?
Weirdly enough, I get the following message when I run the unit-tests on the main branch. Do you have any idea why this is happening? (For context, the other 3 tests all run)

yt/frontends/cholla/tests/test_outputs.py::test_cholla_data SKIPPED (cannot load dataset ChollaSimple/0.h5)
Is there any preference for unit tests vs answer-tests when it comes to frontends?

neutrinoceros · 2023-10-11T05:51:21Z

It's been a while since I've done this. Could someone remind me of the procedure for doing this?

you'll need to

open a pull request on the website repository (see for example Adding basic gizmo_zeldovich entry. website#121)
add an entry to yt/sample_data_registry.json (this repo). This is to support loading the new sample dataset with yt.load_sample

Weirdly enough, I get the following message when I run the unit-tests on the main branch. Do you have any idea why this is happening? (For context, the other 3 tests all run)

Maybe that's a bug with small_patch_amr. I suggest trying to work on a simplified version of the test and refine it until it doesn't skip, to discover what's happening.

Is there any preference for unit tests vs answer-tests when it comes to frontends?

I think unit tests should be preferred whenever they suffice for a couple reasons:

answer tests are currently deeply rooted in the nose test framework (migration to pytest is still ongoing), so adding more of them makes this long lasting migration ever so slightly harder
fast tests are easier to scale

That said, if what you need is some answer tests, go for it !

matthewturk · 2023-10-18T17:10:16Z

 from .fields import ChollaFieldInfo


+def _split_fname_proc_suffix(filename: str):


Could you put a short note about how this is different from os.path.splitext? Just to avoid future confusion.

Good point -- I overhauled the docstring to try to make it more clear (and I explicitly addressed how it differs from os.path.splitext)

matthewturk

only minor stuff -- looks good otherwise

matthewturk · 2023-10-18T17:16:17Z

+    def io_iter(self, chunks, fields):
+        # this is loosely inspired by the implementation used for Enzo/Enzo-E
+        # - those other options use the lower-level hdf5 interface. Unclear
+        #   whether that affords any advantages...


Good question. I think in the past it did because we avoided having to re-allocate temporary scratch space, but I am not sure that would hold up to current inquiries. I think the big advantage those have is tracking the groups within the iteration.

matthewturk · 2023-10-18T17:16:39Z

+        fh, filename = None, None
+        for chunk in chunks:
+            for obj in chunk.objs:
+                if obj.filename is None:  # unclear when this case arises...


likely it will not here, unless you manually construct virtual grids

Out of curiosity, what is a virtual grid?

I realize this may be an involved answer - so if you could just point me to a frontend (or other area of the code) using virtual grids, I can probably investigate that on my own.

mabruzzo · 2023-10-27T15:20:22Z

My apologies for taking a while to follow up on this. I plan to circle back in the next week or so.

Co-authored-by: Matthew Turk <matthewturk@gmail.com>

chrishavlin · 2025-04-22T14:52:21Z

The test failure from the cancelled test here is unrelated, see #5153

mabruzzo · 2025-04-22T15:16:13Z

@matthewturk, I know it's been a year and a half, but I finally made the requested changes. After you take another look, (and once you guys figure out how to handle that cancelled test), I think this will be good to go

chrishavlin · 2025-04-23T17:36:31Z

hey @mabruzzo if you merge with main again the skipped test should run.

mabruzzo · 2025-04-24T21:27:07Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

mabruzzo · 2025-08-11T22:00:57Z

@matthewturk, this is just a gentle nudge about this PR.

As I've mentioned elsewhere, we may want to close this PR in favor of #5170 (#5170 includes all the changes from here plus a few more changes in order to be a little more general purpose).

matthewturk · 2025-08-26T16:14:31Z

Hi @mabruzzo -- which would you prefer of those options?

mabruzzo · 2025-08-26T16:22:45Z

@matthewturk, personally, I'd prefer that we close this in favor of #5170. #5170 is not that much bigger and I think I had to overwrite a little code I contributed here (it's been a while, so I don't precisely remember the details).

But, I'm flexible

matthewturk · 2025-08-26T16:34:21Z

If they're indeed in conflict, and that one obviates this one, I say go for it.

mabruzzo · 2025-08-26T17:17:05Z

Superseded by #5170

neutrinoceros added code frontends Things related to specific frontends enhancement Making something better labels Oct 11, 2023

Add support for distributed cholla datasets.

4428058

mabruzzo force-pushed the cholla-frontend-improvements branch from 2806b17 to 4428058 Compare October 18, 2023 14:28

matthewturk reviewed Oct 18, 2023

View reviewed changes

matthewturk previously approved these changes Oct 18, 2023

View reviewed changes

mabruzzo mentioned this pull request Oct 27, 2023

minor bugfix in cholla frontend #4686

Merged

neutrinoceros added the frontend: cholla label Nov 12, 2024

Update yt/frontends/cholla/data_structures.py

5b51057

Co-authored-by: Matthew Turk <matthewturk@gmail.com>

mabruzzo dismissed matthewturk’s stale review via 5b51057 April 22, 2025 01:14

Merge branch 'main' into cholla-frontend-improvements

ae37959

improve the docstring for _split_fname_procid_suffix

448b483

Merge branch 'main' into cholla-frontend-improvements

a51fe81

[pre-commit.ci] auto fixes from pre-commit.com hooks

0e531b1

for more information, see https://pre-commit.ci

mabruzzo mentioned this pull request Apr 25, 2025

Introduce new organizational schema format for storing Cholla data cholla-hydro/cholla#427

Merged

mabruzzo mentioned this pull request May 21, 2025

Adds support for Cholla's new Concatenation Format #5170

Open

1 task

mabruzzo closed this Aug 26, 2025

		from .fields import ChollaFieldInfo


		def _split_fname_proc_suffix(filename: str):

Conversation

mabruzzo commented Oct 10, 2023

PR Summary

PR Checklist

Uh oh!

neutrinoceros commented Oct 11, 2023

Uh oh!

matthewturk Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

mabruzzo Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

matthewturk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matthewturk Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

matthewturk Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

mabruzzo Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mabruzzo commented Oct 27, 2023

Uh oh!

chrishavlin commented Apr 22, 2025

Uh oh!

mabruzzo commented Apr 22, 2025

Uh oh!

chrishavlin commented Apr 23, 2025

Uh oh!

mabruzzo commented Apr 24, 2025

Uh oh!

mabruzzo commented Aug 11, 2025

Uh oh!

matthewturk commented Aug 26, 2025

Uh oh!

mabruzzo commented Aug 26, 2025

Uh oh!

matthewturk commented Aug 26, 2025

Uh oh!

mabruzzo commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants