Skip to content

Comments

[RAMSES] Optimisation: fast bbox rejection when finding intersecting domains#5379

Open
AnatoleStorck wants to merge 1 commit intoyt-project:mainfrom
AnatoleStorck:fix_disk_intersecting_cpus_calculation
Open

[RAMSES] Optimisation: fast bbox rejection when finding intersecting domains#5379
AnatoleStorck wants to merge 1 commit intoyt-project:mainfrom
AnatoleStorck:fix_disk_intersecting_cpus_calculation

Conversation

@AnatoleStorck
Copy link
Contributor

PR Summary

When deciding which files to load from a RAMSES dataset to get data for a YTRegion, we do a recursive check on the overlap between the YTRegion's bbox and the space covered by a cpu. The current implementation takes a very long time (~hours) to find the intersecting cpus for a thin disk (whose height can be only a few cells) inside of a large simulation volume, as it gets stuck in extremely deep/expensive recursion. I add here an initial check that the cube is within the bbox, reducing the number of subcubes visited for small/thin disks. This brings down the intersection computation time for thin disks to a few seconds, while leaving the behavior for spheres and bulkier regions seemingly unchanged (although this could be checked more extensively.)

@AnatoleStorck
Copy link
Contributor Author

@cphyc Could you look this over?

Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a great optimisation!

This looks great to me and should already be covered by the existing tests, afaik.

Now that you're at it, do you think it'd be worth changing line 88 to

dx_cond = float(max(
    (bbox[1]-bbox[0]).min(),
    (bbox[1]-bbox[0]).max() / 16,
).to("code_length"))

The value "16" is quite arbitrary, but it prevents us from splitting the region into more than 16³ subregions to be checked (16³ sounds like a reasonable number 🙃).

This should prevent recursing too much if one of the dimension is really tiny compared to the others.

@cphyc cphyc changed the title bbox rejection for finding cpu intersections of a RAMSES dataset with a YTRegion [RAMSES] Optimisatoin: fast bbox rejection when finding intersecting domains Feb 14, 2026
@cphyc cphyc added enhancement Making something better frontend: ramses labels Feb 14, 2026
dx_cond: float | None = None,
factor: float = 4.0,
bound_keys: Optional["np.ndarray[Any, np.dtype[np.float64]]"] = None,
bbox: Optional[Any] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised, but Optional[Any] should probably be replaced by Optional[npt.NDArray] or even Optional[npt.NDArray[float]] (you'd need import numpy.testing as npt, see e.g. #5349).

@cphyc cphyc changed the title [RAMSES] Optimisatoin: fast bbox rejection when finding intersecting domains [RAMSES] Optimisation: fast bbox rejection when finding intersecting domains Feb 14, 2026
@AnatoleStorck
Copy link
Contributor Author

Oh, that's a great optimisation!

This looks great to me and should already be covered by the existing tests, afaik.

Now that you're at it, do you think it'd be worth changing line 88 to

dx_cond = float(max(
    (bbox[1]-bbox[0]).min(),
    (bbox[1]-bbox[0]).max() / 16,
).to("code_length"))

The value "16" is quite arbitrary, but it prevents us from splitting the region into more than 16³ subregions to be checked (16³ sounds like a reasonable number 🙃).

This should prevent recursing too much if one of the dimension is really tiny compared to the others.

So I did a bit of testing, trying to define a disk which takes the longest to generate intersections for with the current implementation. I ended up landing on a disk with 70 kpc radius and 0.33 kpc height (In a 50 Mpc box.)

With the current changes, it took 116 seconds to calculate the intersecting domains. It found 608/4096 intersecting domains.

With your change and a factor of 16, it took 4 seconds, and found 898/4096 intersecting domains.
With your change and a factor of 32, it took 20 seconds, and found 808/4096 intersecting domains.
With your change and a factor of 64, the computation gets back to the current change (116 secs and 608 domains)

So I'm not sure if the time offset between a faster intersection computation time makes up for the fact you have to search ~200-300 more domains in this case. Of course for an easier morphology, all cases take at most a few seconds so it only matters for the most extreme cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Making something better frontend: ramses

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants