[RAMSES] Optimisation: fast bbox rejection when finding intersecting domains#5379
[RAMSES] Optimisation: fast bbox rejection when finding intersecting domains#5379AnatoleStorck wants to merge 1 commit intoyt-project:mainfrom
Conversation
|
@cphyc Could you look this over? |
cphyc
left a comment
There was a problem hiding this comment.
Oh, that's a great optimisation!
This looks great to me and should already be covered by the existing tests, afaik.
Now that you're at it, do you think it'd be worth changing line 88 to
dx_cond = float(max(
(bbox[1]-bbox[0]).min(),
(bbox[1]-bbox[0]).max() / 16,
).to("code_length"))The value "16" is quite arbitrary, but it prevents us from splitting the region into more than 16³ subregions to be checked (16³ sounds like a reasonable number 🙃).
This should prevent recursing too much if one of the dimension is really tiny compared to the others.
| dx_cond: float | None = None, | ||
| factor: float = 4.0, | ||
| bound_keys: Optional["np.ndarray[Any, np.dtype[np.float64]]"] = None, | ||
| bbox: Optional[Any] = None, |
There was a problem hiding this comment.
I just realised, but Optional[Any] should probably be replaced by Optional[npt.NDArray] or even Optional[npt.NDArray[float]] (you'd need import numpy.testing as npt, see e.g. #5349).
So I did a bit of testing, trying to define a disk which takes the longest to generate intersections for with the current implementation. I ended up landing on a disk with 70 kpc radius and 0.33 kpc height (In a 50 Mpc box.) With the current changes, it took 116 seconds to calculate the intersecting domains. It found 608/4096 intersecting domains. With your change and a factor of 16, it took 4 seconds, and found 898/4096 intersecting domains. So I'm not sure if the time offset between a faster intersection computation time makes up for the fact you have to search ~200-300 more domains in this case. Of course for an easier morphology, all cases take at most a few seconds so it only matters for the most extreme cases. |
PR Summary
When deciding which files to load from a RAMSES dataset to get data for a YTRegion, we do a recursive check on the overlap between the YTRegion's bbox and the space covered by a cpu. The current implementation takes a very long time (~hours) to find the intersecting cpus for a thin disk (whose height can be only a few cells) inside of a large simulation volume, as it gets stuck in extremely deep/expensive recursion. I add here an initial check that the cube is within the bbox, reducing the number of subcubes visited for small/thin disks. This brings down the intersection computation time for thin disks to a few seconds, while leaving the behavior for spheres and bulkier regions seemingly unchanged (although this could be checked more extensively.)