-
Notifications
You must be signed in to change notification settings - Fork 34
Support large scale analysis on ultra-high res data via Dask #952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1e3f6b3 to
250271d
Compare
6531fec to
fa5d910
Compare
Add `cluster` parameter to enable custom dask cluster configs Add `chunks="auto"` arg to `open_mfdataset()` - Update `convert_units()` to make udunit conversion dask compatible via `xr.apply_ufunc` Add old multiprocessing code for backwards compatibility - Add `dask_scheduler_type` attribute to `CoreParameter` - Update multiprocessing logic to take into account `dask_scheduler_type` Update comment in `run_diags()` for clarity Update run scripts to run all variables - Update `_cleanup_dask_resources()` to wait for workers to close before closing client and cluster - Add data loading for lat_lon_driver.py metrics
fa5d910 to
ba70873
Compare
ESMF + xESMF failure when using Dask DistributedI am running into an issue where xESMF/ESMF breaks when using a Dask Distributed scheduler and the num_workers is < number of regridding tasks (based on number of variables). This also might be related to #988. ESMCI::VM::getCurrent() Internal error: Bad condition - Could not determine current VM
ESMF_VMGetCurrent() Internal error: Bad condition
ESMF_GridCreateNoPeriDimR Internal error: Bad condition
...This is triggered when Avoid initializing any Attempts to fix
|
xESMF does not play well with Dask Distributed Scheduler
If multiple xESMF operations run concurrently in the same process (e.g., through shared Dask workers), this can cause segfaults, corrupted results, or silent hangs. ✅ Recommended setup:Client(n_workers=N, threads_per_worker=1, processes=True)Then use 💡 Tips:
This pattern ensures stable and parallel-safe usage of |
Description
Changes so far
CoreParameterattributes:dask_scheduler_typeanddask_memory_limitdask_scheduler_type--"processes"(default for backwards compatibility) or"distributed"dask_memory_limit--"auto"or specific memory limit such as"2GB","512MB"`, etc."auto"is the default"auto"chunk setting with to allow for a generalized chunking scheme regardless of the data size. This is necessary because E3SM Diagnostics receives input data that can vary in size and shape.TODO:
Checklist
If applicable: