-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Is your feature request related to a problem?
We've ran into issues in the past where the MPI version of esmf on conda-forge is not compatible with the MPI installed on HPC environments (e.g., LCRC), resulting in MPI initialization errors.
The workaround with E3SM Unified is to constrain esmf to the nompi version. The limitation to this workaround is that it restricts users from using ESMF with MPI for faster regridding processes, while also limiting all packages in the environment to no MPI as well.
Related issues/PRs
- [Bug]:
e3sm_diagsfails on E3SM Unified 1.11.0rc6 withconcurrent.futures.process.BrokenProcessPool#933 (comment) - Investigate performance degradation with latest dependencies #986 (comment)
Describe the solution you'd like
Here's a possible solution:
- Use
subprocessto generate regridding weights with ESMF (e.g.,ESMF_RegridWeightGen) -- This might avoid the MPI init issue
import subprocess
subprocess.run([
"ESMF_RegridWeightGen",
"--source", "source_grid.nc",
"--destination", "target_grid.nc",
"--weight", "weights_file.nc",
"--method", "bilinear",
"--ignore_unmapped"
])- Use xESMF to apply the pre-computed weights.
regridder = xesmf.Regridder(ds, ds_out, weights="weights_file.nc", reuse_weights=True)Describe alternatives you've considered
The short-term workaround is to constrain esmf to nompi.
Additional context
Should I use subprocess instead of xESMF (ChatGPT)?
Whether you should use a subprocess call to ESMF directly instead of xESMF depends on your use case. Here's a breakdown of the trade-offs and guidance:
✅ When to Use xESMF
xESMF is a high-level, Pythonic interface built on top of ESMF_RegridWeightGen, and it is the preferred choice in most workflows, especially when:
-
You're working with NetCDF,
xarray, or climate/scientific data in Python. -
You want to integrate regridding into a Python pipeline (e.g., with Dask, Pandas, SciPy).
-
You don’t require very fine-grained control over parallelism (or you’re okay with serial/nompi execution).
-
Your deployment environment is compatible with Python+MPI, or you’re fine using
esmf-nompi.
Advantages:
-
Simple API (
xesmf.Regridder) -
Automatically generates weights and applies them.
-
Good for small- to medium-scale workflows.
🛠 When to Use ESMF Directly via subprocess
You might want to invoke ESMF_RegridWeightGen or ESMF_Regrid directly via subprocess when:
-
You want full control over the ESMF command-line interface or output formats.
-
You’re working in an environment where Python MPI support is broken, but native
mpirun+ ESMF binaries work. -
You need to process very large grids in parallel, and
xESMF+esmf-nompiis too slow or runs out of memory. -
You want to precompute and store regridding weights for reuse across pipelines or applications.
-
You want to avoid Conda or Python environment issues altogether.
Example subprocess call:
import subprocess
subprocess.run([
"ESMF_RegridWeightGen",
"--source", "source_grid.nc",
"--destination", "target_grid.nc",
"--weight", "weights_file.nc",
"--method", "bilinear",
"--ignore_unmapped"
])
Then apply the weights manually in Python, NCO, or another tool.
🔄 Hybrid Strategy (Best of Both Worlds)
A great approach can be:
-
Use subprocess to generate regridding weights with MPI-enabled
ESMF_RegridWeightGen, usingmpirunorsrunas appropriate. -
Use
xESMFin Python to apply those precomputed weights using:
regridder = xesmf.Regridder(ds, ds_out, weights="weights_file.nc", reuse_weights=True)
This avoids the MPI_Init issues entirely but still lets you stay in Python for the rest of the workflow.
✅ Summary Recommendation
| Use Case | Recommended Tool |
|---|---|
| Small/medium workflows, simple Python use | xESMF with esmf-nompi |
| Broken MPI in Python or large jobs on HPC | subprocess with ESMF_RegridWeightGen |
| Precomputed weights for speed or reproducibility | Hybrid (generate via subprocess, apply with xESMF) |
| End-to-end Python workflow without Slurm | xESMF is easier and faster |
Let me know your workflow context (e.g., grid sizes, job launcher, need for parallelism), and I can suggest a tailored setup.