Skip to content

Fix node count and tasks-per-node in mache.parallel#376

Merged
xylar merged 2 commits intoE3SM-Project:mainfrom
xylar:fix-mache-parallel-too-many-nodes
Apr 2, 2026
Merged

Fix node count and tasks-per-node in mache.parallel#376
xylar merged 2 commits intoE3SM-Project:mainfrom
xylar:fix-mache-parallel-too-many-nodes

Conversation

@xylar
Copy link
Copy Markdown
Collaborator

@xylar xylar commented Mar 31, 2026

First, we don't want to request more nodes than the number of tasks, which is currently happening under Slurm for small jobs.

Second, we don't really want to be spreading runs across all allocated nodes, we want to request the minimum number of nodes that will hold the run. This is because we want the option in the not-to-distant future (in Polaris at least) to run multiple tasks in parallel as resources allow.

Checklist

  • Tests pass and new features are covered by tests
  • Testing comment, if appropriate, in the PR documents testing used to verify the changes

xylar added 2 commits March 31, 2026 13:32
We should always be allocating the minimum number of nodes on
which the number of tasks will fit.
We want as many tasks per node as possible so idle nodes are
potentially available for other work.
@xylar xylar self-assigned this Mar 31, 2026
@xylar xylar added bug Something isn't working parallel labels Mar 31, 2026
@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Mar 31, 2026

I will test this in combination with other mache 3.3.0 changes.

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 2, 2026

Testing

I was able to see that I got the expected, reduced number of nodes with this fix.

@xylar xylar merged commit 8090bbd into E3SM-Project:main Apr 2, 2026
15 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working parallel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant