Skip to content

Conversation

@grondo
Copy link
Contributor

@grondo grondo commented Nov 18, 2025

The Spindle Flux plugin puts a 300s timeout while waiting for the shell.init event to be posted. This timeout could unnecessarily terminate a large job which is experiencing slow startup. The shell.init event is guaranteed to be posted for all jobs, and an overall timeout for hung jobs should be enforced elsewhere. Therefore, just drop the timeout in the spindle plugin.

For more background see a similar issue in flux-coral2 fixed here: flux-framework/flux-coral2#433

Problem: There's some extraneous trailing whitespace in the
`flux-spindle.c` Flux plugin sources.

Fix the trailing whitespace.
Problem: The 300s timeout in the spindle flux plugin may cause large
jobs experiencing slow startup to fail unnecessarily.

Drop this timeout as unnecessary since the `shell.init` event is
guaranteed to come at some point or the job will be timed out in some
other manner.
@mplegendre mplegendre merged commit 4e46998 into llnl:devel Nov 18, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants