Skip to content

Conversation

@arpit974
Copy link
Contributor

@arpit974 arpit974 commented Sep 25, 2025

Task: DWS - allow setting use_job_duration with non-exclusive partitions

Approach: putting a reservation on a provisioned node, that starts when DWS EOL is expected.

Changes:
Removed the restriction that use_job_duration can only be used in exclusive partitions from documentation and code.
Added functions to create and delete Slurm reservations when use_job_duration is enabled.
Updated the provisioning logic to automatically handle Slurm reservations at node boot and termination.
Minor doc and code cleanups related to DWS Flex options.

Expected Behavior:
nodes are provisioned with max_run_duration passed to DWS
all jobs, that have job time set to finish before DWS node expires can be scheduled on this node (provided that resources are available, if not - a new node is provisioned)
if the job specifies a job time that exceeds DWS node expiration, then new node(s) will be created for this job
when no jobs are running on nodes, SuspendTime takes effect, and deprovisions node, after defined time.

@arpit974 arpit974 force-pushed the dws_flex_start_non_exclusive_partition branch 3 times, most recently from 99ee5fa to da76576 Compare September 25, 2025 17:33
@arpit974 arpit974 changed the title allowing dws partition with use_job_duration to work with non-exlusive partition as well. allow setting use_job_duration with non-exclusive partitions. Sep 26, 2025
@arpit974 arpit974 changed the title allow setting use_job_duration with non-exclusive partitions. Allowing setting use_job_duration with non-exclusive partitions. Sep 26, 2025
@arpit974 arpit974 force-pushed the dws_flex_start_non_exclusive_partition branch from da76576 to 3ed469e Compare September 26, 2025 14:28
@arpit974 arpit974 added the release-breaking-changes Prevents "smooth" re-deploy across versions label Sep 28, 2025
@arpit974 arpit974 marked this pull request as ready for review September 29, 2025 04:22
@arpit974 arpit974 requested review from a team and samskillman as code owners September 29, 2025 04:22
@arpit974 arpit974 merged commit 52c9602 into GoogleCloudPlatform:develop Sep 29, 2025
39 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-breaking-changes Prevents "smooth" re-deploy across versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants