Refactor resource scaling in nextflow.config tower profile #28

adamjtaylor · 2025-09-16T17:39:03Z

This pull request introduces a robust retry and resource scaling strategy for Nextflow processes, with special emphasis on handling Docker out-of-memory (OOM) errors. The changes improve reliability by dynamically increasing resources and retry attempts for failed tasks, and add new configuration for very memory-intensive processes.

Error handling and retry strategy improvements:

Added a global error strategy that retries tasks up to three times for Docker OOM errors (exit codes 125, 137, 139), and up to two times for general failures, with exponential scaling of CPU and memory resources on each retry. (nextflow.config)
Updated profile-specific error strategies (tower profile) to consistently handle Docker OOM errors with increased retry limits and resource scaling for different process labels (process_low, process_medium, process_high). (nextflow.config)

Resource scaling enhancements:

Changed CPU and memory allocation logic to use exponential scaling based on retry attempts, ensuring more resources are allocated to tasks that fail repeatedly. (nextflow.config)
For process_high and process_high_memory labels, increased maximum retry attempts and introduced higher memory scaling for very demanding tasks. (nextflow.config)
Added a new process_high_memory label for processes requiring exceptionally large

…mproved retry logic and memory management

Copilot

Pull request overview

This PR refactors the Nextflow configuration to implement a robust retry and resource scaling strategy for handling Docker out-of-memory (OOM) errors across different process types. The changes introduce global default settings and enhance profile-specific configurations with exponential memory scaling.

Changes:

Added global error handling strategy for Docker OOM errors with retry logic and exponential resource scaling
Updated the tower profile with exponential memory scaling and differentiated retry logic for OOM vs. general failures
Introduced a new process_high_memory label for exceptionally memory-intensive processes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-15T10:40:45Z

nextflow.config

+    maxRetries = 3
+
+    // Default resource scaling
+    cpus = { 2 * task.attempt }


The global default CPU allocation grows unbounded with retry attempts (2, 4, 6, 8, etc.). Consider adding a Math.min() cap similar to the tower profile's process_medium and process_high labels to prevent excessive CPU allocation on later retries.

Suggested change

cpus = { 2 * task.attempt }

cpus = { Math.min(2 * task.attempt, 8) }

Copilot · 2026-01-15T10:40:46Z

nextflow.config

+                    if (task.exitStatus in [125, 137, 139]) {
+                        return task.attempt <= 3 ? 'retry' : 'ignore'
+                    }
+                    return task.attempt <= 3 ? 'retry' : 'ignore'


The error strategy for process_medium has redundant logic - both the OOM error case and the default case return the same condition (task.attempt <= 3). This can be simplified to just 'return task.attempt <= 3 ? 'retry' : 'ignore'' without the if statement, or the non-OOM case should have a different retry limit to match the differentiation in other process labels.

Suggested change

return task.attempt <= 3 ? 'retry' : 'ignore'

return task.attempt <= 2 ? 'retry' : 'ignore'

Copilot · 2026-01-15T10:40:46Z

nextflow.config

-                errorStrategy = {task.attempt <= 3 ? 'retry' : 'ignore' }
+                cpus = { Math.min(8 * task.attempt, 16) }
+                memory = { 16.GB * Math.pow(2, task.attempt - 1) } // 16GB, 32GB, 64GB
+                maxRetries = 4 // One extra retry for high-memory processes


The comment states 'high-memory processes' but this is for the process_high label, not process_high_memory. Either the comment should say 'high processes' or it may be confusing since process_high_memory is introduced later.

Suggested change

maxRetries = 4 // One extra retry for high-memory processes

maxRetries = 4 // One extra retry for high processes

aditigopalan · 2026-01-15T20:15:43Z

Looks like sage.config uses uses check_max() and slow() functions and tower profile uses direct calculations. Is this intentional?

Refactor error handling and resource scaling in nextflow.config for i…

c587867

…mproved retry logic and memory management

adamjtaylor requested a review from Copilot January 15, 2026 10:38

Copilot started reviewing on behalf of adamjtaylor January 15, 2026 10:39 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor resource scaling in nextflow.config tower profile #28

Refactor resource scaling in nextflow.config tower profile #28

Uh oh!

adamjtaylor commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

aditigopalan commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	cpus = { 2 * task.attempt }
	cpus = { Math.min(2 * task.attempt, 8) }

	return task.attempt <= 3 ? 'retry' : 'ignore'
	return task.attempt <= 2 ? 'retry' : 'ignore'

	maxRetries = 4 // One extra retry for high-memory processes
	maxRetries = 4 // One extra retry for high processes

Refactor resource scaling in nextflow.config tower profile #28

Are you sure you want to change the base?

Refactor resource scaling in nextflow.config tower profile #28

Uh oh!

Conversation

adamjtaylor commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

aditigopalan commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants