-
-
Notifications
You must be signed in to change notification settings - Fork 167
added support for custom exit codes on retry #792 #902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it looks great! I've left a few comments.
var exitCode int | ||
var exitCodeFound bool | ||
|
||
// Try to extract exit code from different error types | ||
if exitErr, ok := execErr.(*exec.ExitError); ok { | ||
exitCode = exitErr.ExitCode() | ||
exitCodeFound = true | ||
logger.Debug(ctx, "Found ExitError", "error", execErr, "exitCode", exitCode) | ||
} else { | ||
// Try to parse exit code from error string | ||
errStr := execErr.Error() | ||
if strings.Contains(errStr, "exit status") { | ||
// Parse "exit status N" format | ||
parts := strings.Split(errStr, " ") | ||
if len(parts) > 2 { | ||
if code, err := strconv.Atoi(parts[2]); err == nil { | ||
exitCode = code | ||
exitCodeFound = true | ||
logger.Debug(ctx, "Parsed exit code from error string", "error", errStr, "exitCode", exitCode) | ||
} | ||
} | ||
} else if strings.Contains(errStr, "signal:") { | ||
// Handle signal termination | ||
exitCode = -1 | ||
exitCodeFound = true | ||
logger.Debug(ctx, "Process terminated by signal", "error", errStr) | ||
} | ||
} | ||
|
||
if !exitCodeFound { | ||
logger.Debug(ctx, "Could not determine exit code", "error", execErr, "errorType", fmt.Sprintf("%T", execErr)) | ||
// Default to exit code 1 if we can't determine the actual code | ||
exitCode = 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define a separate method for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yess sure.
shouldRetry := false | ||
if len(node.retryPolicy.ExitCodes) > 0 { | ||
// If exit codes are specified, only retry for those codes | ||
for _, code := range node.retryPolicy.ExitCodes { | ||
if exitCode == code { | ||
shouldRetry = true | ||
break | ||
} | ||
} | ||
logger.Debug(ctx, "Checking retry policy", "exitCode", exitCode, "allowedCodes", node.retryPolicy.ExitCodes, "shouldRetry", shouldRetry) | ||
} else { | ||
// If no exit codes specified, retry for any non-zero exit code | ||
shouldRetry = exitCode != 0 | ||
logger.Debug(ctx, "Using default retry policy", "exitCode", exitCode, "shouldRetry", shouldRetry) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define a ShouldRetry(exitCode int)
method on RetryPolicy
struct to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try it
@yottahmd resolved all the comments. |
if !shouldRetry { | ||
// finish the node with error | ||
node.data.SetStatus(NodeStatusError) | ||
node.data.MarkError(execErr) | ||
sc.setLastError(execErr) | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kriyanshii
Would it be possible to reuse the same logic in the switch statement below? In some cases, a step should be marked as successful even if it fails.
dagu/internal/digraph/scheduler/scheduler.go
Lines 203 to 214 in 2ab2fab
default: | |
// finish the node | |
node.data.SetStatus(NodeStatusError) | |
if node.shouldMarkSuccess(ctx) { | |
// mark as success if the node should be marked as success | |
// i.e. continueOn.markSuccess is set to true | |
node.data.SetStatus(NodeStatusSuccess) | |
} else { | |
node.data.MarkError(execErr) | |
sc.setLastError(execErr) | |
} | |
} |
Perhaps we could check the return value from the handleNodeRetry
fucntion and use fallthrough when the return value is false. What do you think?
Something like this:
case node.retryPolicy.Limit > node.data.GetRetryCount():
if sc.handleNodeRetry(ctx, node, execErr) {
continue ExecRepeat
}
+ fallthrough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good observation! We can reuse the success marking logic by using fallthrough when handleNodeRetry returns false. This would make the code more DRY (Don't Repeat Yourself) and ensure consistent behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! I've added some nit comments.
if !exitCodeFound { | ||
logger.Debug(ctx, "Could not determine exit code", "error", execErr, "errorType", fmt.Sprintf("%T", execErr)) | ||
// Default to exit code 1 if we can't determine the actual code | ||
exitCode = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we can initialize the exitCode with 1
at declaration. WDYT?
@kriyanshii Would you mind updating the documents to reflect this change? Thanks a lot! |
hey can you check it? i have updated docs. do i need to update anything else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the document and JSON schema! LGTM 🚀🚀🚀
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #902 +/- ##
==========================================
- Coverage 55.48% 55.41% -0.08%
==========================================
Files 75 75
Lines 8240 8301 +61
==========================================
+ Hits 4572 4600 +28
- Misses 3281 3308 +27
- Partials 387 393 +6
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Add Custom Exit Code Support for Retry Policy
Description
This PR adds support for specifying custom exit codes that should trigger task retries. Previously, the retry policy would retry on any non-zero exit code. Now users can specify exactly which exit codes should trigger retries, while other non-zero exit codes will cause the task to fail immediately.
Changes
exitCode
field to the retry policy schema inschemas/dag.schema.json
RetryPolicy
struct ininternal/digraph/step.go
to includeExitCodes
fieldExitCode
field toretryPolicyDef
struct ininternal/digraph/spec.go
internal/digraph/builder.go
to handle the newexitCode
fieldinternal/digraph/scheduler/scheduler.go
:Example Usage
Backward Compatibility
exitCode
is specified, the behavior remains unchanged (retries on any non-zero exit code)Testing
Benefits