Skip to content

Fix retry policy attempts to match server behavior #1414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Dec 26, 2024

What this changes

Apparently we've had an off-by-one error in the test suite for years now, but few noticed.

Due to this mistake, retry policy config like this:

&workflow.RetryPolicy{
  // ...
  MaximumAttempts: 3,
}

would result in up to four calls in tests, but only three in production.

The server has always considered "MaximumAttempts" to be "total number of calls" (going back to at least 2018 iirc, when I checked), so the test suite is simply wrong.
https://github.com/cadence-workflow/cadence/blob/830974dd9e49365bd974d288b21e949c48089216/service/history/execution/retry.go#L47-L51

What this means for users whose tests are now failing

Unfortunately, this means your tests have been misleading you.
Production is unchanged, your tests are just failing because they've been wrong all along, and you'll need to update them.

Sorry about that 🤦

This is especially true when MaximumAttempts: 1 is configured: this actually means zero retries, not one retry.
Ideally we would disallow MaximumAttempts: 1 because it's so misleading... but this would affect production, so we've been holding off on it. It may change in a later version with more communication though.


making the CI job happy:

Detailed Description
bugfix, covered above

Impact Analysis

  • only affects tests
  • no cross-version compatibility at all, it's intentionally "breaking" (at runtime)

Testing Plan

  • Unit Tests: all existing tests
  • Persistence Tests: n/a
  • Integration Tests: n/a
  • Compatibility Tests: n/a

Rollout Plan

  • autobots, which are self-rolling-out.

Copy link

codecov bot commented Dec 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.61%. Comparing base (1fd8ba0) to head (3e71788).

Files with missing lines Coverage Δ
internal/internal_task_handlers.go 81.79% <100.00%> (+0.17%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fd8ba0...3e71788. Read the comment docs.

@Groxx Groxx changed the title [wip] Fix retry policy attempts to match server behavior Fix retry policy attempts to match server behavior May 5, 2025
@Groxx Groxx marked this pull request as ready for review May 6, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants