Fix retry policy attempts to match server behavior #1414
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this changes
Apparently we've had an off-by-one error in the test suite for years now, but few noticed.
Due to this mistake, retry policy config like this:
would result in up to four calls in tests, but only three in production.
The server has always considered "MaximumAttempts" to be "total number of calls" (going back to at least 2018 iirc, when I checked), so the test suite is simply wrong.
https://github.com/cadence-workflow/cadence/blob/830974dd9e49365bd974d288b21e949c48089216/service/history/execution/retry.go#L47-L51
What this means for users whose tests are now failing
Unfortunately, this means your tests have been misleading you.
Production is unchanged, your tests are just failing because they've been wrong all along, and you'll need to update them.
Sorry about that 🤦
This is especially true when
MaximumAttempts: 1
is configured: this actually means zero retries, not one retry.Ideally we would disallow
MaximumAttempts: 1
because it's so misleading... but this would affect production, so we've been holding off on it. It may change in a later version with more communication though.making the CI job happy:
Detailed Description
bugfix, covered above
Impact Analysis
Testing Plan
Rollout Plan