-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phil/controller backoffs #2038
Merged
Merged
Phil/controller backoffs #2038
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduces a backoff for failed periodic publications, to limit the rate of attempts when the controller gets run very frequently.
Introduces a backoff error after failed auto-discover attempts. This avoids attempting auto-disocvers too frequently in cases where the controller is being run frequently for other reasons. Having an error means that controllers with failed auto-discovers will still indicate this obviously. Also fixes a bug in auto-discovers, which didn't account for specs that have auto-discovers disabled completely, and would still attempt to discover them.
Updates the dependency update controller to consistently apply the backoff after failed publications. Previously, this would re-try the publication every time the controller runs, which resulted in frequent attempts when the controller was being invoked frequently for various other reasons.
Introduces a backoff in the materialization controller when publications to add sourceCapture bindings fail. This prevents rapid retries when the controller is invoked frequently.
Fixes a typo in the runtime, which resulted in only setting `lastSourcePublishedAt` in the stats documents if there were documents loaded from the destination. This field should be present if any documents were read from source collections.
6f9abc9
to
45b71e5
Compare
jgraettinger
approved these changes
Apr 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
update_next_run(status, state, model, control_plane).await?; | ||
has_changes | ||
} | ||
let outcome = match result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional nit: could use map_err
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Resolves #2032
Adds backoffs to controller-initiated publications and discovers, so that they don't retry them too frequently when the controller is invoked frequently for various different reasons. This ought to dramatically reduce the rate of failed publications and discovers, which are being run due to frequent shard failures, which trigger controller runs.
In all of these, the goal is to still show the controller as being in error status when it's waiting on a backoff, so that failures are still obvious when looking at the overall status.
Also fixes a bug in the runtime, which caused us to fail to set
lastSourcePublishedAt
in materialization stats documents if no documents were read from the source.This change is