-
Notifications
You must be signed in to change notification settings - Fork 95
docs: fatal codes, re-init, and retry policy #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Todd Baert <[email protected]>
✅ Deploy Preview for polite-licorice-3db33c ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary of ChangesHello @toddbaert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and predictability of flagd provider behavior by formalizing how providers handle stream health, gRPC retry mechanisms, and fatal error conditions. It introduces a standardized retry policy for transient network issues and a critical new feature to recognize and react to non-transient (fatal) gRPC errors, preventing endless reconnection loops. Additionally, the documentation now explicitly outlines provider re-initialization capabilities, ensuring clearer and more reliable provider operations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| STALE --> NOT_READY: shutdown | ||
| ERROR --> READY: reconnected | ||
| ERROR --> [*]: shutdown | ||
| ERROR --> NOT_READY: shutdown | ||
| ERROR --> [*]: Error code == PROVIDER_FATAL | ||
| note right of STALE | ||
| note left of STALE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the impression that PROVIDER_FATAL can only happen during initialization, where the error can be surfaced and handled by the caller.
With the current proposal, PROVIDER_FATAL can be a result of a failing sync. As a user, it seems that I'll get the default value and an error. Am I supposed to handle this error and exit the program?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangenti I need to make some updates to reflect the discussion here.
We decided the best path forward is to provide an option to enumerate the status codes that a user considers FATAL. In the case those are received, whether it's the initial connection or not, the program can exit (or rebuild a new provider). We believed this was the best trade-off between usability and complexity, and it's easy to understand: select what you want to consider FATAL, and take the action you want when those codes are received; by marking a code is FATAL you are telling the provider that this code represents a non-transient error state.
I will make the related updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the provider specification to clarify behavior around stream health, gRPC retry policies, and fatal error codes. The changes include updating the state diagram, defining a gRPC retry policy, and introducing the concept of fatal status codes that stop reconnection attempts. The documentation is clearer as a result. I've found a few issues: an invalid JSON example for the retry policy, an inconsistency in the number of retries described, and a minor stylistic point.
| While the provider is in state `STALE` the provider resolves values from its cache or stored flag set rules, depending on its resolver mode. | ||
| When the time since the last disconnect first exceeds `retryGracePeriod`, the provider emits `ERROR`. | ||
| The provider attempts to reconnect indefinitely, with a maximum interval of `retryBackoffMaxMs`. | ||
| ```json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is standard retryPolicy, accepted in this JSON format by most gRPC implementations.
| | offlineFlagSourcePath | FLAGD_OFFLINE_FLAG_SOURCE_PATH | offline, file-based flag definitions, overrides host/port/targetUri | string | null | file | | ||
| | offlinePollIntervalMs | FLAGD_OFFLINE_POLL_MS | poll interval for reading offlineFlagSourcePath | int | 5000 | file | | ||
| | contextEnricher | - | sync-metadata to evaluation context mapping function | function | identity function | in-process | | ||
| | fatalStatusCodes | - | a list of gRPC status codes, which will cause streams to give up and put the provider in a PROVIDER_FATAL state | array | [] | rpc & in-process | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only new option - the other changes are just whitespace.
Signed-off-by: Todd Baert <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Todd Baert <[email protected]>
Co-authored-by: alexandraoberaigner <[email protected]> Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
aee9e31 to
0a8fd9c
Compare
|
@aepfli @alexandraoberaigner made changes from your feedback, plz re-review. |
|





This PR specifies some provider behavior, specifically around stream health, gRPC retry policy, and FATAL codes.
Specifically, it: