Skip to content

Conversation

@toddbaert
Copy link
Member

This PR specifies some provider behavior, specifically around stream health, gRPC retry policy, and FATAL codes.

Specifically, it:

@toddbaert toddbaert requested review from a team as code owners October 30, 2025 16:23
@netlify
Copy link

netlify bot commented Oct 30, 2025

Deploy Preview for polite-licorice-3db33c ready!

Name Link
🔨 Latest commit 7cb3e07
🔍 Latest deploy log https://app.netlify.com/projects/polite-licorice-3db33c/deploys/692d5425c39aba0008342d0c
😎 Deploy Preview https://deploy-preview-1818--polite-licorice-3db33c.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 30, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @toddbaert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and predictability of flagd provider behavior by formalizing how providers handle stream health, gRPC retry mechanisms, and fatal error conditions. It introduces a standardized retry policy for transient network issues and a critical new feature to recognize and react to non-transient (fatal) gRPC errors, preventing endless reconnection loops. Additionally, the documentation now explicitly outlines provider re-initialization capabilities, ensuring clearer and more reliable provider operations.

Highlights

  • Standardized Retry Policy: A clear gRPC retry policy has been published, to be adopted by all flagd providers, specifically for UNAVAILABLE and UNKNOWN status codes.
  • Fatal Error Handling: A new mechanism has been introduced to mark certain gRPC status codes as FATAL, which will cause providers to cease reconnection attempts for non-transient errors.
  • Provider Re-initialization: The state diagram has been updated to explicitly clarify that flagd providers should support re-initialization, provided they are not in a FATAL state.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Comment on lines +69 to +74
STALE --> NOT_READY: shutdown
ERROR --> READY: reconnected
ERROR --> [*]: shutdown
ERROR --> NOT_READY: shutdown
ERROR --> [*]: Error code == PROVIDER_FATAL
note right of STALE
note left of STALE
Copy link
Member Author

@toddbaert toddbaert Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old:

Image

new:

Image

The main different is we make it clear transitions are possible from non-fatal ERROR, back to NOT_READY... many implementations already support this, but not all.
I think it makes sense to specify this so we can be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the impression that PROVIDER_FATAL can only happen during initialization, where the error can be surfaced and handled by the caller.

With the current proposal, PROVIDER_FATAL can be a result of a failing sync. As a user, it seems that I'll get the default value and an error. Am I supposed to handle this error and exit the program?

Copy link
Member Author

@toddbaert toddbaert Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangenti I need to make some updates to reflect the discussion here.

We decided the best path forward is to provide an option to enumerate the status codes that a user considers FATAL. In the case those are received, whether it's the initial connection or not, the program can exit (or rebuild a new provider). We believed this was the best trade-off between usability and complexity, and it's easy to understand: select what you want to consider FATAL, and take the action you want when those codes are received; by marking a code is FATAL you are telling the provider that this code represents a non-transient error state.

I will make the related updates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included this.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the provider specification to clarify behavior around stream health, gRPC retry policies, and fatal error codes. The changes include updating the state diagram, defining a gRPC retry policy, and introducing the concept of fatal status codes that stop reconnection attempts. The documentation is clearer as a result. I've found a few issues: an invalid JSON example for the retry policy, an inconsistency in the number of retries described, and a minor stylistic point.

While the provider is in state `STALE` the provider resolves values from its cache or stored flag set rules, depending on its resolver mode.
When the time since the last disconnect first exceeds `retryGracePeriod`, the provider emits `ERROR`.
The provider attempts to reconnect indefinitely, with a maximum interval of `retryBackoffMaxMs`.
```json
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is standard retryPolicy, accepted in this JSON format by most gRPC implementations.

| offlineFlagSourcePath | FLAGD_OFFLINE_FLAG_SOURCE_PATH | offline, file-based flag definitions, overrides host/port/targetUri | string | null | file |
| offlinePollIntervalMs | FLAGD_OFFLINE_POLL_MS | poll interval for reading offlineFlagSourcePath | int | 5000 | file |
| contextEnricher | - | sync-metadata to evaluation context mapping function | function | identity function | in-process |
| fatalStatusCodes | - | a list of gRPC status codes, which will cause streams to give up and put the provider in a PROVIDER_FATAL state | array | [] | rpc & in-process |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only new option - the other changes are just whitespace.

toddbaert and others added 4 commits October 30, 2025 12:32
Signed-off-by: Todd Baert <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Todd Baert <[email protected]>
Co-authored-by: alexandraoberaigner <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
@toddbaert toddbaert force-pushed the docs/provider-spec-updates branch from aee9e31 to 0a8fd9c Compare November 27, 2025 19:04
@toddbaert
Copy link
Member Author

@aepfli @alexandraoberaigner made changes from your feedback, plz re-review.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 1, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants