docs: fatal codes, re-init, and retry policy #1818

toddbaert · 2025-10-30T16:23:06Z

This PR specifies some provider behavior, specifically around stream health, gRPC retry policy, and FATAL codes.

Specifically, it:

publishes a retry policy that is shall be used by all flagd providers
specifies a new option for marking some gRPC status codes as FATAL, which will cause the provider to stop attempting to reconnect (generally useful and requested in Infinite retry to establish connection to FlagSyncService in Flagd golang provider go-sdk-contrib#756)
makes clear via state diagram that flagd provider should support re-initialization (if not in FATAL state)

Signed-off-by: Todd Baert <[email protected]>

netlify · 2025-10-30T16:23:12Z

✅ Deploy Preview for polite-licorice-3db33c ready!

Name	Link
🔨 Latest commit	`7cb3e07`
🔍 Latest deploy log	https://app.netlify.com/projects/polite-licorice-3db33c/deploys/692d5425c39aba0008342d0c
😎 Deploy Preview	https://deploy-preview-1818--polite-licorice-3db33c.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

gemini-code-assist · 2025-10-30T16:23:23Z

Summary of Changes

Hello @toddbaert, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and predictability of flagd provider behavior by formalizing how providers handle stream health, gRPC retry mechanisms, and fatal error conditions. It introduces a standardized retry policy for transient network issues and a critical new feature to recognize and react to non-transient (fatal) gRPC errors, preventing endless reconnection loops. Additionally, the documentation now explicitly outlines provider re-initialization capabilities, ensuring clearer and more reliable provider operations.

Highlights

Standardized Retry Policy: A clear gRPC retry policy has been published, to be adopted by all flagd providers, specifically for UNAVAILABLE and UNKNOWN status codes.
Fatal Error Handling: A new mechanism has been introduced to mark certain gRPC status codes as FATAL, which will cause providers to cease reconnection attempts for non-transient errors.
Provider Re-initialization: The state diagram has been updated to explicitly clarify that flagd providers should support re-initialization, provided they are not in a FATAL state.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

toddbaert · 2025-10-30T16:25:21Z

docs/reference/specifications/providers.md

+    STALE --> NOT_READY: shutdown
    ERROR --> READY: reconnected
-    ERROR --> [*]: shutdown
+    ERROR --> NOT_READY: shutdown
+    ERROR --> [*]: Error code == PROVIDER_FATAL

-    note right of STALE
+    note left of STALE


old:

new:

The main different is we make it clear transitions are possible from non-fatal ERROR, back to NOT_READY... many implementations already support this, but not all.
I think it makes sense to specify this so we can be consistent.

I had the impression that PROVIDER_FATAL can only happen during initialization, where the error can be surfaced and handled by the caller.

With the current proposal, PROVIDER_FATAL can be a result of a failing sync. As a user, it seems that I'll get the default value and an error. Am I supposed to handle this error and exit the program?

@tangenti I need to make some updates to reflect the discussion here.

We decided the best path forward is to provide an option to enumerate the status codes that a user considers FATAL. In the case those are received, whether it's the initial connection or not, the program can exit (or rebuild a new provider). We believed this was the best trade-off between usability and complexity, and it's easy to understand: select what you want to consider FATAL, and take the action you want when those codes are received; by marking a code is FATAL you are telling the provider that this code represents a non-transient error state.

I will make the related updates.

I've included this.

gemini-code-assist

Code Review

This pull request updates the provider specification to clarify behavior around stream health, gRPC retry policies, and fatal error codes. The changes include updating the state diagram, defining a gRPC retry policy, and introducing the concept of fatal status codes that stop reconnection attempts. The documentation is clearer as a result. I've found a few issues: an invalid JSON example for the retry policy, an inconsistency in the number of retries described, and a minor stylistic point.

docs/reference/specifications/providers.md

toddbaert · 2025-10-30T16:27:20Z

docs/reference/specifications/providers.md

-While the provider is in state `STALE` the provider resolves values from its cache or stored flag set rules, depending on its resolver mode.
-When the time since the last disconnect first exceeds `retryGracePeriod`, the provider emits `ERROR`.
-The provider attempts to reconnect indefinitely, with a maximum interval of `retryBackoffMaxMs`.
+```json


This is standard retryPolicy, accepted in this JSON format by most gRPC implementations.

toddbaert · 2025-10-30T16:27:43Z

docs/reference/specifications/providers.md

+| offlineFlagSourcePath | FLAGD_OFFLINE_FLAG_SOURCE_PATH | offline, file-based flag definitions, overrides host/port/targetUri                                             | string                       | null                          | file                    |
+| offlinePollIntervalMs | FLAGD_OFFLINE_POLL_MS          | poll interval for reading offlineFlagSourcePath                                                                 | int                          | 5000                          | file                    |
+| contextEnricher       | -                              | sync-metadata to evaluation context mapping function                                                            | function                     | identity function             | in-process              |
+| fatalStatusCodes      | -                              | a list of gRPC status codes, which will cause streams to give up and put the provider in a PROVIDER_FATAL state | array                        | []                            | rpc & in-process        |


This is the only new option - the other changes are just whitespace.

Signed-off-by: Todd Baert <[email protected]>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Todd Baert <[email protected]>

Signed-off-by: Todd Baert <[email protected]>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Todd Baert <[email protected]>

docs/reference/specifications/providers.md

Co-authored-by: alexandraoberaigner <[email protected]> Signed-off-by: Todd Baert <[email protected]>

Signed-off-by: Todd Baert <[email protected]>

toddbaert · 2025-11-27T19:06:26Z

@aepfli @alexandraoberaigner made changes from your feedback, plz re-review.

sonarqubecloud · 2025-12-01T08:39:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

docs: fatal codes, re-init, and retry policy

b4cc836

Signed-off-by: Todd Baert <[email protected]>

toddbaert requested review from a team as code owners October 30, 2025 16:23

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 30, 2025

toddbaert commented Oct 30, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved

toddbaert commented Oct 30, 2025

View reviewed changes

toddbaert and others added 4 commits October 30, 2025 12:32

fixup: json

8a0b6f1

Signed-off-by: Todd Baert <[email protected]>

Update docs/reference/specifications/providers.md

f749674

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Todd Baert <[email protected]>

fixup: typo

18363a9

Signed-off-by: Todd Baert <[email protected]>

Update docs/reference/specifications/providers.md

48a46ea

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Todd Baert <[email protected]>

aepfli reviewed Oct 31, 2025

View reviewed changes

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved

aepfli reviewed Oct 31, 2025

View reviewed changes

docs/reference/specifications/providers.md Show resolved Hide resolved

alexandraoberaigner reviewed Nov 10, 2025

View reviewed changes

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved

aepfli mentioned this pull request Nov 11, 2025

feat: add missing steps for config and improve wording open-feature/flagd-testbed#311

Merged

alexandraoberaigner mentioned this pull request Nov 17, 2025

Infinite retry to establish connection to FlagSyncService in Flagd golang provider open-feature/go-sdk-contrib#756

Closed

Apply suggestion from @alexandraoberaigner

393eaf0

Co-authored-by: alexandraoberaigner <[email protected]> Signed-off-by: Todd Baert <[email protected]>

toddbaert requested review from aepfli and alexandraoberaigner November 27, 2025 19:01

fixup: pr review changes

0a8fd9c

Signed-off-by: Todd Baert <[email protected]>

toddbaert force-pushed the docs/provider-spec-updates branch from aee9e31 to 0a8fd9c Compare November 27, 2025 19:04

aepfli approved these changes Dec 1, 2025

View reviewed changes

Merge branch 'main' into docs/provider-spec-updates

7cb3e07

toddbaert requested a review from tangenti December 8, 2025 13:45

toddbaert mentioned this pull request Dec 11, 2025

[flagd-provider] fix non-conformant config options open-feature/js-sdk-contrib#533

Closed

toddbaert mentioned this pull request Dec 11, 2025

[flagd] add FATAL status codes option open-feature/js-sdk-contrib#1423

Open

docs: fatal codes, re-init, and retry policy #1818

Are you sure you want to change the base?

docs: fatal codes, re-init, and retry policy #1818

Uh oh!

Conversation

toddbaert commented Oct 30, 2025

Uh oh!

netlify bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for polite-licorice-3db33c ready!

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

toddbaert Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tangenti Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toddbaert Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toddbaert commented Nov 27, 2025

Uh oh!

sonarqubecloud bot commented Dec 1, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Oct 30, 2025 •

edited

Loading

toddbaert Oct 30, 2025 •

edited

Loading

toddbaert Dec 8, 2025 •

edited

Loading