Skip to content

Commit b4cc836

Browse files
committed
docs: fatal codes, re-init, and retry policy
Signed-off-by: Todd Baert <[email protected]>
1 parent 8396f0d commit b4cc836

File tree

1 file changed

+68
-42
lines changed

1 file changed

+68
-42
lines changed

docs/reference/specifications/providers.md

Lines changed: 68 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -64,18 +64,21 @@ stateDiagram-v2
6464
NOT_READY --> ERROR: initialize
6565
READY --> ERROR: disconnected, disconnected period == 0
6666
READY --> STALE: disconnected, disconnect period < retry grace period
67+
READY --> NOT_READY: shutdown
6768
STALE --> ERROR: disconnect period >= retry grace period
69+
STALE --> NOT_READY: shutdown
6870
ERROR --> READY: reconnected
69-
ERROR --> [*]: shutdown
71+
ERROR --> NOT_READY: shutdown
72+
ERROR --> [*]: Error code == PROVIDER_FATAL
7073
71-
note right of STALE
74+
note left of STALE
7275
stream disconnected, attempting to reconnect,
7376
resolve from cache*
7477
resolve from flag set rules**
7578
STALE emitted
7679
end note
7780
78-
note right of READY
81+
note left of READY
7982
stream connected,
8083
evaluation cache active*,
8184
flag set rules stored**,
@@ -84,7 +87,7 @@ stateDiagram-v2
8487
CHANGE emitted with stream messages
8588
end note
8689
87-
note right of ERROR
90+
note left of ERROR
8891
stream disconnected, attempting to reconnect,
8992
evaluation cache purged*,
9093
ERROR emitted
@@ -101,25 +104,47 @@ stateDiagram-v2
101104

102105
### Stream Reconnection
103106

104-
When either stream (sync or event) disconnects, whether due to the associated deadline being exceeded, network error or any other cause, the provider attempts to re-establish the stream immediately, and then retries with an exponential back-off.
105-
We always rely on the [integrated functionality of GRPC for reconnection](https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md) and utilize [Wait-for-Ready](https://grpc.io/docs/guides/wait-for-ready/) to re-establish the stream.
106-
We are configuring the underlying reconnection mechanism whenever we can, based on our configuration. (not all GRPC implementations support this)
107+
When either stream (sync or event) disconnects, whether due to the associated deadline being exceeded, network error or any other cause, the provider attempts to re-establish the stream immediately.
108+
Both the RPC and sync streams will forever attempt to reconnect unless the stream response indicates a [fatal status code](#fatal-status-codes).
109+
This is distinct from the [gRPC retry-policy](#grpc-retry-policy), which automatically retries *all RPCs* (streams or otherwise) a limited number of times to make the provider resilient to transient errors.
107110

108-
| language/property | min connect timeout | max backoff | initial backoff | jitter | multiplier |
109-
|-------------------|-----------------------------------|--------------------------|--------------------------|--------|------------|
110-
| GRPC property | grpc.initial_reconnect_backoff_ms | max_reconnect_backoff_ms | min_reconnect_backoff_ms | 0.2 | 1.6 |
111-
| Flagd property | deadlineMs | retryBackoffMaxMs | retryBackoffMs | 0.2 | 1.6 |
112-
| --- | --- | --- | --- | --- | --- |
113-
| default [^1] |||| 0.2 | 1.6 |
114-
| js |||| 0.2 | 1.6 |
115-
| java |||| 0.2 | 1.6 |
111+
## gRPC Retry Policy
116112

117-
[^1] : C++, Python, Ruby, Objective-C, PHP, C#, js(deprecated)
113+
flagd leverages gRPC built-in retry mechanism for all RPCs.
114+
In short, the retry policy attempts to retry all RPCs which return `UNAVAILABLE` or `UNKNOWN` status codes 3 times, with a 1s, 2s, 4s, backoff respectively.
115+
No other status codes are retried.
116+
The flagd gRPC retry policy is specified below:
118117

119-
When disconnected, if the time since disconnection is less than `retryGracePeriod`, the provider emits `STALE` when it disconnects.
120-
While the provider is in state `STALE` the provider resolves values from its cache or stored flag set rules, depending on its resolver mode.
121-
When the time since the last disconnect first exceeds `retryGracePeriod`, the provider emits `ERROR`.
122-
The provider attempts to reconnect indefinitely, with a maximum interval of `retryBackoffMaxMs`.
118+
```json
119+
{
120+
"methodConfig": [
121+
{
122+
"name": [
123+
{
124+
"service": "flagd.sync.v1.FlagSyncService",
125+
"service": "flagd.evaluation.v1.Service",
126+
}
127+
],
128+
"retryPolicy": {
129+
"MaxAttempts": 3,
130+
"InitialBackoff": "1s",
131+
"MaxBackoff": $FLAGD_RETRY_BACKOFF_MAX_MS, // from provider options
132+
"BackoffMultiplier": 2.0,
133+
"RetryableStatusCodes": [
134+
"UNAVAILABLE",
135+
"UNKNOWN"
136+
]
137+
}
138+
}
139+
]
140+
}
141+
```
142+
143+
## Fatal Status Codes
144+
145+
Providers accept an option for defining fatal gRPC status codes which, when received in the RPC or sync streams, transition the provider to the PROVIDER_FATAL state.
146+
This configuration is useful for situations wherein these codes indicate to a client that their configuration is invalid and must be changed (ie: the error is non-transient).
147+
Examples for this include status codes such as `UNAUTHENTICATED` or `PERMISSION_DENIED`.
123148

124149
## RPC Resolver
125150

@@ -262,28 +287,29 @@ precedence.
262287

263288
Below are the supported configuration parameters (note that not all apply to both resolver modes):
264289

265-
| Option name | Environment variable name | Explanation | Type & Values | Default | Compatible resolver |
266-
| --------------------- | ------------------------------ | ---------------------------------------------------------------------- | ---------------------------- | ----------------------------- | ----------------------- |
267-
| resolver | FLAGD_RESOLVER | mode of operation | String - `rpc`, `in-process` | rpc | rpc & in-process |
268-
| host | FLAGD_HOST | remote host | String | localhost | rpc & in-process |
269-
| port | FLAGD_PORT | remote port | int | 8013 (rpc), 8015 (in-process) | rpc & in-process |
270-
| targetUri | FLAGD_TARGET_URI | alternative to host/port, supporting custom name resolution | string | null | rpc & in-process |
271-
| tls | FLAGD_TLS | connection encryption | boolean | false | rpc & in-process |
272-
| socketPath | FLAGD_SOCKET_PATH | alternative to host port, unix socket | String | null | rpc & in-process |
273-
| certPath | FLAGD_SERVER_CERT_PATH | tls cert path | String | null | rpc & in-process |
274-
| deadlineMs | FLAGD_DEADLINE_MS | deadline for unary calls, and timeout for initialization | int | 500 | rpc & in-process & file |
275-
| streamDeadlineMs | FLAGD_STREAM_DEADLINE_MS | deadline for streaming calls, useful as an application-layer keepalive | int | 600000 | rpc & in-process |
276-
| retryBackoffMs | FLAGD_RETRY_BACKOFF_MS | initial backoff for stream retry | int | 1000 | rpc & in-process |
277-
| retryBackoffMaxMs | FLAGD_RETRY_BACKOFF_MAX_MS | maximum backoff for stream retry | int | 120000 | rpc & in-process |
278-
| retryGracePeriod | FLAGD_RETRY_GRACE_PERIOD | period in seconds before provider moves from STALE to ERROR state | int | 5 | rpc & in-process & file |
279-
| keepAliveTime | FLAGD_KEEP_ALIVE_TIME_MS | http 2 keepalive | long | 0 | rpc & in-process |
280-
| cache | FLAGD_CACHE | enable cache of static flags | String - `lru`, `disabled` | lru | rpc |
281-
| maxCacheSize | FLAGD_MAX_CACHE_SIZE | max size of static flag cache | int | 1000 | rpc |
282-
| selector | FLAGD_SOURCE_SELECTOR | selects a single sync source to retrieve flags from only that source | string | null | in-process |
283-
| providerId | FLAGD_PROVIDER_ID | A unique identifier for flagd(grpc client) initiating the request. | string | null | in-process |
284-
| offlineFlagSourcePath | FLAGD_OFFLINE_FLAG_SOURCE_PATH | offline, file-based flag definitions, overrides host/port/targetUri | string | null | file |
285-
| offlinePollIntervalMs | FLAGD_OFFLINE_POLL_MS | poll interval for reading offlineFlagSourcePath | int | 5000 | file |
286-
| contextEnricher | - | sync-metadata to evaluation context mapping function | function | identity function | in-process |
290+
| Option name | Environment variable name | Explanation | Type & Values | Default | Compatible resolver |
291+
| --------------------- | ------------------------------ | --------------------------------------------------------------------------------------------------------------- | ---------------------------- | ----------------------------- | ----------------------- |
292+
| resolver | FLAGD_RESOLVER | mode of operation | string - `rpc`, `in-process` | rpc | rpc & in-process |
293+
| host | FLAGD_HOST | remote host | string | localhost | rpc & in-process |
294+
| port | FLAGD_PORT | remote port | int | 8013 (rpc), 8015 (in-process) | rpc & in-process |
295+
| targetUri | FLAGD_TARGET_URI | alternative to host/port, supporting custom name resolution | string | null | rpc & in-process |
296+
| tls | FLAGD_TLS | connection encryption | boolean | false | rpc & in-process |
297+
| socketPath | FLAGD_SOCKET_PATH | alternative to host port, unix socket | string | null | rpc & in-process |
298+
| certPath | FLAGD_SERVER_CERT_PATH | tls cert path | string | null | rpc & in-process |
299+
| deadlineMs | FLAGD_DEADLINE_MS | deadline for unary calls, and timeout for initialization | int | 500 | rpc & in-process & file |
300+
| streamDeadlineMs | FLAGD_STREAM_DEADLINE_MS | deadline for streaming calls, useful as an application-layer keepalive | int | 600000 | rpc & in-process |
301+
| retryBackoffMs | FLAGD_RETRY_BACKOFF_MS | initial backoff for stream retry | int | 1000 | rpc & in-process |
302+
| retryBackoffMaxMs | FLAGD_RETRY_BACKOFF_MAX_MS | maximum backoff for stream retry | int | 120000 | rpc & in-process |
303+
| retryGracePeriod | FLAGD_RETRY_GRACE_PERIOD | period in seconds before provider moves from STALE to ERROR state | int | 5 | rpc & in-process & file |
304+
| keepAliveTime | FLAGD_KEEP_ALIVE_TIME_MS | http 2 keepalive | long | 0 | rpc & in-process |
305+
| cache | FLAGD_CACHE | enable cache of static flags | string - `lru`, `disabled` | lru | rpc |
306+
| maxCacheSize | FLAGD_MAX_CACHE_SIZE | max size of static flag cache | int | 1000 | rpc |
307+
| selector | FLAGD_SOURCE_SELECTOR | selects a single sync source to retrieve flags from only that source | string | null | in-process |
308+
| providerId | FLAGD_PROVIDER_ID | A unique identifier for flagd(grpc client) initiating the request. | string | null | in-process |
309+
| offlineFlagSourcePath | FLAGD_OFFLINE_FLAG_SOURCE_PATH | offline, file-based flag definitions, overrides host/port/targetUri | string | null | file |
310+
| offlinePollIntervalMs | FLAGD_OFFLINE_POLL_MS | poll interval for reading offlineFlagSourcePath | int | 5000 | file |
311+
| contextEnricher | - | sync-metadata to evaluation context mapping function | function | identity function | in-process |
312+
| fatalStatusCodes | - | a list of gRPC status codes, which will cause streams to give up and put the provider in a PROVIDER_FATAL state | array | [] | rpc & in-process |
287313

288314
### Custom Name Resolution
289315

0 commit comments

Comments
 (0)