-
Notifications
You must be signed in to change notification settings - Fork 51
fix: provider state race #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| --- | ||
| id: appendix-e | ||
| title: "Appendix E: Migrations" | ||
| description: Migration guidance for breaking spec changes | ||
| sidebar_position: 6 | ||
| --- | ||
|
|
||
| # Appendix E: Migrations | ||
|
|
||
| This appendix provides non-normative guidance for provider authors and SDK authors on migrating to new or changed specification requirements. | ||
|
|
||
| ## Provider Status Ownership | ||
|
|
||
| ### Background | ||
|
|
||
| Prior to `v0.9.0`, provider status (e.g. `NOT_READY`, `READY`, `ERROR`) was managed by the SDK on behalf of the provider. | ||
| The SDK would set status and emit events after lifecycle methods (`initialize`, `shutdown`, `on context change`) returned. | ||
| This created a race condition in multi-threaded SDKs: the provider could change its own state (e.g. emit an error event from a background thread) in the window between the lifecycle method returning and the SDK writing its post-lifecycle status and emitting the corresponding event. | ||
| The result was incorrect event ordering and inconsistent status. | ||
|
|
||
| The spec now requires providers to own their status and emit events atomically with status transitions (see [provider status](./sections/02-providers.md#28-provider-status)). | ||
|
|
||
| ### For provider authors | ||
|
|
||
| Providers are now responsible for maintaining their own `status` and emitting events atomically with status transitions. | ||
|
|
||
| #### What to implement | ||
|
|
||
| - A `status` accessor returning the provider's current readiness: `NOT_READY`, `READY`, `STALE`, `ERROR`, or `FATAL` (plus `RECONCILING` in the static-context paradigm) | ||
| - `status` must be `NOT_READY` before `initialize` is called and after `shutdown` terminates | ||
| - `status` must be safe for concurrent access | ||
| - Status transitions and associated event emissions must be atomic from the perspective of external observers; set the status before emitting the corresponding event | ||
|
|
||
| #### The `StateManagingProvider` interface | ||
|
|
||
| To signal to the SDK that your provider manages its own status, implement an opt-in interface (or equivalent mechanism) defined by the SDK. | ||
| This interface should expose: | ||
|
|
||
| - A `status` accessor that returns the provider's current status | ||
| - A discriminant or marker (e.g. an additional interface, a boolean property, or a type-level tag) that allows the SDK to detect at registration time that the provider manages its own state | ||
|
|
||
| Providers that do not implement this interface will continue to have their status managed by the SDK. | ||
| This legacy behavior is deprecated and will be removed in the next major version. | ||
|
|
||
| ### For SDK authors | ||
|
|
||
| SDKs must detect whether a registered provider manages its own state and branch behavior accordingly. | ||
|
|
||
| #### Detecting state-managing providers | ||
|
|
||
| At registration time, check whether the provider implements the `StateManagingProvider` interface (or equivalent). | ||
| Store this as a flag on the internal provider wrapper for use during lifecycle calls and event handling. | ||
|
|
||
| #### SDK wrapper behavior | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor/aside: I'd argue that the better design is implementing an adapter that takes a non- |
||
|
|
||
| SDKs typically wrap registered providers in an internal adapter (e.g. a "provider wrapper" or "state manager") that mediates lifecycle calls and event forwarding. | ||
| The wrapper should branch based on whether the registered provider implements the state-managing interface. | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| A[Provider registered with SDK] --> B{Implements state-managing interface?} | ||
|
|
||
| B -- Yes --> C[SDK wrapper delegates status to provider] | ||
| C --> C1[initialize / shutdown / onContextChange: SDK skips state writes AND event emissions] | ||
| C --> C2[Provider events: SDK skips state writes] | ||
| C --> C3[status: reads from provider directly] | ||
|
|
||
| B -- No --> D[SDK wrapper manages state internally - legacy, deprecated] | ||
| D --> D1[initialize / shutdown / onContextChange: SDK sets state AND emits events after return] | ||
| D --> D2[Provider events: SDK updates state on emit] | ||
| D --> D3[status: reads from SDK wrapper] | ||
|
|
||
| C1 --> E[Provider-emitted events still propagate to registered handlers] | ||
| C2 --> E | ||
| C3 --> E | ||
| D1 --> E | ||
| D2 --> E | ||
| D3 --> E | ||
| ``` | ||
|
|
||
| #### What the SDK skips for state-managing providers | ||
|
|
||
| For providers that implement the state-managing interface, the SDK must not perform any of the following actions that it would normally perform for legacy providers: | ||
|
|
||
| - Setting status to `READY` after `initialize()` succeeds | ||
| - Setting status to `ERROR` or `FATAL` after `initialize()` fails | ||
| - Setting status to `NOT_READY` after `shutdown()` completes | ||
| - Emitting `PROVIDER_READY` or `PROVIDER_ERROR` events after `initialize()` | ||
| - Updating status when the provider emits events at runtime (the provider already set its own status atomically with the event) | ||
| - (Static-context paradigm only) Setting `RECONCILING` status, emitting `PROVIDER_RECONCILING`, setting `READY`/`ERROR` status, or emitting `PROVIDER_CONTEXT_CHANGED`/`PROVIDER_ERROR` during `on context change` handling | ||
|
|
||
| #### What the SDK still does for all providers | ||
|
|
||
| Regardless of whether the provider manages its own state, the SDK continues to: | ||
|
|
||
| - Call `initialize()`, `shutdown()`, and `on context change` lifecycle methods on the provider | ||
| - Forward provider-emitted events to registered domain and API-level event handlers | ||
| - Run late-attached handlers immediately if the provider is already in the associated state | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. major: I believe SDKs can't do this safely as this requires that subscription and emission of the current status is atomic (there should be no new events emitted between reading current status and running the handler) — and you can't guarantee that when another thread is emitting events (unless SDK implements event buffering which is cumbersome). |
||
| - Enforce short-circuit behavior for `NOT_READY` and `FATAL` statuses during flag evaluation | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. major: multi-threaded SDKs can't enforce this because of TOCTOU |
||
|
|
||
| #### Deprecation | ||
|
|
||
| The legacy path (SDK-managed status) should be deprecated in the release that introduces the state-managing interface, with removal targeted for the next major version. | ||
| SDK authors should update any first-party providers and provider base classes to implement the new interface. | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -421,8 +421,8 @@ This is especially useful for testing purposes to restore the API to a known sta | |||||
|
|
||||||
| ### 1.7. Provider Lifecycle Management | ||||||
|
|
||||||
| The implementation maintains an internal representation of the state of configured providers, tracking the lifecycle of each provider. | ||||||
| This state of the provider is exposed on associated `clients`. | ||||||
| Providers own their current status (see [provider status](./02-providers.md#28-provider-status)). | ||||||
| The `client`'s `provider status` accessor delegates to the associated provider's `status` accessor, and SDKs surface provider-emitted events to registered handlers. | ||||||
|
|
||||||
| The diagram below illustrates the possible states and transitions of the `state` field for a provider during the provider lifecycle. | ||||||
|
|
||||||
|
|
@@ -463,9 +463,9 @@ stateDiagram-v2 | |||||
|
|
||||||
| > The `client` **MUST** define a `provider status` accessor which indicates the readiness of the associated provider, with possible values `NOT_READY`, `READY`, `STALE`, `ERROR`, or `FATAL`. | ||||||
|
|
||||||
| The SDK at all times maintains an up-to-date state corresponding to the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event. | ||||||
| The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor: given that providers must emit events on lifecycle methods, this paragraph can be simplified to:
Suggested change
|
||||||
|
|
||||||
| see [provider status](../types.md#provider-status) | ||||||
| see [provider status](../types.md#provider-status), [provider status requirements](./02-providers.md#28-provider-status) | ||||||
|
|
||||||
| #### Condition 1.7.2 | ||||||
|
|
||||||
|
|
@@ -477,27 +477,30 @@ see: [static-context paradigm](../glossary.md#static-context-paradigm) | |||||
|
|
||||||
| ##### Conditional Requirement 1.7.2.1 | ||||||
|
|
||||||
| > In addition to `NOT_READY`, `READY`, `STALE`, or `ERROR`, the `provider status` accessor must support possible value `RECONCILING`. | ||||||
| > In addition to `NOT_READY`, `READY`, `STALE`, or `ERROR`, the `provider status` accessor must support possible value `RECONCILING`. | ||||||
|
toddbaert marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| In the static context paradigm, the implementation must define a `provider status` indicating that a provider is reconciling its internal state due to a context change. | ||||||
|
|
||||||
| #### Requirement 1.7.3 | ||||||
|
|
||||||
| > The client's `provider status` accessor **MUST** indicate `READY` if the `initialize` function of the associated provider terminates normally. | ||||||
| > The `provider status` **MUST** indicate `READY` if the `initialize` function of the associated provider terminates normally. | ||||||
|
|
||||||
| Once the provider has initialized, the `provider status` should indicate the provider is ready to be used to evaluate flags. | ||||||
| The provider is responsible for its own status transition; the client's `provider status` accessor reflects it. | ||||||
|
|
||||||
| #### Requirement 1.7.4 | ||||||
|
|
||||||
| > The client's `provider status` accessor **MUST** indicate `ERROR` if the `initialize` function of the associated provider terminates abnormally. | ||||||
| > The `provider status` **MUST** indicate `ERROR` if the `initialize` function of the associated provider terminates abnormally. | ||||||
|
|
||||||
| If the provider has failed to initialize, the `provider status` should indicate the provider is in an error state. | ||||||
| The provider is responsible for its own status transition; the client's `provider status` accessor reflects it. | ||||||
|
|
||||||
| #### Requirement 1.7.5 | ||||||
|
|
||||||
| > The client's `provider status` accessor **MUST** indicate `FATAL` if the `initialize` function of the associated provider terminates abnormally and indicates `error code` `PROVIDER_FATAL`. | ||||||
| > The `provider status` **MUST** indicate `FATAL` if the `initialize` function of the associated provider terminates abnormally and indicates `error code` `PROVIDER_FATAL`. | ||||||
|
|
||||||
| If the provider has failed to initialize, the `provider status` should indicate the provider is in an error state. | ||||||
| The provider is responsible for its own status transition; the client's `provider status` accessor reflects it. | ||||||
|
|
||||||
| #### Requirement 1.7.6 | ||||||
|
|
||||||
|
|
@@ -527,9 +530,10 @@ see: [error codes](../types.md#error-code) | |||||
|
|
||||||
| #### Requirement 1.7.9 | ||||||
|
|
||||||
| > The client's `provider status` accessor **MUST** indicate `NOT_READY` once the `shutdown` function of the associated provider terminates. | ||||||
| > The `provider status` **MUST** indicate `NOT_READY` once the `shutdown` function of the associated provider terminates. | ||||||
|
|
||||||
| Regardless of the success of the provider's `shutdown` function, the `provider status` should convey the provider is no longer ready to use once the shutdown function terminates. | ||||||
| The provider is responsible for its own status transition; the client's `provider status` accessor reflects it. | ||||||
|
|
||||||
| ### 1.8. Isolated API Instances | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -305,3 +305,60 @@ The track function performs side effects required to record the `tracking event` | |
| Providers should be careful to complete any communication or flush any relevant uncommitted tracking data before they shut down. | ||
|
|
||
| See [shutdown](#25-shutdown). | ||
|
|
||
| ### 2.8. Provider status | ||
|
|
||
| [](https://github.com/open-feature/spec/tree/main/specification#hardening) | ||
|
|
||
| Providers own their current status and any events associated with status transitions. | ||
| This allows providers to atomically update their status and emit the corresponding event, avoiding races between lifecycle methods terminating and events or status updates produced by concurrent work (such as background threads or pollers) maintained by the provider. | ||
|
|
||
| SDKs may provide a base class, wrapper, or other mechanism that maintains status on behalf of provider implementations which do not define it natively, for migration purposes. | ||
|
|
||
| see: [provider lifecycle management](./01-flag-evaluation.md#17-provider-lifecycle-management), [provider events](./05-events.md#51-provider-events) | ||
|
|
||
| #### Requirement 2.8.1 | ||
|
|
||
| > The provider **MUST** define a `status` accessor which indicates the provider's current readiness, with possible values `NOT_READY`, `READY`, `STALE`, `ERROR`, or `FATAL`. | ||
|
|
||
| The `status` accessor reflects the provider's readiness to evaluate flags. | ||
| The client's `provider status` accessor (see [requirement 1.7.1](./01-flag-evaluation.md#requirement-171)) delegates to this accessor. | ||
|
|
||
| see: [provider status](../types.md#provider-status) | ||
|
|
||
| #### Condition 2.8.2 | ||
|
|
||
| > The implementation uses the static-context paradigm. | ||
|
|
||
| see: [static-context paradigm](../glossary.md#static-context-paradigm) | ||
|
|
||
| ##### Conditional Requirement 2.8.2.1 | ||
|
|
||
| > In addition to `NOT_READY`, `READY`, `STALE`, `ERROR`, or `FATAL`, the provider's `status` accessor **MUST** support possible value `RECONCILING`. | ||
|
|
||
| In the static-context paradigm, the provider must define a `status` value indicating that it is reconciling its internal state due to a context change. | ||
|
|
||
| see: [provider context reconciliation](#26-provider-context-reconciliation) | ||
|
|
||
| #### Requirement 2.8.3 | ||
|
|
||
| > The provider's `status` **MUST** be `NOT_READY` before `initialize` is called and after `shutdown` terminates. | ||
|
|
||
| Providers which do not define an `initialize` function are assumed to be ready at all times, and their `status` may be `READY` from construction. | ||
|
|
||
| see: [initialization](#24-initialization), [shutdown](#25-shutdown) | ||
|
|
||
| #### Requirement 2.8.4 | ||
|
|
||
| > The provider's `status` accessor **MUST** be safe for concurrent access. | ||
|
|
||
| In languages supporting multi-threaded execution, the provider must ensure that concurrent reads of the `status` accessor do not observe torn or inconsistent values. | ||
|
|
||
| #### Requirement 2.8.5 | ||
|
|
||
| > Status changes and any associated event emissions **MUST** be atomic from the perspective of external observers. | ||
|
|
||
| When a provider transitions between statuses and emits an event associated with that transition, external observers (such as SDK event handlers) must observe a consistent view: the updated `status` value and the emitted event are visible together. | ||
| This prevents ordering anomalies where, for example, a `PROVIDER_READY` handler runs while `status` still indicates `NOT_READY` or `ERROR`, or where the provider transitions out of a status before the associated event is dispatched. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Dispatch" may be a big ambiguous here:
Is the intent here that event handlers see the exact status that triggered the event? If so, this may be problematic as it requires holding the status lock while all handlers run, preventing the provider from changing its own status. I think what we're after here is establishing an observable "happens-before" relationship:
So an event handler may observe next status changes, the important bit is that it must observe the status change that triggered the event. |
||
|
|
||
| see: [provider events](./05-events.md#51-provider-events) | ||
Uh oh!
There was an error while loading. Please reload this page.