Skip to content

Commit 6d9ec31

Browse files
committed
Document live reconfiguration consistency model
1 parent acfe52d commit 6d9ec31

4 files changed

Lines changed: 43 additions & 6 deletions

File tree

rust/otap-dataflow/crates/admin-types/src/operations.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ pub enum OperationErrorKind {
9292
RolloutNotFound,
9393
/// The requested shutdown does not exist.
9494
ShutdownNotFound,
95-
/// Another rollout or shutdown is already active for this logical pipeline.
95+
/// Another incompatible live operation is active in the server's consistency scope.
9696
Conflict,
9797
/// The request was rejected as invalid.
9898
InvalidRequest,

rust/otap-dataflow/crates/admin/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ pub enum ControlPlaneError {
4040
GroupNotFound,
4141
/// The requested pipeline does not exist.
4242
PipelineNotFound,
43-
/// Another rollout is already active for this logical pipeline.
43+
/// Another incompatible live operation is active in the current consistency scope.
4444
RolloutConflict,
4545
/// Submitted pipeline configuration failed validation or violated a runtime boundary.
4646
InvalidRequest {

rust/otap-dataflow/crates/controller/src/live_control/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,9 @@ Live control separates three related concepts:
4747

4848
- A logical pipeline is identified by `(pipeline_group_id, pipeline_id)` and
4949
points at the committed resolved pipeline plus its active generation.
50+
- A pipeline group is the config hierarchy that contains related pipelines,
51+
group-local topics, and group-level policies. Current live-control operations
52+
target one logical pipeline inside that group.
5053
- A deployed runtime instance is identified by `(pipeline_group_id,
5154
pipeline_id, core_id, deployment_generation)` and tracks whether that thread
5255
is still active or has exited.
@@ -70,6 +73,11 @@ all active instances.
7073
- The controller is the authority for when old generations can be retired.
7174
Observed-state compaction is invoked only after active rollout/shutdown work
7275
no longer needs generation-specific entries.
76+
- The current consistency scope is one logical pipeline. Planning validates a
77+
candidate against a cloned full config snapshot, but commit patches only that
78+
pipeline into the latest live config. This intentionally does not provide
79+
whole-config serializability across concurrent operations on different
80+
logical pipelines.
7381
- Terminal rollout and shutdown records are retained in memory with both a
7482
per-logical-pipeline cap and a TTL. This keeps recent admin lookups useful
7583
without unbounded history growth.
@@ -94,5 +102,8 @@ all active instances.
94102
- Full group shutdown is orchestrated above this module by issuing
95103
per-pipeline/global control-plane calls; this module tracks per-pipeline
96104
live-control state.
105+
- Future group-level reconfiguration can widen the active-operation conflict
106+
scope from logical pipeline to pipeline group without changing the existing
107+
per-pipeline endpoint shape.
97108
- Rollbacks are best effort. If rollback itself fails, the operation records
98109
`rollback_failed` and preserves diagnostics for operators.

rust/otap-dataflow/docs/admin/live-reconfiguration.md

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,27 @@ traffic flip across the whole pipeline.
5454
- There is no dedicated scale endpoint. Scale-only changes use the same `PUT`
5555
endpoint as topology changes.
5656

57+
## Consistency Model
58+
59+
The current API serializes live operations per logical pipeline, identified by
60+
`(pipeline_group_id, pipeline_id)`. A rollout or shutdown conflicts with another
61+
active operation for the same logical pipeline, while operations for different
62+
logical pipelines may run concurrently.
63+
64+
Rollout planning validates a candidate by patching one pipeline into the
65+
controller's current in-memory `OtelDataflowSpec` snapshot and running full
66+
engine validation on that candidate snapshot. That validation does not make the
67+
operation a whole-config transaction: another logical pipeline can commit before
68+
this rollout commits, and commit applies only the accepted pipeline back into
69+
the latest live config.
70+
71+
The API intentionally leaves room to widen the consistency scope later. If
72+
group-level invariants become mutable, the controller can serialize
73+
config-mutating operations per pipeline group and return `409 Conflict` for
74+
concurrent operations in that group without changing the existing pipeline
75+
endpoint or response schema. Engine-level reconfiguration can be added as a
76+
separate operation surface if full-engine transactions become necessary.
77+
5778
## How It Works
5879

5980
1. The client submits a candidate pipeline config to
@@ -169,8 +190,10 @@ Status codes:
169190
- `202 Accepted`: request accepted and `wait=false`
170191
- `200 OK`: `wait=true` and the rollout finished successfully
171192
- `404 Not Found`: pipeline group does not exist
172-
- `409 Conflict`: another rollout or shutdown is already active for the same
173-
logical pipeline, or a waited rollout finished in failure
193+
- `409 Conflict`: another incompatible live operation is active in the
194+
controller's current consistency scope, or a waited rollout finished in
195+
failure. In the current version of the API, that scope is one logical
196+
pipeline.
174197
- `422 Unprocessable Entity`: validation failure or unsupported runtime
175198
mutation
176199
- `504 Gateway Timeout`: `wait=true` exceeded the overall wait timeout
@@ -207,7 +230,7 @@ overlapping old/new generations stay distinguishable during a rolling cutover.
207230
- `POST /groups/shutdown`
208231

209232
These are separate from reconfiguration, but they use the same resident
210-
controller and the same logical-pipeline locking rules.
233+
controller and the same live-operation consistency scope.
211234
Terminal shutdown ids are retained only within a bounded in-memory window, so
212235
older ids may return `404 Not Found` after eviction.
213236

@@ -356,9 +379,12 @@ pattern.
356379

357380
## Operational Notes
358381

359-
- Different logical pipelines may roll concurrently.
382+
- Different logical pipelines may roll concurrently in the current
383+
implementation.
360384
- A single logical pipeline allows only one active rollout or shutdown at a
361385
time.
386+
- Future group-level consistency can widen the conflict scope so concurrent
387+
operations in the same group return `409 Conflict`.
362388
- `GET /groups/{group}/pipelines/{id}` always returns the committed
363389
live config, not an uncommitted candidate.
364390
- `GET /groups/{group}/pipelines/{id}/status` is the best endpoint

0 commit comments

Comments
 (0)