diff --git a/rust/otap-dataflow/crates/admin-api/README.md b/rust/otap-dataflow/crates/admin-api/README.md index 763b7596a4..048c465018 100644 --- a/rust/otap-dataflow/crates/admin-api/README.md +++ b/rust/otap-dataflow/crates/admin-api/README.md @@ -244,11 +244,16 @@ method and its operational purpose. | `GET /api/v1/status` | `engine().status()` | Full engine status snapshot across pipelines and cores. | | `GET /api/v1/livez` | `engine().livez()` | Engine liveness probe with structured failure details. | | `GET /api/v1/readyz` | `engine().readyz()` | Readiness probe for orchestration or traffic gating. | -| `GET /api/v1/pipeline-groups/status` | `pipeline_groups().status()` | Fleet-style pipeline status view. | -| `POST /api/v1/pipeline-groups/shutdown` | `pipeline_groups().shutdown(...)` | Coordinated shutdown request across running pipelines. | -| `GET /api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/status` | `pipelines().status(...)` | Detailed status for a single pipeline. | -| `GET /api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/livez` | `pipelines().livez(...)` | Semantic liveness probe result for a single pipeline. | -| `GET /api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/readyz` | `pipelines().readyz(...)` | Semantic readiness probe result for a single pipeline. | +| `GET /api/v1/groups/status` | `groups().status()` | Fleet-style pipeline status view. | +| `POST /api/v1/groups/shutdown` | `groups().shutdown(...)` | Coordinated shutdown request across running pipelines. | +| `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}` | `pipelines().details(...)` | Live committed configuration and any active rollout summary for one logical pipeline. | +| `PUT /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}` | `pipelines().reconfigure(...)` | Submit a live pipeline reconfiguration request and get an accepted, completed, failed, or timed-out operation outcome. | +| `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/rollouts/{rollout_id}` | `pipelines().rollout_status(...)` | Detailed status for one rollout operation. | +| `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/status` | `pipelines().status(...)` | Detailed status for a single pipeline. | +| `POST /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown` | `pipelines().shutdown(...)` | Shut down one logical pipeline and get an accepted, completed, failed, or timed-out operation outcome. | +| `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdowns/{shutdown_id}` | `pipelines().shutdown_status(...)` | Detailed status for one pipeline shutdown operation. | +| `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/livez` | `pipelines().livez(...)` | Semantic liveness probe result for a single pipeline. | +| `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/readyz` | `pipelines().readyz(...)` | Semantic readiness probe result for a single pipeline. | | `GET /api/v1/telemetry/logs` | `telemetry().logs(...)` | Retained admin logs when log retention is enabled. | | `GET /api/v1/telemetry/metrics` | `telemetry().metrics(...)`, `telemetry().metrics_compact(...)` | Current engine metrics as structured JSON, using either the full or compact response shape. | @@ -257,50 +262,26 @@ method and its operational purpose. canonical `telemetry().metrics(...)` and `telemetry().metrics_compact(...)` methods. -## Future evolution: live reconfiguration - -Future live reconfiguration work is expected to extend the admin SDK from a -status-and-observability client into a richer control-plane client for -long-lived engine instances. The details are not stabilized yet, but the work -in progress already helps frame the direction for advanced integrators building -external controllers. - -Main capabilities expected from this area of the admin API: - -- read the live committed configuration for a single logical pipeline; -- create, replace, resize, or accept a `noop` update for one logical pipeline; -- track rollout progress through a dedicated rollout resource; -- track per-pipeline shutdown progress through a dedicated shutdown resource; -- expose generation-aware pipeline status during overlapping cutover. - -The current SDK is intentionally narrower, and the main future extensions for -live reconfiguration are expected to center on: - -- resource model: adding live pipeline details, rollout status, and shutdown - status as first-class SDK resources instead of exposing only snapshots and - probes; -- status shape: extending pipeline status with generation-aware fields such as - `activeGeneration`, `servingGenerations`, rollout summaries, and - per-generation instance views; -- operation semantics: treating create, replace, resize, and shutdown as - long-running admin operations with both immediate-return and wait-or-poll - interaction patterns; -- error and outcome modeling: representing rollout conflicts, validation - failures, and timeout outcomes as typed SDK results rather than leaving them - as transport-level concerns. - -The intended integration direction is to keep `AdminClient` as the stable -entrypoint and absorb those changes behind typed client methods rather than -exposing raw route strings as the public contract. In practice, that likely -means: - -- keeping transport and route-version differences behind backend adapters; -- adding job-oriented client methods for live pipeline read, update, rollout - status, and per-pipeline shutdown tracking; -- supporting both immediate-return and wait-or-poll interaction patterns for - long-running admin operations; -- continuing to treat experimental endpoints as opt-in additions only after - their semantics and wire format settle. +## Live pipeline control + +The SDK exposes the live pipeline control surface behind typed methods: + +- `pipelines().details(...)` reads the committed pipeline config and active + rollout summary. +- `pipelines().reconfigure(...)` submits create, `noop`, resize, and replace + operations and returns a typed outcome. +- `pipelines().rollout_status(...)` polls a rollout by id. +- `pipelines().shutdown(...)` requests shutdown for one logical pipeline and + returns a typed outcome. +- `pipelines().shutdown_status(...)` polls a shutdown operation by id. + +Terminal rollout and shutdown ids are retained only within a bounded in-memory +window. Older ids may return `Ok(None)` after the controller evicts historical +operation snapshots. + +Waited operations return typed terminal outcomes instead of surfacing rollout +or shutdown failures as transport-level errors. Request rejection remains a +typed SDK error via `Error::AdminOperation`. ## Client API cookbook @@ -327,7 +308,7 @@ println!("readyz={:?}", readyz.status); # } ``` -### Pipeline group status and coordinated shutdown +### Group status and coordinated shutdown Use this when an operator or control plane needs a fleet view and a single engine-wide shutdown entrypoint. @@ -343,11 +324,11 @@ let client = AdminClient::builder() .http(HttpAdminClientSettings::new(AdminEndpoint::http("127.0.0.1", 8080))) .build()?; -let groups = client.pipeline_groups().status().await?; +let groups = client.groups().status().await?; println!("pipelines={}", groups.pipelines.len()); let shutdown = client - .pipeline_groups() + .groups() .shutdown(&OperationOptions { wait: true, timeout_secs: 30, diff --git a/rust/otap-dataflow/crates/admin-api/src/client.rs b/rust/otap-dataflow/crates/admin-api/src/client.rs index 6888145b4f..db0ebfd056 100644 --- a/rust/otap-dataflow/crates/admin-api/src/client.rs +++ b/rust/otap-dataflow/crates/admin-api/src/client.rs @@ -5,7 +5,7 @@ use crate::endpoint::{AdminAuth, AdminEndpoint}; use crate::http_backend::HttpBackend; -use crate::{Error, engine, operations, pipeline_groups, pipelines, telemetry}; +use crate::{Error, engine, groups, operations, pipelines, telemetry}; use async_trait::async_trait; use std::sync::Arc; use std::time::Duration; @@ -38,7 +38,10 @@ pub struct HttpAdminClientSettings { } impl HttpAdminClientSettings { - /// Creates new HTTP client settings. + /// Creates HTTP client settings with the SDK defaults for connection behavior. + /// + /// Use the builder-style `with_*` methods to override auth, timeout, + /// keepalive, or TLS behavior. #[must_use] pub fn new(endpoint: AdminEndpoint) -> Self { Self { @@ -53,49 +56,53 @@ impl HttpAdminClientSettings { } } - /// Sets the auth mode. + /// Sets the authentication mode for requests sent by this client. #[must_use] pub fn with_auth(mut self, auth: AdminAuth) -> Self { self.auth = auth; self } - /// Sets the TCP connect timeout. + /// Sets the TCP connect timeout for establishing new connections. #[must_use] pub fn with_connect_timeout(mut self, connect_timeout: Duration) -> Self { self.connect_timeout = connect_timeout; self } - /// Sets the request timeout. + /// Sets a per-request timeout for admin calls. + /// + /// This is separate from [`operations::OperationOptions::timeout_secs`], + /// which controls how long the server should wait on long-running + /// operations such as reconfigure or shutdown. #[must_use] pub fn with_timeout(mut self, timeout: Duration) -> Self { self.timeout = Some(timeout); self } - /// Clears any request timeout. + /// Disables the client-side per-request timeout. #[must_use] pub fn without_timeout(mut self) -> Self { self.timeout = None; self } - /// Sets whether to enable `TCP_NODELAY`. + /// Sets whether outbound TCP sockets should use `TCP_NODELAY`. #[must_use] pub fn with_tcp_nodelay(mut self, tcp_nodelay: bool) -> Self { self.tcp_nodelay = tcp_nodelay; self } - /// Sets the TCP keepalive timeout. + /// Sets the TCP keepalive timeout for outbound connections. #[must_use] pub fn with_tcp_keepalive(mut self, tcp_keepalive: Option) -> Self { self.tcp_keepalive = tcp_keepalive; self } - /// Sets the TCP keepalive probe interval. + /// Sets the interval between TCP keepalive probes when keepalive is enabled. #[must_use] pub fn with_tcp_keepalive_interval(mut self, tcp_keepalive_interval: Option) -> Self { self.tcp_keepalive_interval = tcp_keepalive_interval; @@ -103,6 +110,10 @@ impl HttpAdminClientSettings { } /// Sets the TLS or mTLS configuration for HTTPS endpoints. + /// + /// This is ignored for plaintext HTTP endpoints and required only when the + /// target endpoint needs custom CA trust, client certificates, or other TLS + /// overrides. #[must_use] pub fn with_tls(mut self, tls: TlsClientConfig) -> Self { self.tls = Some(tls); @@ -121,13 +132,13 @@ pub struct AdminClientBuilder { } impl AdminClientBuilder { - /// Creates a new builder. + /// Creates a new admin client builder with no backend configured yet. #[must_use] pub fn new() -> Self { Self::default() } - /// Configures the client to use the HTTP admin backend. + /// Configures the client to use the HTTP admin transport. #[must_use] pub fn http(mut self, settings: HttpAdminClientSettings) -> Self { self.backend = Some(BackendConfig::Http(settings)); @@ -135,6 +146,9 @@ impl AdminClientBuilder { } /// Builds the configured admin client. + /// + /// Returns an error when no backend has been configured or when the HTTP + /// transport settings are invalid. pub fn build(self) -> Result { let backend = match self.backend { Some(BackendConfig::Http(settings)) => { @@ -158,13 +172,30 @@ pub struct AdminClient { } impl AdminClient { - /// Creates a new client builder. + /// Creates a builder for constructing an [`AdminClient`]. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # fn example() -> Result<(), otap_df_admin_api::Error> { + /// let client = AdminClient::builder() + /// .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// "engine-a.internal.example", + /// 8080, + /// ))) + /// .build()?; + /// + /// # let _ = client; + /// # Ok(()) + /// # } + /// ``` #[must_use] pub fn builder() -> AdminClientBuilder { AdminClientBuilder::new() } - /// Returns the engine-scoped resource client. + /// Returns the engine-scoped resource client for engine-wide status and probes. #[must_use] pub fn engine(&self) -> EngineClient<'_> { EngineClient { @@ -172,15 +203,15 @@ impl AdminClient { } } - /// Returns the pipeline-group-scoped resource client. + /// Returns the group-scoped resource client for fleet-style status and shutdown operations. #[must_use] - pub fn pipeline_groups(&self) -> PipelineGroupsClient<'_> { - PipelineGroupsClient { + pub fn groups(&self) -> GroupsClient<'_> { + GroupsClient { backend: self.backend.as_ref(), } } - /// Returns the pipeline-scoped resource client. + /// Returns the pipeline-scoped resource client for per-pipeline status and live control. #[must_use] pub fn pipelines(&self) -> PipelinesClient<'_> { PipelinesClient { @@ -188,7 +219,7 @@ impl AdminClient { } } - /// Returns the telemetry-scoped resource client. + /// Returns the telemetry-scoped resource client for logs and structured metrics. #[must_use] pub fn telemetry(&self) -> TelemetryClient<'_> { TelemetryClient { @@ -204,40 +235,154 @@ pub struct EngineClient<'a> { } impl EngineClient<'_> { - /// Returns global pipeline status. + /// Returns the current engine-wide status snapshot. + /// + /// Use this when you need a cross-pipeline view of the running engine. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let status = client.engine().status().await?; + /// println!("pipelines={}", status.pipelines.len()); + /// # Ok(()) + /// # } + /// ``` pub async fn status(&self) -> Result { self.backend.engine_status().await } - /// Returns the global liveness probe response. + /// Returns the engine liveness probe result. + /// + /// This is the SDK equivalent of checking whether the engine process is + /// live enough to keep serving admin traffic. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let probe = client.engine().livez().await?; + /// println!("livez={:?}", probe.status); + /// # Ok(()) + /// # } + /// ``` pub async fn livez(&self) -> Result { self.backend.engine_livez().await } - /// Returns the global readiness probe response. + /// Returns the engine readiness probe result. + /// + /// Use this when orchestration or callers need to know whether the engine + /// currently considers itself ready. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let probe = client.engine().readyz().await?; + /// println!("readyz={:?}", probe.status); + /// # Ok(()) + /// # } + /// ``` pub async fn readyz(&self) -> Result { self.backend.engine_readyz().await } } -/// Pipeline-group-scoped admin client. +/// Group-scoped admin client. #[derive(Clone, Copy)] -pub struct PipelineGroupsClient<'a> { +pub struct GroupsClient<'a> { backend: &'a dyn AdminBackend, } -impl PipelineGroupsClient<'_> { - /// Returns pipeline-group status. - pub async fn status(&self) -> Result { - self.backend.pipeline_groups_status().await +impl GroupsClient<'_> { + /// Returns a group-wide status snapshot across logical pipelines. + /// + /// Use this as a fleet-style overview when you do not need full + /// engine-wide detail from [`EngineClient::status`]. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let status = client.groups().status().await?; + /// println!("pipelines={}", status.pipelines.len()); + /// # Ok(()) + /// # } + /// ``` + pub async fn status(&self) -> Result { + self.backend.groups_status().await } - /// Requests shutdown for all pipelines. + /// Requests coordinated shutdown for all running logical pipelines. + /// + /// Use `options.wait` to choose whether the call should return immediately + /// with the server's current shutdown response or wait up to + /// `options.timeout_secs` for a terminal shutdown result. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{ + /// # groups, operations, AdminClient, AdminEndpoint, HttpAdminClientSettings, + /// # }; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let response = client + /// .groups() + /// .shutdown(&operations::OperationOptions { + /// wait: true, + /// timeout_secs: 30, + /// }) + /// .await?; + /// + /// if matches!( + /// response.status, + /// groups::ShutdownStatus::Failed | groups::ShutdownStatus::Timeout + /// ) { + /// eprintln!("shutdown issues: {:?}", response.errors); + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn shutdown( &self, options: &operations::OperationOptions, - ) -> Result { - self.backend.pipeline_groups_shutdown(options).await + ) -> Result { + self.backend.groups_shutdown(options).await } } @@ -248,7 +393,202 @@ pub struct PipelinesClient<'a> { } impl PipelinesClient<'_> { - /// Returns status for one pipeline. + /// Returns the committed live configuration for one logical pipeline. + /// + /// Use this when you need the configuration that the controller currently + /// treats as active. This does not include per-core runtime progress or + /// overlapping-instance state; use [`Self::status`] for runtime status. + /// + /// Returns `Ok(None)` when the logical pipeline is not found. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// if let Some(details) = client + /// .pipelines() + /// .details("tenant-a", "ingest") + /// .await? + /// { + /// println!("active_generation={:?}", details.active_generation); + /// } + /// # Ok(()) + /// # } + /// ``` + pub async fn details( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + ) -> Result, Error> { + self.backend + .pipeline_details(pipeline_group_id, pipeline_id) + .await + } + + /// Submits a live reconfiguration request for one logical pipeline. + /// + /// The controller may treat the request as a create, resize, replace, or + /// no-op depending on how the submitted configuration differs from the + /// current committed pipeline. + /// + /// With `options.wait = false`, this returns as soon as the request has + /// either been accepted for background execution or already completed, + /// yielding [`pipelines::ReconfigureOutcome::Accepted`] or + /// [`pipelines::ReconfigureOutcome::Completed`]. + /// + /// With `options.wait = true`, this waits up to `options.timeout_secs` for + /// a terminal result and returns the latest rollout snapshot as + /// [`pipelines::ReconfigureOutcome::Completed`], + /// [`pipelines::ReconfigureOutcome::Failed`], or + /// [`pipelines::ReconfigureOutcome::TimedOut`]. + /// + /// If the server rejects the request before a rollout starts, this returns + /// [`Error::AdminOperation`]. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{ + /// # config::pipeline::{PipelineConfigBuilder, PipelineType}, + /// # operations, pipelines, AdminClient, AdminEndpoint, HttpAdminClientSettings, + /// # }; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// # let request = pipelines::ReconfigureRequest { + /// # pipeline: PipelineConfigBuilder::new() + /// # .add_receiver("ingress", "receiver:otlp", None) + /// # .add_exporter("egress", "exporter:debug", None) + /// # .to("ingress", "egress") + /// # .build(PipelineType::Otap, "tenant-a", "ingest")?, + /// # step_timeout_secs: 60, + /// # drain_timeout_secs: 60, + /// # }; + /// let outcome = client + /// .pipelines() + /// .reconfigure( + /// "tenant-a", + /// "ingest", + /// &request, + /// &operations::OperationOptions { + /// wait: true, + /// timeout_secs: 120, + /// }, + /// ) + /// .await?; + /// + /// match outcome { + /// pipelines::ReconfigureOutcome::Completed(status) => { + /// println!("rolled out generation {}", status.target_generation); + /// } + /// pipelines::ReconfigureOutcome::Accepted(status) => { + /// println!("poll rollout {}", status.rollout_id); + /// } + /// pipelines::ReconfigureOutcome::Failed(status) + /// | pipelines::ReconfigureOutcome::TimedOut(status) => { + /// eprintln!("rollout state: {:?}", status.state); + /// } + /// } + /// # Ok(()) + /// # } + /// ``` + pub async fn reconfigure( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + request: &pipelines::ReconfigureRequest, + options: &operations::OperationOptions, + ) -> Result { + self.backend + .pipeline_reconfigure(pipeline_group_id, pipeline_id, request, options) + .await + } + + /// Returns the latest known status for one previously created rollout. + /// + /// Use the `rollout_id` returned from [`Self::reconfigure`] to poll an + /// asynchronous reconfiguration operation after an + /// [`pipelines::ReconfigureOutcome::Accepted`] result. + /// + /// Returns `Ok(None)` when the requested rollout status resource is not + /// found. Terminal rollout history is retained only within a bounded + /// in-memory window, so older rollout ids may also return `Ok(None)` after + /// eviction. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let rollout_id = "rollout-42"; + /// + /// if let Some(status) = client + /// .pipelines() + /// .rollout_status("tenant-a", "ingest", rollout_id) + /// .await? + /// { + /// println!("rollout_state={:?}", status.state); + /// } + /// # Ok(()) + /// # } + /// ``` + pub async fn rollout_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + rollout_id: &str, + ) -> Result, Error> { + self.backend + .pipeline_rollout_status(pipeline_group_id, pipeline_id, rollout_id) + .await + } + + /// Returns the current runtime status for one logical pipeline. + /// + /// Use this when you need per-core phase, overlapping-instance state, + /// rollout summaries, or other runtime progress. Use [`Self::details`] when + /// you need the committed live configuration instead. + /// + /// Returns `Ok(None)` when the logical pipeline is not found. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// if let Some(status) = client + /// .pipelines() + /// .status("tenant-a", "ingest") + /// .await? + /// { + /// println!("running_cores={}", status.running_cores); + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn status( &self, pipeline_group_id: &str, @@ -259,7 +599,137 @@ impl PipelinesClient<'_> { .await } - /// Returns the liveness probe for one pipeline. + /// Requests shutdown of the currently running instances for one logical pipeline. + /// + /// With `options.wait = false`, this returns as soon as the shutdown has + /// either been accepted for background execution or already completed, + /// yielding [`pipelines::ShutdownOutcome::Accepted`] or + /// [`pipelines::ShutdownOutcome::Completed`]. + /// + /// With `options.wait = true`, this waits up to `options.timeout_secs` for + /// a terminal result and returns the latest shutdown snapshot as + /// [`pipelines::ShutdownOutcome::Completed`], + /// [`pipelines::ShutdownOutcome::Failed`], or + /// [`pipelines::ShutdownOutcome::TimedOut`]. + /// + /// If the server rejects the request before shutdown work starts, this + /// returns [`Error::AdminOperation`]. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{operations, pipelines, AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let outcome = client + /// .pipelines() + /// .shutdown( + /// "tenant-a", + /// "ingest", + /// &operations::OperationOptions { + /// wait: true, + /// timeout_secs: 60, + /// }, + /// ) + /// .await?; + /// + /// match outcome { + /// pipelines::ShutdownOutcome::Completed(status) => { + /// println!("shutdown completed: {}", status.shutdown_id); + /// } + /// pipelines::ShutdownOutcome::Accepted(status) => { + /// println!("poll shutdown {}", status.shutdown_id); + /// } + /// pipelines::ShutdownOutcome::Failed(status) + /// | pipelines::ShutdownOutcome::TimedOut(status) => { + /// eprintln!("shutdown state: {}", status.state); + /// } + /// } + /// # Ok(()) + /// # } + /// ``` + pub async fn shutdown( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + options: &operations::OperationOptions, + ) -> Result { + self.backend + .pipeline_shutdown(pipeline_group_id, pipeline_id, options) + .await + } + + /// Returns the latest known status for one previously created shutdown operation. + /// + /// Use the `shutdown_id` returned from [`Self::shutdown`] to poll an + /// asynchronous shutdown after an + /// [`pipelines::ShutdownOutcome::Accepted`] result. + /// + /// Returns `Ok(None)` when the requested shutdown status resource is not + /// found. Terminal shutdown history is retained only within a bounded + /// in-memory window, so older shutdown ids may also return `Ok(None)` after + /// eviction. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let shutdown_id = "shutdown-42"; + /// + /// if let Some(status) = client + /// .pipelines() + /// .shutdown_status("tenant-a", "ingest", shutdown_id) + /// .await? + /// { + /// println!("shutdown_state={}", status.state); + /// } + /// # Ok(()) + /// # } + /// ``` + pub async fn shutdown_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + shutdown_id: &str, + ) -> Result, Error> { + self.backend + .pipeline_shutdown_status(pipeline_group_id, pipeline_id, shutdown_id) + .await + } + + /// Returns the liveness probe result for one logical pipeline. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{pipelines, AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let probe = client.pipelines().livez("tenant-a", "ingest").await?; + /// + /// if probe.status == pipelines::ProbeStatus::Failed { + /// eprintln!("pipeline is not live: {:?}", probe.message); + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn livez( &self, pipeline_group_id: &str, @@ -270,7 +740,27 @@ impl PipelinesClient<'_> { .await } - /// Returns the readiness probe for one pipeline. + /// Returns the readiness probe result for one logical pipeline. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{pipelines, AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let probe = client.pipelines().readyz("tenant-a", "ingest").await?; + /// + /// if probe.status == pipelines::ProbeStatus::Failed { + /// eprintln!("pipeline is not ready: {:?}", probe.message); + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn readyz( &self, pipeline_group_id: &str, @@ -289,7 +779,39 @@ pub struct TelemetryClient<'a> { } impl TelemetryClient<'_> { - /// Returns retained logs or `None` when the logs endpoint is unavailable. + /// Returns retained admin logs. + /// + /// Use [`telemetry::LogsQuery`] to request only entries newer than a known + /// sequence number or to cap the number of returned entries. + /// + /// Returns `Ok(None)` when retained logs are not available on the target + /// engine. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{telemetry, AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let logs = client + /// .telemetry() + /// .logs(&telemetry::LogsQuery { + /// after: Some(1_000), + /// limit: Some(200), + /// }) + /// .await?; + /// + /// if let Some(logs) = logs { + /// println!("next_seq={}", logs.next_seq); + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn logs( &self, query: &telemetry::LogsQuery, @@ -297,7 +819,35 @@ impl TelemetryClient<'_> { self.backend.telemetry_logs(query).await } - /// Returns full structured metrics. + /// Returns structured metrics with descriptor metadata for each metric field. + /// + /// Use this form when callers need metric names, units, instrument kinds, + /// or temporality alongside metric values. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{telemetry, AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let metrics = client + /// .telemetry() + /// .metrics(&telemetry::MetricsOptions::default()) + /// .await?; + /// + /// if let Some(metric_set) = metrics.metric_sets.first() { + /// for point in &metric_set.metrics { + /// println!("{} {}", point.metadata.name, point.metadata.unit); + /// } + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn metrics( &self, options: &telemetry::MetricsOptions, @@ -305,7 +855,33 @@ impl TelemetryClient<'_> { self.backend.telemetry_metrics(options).await } - /// Returns compact structured metrics. + /// Returns structured metrics without per-field descriptor metadata. + /// + /// Use this form when callers only need current metric values and want a + /// smaller response payload than [`Self::metrics`]. + /// + /// # Examples + /// + /// ```rust + /// # use otap_df_admin_api::{telemetry, AdminClient, AdminEndpoint, HttpAdminClientSettings}; + /// # async fn example() -> Result<(), Box> { + /// # let client = AdminClient::builder() + /// # .http(HttpAdminClientSettings::new(AdminEndpoint::http( + /// # "engine-a.internal.example", + /// # 8080, + /// # ))) + /// # .build()?; + /// let metrics = client + /// .telemetry() + /// .metrics_compact(&telemetry::MetricsOptions::default()) + /// .await?; + /// + /// if let Some(metric_set) = metrics.metric_sets.first() { + /// println!("value_count={}", metric_set.metrics.len()); + /// } + /// # Ok(()) + /// # } + /// ``` pub async fn metrics_compact( &self, options: &telemetry::MetricsOptions, @@ -320,17 +896,35 @@ pub(crate) trait AdminBackend: Send + Sync { async fn engine_livez(&self) -> Result; async fn engine_readyz(&self) -> Result; - async fn pipeline_groups_status(&self) -> Result; - async fn pipeline_groups_shutdown( + async fn groups_status(&self) -> Result; + async fn groups_shutdown( &self, options: &operations::OperationOptions, - ) -> Result; + ) -> Result; async fn pipeline_status( &self, pipeline_group_id: &str, pipeline_id: &str, ) -> Result, Error>; + async fn pipeline_details( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + ) -> Result, Error>; + async fn pipeline_reconfigure( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + request: &pipelines::ReconfigureRequest, + options: &operations::OperationOptions, + ) -> Result; + async fn pipeline_rollout_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + rollout_id: &str, + ) -> Result, Error>; async fn pipeline_livez( &self, pipeline_group_id: &str, @@ -341,6 +935,18 @@ pub(crate) trait AdminBackend: Send + Sync { pipeline_group_id: &str, pipeline_id: &str, ) -> Result; + async fn pipeline_shutdown( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + options: &operations::OperationOptions, + ) -> Result; + async fn pipeline_shutdown_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + shutdown_id: &str, + ) -> Result, Error>; async fn telemetry_logs( &self, diff --git a/rust/otap-dataflow/crates/admin-api/src/endpoint.rs b/rust/otap-dataflow/crates/admin-api/src/endpoint.rs index 455dd1d728..87e31bf5ce 100644 --- a/rust/otap-dataflow/crates/admin-api/src/endpoint.rs +++ b/rust/otap-dataflow/crates/admin-api/src/endpoint.rs @@ -17,7 +17,7 @@ pub enum AdminScheme { } impl AdminScheme { - /// Returns the URL scheme string. + /// Returns the URL scheme string used when building admin endpoint URLs. #[must_use] pub const fn as_str(self) -> &'static str { match self { @@ -49,7 +49,9 @@ pub struct AdminEndpoint { } impl AdminEndpoint { - /// Creates a new endpoint. + /// Creates an endpoint from explicit scheme, host, and port components. + /// + /// This validates the endpoint fields before returning. pub fn new( scheme: AdminScheme, host: impl Into, @@ -65,7 +67,10 @@ impl AdminEndpoint { Ok(endpoint) } - /// Creates an HTTP endpoint. + /// Creates an HTTP endpoint for direct plaintext admin access. + /// + /// This constructor does not fail; validation happens later when the client + /// is built or when the endpoint is used to construct URLs. #[must_use] pub fn http(host: impl Into, port: u16) -> Self { Self { @@ -76,7 +81,10 @@ impl AdminEndpoint { } } - /// Creates an HTTPS endpoint. + /// Creates an HTTPS endpoint for admin access over TLS. + /// + /// Pair this with [`HttpAdminClientSettings::with_tls`](crate::HttpAdminClientSettings::with_tls) + /// when the server or upstream gateway requires custom CA trust or mTLS. #[must_use] pub fn https(host: impl Into, port: u16) -> Self { Self { @@ -87,13 +95,22 @@ impl AdminEndpoint { } } - /// Creates an endpoint from a socket address using HTTP. + /// Creates an HTTP endpoint from a socket address. + /// + /// This is mainly useful for local engines discovered as a concrete bind + /// address. #[must_use] pub fn from_socket_addr(addr: SocketAddr) -> Self { Self::http(addr.ip().to_string(), addr.port()) } /// Creates an endpoint from a full base URL. + /// + /// Use this when the admin API is exposed behind a gateway or reverse proxy + /// and you want to preserve the URL prefix in `base_path`. + /// + /// Query strings and fragments are rejected because SDK routes are built by + /// appending `/api/v1/...` path segments to this base URL. pub fn from_url(url: &str) -> Result { let parsed = Url::parse(url).map_err(|err| EndpointError::UrlParse { url: url.to_string(), @@ -136,14 +153,20 @@ impl AdminEndpoint { Ok(endpoint) } - /// Sets the base path used for URL construction. + /// Sets the URL path prefix used when building admin request URLs. + /// + /// This is useful when the engine is published behind a path-prefixed + /// gateway such as `/engine-a`. pub fn with_base_path(mut self, base_path: impl Into) -> Result { self.base_path = Some(base_path.into()); self.validate()?; Ok(self) } - /// Validates the endpoint fields. + /// Validates the endpoint fields without building a client. + /// + /// Most callers do not need to call this directly because client creation + /// and URL construction validate automatically. pub fn validate(&self) -> Result<(), EndpointError> { if self.host.trim().is_empty() { return Err(EndpointError::EmptyHost); @@ -159,7 +182,11 @@ impl AdminEndpoint { Ok(()) } - /// Builds a URL for the provided path segments. + /// Builds a concrete URL by appending path segments to this endpoint. + /// + /// Most SDK callers do not need this directly because the built-in HTTP + /// transport uses it internally. It is mainly useful for custom transports, + /// tests, or diagnostics. pub fn url_for_segments<'a, I>(&self, segments: I) -> Result where I: IntoIterator, diff --git a/rust/otap-dataflow/crates/admin-api/src/error.rs b/rust/otap-dataflow/crates/admin-api/src/error.rs index 41947b57e3..02a4f3b0f0 100644 --- a/rust/otap-dataflow/crates/admin-api/src/error.rs +++ b/rust/otap-dataflow/crates/admin-api/src/error.rs @@ -3,6 +3,7 @@ //! Error types for the public admin SDK. +use crate::operations::OperationError; use thiserror::Error; /// Endpoint validation and URL construction errors. @@ -89,6 +90,19 @@ pub enum Error { details: String, }, + /// The server rejected a live admin operation request before work started. + /// + /// This wraps a typed [`OperationError`] for request-level rejections such + /// as not found, conflict, or invalid request. Use the operation outcome + /// enums for requests that were accepted and later failed or timed out. + #[error("admin operation rejected with status {status}: {error:?}")] + AdminOperation { + /// HTTP status code. + status: u16, + /// Typed control-plane rejection details. + error: OperationError, + }, + /// Remote endpoint returned an unexpected HTTP status. #[error("admin endpoint returned unexpected status {status} for {method} {url}")] RemoteStatus { diff --git a/rust/otap-dataflow/crates/admin-api/src/http_backend.rs b/rust/otap-dataflow/crates/admin-api/src/http_backend.rs index fa7d370d70..c7487a9c8a 100644 --- a/rust/otap-dataflow/crates/admin-api/src/http_backend.rs +++ b/rust/otap-dataflow/crates/admin-api/src/http_backend.rs @@ -5,7 +5,7 @@ use crate::client::{AdminBackend, HttpAdminClientSettings}; use crate::endpoint::{AdminAuth, AdminEndpoint, AdminScheme}; -use crate::{Error, engine, operations, pipeline_groups, pipelines, telemetry}; +use crate::{Error, engine, groups, operations, pipelines, telemetry}; use async_trait::async_trait; use reqwest::{Certificate, ClientBuilder, Identity, Method, Url}; use serde::de::DeserializeOwned; @@ -16,6 +16,7 @@ use std::sync::OnceLock; struct RawRequest { method: Method, url: Url, + body: Option>, } struct RawResponse { @@ -50,7 +51,7 @@ impl HttpBackend { expected_statuses: &[u16], ) -> Result<(u16, T), Error> { let (status, body) = self - .request_raw(method, segments, query, expected_statuses) + .request_raw(method, segments, query, None, expected_statuses) .await?; Ok((status, self.decode_json(&body)?)) } @@ -63,7 +64,7 @@ impl HttpBackend { expected_statuses: &[u16], ) -> Result { let (status_code, body) = self - .request_raw(method, segments, query, expected_statuses) + .request_raw(method, segments, query, None, expected_statuses) .await?; let status = match status_code { 200 => pipelines::ProbeStatus::Ok, @@ -81,6 +82,7 @@ impl HttpBackend { method: Method, segments: &[&str], query: &[(&str, String)], + body: Option>, expected_statuses: &[u16], ) -> Result<(u16, Vec), Error> { let mut url = self.endpoint.url_for_segments(segments.iter().copied())?; @@ -95,6 +97,7 @@ impl HttpBackend { .send(RawRequest { method: method.clone(), url, + body, }) .await?; @@ -111,13 +114,19 @@ impl HttpBackend { } async fn send(&self, request: RawRequest) -> Result { - let RawRequest { method, url } = request; - let builder = self.client.request(method, url.clone()); + let RawRequest { method, url, body } = request; + let mut builder = self.client.request(method, url.clone()); match self.auth { AdminAuth::None => {} } + if let Some(body) = body { + builder = builder + .header(reqwest::header::CONTENT_TYPE, "application/json") + .body(body); + } + let response = builder.send().await.map_err(|err| Error::Transport { details: err.to_string(), })?; @@ -145,6 +154,11 @@ impl HttpBackend { details: err.to_string(), }) } + + fn decode_operation_error(&self, status: u16, body: &[u8]) -> Result { + let error = self.decode_json::(body)?; + Ok(Error::AdminOperation { status, error }) + } } #[async_trait] @@ -167,25 +181,20 @@ impl AdminBackend for HttpBackend { .map(|(_, body)| body) } - async fn pipeline_groups_status(&self) -> Result { - self.request_json( - Method::GET, - &["api", "v1", "pipeline-groups", "status"], - &[], - &[200], - ) - .await - .map(|(_, body)| body) + async fn groups_status(&self) -> Result { + self.request_json(Method::GET, &["api", "v1", "groups", "status"], &[], &[200]) + .await + .map(|(_, body)| body) } - async fn pipeline_groups_shutdown( + async fn groups_shutdown( &self, options: &operations::OperationOptions, - ) -> Result { + ) -> Result { let query = options.to_query_pairs(); self.request_json( Method::POST, - &["api", "v1", "pipeline-groups", "shutdown"], + &["api", "v1", "groups", "shutdown"], &query, &[200, 202, 500, 504], ) @@ -193,6 +202,111 @@ impl AdminBackend for HttpBackend { .map(|(_, body)| body) } + async fn pipeline_details( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + ) -> Result, Error> { + let (status, body) = self + .request_raw( + Method::GET, + &[ + "api", + "v1", + "groups", + pipeline_group_id, + "pipelines", + pipeline_id, + ], + &[], + None, + &[200, 404], + ) + .await?; + if status == 404 { + return Ok(None); + } + self.decode_json(&body).map(Some) + } + + async fn pipeline_reconfigure( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + request: &pipelines::ReconfigureRequest, + options: &operations::OperationOptions, + ) -> Result { + let query = options.to_query_pairs(); + let (status, body) = self + .request_raw( + Method::PUT, + &[ + "api", + "v1", + "groups", + pipeline_group_id, + "pipelines", + pipeline_id, + ], + &query, + Some( + serde_json::to_vec(request).map_err(|err| Error::ClientConfig { + details: format!("failed to encode reconfigure request: {err}"), + })?, + ), + &[200, 202, 404, 409, 422, 500, 504], + ) + .await?; + + match status { + 200 => self + .decode_json(&body) + .map(pipelines::ReconfigureOutcome::Completed), + 202 => self + .decode_json(&body) + .map(pipelines::ReconfigureOutcome::Accepted), + 409 => match self.decode_json::(&body) { + Ok(status) => Ok(pipelines::ReconfigureOutcome::Failed(status)), + Err(_) => Err(self.decode_operation_error(status, &body)?), + }, + 504 => self + .decode_json(&body) + .map(pipelines::ReconfigureOutcome::TimedOut), + 404 | 422 | 500 => Err(self.decode_operation_error(status, &body)?), + _ => unreachable!("request_raw should have filtered unexpected statuses"), + } + } + + async fn pipeline_rollout_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + rollout_id: &str, + ) -> Result, Error> { + let (status, body) = self + .request_raw( + Method::GET, + &[ + "api", + "v1", + "groups", + pipeline_group_id, + "pipelines", + pipeline_id, + "rollouts", + rollout_id, + ], + &[], + None, + &[200, 404], + ) + .await?; + if status == 404 { + return Ok(None); + } + self.decode_json(&body).map(Some) + } + async fn pipeline_status( &self, pipeline_group_id: &str, @@ -203,7 +317,7 @@ impl AdminBackend for HttpBackend { &[ "api", "v1", - "pipeline-groups", + "groups", pipeline_group_id, "pipelines", pipeline_id, @@ -226,7 +340,7 @@ impl AdminBackend for HttpBackend { &[ "api", "v1", - "pipeline-groups", + "groups", pipeline_group_id, "pipelines", pipeline_id, @@ -248,7 +362,7 @@ impl AdminBackend for HttpBackend { &[ "api", "v1", - "pipeline-groups", + "groups", pipeline_group_id, "pipelines", pipeline_id, @@ -260,6 +374,80 @@ impl AdminBackend for HttpBackend { .await } + async fn pipeline_shutdown( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + options: &operations::OperationOptions, + ) -> Result { + let query = options.to_query_pairs(); + let (status, body) = self + .request_raw( + Method::POST, + &[ + "api", + "v1", + "groups", + pipeline_group_id, + "pipelines", + pipeline_id, + "shutdown", + ], + &query, + None, + &[200, 202, 404, 409, 422, 500, 504], + ) + .await?; + + match status { + 200 => self + .decode_json(&body) + .map(pipelines::ShutdownOutcome::Completed), + 202 => self + .decode_json(&body) + .map(pipelines::ShutdownOutcome::Accepted), + 409 => match self.decode_json::(&body) { + Ok(status) => Ok(pipelines::ShutdownOutcome::Failed(status)), + Err(_) => Err(self.decode_operation_error(status, &body)?), + }, + 504 => self + .decode_json(&body) + .map(pipelines::ShutdownOutcome::TimedOut), + 404 | 422 | 500 => Err(self.decode_operation_error(status, &body)?), + _ => unreachable!("request_raw should have filtered unexpected statuses"), + } + } + + async fn pipeline_shutdown_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + shutdown_id: &str, + ) -> Result, Error> { + let (status, body) = self + .request_raw( + Method::GET, + &[ + "api", + "v1", + "groups", + pipeline_group_id, + "pipelines", + pipeline_id, + "shutdowns", + shutdown_id, + ], + &[], + None, + &[200, 404], + ) + .await?; + if status == 404 { + return Ok(None); + } + self.decode_json(&body).map(Some) + } + async fn telemetry_logs( &self, query: &telemetry::LogsQuery, @@ -270,6 +458,7 @@ impl AdminBackend for HttpBackend { Method::GET, &["api", "v1", "telemetry", "logs"], &query_pairs, + None, &[200, 404], ) .await?; @@ -533,15 +722,16 @@ fn ensure_crypto_provider() -> Result<(), Error> { mod tests { use super::*; use crate::config::tls::{TlsClientConfig, TlsConfig}; - use crate::{AdminClient, engine, operations, pipeline_groups, pipelines, telemetry}; + use crate::{AdminClient, engine, groups, operations, pipelines, telemetry}; use otap_test_tls_certs::{ExtendedKeyUsage, generate_ca}; use rustls_pki_types::{CertificateDer, PrivateKeyDer, pem::PemObject}; + use serde_json::json; use std::sync::Arc; use tempfile::tempdir; use tokio::io::{AsyncReadExt, AsyncWriteExt}; use tokio::net::TcpListener; use tokio_rustls::TlsAcceptor; - use wiremock::matchers::{method, path, query_param}; + use wiremock::matchers::{body_json, method, path, query_param}; use wiremock::{Mock, MockServer, ResponseTemplate}; fn client(server: &MockServer) -> AdminClient { @@ -552,6 +742,18 @@ mod tests { .expect("client should build") } + fn minimal_pipeline_json() -> serde_json::Value { + json!({ + "type": "otap", + "nodes": { + "recv": { + "type": "receiver:fake", + "config": {} + } + } + }) + } + async fn start_https_json_server( server_cert_pem: &str, server_key_pem: &str, @@ -682,25 +884,29 @@ mod tests { assert_eq!(response.status, engine::ProbeStatus::Failed); } + /// Scenario: the SDK calls the group shutdown endpoint with wait/query + /// options and the server returns a non-200 success body. + /// Guarantees: the HTTP backend targets `/api/v1/groups/shutdown`, + /// forwards the query parameters, and still decodes the accepted response. #[tokio::test] - async fn pipeline_groups_shutdown_accepts_query_and_non_200_success_shapes() { + async fn groups_shutdown_accepts_query_and_non_200_success_shapes() { let server = MockServer::start().await; Mock::given(method("POST")) - .and(path("/api/v1/pipeline-groups/shutdown")) + .and(path("/api/v1/groups/shutdown")) .and(query_param("wait", "true")) .and(query_param("timeout_secs", "30")) - .respond_with(ResponseTemplate::new(202).set_body_json( - pipeline_groups::ShutdownResponse { - status: pipeline_groups::ShutdownStatus::Accepted, + .respond_with( + ResponseTemplate::new(202).set_body_json(groups::ShutdownResponse { + status: groups::ShutdownStatus::Accepted, errors: None, duration_ms: None, - }, - )) + }), + ) .mount(&server) .await; let response = client(&server) - .pipeline_groups() + .groups() .shutdown(&operations::OperationOptions { wait: true, timeout_secs: 30, @@ -708,22 +914,47 @@ mod tests { .await .expect("shutdown should decode"); - assert_eq!(response.status, pipeline_groups::ShutdownStatus::Accepted); + assert_eq!(response.status, groups::ShutdownStatus::Accepted); + } + + /// Scenario: a caller requests group status through the public SDK. + /// Guarantees: the HTTP backend uses the `/api/v1/groups/status` route + /// instead of the older pipeline-groups path and decodes the payload. + #[tokio::test] + async fn groups_status_uses_groups_route() { + let server = MockServer::start().await; + Mock::given(method("GET")) + .and(path("/api/v1/groups/status")) + .respond_with(ResponseTemplate::new(200).set_body_json(groups::Status { + generated_at: "2026-01-01T00:00:00Z".to_string(), + pipelines: Default::default(), + })) + .mount(&server) + .await; + + let response = client(&server) + .groups() + .status() + .await + .expect("group status should decode"); + assert_eq!(response.generated_at, "2026-01-01T00:00:00Z"); } #[tokio::test] async fn pipeline_status_decodes_optional_payload() { let server = MockServer::start().await; Mock::given(method("GET")) - .and(path( - "/api/v1/pipeline-groups/default/pipelines/main/status", - )) + .and(path("/api/v1/groups/default/pipelines/main/status")) .respond_with( ResponseTemplate::new(200).set_body_json(Some(pipelines::Status { conditions: vec![], total_cores: 1, running_cores: 1, cores: Default::default(), + instances: None, + active_generation: None, + serving_generations: None, + rollout: None, })), ) .mount(&server) @@ -738,11 +969,307 @@ mod tests { assert!(response.is_some()); } + /// Scenario: the server returns a committed pipeline details payload for an + /// existing logical pipeline. + /// Guarantees: the SDK surfaces that payload as `Some(...)` rather than + /// treating it as an optional or missing resource. + #[tokio::test] + async fn pipeline_details_returns_some_on_200() { + let server = MockServer::start().await; + Mock::given(method("GET")) + .and(path("/api/v1/groups/default/pipelines/main")) + .respond_with(ResponseTemplate::new(200).set_body_json(json!({ + "pipelineGroupId": "default", + "pipelineId": "main", + "activeGeneration": 3, + "pipeline": minimal_pipeline_json(), + "rollout": { + "rolloutId": "rollout-3", + "state": "running", + "targetGeneration": 3, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:01Z" + } + }))) + .mount(&server) + .await; + + let response = client(&server) + .pipelines() + .details("default", "main") + .await + .expect("pipeline details should decode"); + + assert!(response.is_some()); + } + + /// Scenario: a caller submits an asynchronous reconfigure request through + /// the public SDK. + /// Guarantees: the backend serializes the request body and query options + /// correctly and maps an accepted rollout response to `Accepted`. + #[tokio::test] + async fn pipeline_reconfigure_encodes_request_and_decodes_accepted() { + let server = MockServer::start().await; + let request = pipelines::ReconfigureRequest { + pipeline: serde_json::from_value(minimal_pipeline_json()) + .expect("fixture pipeline should deserialize"), + step_timeout_secs: 45, + drain_timeout_secs: 30, + }; + Mock::given(method("PUT")) + .and(path("/api/v1/groups/default/pipelines/main")) + .and(query_param("wait", "false")) + .and(query_param("timeout_secs", "120")) + .and(body_json( + serde_json::to_value(&request).expect("request should serialize"), + )) + .respond_with(ResponseTemplate::new(202).set_body_json(json!({ + "rolloutId": "rollout-3", + "pipelineGroupId": "default", + "pipelineId": "main", + "action": "replace", + "state": "running", + "targetGeneration": 3, + "previousGeneration": 2, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:01Z", + "cores": [] + }))) + .mount(&server) + .await; + + let response = client(&server) + .pipelines() + .reconfigure( + "default", + "main", + &request, + &operations::OperationOptions { + wait: false, + timeout_secs: 120, + }, + ) + .await + .expect("reconfigure should decode"); + + match response { + pipelines::ReconfigureOutcome::Accepted(status) => { + assert_eq!(status.rollout_id, "rollout-3"); + assert_eq!(status.state, pipelines::PipelineRolloutState::Running); + } + other => panic!("unexpected outcome: {other:?}"), + } + } + + /// Scenario: a waited reconfigure request reaches a terminal failed rollout + /// and the server reports that state with a 409 status body. + /// Guarantees: the backend treats this as an operation outcome, not a typed + /// request rejection, and returns `ReconfigureOutcome::Failed`. + #[tokio::test] + async fn pipeline_reconfigure_decodes_failed_outcome_from_409_status_body() { + let server = MockServer::start().await; + Mock::given(method("PUT")) + .and(path("/api/v1/groups/default/pipelines/main")) + .and(query_param("wait", "true")) + .and(query_param("timeout_secs", "60")) + .respond_with(ResponseTemplate::new(409).set_body_json(json!({ + "rolloutId": "rollout-4", + "pipelineGroupId": "default", + "pipelineId": "main", + "action": "replace", + "state": "failed", + "targetGeneration": 4, + "previousGeneration": 3, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:10Z", + "failureReason": "candidate failed admission", + "cores": [] + }))) + .mount(&server) + .await; + + let request = pipelines::ReconfigureRequest { + pipeline: serde_json::from_value(minimal_pipeline_json()) + .expect("fixture pipeline should deserialize"), + step_timeout_secs: 60, + drain_timeout_secs: 60, + }; + + let response = client(&server) + .pipelines() + .reconfigure( + "default", + "main", + &request, + &operations::OperationOptions { + wait: true, + timeout_secs: 60, + }, + ) + .await + .expect("failed outcome should decode"); + + match response { + pipelines::ReconfigureOutcome::Failed(status) => { + assert_eq!(status.rollout_id, "rollout-4"); + assert_eq!(status.state, pipelines::PipelineRolloutState::Failed); + } + other => panic!("unexpected outcome: {other:?}"), + } + } + + /// Scenario: the server rejects a reconfigure request before any rollout + /// work starts and returns a structured operation error body. + /// Guarantees: the backend preserves that rejection as + /// `Error::AdminOperation` so callers can distinguish it from transport + /// failures and terminal rollout outcomes. + #[tokio::test] + async fn pipeline_reconfigure_decodes_admin_operation_error() { + let server = MockServer::start().await; + Mock::given(method("PUT")) + .and(path("/api/v1/groups/default/pipelines/main")) + .respond_with(ResponseTemplate::new(422).set_body_json(json!({ + "kind": "invalid_request", + "message": "topic runtime mutation is not supported" + }))) + .mount(&server) + .await; + + let request = pipelines::ReconfigureRequest { + pipeline: serde_json::from_value(minimal_pipeline_json()) + .expect("fixture pipeline should deserialize"), + step_timeout_secs: 60, + drain_timeout_secs: 60, + }; + + let err = client(&server) + .pipelines() + .reconfigure( + "default", + "main", + &request, + &operations::OperationOptions::default(), + ) + .await + .expect_err("request rejection should be typed"); + + match err { + Error::AdminOperation { status, error } => { + assert_eq!(status, 422); + assert_eq!(error.kind, operations::OperationErrorKind::InvalidRequest); + assert_eq!( + error.message.as_deref(), + Some("topic runtime mutation is not supported") + ); + } + other => panic!("unexpected error: {other}"), + } + } + + /// Scenario: a caller polls a rollout id that no longer exists or was never + /// created. + /// Guarantees: the backend maps HTTP 404 to `Ok(None)` for rollout status + /// lookups instead of treating it as an SDK error. + #[tokio::test] + async fn pipeline_rollout_status_returns_none_on_404() { + let server = MockServer::start().await; + Mock::given(method("GET")) + .and(path( + "/api/v1/groups/default/pipelines/main/rollouts/rollout-9", + )) + .respond_with(ResponseTemplate::new(404)) + .mount(&server) + .await; + + let response = client(&server) + .pipelines() + .rollout_status("default", "main", "rollout-9") + .await + .expect("rollout status should decode"); + + assert!(response.is_none()); + } + + /// Scenario: a caller waits on pipeline shutdown and the server times out + /// the wait while returning the latest shutdown snapshot. + /// Guarantees: the backend decodes that response as + /// `ShutdownOutcome::TimedOut` and preserves the embedded status. + #[tokio::test] + async fn pipeline_shutdown_decodes_timed_out_outcome() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/v1/groups/default/pipelines/main/shutdown")) + .and(query_param("wait", "true")) + .and(query_param("timeout_secs", "30")) + .respond_with(ResponseTemplate::new(504).set_body_json(json!({ + "shutdownId": "shutdown-2", + "pipelineGroupId": "default", + "pipelineId": "main", + "state": "running", + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:30Z", + "cores": [] + }))) + .mount(&server) + .await; + + let response = client(&server) + .pipelines() + .shutdown( + "default", + "main", + &operations::OperationOptions { + wait: true, + timeout_secs: 30, + }, + ) + .await + .expect("shutdown outcome should decode"); + + match response { + pipelines::ShutdownOutcome::TimedOut(status) => { + assert_eq!(status.shutdown_id, "shutdown-2"); + } + other => panic!("unexpected outcome: {other:?}"), + } + } + + /// Scenario: a caller polls a known pipeline shutdown operation by id. + /// Guarantees: the backend decodes the returned shutdown snapshot and + /// surfaces it as `Some(...)`. + #[tokio::test] + async fn pipeline_shutdown_status_returns_some_on_200() { + let server = MockServer::start().await; + Mock::given(method("GET")) + .and(path( + "/api/v1/groups/default/pipelines/main/shutdowns/shutdown-2", + )) + .respond_with(ResponseTemplate::new(200).set_body_json(json!({ + "shutdownId": "shutdown-2", + "pipelineGroupId": "default", + "pipelineId": "main", + "state": "succeeded", + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:05Z", + "cores": [] + }))) + .mount(&server) + .await; + + let response = client(&server) + .pipelines() + .shutdown_status("default", "main", "shutdown-2") + .await + .expect("shutdown status should decode"); + + assert!(response.is_some()); + } + #[tokio::test] async fn pipeline_livez_maps_failed_probe_and_message() { let server = MockServer::start().await; Mock::given(method("GET")) - .and(path("/api/v1/pipeline-groups/default/pipelines/main/livez")) + .and(path("/api/v1/groups/default/pipelines/main/livez")) .respond_with(ResponseTemplate::new(500).set_body_string("NOT OK")) .mount(&server) .await; @@ -760,7 +1287,7 @@ mod tests { async fn pipeline_livez_maps_ok_probe_without_message() { let server = MockServer::start().await; Mock::given(method("GET")) - .and(path("/api/v1/pipeline-groups/default/pipelines/main/livez")) + .and(path("/api/v1/groups/default/pipelines/main/livez")) .respond_with(ResponseTemplate::new(200).set_body_string("")) .mount(&server) .await; @@ -779,9 +1306,7 @@ mod tests { async fn pipeline_readyz_maps_service_unavailable_to_failed_probe() { let server = MockServer::start().await; Mock::given(method("GET")) - .and(path( - "/api/v1/pipeline-groups/default/pipelines/main/readyz", - )) + .and(path("/api/v1/groups/default/pipelines/main/readyz")) .respond_with(ResponseTemplate::new(503).set_body_string("NOT OK")) .mount(&server) .await; @@ -800,7 +1325,7 @@ mod tests { async fn pipeline_probe_unexpected_status_is_remote_status() { let server = MockServer::start().await; Mock::given(method("GET")) - .and(path("/api/v1/pipeline-groups/default/pipelines/main/livez")) + .and(path("/api/v1/groups/default/pipelines/main/livez")) .respond_with(ResponseTemplate::new(418).set_body_string("teapot")) .mount(&server) .await; diff --git a/rust/otap-dataflow/crates/admin-api/src/lib.rs b/rust/otap-dataflow/crates/admin-api/src/lib.rs index 8cb7e635f0..79162b0607 100644 --- a/rust/otap-dataflow/crates/admin-api/src/lib.rs +++ b/rust/otap-dataflow/crates/admin-api/src/lib.rs @@ -11,12 +11,12 @@ mod client; #[cfg(feature = "http-client")] mod http_backend; -pub use otap_df_admin_types::{engine, operations, pipeline_groups, pipelines, telemetry}; +pub use otap_df_admin_types::{engine, groups, operations, pipelines, telemetry}; pub use otap_df_config as config; #[cfg(feature = "http-client")] pub use crate::client::{ - AdminClient, AdminClientBuilder, EngineClient, HttpAdminClientSettings, PipelineGroupsClient, + AdminClient, AdminClientBuilder, EngineClient, GroupsClient, HttpAdminClientSettings, PipelinesClient, TelemetryClient, }; pub use crate::endpoint::{AdminAuth, AdminEndpoint, AdminScheme}; diff --git a/rust/otap-dataflow/crates/admin-types/Cargo.toml b/rust/otap-dataflow/crates/admin-types/Cargo.toml index 08681242f7..453ed40cbb 100644 --- a/rust/otap-dataflow/crates/admin-types/Cargo.toml +++ b/rust/otap-dataflow/crates/admin-types/Cargo.toml @@ -13,5 +13,7 @@ rust-version.workspace = true workspace = true [dependencies] +otap-df-config = { workspace = true } + serde = { workspace = true, features = ["derive"] } serde_json = { workspace = true } diff --git a/rust/otap-dataflow/crates/admin-types/src/pipeline_groups.rs b/rust/otap-dataflow/crates/admin-types/src/groups.rs similarity index 96% rename from rust/otap-dataflow/crates/admin-types/src/pipeline_groups.rs rename to rust/otap-dataflow/crates/admin-types/src/groups.rs index 61558ae8e1..3da80352e4 100644 --- a/rust/otap-dataflow/crates/admin-types/src/pipeline_groups.rs +++ b/rust/otap-dataflow/crates/admin-types/src/groups.rs @@ -1,13 +1,13 @@ // Copyright The OpenTelemetry Authors // SPDX-License-Identifier: Apache-2.0 -//! Shared pipeline-group-scoped admin models. +//! Shared group-scoped admin models. use crate::pipelines::Status as PipelineStatus; use serde::{Deserialize, Serialize}; use std::collections::BTreeMap; -/// Pipeline-group status response. +/// Group status response. #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] #[serde(rename_all = "camelCase")] pub struct Status { diff --git a/rust/otap-dataflow/crates/admin-types/src/lib.rs b/rust/otap-dataflow/crates/admin-types/src/lib.rs index 7524741dc5..9000275476 100644 --- a/rust/otap-dataflow/crates/admin-types/src/lib.rs +++ b/rust/otap-dataflow/crates/admin-types/src/lib.rs @@ -4,7 +4,7 @@ //! Shared admin request, response, query, and model types. pub mod engine; +pub mod groups; pub mod operations; -pub mod pipeline_groups; pub mod pipelines; pub mod telemetry; diff --git a/rust/otap-dataflow/crates/admin-types/src/operations.rs b/rust/otap-dataflow/crates/admin-types/src/operations.rs index 76468d6e97..d4bcc6afad 100644 --- a/rust/otap-dataflow/crates/admin-types/src/operations.rs +++ b/rust/otap-dataflow/crates/admin-types/src/operations.rs @@ -5,13 +5,17 @@ use serde::{Deserialize, Serialize}; -/// Generic options for long-running admin operations. +/// Wait behavior for long-running admin operations such as reconfigure and shutdown. +/// +/// By default operations are asynchronous: the SDK returns as soon as the +/// request has been accepted for background execution or has already completed. +/// Set `wait = true` to wait up to `timeout_secs` for a terminal result. #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] pub struct OperationOptions { - /// Whether to wait for completion. + /// Whether the SDK should wait for the operation to reach a terminal result. #[serde(default)] pub wait: bool, - /// Wait timeout in seconds. + /// Maximum number of seconds to wait when `wait` is `true`. #[serde(default = "default_timeout_secs")] pub timeout_secs: u64, } @@ -30,7 +34,10 @@ const fn default_timeout_secs() -> u64 { } impl OperationOptions { - /// Converts this request into URL query pairs. + /// Converts these options into URL query pairs for SDK transports. + /// + /// Most callers do not need this directly because the built-in HTTP + /// transport uses it automatically. #[must_use] pub fn to_query_pairs(&self) -> Vec<(&'static str, String)> { vec![ @@ -39,3 +46,80 @@ impl OperationOptions { ] } } + +/// Typed request rejection for live admin operations. +/// +/// This is returned when the server refuses to start the requested operation at +/// all. It is different from an accepted operation that later reports +/// `Failed(...)` or `TimedOut(...)`. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct OperationError { + /// Machine-readable rejection kind. + pub kind: OperationErrorKind, + /// Optional human-readable detail. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub message: Option, +} + +impl OperationError { + /// Creates a typed operation rejection without a human-readable message. + #[must_use] + pub const fn new(kind: OperationErrorKind) -> Self { + Self { + kind, + message: None, + } + } + + /// Attaches a human-readable detail message to the rejection. + #[must_use] + pub fn with_message(mut self, message: impl Into) -> Self { + self.message = Some(message.into()); + self + } +} + +/// Machine-readable rejection kinds for live admin operations. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum OperationErrorKind { + /// The requested pipeline group does not exist. + GroupNotFound, + /// The requested pipeline does not exist. + PipelineNotFound, + /// The requested rollout does not exist. + RolloutNotFound, + /// The requested shutdown does not exist. + ShutdownNotFound, + /// Another incompatible live operation is active in the server's consistency scope. + Conflict, + /// The request was rejected as invalid. + InvalidRequest, + /// The server failed while processing the request. + Internal, +} + +#[cfg(test)] +mod tests { + use super::*; + use serde_json::json; + + /// Scenario: the server returns a structured admin operation rejection in + /// the shared public wire format. + /// Guarantees: the SDK-owned `OperationError` model round-trips without + /// renaming fields or changing enum values. + #[test] + fn operation_error_roundtrips() { + let value = json!({ + "kind": "invalid_request", + "message": "core allocation change is not supported" + }); + let parsed: OperationError = + serde_json::from_value(value.clone()).expect("fixture should deserialize"); + assert_eq!( + serde_json::to_value(parsed).expect("model should serialize"), + value + ); + } +} diff --git a/rust/otap-dataflow/crates/admin-types/src/pipelines.rs b/rust/otap-dataflow/crates/admin-types/src/pipelines.rs index 2ad1e7ea53..1140757c32 100644 --- a/rust/otap-dataflow/crates/admin-types/src/pipelines.rs +++ b/rust/otap-dataflow/crates/admin-types/src/pipelines.rs @@ -3,10 +3,64 @@ //! Shared pipeline-scoped admin models. +use otap_df_config::{PipelineGroupId, PipelineId, pipeline::PipelineConfig}; use serde::{Deserialize, Deserializer, Serialize, Serializer}; use serde_json::Value; use std::collections::BTreeMap; +const fn default_rollout_timeout_secs() -> u64 { + 60 +} + +/// Rollout state summary exposed on pipeline status snapshots and rollout resources. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum PipelineRolloutState { + /// Rollout has been accepted but work has not started yet. + Pending, + /// Rollout is actively applying changes. + Running, + /// Rollout completed successfully and the target generation is serving. + Succeeded, + /// Rollout failed before completion. + Failed, + /// Automatic rollback is in progress. + RollingBack, + /// Rollback could not restore a fully healthy serving set. + RollbackFailed, +} + +/// Lightweight rollout summary embedded into pipeline status payloads. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct PipelineRolloutSummary { + /// Controller-assigned rollout identifier. + pub rollout_id: String, + /// Current rollout lifecycle state. + pub state: PipelineRolloutState, + /// Candidate generation being rolled out. + pub target_generation: u64, + /// RFC3339 timestamp for rollout creation. + pub started_at: String, + /// RFC3339 timestamp for the latest rollout state transition. + pub updated_at: String, + /// Human-readable failure or rollback reason when present. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub failure_reason: Option, +} + +/// Per-instance runtime status entry for generation-aware pipeline status payloads. +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct RuntimeInstanceStatus { + /// CPU core hosting this runtime instance. + pub core_id: usize, + /// Deployment generation for this runtime instance. + pub deployment_generation: u64, + /// Runtime status for this instance. + pub status: CoreStatus, +} + /// Pipeline status across all cores. #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] #[serde(rename_all = "camelCase")] @@ -19,6 +73,198 @@ pub struct Status { pub running_cores: usize, /// Per-core details. pub cores: BTreeMap, + /// Per-instance details when overlapping generations are present. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub instances: Option>, + /// Last committed active generation, if known. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub active_generation: Option, + /// Serving generation selected per core by the controller during rollout. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub serving_generations: Option>, + /// Optional rollout summary mirrored into `/status`. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub rollout: Option, +} + +/// Committed live definition of one logical pipeline. +/// +/// This is the configuration that the controller currently treats as active for +/// the logical pipeline. It is not a runtime status snapshot; use [`Status`] +/// when you need per-core progress or overlapping-instance state. +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct PipelineDetails { + /// Logical pipeline group id. + pub pipeline_group_id: PipelineGroupId, + /// Logical pipeline id. + pub pipeline_id: PipelineId, + /// Last committed active generation, if known. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub active_generation: Option, + /// Current live pipeline configuration. + pub pipeline: PipelineConfig, + /// Optional rollout summary mirrored into `/status`. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub rollout: Option, +} + +/// Desired pipeline definition and timing options for a live reconfiguration request. +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct ReconfigureRequest { + /// Candidate pipeline configuration to create or roll out. + pub pipeline: PipelineConfig, + /// Per-core admission/ready timeout in seconds. + #[serde(default = "default_rollout_timeout_secs")] + pub step_timeout_secs: u64, + /// Graceful drain timeout in seconds when shutting down the old generation. + #[serde(default = "default_rollout_timeout_secs")] + pub drain_timeout_secs: u64, +} + +/// Detailed per-core rollout progress. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct RolloutCoreStatus { + /// Target core for this step. + pub core_id: usize, + /// Previously serving generation on this core, if any. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub previous_generation: Option, + /// Candidate generation being launched for this core. + pub target_generation: u64, + /// Current lifecycle state for this core step. + pub state: String, + /// RFC3339 timestamp for the latest step transition. + pub updated_at: String, + /// Optional human-readable detail for failures or waits. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub detail: Option, +} + +/// Snapshot of one live reconfiguration operation. +/// +/// This describes the current state of a specific rollout id. It is operation +/// status, not a stable pipeline definition. These snapshots are retained in +/// controller memory only for a bounded window. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct RolloutStatus { + /// Controller-assigned rollout identifier. + pub rollout_id: String, + /// Logical target pipeline group id. + pub pipeline_group_id: PipelineGroupId, + /// Logical target pipeline id. + pub pipeline_id: PipelineId, + /// `create`, `noop`, `replace`, or `resize`. + pub action: String, + /// Current rollout lifecycle state. + pub state: PipelineRolloutState, + /// Candidate generation targeted by this rollout. + pub target_generation: u64, + /// Previously committed generation, if any. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub previous_generation: Option, + /// RFC3339 timestamp for rollout creation. + pub started_at: String, + /// RFC3339 timestamp for the latest rollout transition. + pub updated_at: String, + /// Optional failure or rollback reason. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub failure_reason: Option, + /// Per-core rollout progress entries. + pub cores: Vec, +} + +/// Detailed per-core shutdown progress. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct ShutdownCoreStatus { + /// Target core being drained. + pub core_id: usize, + /// Deployment generation targeted for shutdown on this core. + pub deployment_generation: u64, + /// Current lifecycle state for this core shutdown step. + pub state: String, + /// RFC3339 timestamp for the latest step transition. + pub updated_at: String, + /// Optional human-readable detail for failures or waits. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub detail: Option, +} + +/// Snapshot of one pipeline shutdown operation. +/// +/// This describes the current state of a specific shutdown id. It is operation +/// status, not a stable pipeline definition. These snapshots are retained in +/// controller memory only for a bounded window. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct ShutdownStatus { + /// Controller-assigned shutdown identifier. + pub shutdown_id: String, + /// Logical target pipeline group id. + pub pipeline_group_id: PipelineGroupId, + /// Logical target pipeline id. + pub pipeline_id: PipelineId, + /// Current shutdown lifecycle state. + pub state: String, + /// RFC3339 timestamp for shutdown creation. + pub started_at: String, + /// RFC3339 timestamp for the latest shutdown transition. + pub updated_at: String, + /// Optional failure reason when shutdown does not complete cleanly. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub failure_reason: Option, + /// Per-core shutdown progress entries. + pub cores: Vec, +} + +/// Caller-facing outcome of a live reconfiguration request. +/// +/// The variant tells you whether the request was only accepted, reached a +/// terminal state within the requested wait window, or outlived that wait +/// window. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub enum ReconfigureOutcome { + /// The request was accepted and the rollout continues asynchronously. + /// + /// Poll [`RolloutStatus`] later if you need progress or a terminal + /// result. + Accepted(RolloutStatus), + /// The rollout reached a successful terminal state within the requested wait window. + Completed(RolloutStatus), + /// The rollout reached a failed terminal state within the requested wait window. + Failed(RolloutStatus), + /// The requested wait window elapsed before the rollout reached a terminal state. + /// + /// The included snapshot is the latest known rollout status. The rollout + /// may still continue running in the engine. + TimedOut(RolloutStatus), +} + +/// Caller-facing outcome of a pipeline shutdown request. +/// +/// The variant tells you whether the request was only accepted, reached a +/// terminal state within the requested wait window, or outlived that wait +/// window. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub enum ShutdownOutcome { + /// The request was accepted and the shutdown continues asynchronously. + /// + /// Poll [`ShutdownStatus`] later if you need progress or a terminal + /// result. + Accepted(ShutdownStatus), + /// The shutdown reached a successful terminal state within the requested wait window. + Completed(ShutdownStatus), + /// The shutdown reached a failed terminal state within the requested wait window. + Failed(ShutdownStatus), + /// The requested wait window elapsed before the shutdown reached a terminal state. + /// + /// The included snapshot is the latest known shutdown status. The shutdown + /// may still continue running in the engine. + TimedOut(ShutdownStatus), } /// Per-core pipeline status. @@ -573,6 +819,172 @@ mod tests { } ] } + }, + "instances": [ + { + "coreId": 0, + "deploymentGeneration": 7, + "status": { + "phase": "running", + "lastHeartbeatTime": "2026-01-01T00:00:00Z", + "conditions": [], + "deletePending": false + } + } + ], + "activeGeneration": 7, + "servingGenerations": { + "0": 7 + }, + "rollout": { + "rolloutId": "rollout-7", + "state": "running", + "targetGeneration": 7, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:01Z" + } + })); + } + + /// Scenario: a caller serializes or deserializes the public live + /// reconfiguration request body. + /// Guarantees: the shared SDK model preserves the committed camelCase wire + /// shape for pipeline config and timeout fields. + #[test] + fn reconfigure_request_roundtrips_current_wire_shape() { + assert_roundtrip::(json!({ + "pipeline": { + "type": "otap", + "nodes": { + "recv": { + "type": "urn:otel:receiver:fake", + "config": {} + } + } + }, + "stepTimeoutSecs": 45, + "drainTimeoutSecs": 30 + })); + } + + /// Scenario: a caller reads the committed pipeline-details resource through + /// the public SDK. + /// Guarantees: the shared model preserves the current wire shape for the + /// committed config, active generation, and embedded rollout summary. + #[test] + fn pipeline_details_roundtrips_current_wire_shape() { + assert_roundtrip::(json!({ + "pipelineGroupId": "default", + "pipelineId": "main", + "activeGeneration": 2, + "pipeline": { + "type": "otap", + "nodes": { + "recv": { + "type": "urn:otel:receiver:fake", + "config": {} + } + } + }, + "rollout": { + "rolloutId": "rollout-2", + "state": "succeeded", + "targetGeneration": 2, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:05Z" + } + })); + } + + /// Scenario: a caller polls the status of one rollout operation by id. + /// Guarantees: the shared rollout snapshot model round-trips the current + /// wire shape, including action, lifecycle state, and per-core progress. + #[test] + fn pipeline_rollout_status_roundtrips_current_wire_shape() { + assert_roundtrip::(json!({ + "rolloutId": "rollout-2", + "pipelineGroupId": "default", + "pipelineId": "main", + "action": "replace", + "state": "rolling_back", + "targetGeneration": 2, + "previousGeneration": 1, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:05Z", + "failureReason": "candidate failed admission", + "cores": [ + { + "coreId": 0, + "previousGeneration": 1, + "targetGeneration": 2, + "state": "waiting_ready", + "updatedAt": "2026-01-01T00:00:03Z" + } + ] + })); + } + + /// Scenario: a caller polls the status of one pipeline shutdown operation + /// by id. + /// Guarantees: the shared shutdown snapshot model round-trips the current + /// wire shape, including failure detail and per-core progress. + #[test] + fn pipeline_shutdown_status_roundtrips_current_wire_shape() { + assert_roundtrip::(json!({ + "shutdownId": "shutdown-1", + "pipelineGroupId": "default", + "pipelineId": "main", + "state": "failed", + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:05Z", + "failureReason": "drain deadline exceeded", + "cores": [ + { + "coreId": 0, + "deploymentGeneration": 2, + "state": "draining", + "updatedAt": "2026-01-01T00:00:05Z" + } + ] + })); + } + + /// Scenario: the SDK receives a waited reconfigure result that completed + /// within the requested wait window. + /// Guarantees: the caller-facing outcome enum preserves the external wire + /// encoding for completed rollout results. + #[test] + fn reconfigure_outcome_roundtrips() { + assert_roundtrip::(json!({ + "Completed": { + "rolloutId": "rollout-2", + "pipelineGroupId": "default", + "pipelineId": "main", + "action": "noop", + "state": "succeeded", + "targetGeneration": 2, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:00Z", + "cores": [] + } + })); + } + + /// Scenario: the SDK receives a waited shutdown result whose wait window + /// expired before the operation finished. + /// Guarantees: the caller-facing outcome enum preserves the external wire + /// encoding for timed-out shutdown results. + #[test] + fn shutdown_outcome_roundtrips() { + assert_roundtrip::(json!({ + "TimedOut": { + "shutdownId": "shutdown-1", + "pipelineGroupId": "default", + "pipelineId": "main", + "state": "running", + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:05Z", + "cores": [] } })); } diff --git a/rust/otap-dataflow/crates/admin-types/src/telemetry.rs b/rust/otap-dataflow/crates/admin-types/src/telemetry.rs index cc9fd01444..3a49c7bb8e 100644 --- a/rust/otap-dataflow/crates/admin-types/src/telemetry.rs +++ b/rust/otap-dataflow/crates/admin-types/src/telemetry.rs @@ -18,7 +18,10 @@ pub struct MetricsOptions { } impl MetricsOptions { - /// Converts these options into URL query pairs. + /// Converts these options into URL query pairs for SDK transports. + /// + /// Most callers do not need this directly because the built-in HTTP + /// transport uses it automatically. #[must_use] pub fn to_query_pairs(&self) -> Vec<(&'static str, String)> { vec![ @@ -189,7 +192,10 @@ pub struct LogsQuery { } impl LogsQuery { - /// Converts this request into URL query pairs. + /// Converts this query into URL query pairs for SDK transports. + /// + /// Most callers do not need this directly because the built-in HTTP + /// transport uses it automatically. #[must_use] pub fn to_query_pairs(&self) -> Vec<(&'static str, String)> { let mut pairs = Vec::new(); diff --git a/rust/otap-dataflow/crates/admin/README.md b/rust/otap-dataflow/crates/admin/README.md index 7cd40c4472..c121e83f5e 100644 --- a/rust/otap-dataflow/crates/admin/README.md +++ b/rust/otap-dataflow/crates/admin/README.md @@ -3,12 +3,16 @@ `otap-df-admin` provides: - admin, health, status, and telemetry HTTP endpoints; +- live pipeline mutation endpoints for create, replace, resize, rollout + tracking, and shutdown tracking; - an embedded single-page UI served from the same process and origin. For architecture and runtime behavior details, see [`docs/admin/architecture.md`](../../docs/admin/architecture.md). For the admin docs landing page, see [`docs/admin/README.md`](../../docs/admin/README.md). +For the operator guide to live pipeline mutation, see +[`docs/admin/live-reconfiguration.md`](../../docs/admin/live-reconfiguration.md). ## Main routes @@ -30,11 +34,19 @@ For the admin docs landing page, see - `GET /api/v1/status` - `GET /api/v1/livez` - `GET /api/v1/readyz` -- `GET /api/v1/pipeline-groups/status` -- `GET /api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/status` -- `GET /api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/livez` -- `GET /api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/readyz` -- `POST /api/v1/pipeline-groups/shutdown` +- `GET /api/v1/groups/status` +- `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}` +- `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/status` +- `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/rollouts/{rollout_id}` +- `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdowns/{shutdown_id}` +- `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/livez` +- `GET /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/readyz` + +### Pipeline lifecycle + +- `PUT /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}` +- `POST /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown` +- `POST /api/v1/groups/shutdown` ## Embedded UI layout (crate-relative) @@ -86,8 +98,11 @@ guidance, see [`docs/admin/architecture.md`](../../docs/admin/architecture.md). through an enforced integration layer). - [ ] Add TLS support in-process or enforce TLS at a mandatory front proxy boundary. -- [ ] Protect `POST /pipeline-groups/shutdown` with stricter access controls - than read-only endpoints. +- [ ] Protect mutating endpoints such as + `PUT /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}`, + `POST /api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown`, + and `POST /api/v1/groups/shutdown` with stricter access controls than + read-only endpoints. - [ ] Apply the same hardened response headers to API endpoints (`/api/v1/status`, `/api/v1/livez`, `/api/v1/readyz`, `/api/v1/telemetry/*`, `/api/v1/metrics`), not only UI/static. @@ -105,4 +120,6 @@ guidance, see [`docs/admin/architecture.md`](../../docs/admin/architecture.md). - strong authentication/authorization - network ACLs / source allow-listing - route-level restrictions for mutating endpoints such as - `/api/v1/pipeline-groups/shutdown` + `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}`, + `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown`, + and `/api/v1/groups/shutdown` diff --git a/rust/otap-dataflow/crates/admin/src/convert.rs b/rust/otap-dataflow/crates/admin/src/convert.rs index 73aca5a920..e7043b289a 100644 --- a/rust/otap-dataflow/crates/admin/src/convert.rs +++ b/rust/otap-dataflow/crates/admin/src/convert.rs @@ -3,8 +3,6 @@ //! Conversion helpers from internal admin/server types to public SDK models. -use otap_df_admin_types::telemetry as api; -use otap_df_telemetry::attributes::AttributeValue; use serde::Serialize; use serde::de::DeserializeOwned; @@ -18,19 +16,3 @@ where ) .expect("public admin model should deserialize from the current wire shape") } - -/// Convert an engine `AttributeValue` to the public admin API representation. -pub(crate) fn convert_attribute_value(value: &AttributeValue) -> api::AttributeValue { - match value { - AttributeValue::String(s) => api::AttributeValue::String(s.clone()), - AttributeValue::Int(v) => api::AttributeValue::Int(*v), - AttributeValue::UInt(v) => api::AttributeValue::UInt(*v), - AttributeValue::Double(v) => api::AttributeValue::Double(*v), - AttributeValue::Boolean(v) => api::AttributeValue::Boolean(*v), - AttributeValue::Map(m) => api::AttributeValue::Map( - m.iter() - .map(|(k, v)| (k.clone(), convert_attribute_value(v))) - .collect(), - ), - } -} diff --git a/rust/otap-dataflow/crates/admin/src/lib.rs b/rust/otap-dataflow/crates/admin/src/lib.rs index 0b20e1c668..f99600a8ce 100644 --- a/rust/otap-dataflow/crates/admin/src/lib.rs +++ b/rust/otap-dataflow/crates/admin/src/lib.rs @@ -12,22 +12,118 @@ mod pipeline_group; mod telemetry; use axum::Router; +use otap_df_admin_types::operations::{OperationError, OperationErrorKind}; +pub use otap_df_admin_types::pipelines::{ + PipelineDetails, PipelineRolloutState, PipelineRolloutSummary, ReconfigureRequest, + RolloutCoreStatus, RolloutStatus, ShutdownCoreStatus, ShutdownStatus, +}; +use serde::Serialize; use std::net::SocketAddr; use std::sync::Arc; use tokio::net::TcpListener; -use tokio::sync::Mutex; use tokio_util::sync::CancellationToken; use tower::ServiceBuilder; use crate::error::Error; use otap_df_config::engine::HttpAdminSettings; -use otap_df_engine::control::PipelineAdminSender; use otap_df_engine::memory_limiter::MemoryPressureState; use otap_df_state::store::ObservedStateHandle; use otap_df_telemetry::log_tap::InternalLogTapHandle; use otap_df_telemetry::registry::TelemetryRegistryHandle; use otap_df_telemetry::{otel_info, otel_warn}; +/// Control-plane error surfaced to admin handlers. +#[derive(Debug, Clone, Serialize, PartialEq, Eq)] +#[serde(tag = "kind", rename_all = "snake_case")] +pub enum ControlPlaneError { + /// The requested pipeline group does not exist. + GroupNotFound, + /// The requested pipeline does not exist. + PipelineNotFound, + /// Another incompatible live operation is active in the current consistency scope. + RolloutConflict, + /// Submitted pipeline configuration failed validation or violated a runtime boundary. + InvalidRequest { + /// Human-readable validation failure detail. + message: String, + }, + /// The requested rollout could not be found. + RolloutNotFound, + /// The requested shutdown could not be found. + ShutdownNotFound, + /// Unexpected internal failure while processing the request. + Internal { + /// Human-readable internal failure detail. + message: String, + }, +} + +impl ControlPlaneError { + /// Converts a control-plane error into the public operation rejection model. + #[must_use] + pub fn as_operation_error(&self) -> OperationError { + match self { + Self::GroupNotFound => OperationError::new(OperationErrorKind::GroupNotFound), + Self::PipelineNotFound => OperationError::new(OperationErrorKind::PipelineNotFound), + Self::RolloutConflict => OperationError::new(OperationErrorKind::Conflict), + Self::InvalidRequest { message } => { + OperationError::new(OperationErrorKind::InvalidRequest) + .with_message(message.clone()) + } + Self::RolloutNotFound => OperationError::new(OperationErrorKind::RolloutNotFound), + Self::ShutdownNotFound => OperationError::new(OperationErrorKind::ShutdownNotFound), + Self::Internal { message } => { + OperationError::new(OperationErrorKind::Internal).with_message(message.clone()) + } + } + } +} + +/// Control-plane interface implemented by the controller runtime. +pub trait ControlPlane: Send + Sync { + /// Requests shutdown of all currently running runtime instances. + fn shutdown_all(&self, timeout_secs: u64) -> Result<(), ControlPlaneError>; + + /// Requests shutdown of all currently running runtime instances for one logical pipeline. + fn shutdown_pipeline( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + timeout_secs: u64, + ) -> Result; + + /// Reconfigures a logical pipeline and returns the rollout job snapshot. + fn reconfigure_pipeline( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + request: ReconfigureRequest, + ) -> Result; + + /// Returns the live active config for a logical pipeline. + fn pipeline_details( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + ) -> Result, ControlPlaneError>; + + /// Returns the detailed status for a rollout job. + fn rollout_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + rollout_id: &str, + ) -> Result, ControlPlaneError>; + + /// Returns the detailed status for a shutdown job. + fn shutdown_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + shutdown_id: &str, + ) -> Result, ControlPlaneError>; +} + /// Shared state for the HTTP admin server. #[derive(Clone)] struct AppState { @@ -37,12 +133,12 @@ struct AppState { /// The metrics registry for querying current metrics. metrics_registry: TelemetryRegistryHandle, + /// Resident controller control plane for runtime mutations. + controller: Arc, + /// Optional internal log tap for querying retained internal logs. log_tap: Option, - /// The control message senders for controlling pipelines. - ctrl_msg_senders: Arc>>>, - /// Shared process-wide memory pressure state. memory_pressure_state: MemoryPressureState, } @@ -51,7 +147,7 @@ struct AppState { pub async fn run( config: HttpAdminSettings, observed_store: ObservedStateHandle, - ctrl_msg_senders: Vec>, + controller: Arc, metrics_registry: TelemetryRegistryHandle, memory_pressure_state: MemoryPressureState, log_tap: Option, @@ -60,8 +156,8 @@ pub async fn run( let app_state = AppState { observed_state_store: observed_store, metrics_registry, + controller, log_tap, - ctrl_msg_senders: Arc::new(Mutex::new(ctrl_msg_senders)), memory_pressure_state, }; diff --git a/rust/otap-dataflow/crates/admin/src/pipeline.rs b/rust/otap-dataflow/crates/admin/src/pipeline.rs index b86d9bd3a1..a9a6dec221 100644 --- a/rust/otap-dataflow/crates/admin/src/pipeline.rs +++ b/rust/otap-dataflow/crates/admin/src/pipeline.rs @@ -2,49 +2,376 @@ // SPDX-License-Identifier: Apache-2.0 //! Pipeline endpoints. -//! Status: Not implemented. //! -//! - GET `/api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}` +//! - GET `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}` //! Get the configuration of the specified pipeline. -//! - GET `/api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/status` +//! - GET `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/status` //! Get the status of the specified pipeline. -//! - POST `/api/v1/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown` -//! Shutdown a specific pipeline +//! - GET `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/rollouts/{rollout_id}` +//! Get the status of a specific rollout job for the logical pipeline. +//! Older rollout ids may return `404 Not Found` after bounded in-memory +//! retention evicts terminal history. +//! - GET `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdowns/{shutdown_id}` +//! Get the status of a specific shutdown job for the logical pipeline. +//! Older shutdown ids may return `404 Not Found` after bounded in-memory +//! retention evicts terminal history. +//! - PUT `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}` +//! Create or replace a pipeline and return a rollout job status snapshot. +//! - POST `/api/v1/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown` +//! Shutdown a specific logical pipeline and return a shutdown job status snapshot. +//! - Query parameters: +//! - `wait` (bool, default: false) - if true, block until the pipeline stops +//! - `timeout_secs` (u64, default: 60) - maximum seconds to wait when `wait=true` +//! - 200 OK if `wait=true` and the pipeline stopped successfully //! - 202 Accepted if the stop request was accepted and is being processed (async operation) //! - 400 Bad Request if the pipeline is already stopped -//! - 404 Not Found if the pipeline does not exist +//! - 404 Not Found if the group or pipeline does not exist +//! - 409 Conflict if a rollout or shutdown is active for the pipeline, or if a waited +//! shutdown fails +//! - 500 Internal Server Error if the stop request could not be processed +//! - 504 Gateway Timeout if `wait=true` and the pipeline did not stop within timeout //! //! ToDo Alternative -> avoid verb-y subpaths and support PATCH /.../pipelines/{pipelineId} with a body like {"status":"stopped"}. Use 409 if already stopping/stopped. use crate::AppState; use crate::convert::json_shape; -use axum::extract::{Path, State}; +use axum::extract::{Path, Query, State}; use axum::http::StatusCode; -use axum::response::IntoResponse; -use axum::routing::get; +use axum::response::{IntoResponse, Response}; +use axum::routing::{get, post}; use axum::{Json, Router}; -use otap_df_admin_types::pipelines::Status as ApiPipelineStatus; +use otap_df_admin_types::pipelines::{PipelineRolloutState, Status as ApiPipelineStatus}; use otap_df_config::PipelineKey; +use otap_df_telemetry::otel_info; +use serde::Deserialize; +use std::time::{Duration, Instant}; /// All the routes for pipelines. pub(crate) fn routes() -> Router { Router::new() + .route( + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}", + get(show_pipeline).put(put_pipeline), + ) // Returns the status of a specific pipeline. .route( - "/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/status", + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}/status", get(show_status), ) + .route( + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}/rollouts/{rollout_id}", + get(show_rollout), + ) + .route( + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdowns/{shutdown_id}", + get(show_shutdown), + ) + .route( + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}/shutdown", + post(shutdown_pipeline), + ) // liveness and readiness probes. .route( - "/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/livez", + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}/livez", get(liveness), ) .route( - "/pipeline-groups/{pipeline_group_id}/pipelines/{pipeline_id}/readyz", + "/groups/{pipeline_group_id}/pipelines/{pipeline_id}/readyz", get(readiness), ) } +#[derive(Deserialize)] +pub(crate) struct WaitParams { + #[serde(default)] + wait: bool, + #[serde(default = "default_timeout_secs")] + timeout_secs: u64, +} + +const fn default_timeout_secs() -> u64 { + 60 +} + +/// Converts a typed control-plane rejection into the shared HTTP error shape. +fn operation_error_response(status: StatusCode, error: crate::ControlPlaneError) -> Response { + (status, Json(error.as_operation_error())).into_response() +} + +/// Returns whether a rollout status is already in a terminal state. +fn rollout_is_terminal(state: PipelineRolloutState) -> bool { + matches!( + state, + PipelineRolloutState::Succeeded + | PipelineRolloutState::Failed + | PipelineRolloutState::RollbackFailed + ) +} + +/// Returns whether a terminal rollout finished successfully. +fn rollout_is_success(state: PipelineRolloutState) -> bool { + state == PipelineRolloutState::Succeeded +} + +/// Returns whether a shutdown status string represents a terminal state. +fn shutdown_is_terminal(state: &str) -> bool { + matches!(state, "succeeded" | "failed") +} + +/// Returns whether a terminal shutdown finished successfully. +fn shutdown_is_success(state: &str) -> bool { + state == "succeeded" +} + +/// Returns committed configuration details for one logical pipeline. +pub async fn show_pipeline( + Path((pipeline_group_id, pipeline_id)): Path<(String, String)>, + State(state): State, +) -> Result, StatusCode> { + match state + .controller + .pipeline_details(&pipeline_group_id, &pipeline_id) + { + Ok(Some(details)) => Ok(Json(details)), + Ok(None) => Err(StatusCode::NOT_FOUND), + Err( + crate::ControlPlaneError::PipelineNotFound | crate::ControlPlaneError::GroupNotFound, + ) => Err(StatusCode::NOT_FOUND), + Err(_) => Err(StatusCode::INTERNAL_SERVER_ERROR), + } +} + +/// Starts a pipeline reconfiguration and optionally waits for its terminal result. +pub async fn put_pipeline( + Path((pipeline_group_id, pipeline_id)): Path<(String, String)>, + Query(params): Query, + State(state): State, + Json(request): Json, +) -> impl IntoResponse { + let rollout = + match state + .controller + .reconfigure_pipeline(&pipeline_group_id, &pipeline_id, request) + { + Ok(rollout) => rollout, + Err(crate::ControlPlaneError::GroupNotFound) => { + return operation_error_response( + StatusCode::NOT_FOUND, + crate::ControlPlaneError::GroupNotFound, + ); + } + Err(crate::ControlPlaneError::RolloutConflict) => { + return operation_error_response( + StatusCode::CONFLICT, + crate::ControlPlaneError::RolloutConflict, + ); + } + Err(crate::ControlPlaneError::InvalidRequest { message }) => { + return operation_error_response( + StatusCode::UNPROCESSABLE_ENTITY, + crate::ControlPlaneError::InvalidRequest { message }, + ); + } + Err(other) => { + return operation_error_response(StatusCode::INTERNAL_SERVER_ERROR, other); + } + }; + + if !params.wait { + let status = if rollout_is_terminal(rollout.state) { + if rollout_is_success(rollout.state) { + StatusCode::OK + } else { + StatusCode::CONFLICT + } + } else { + StatusCode::ACCEPTED + }; + return (status, Json(rollout)).into_response(); + } + + let deadline = Instant::now() + Duration::from_secs(params.timeout_secs); + let mut last_status = Some(rollout); + loop { + let Some(rollout_id) = last_status.as_ref().map(|status| status.rollout_id.clone()) else { + return operation_error_response( + StatusCode::INTERNAL_SERVER_ERROR, + crate::ControlPlaneError::Internal { + message: "initial rollout status disappeared while waiting".to_string(), + }, + ); + }; + match state + .controller + .rollout_status(&pipeline_group_id, &pipeline_id, &rollout_id) + { + Ok(Some(current)) if rollout_is_terminal(current.state) => { + let status = if rollout_is_success(current.state) { + StatusCode::OK + } else { + StatusCode::CONFLICT + }; + return (status, Json(current)).into_response(); + } + Ok(Some(current)) => { + last_status = Some(current); + } + Ok(None) | Err(crate::ControlPlaneError::RolloutNotFound) => { + return operation_error_response( + StatusCode::NOT_FOUND, + crate::ControlPlaneError::RolloutNotFound, + ); + } + Err(other) => { + return operation_error_response(StatusCode::INTERNAL_SERVER_ERROR, other); + } + } + + if Instant::now() >= deadline { + return match last_status { + Some(status) => (StatusCode::GATEWAY_TIMEOUT, Json(status)).into_response(), + None => operation_error_response( + StatusCode::INTERNAL_SERVER_ERROR, + crate::ControlPlaneError::Internal { + message: "rollout status disappeared before timeout response".to_string(), + }, + ), + }; + } + tokio::time::sleep(Duration::from_millis(100)).await; + } +} + +/// Returns the latest snapshot for one rollout operation id. +pub async fn show_rollout( + Path((pipeline_group_id, pipeline_id, rollout_id)): Path<(String, String, String)>, + State(state): State, +) -> Result, StatusCode> { + match state + .controller + .rollout_status(&pipeline_group_id, &pipeline_id, &rollout_id) + { + Ok(Some(status)) => Ok(Json(status)), + Ok(None) => Err(StatusCode::NOT_FOUND), + Err(crate::ControlPlaneError::RolloutNotFound) => Err(StatusCode::NOT_FOUND), + Err(_) => Err(StatusCode::INTERNAL_SERVER_ERROR), + } +} + +/// Returns the latest snapshot for one shutdown operation id. +pub async fn show_shutdown( + Path((pipeline_group_id, pipeline_id, shutdown_id)): Path<(String, String, String)>, + State(state): State, +) -> Result, StatusCode> { + match state + .controller + .shutdown_status(&pipeline_group_id, &pipeline_id, &shutdown_id) + { + Ok(Some(status)) => Ok(Json(status)), + Ok(None) => Err(StatusCode::NOT_FOUND), + Err(crate::ControlPlaneError::ShutdownNotFound) => Err(StatusCode::NOT_FOUND), + Err(_) => Err(StatusCode::INTERNAL_SERVER_ERROR), + } +} + +/// Starts a tracked shutdown for one logical pipeline and optionally waits. +pub async fn shutdown_pipeline( + Path((pipeline_group_id, pipeline_id)): Path<(String, String)>, + Query(params): Query, + State(state): State, +) -> impl IntoResponse { + otel_info!( + "pipeline.shutdown.requested", + pipeline_group_id = pipeline_group_id.as_str(), + pipeline_id = pipeline_id.as_str(), + wait = params.wait, + timeout_secs = params.timeout_secs + ); + + match state + .controller + .shutdown_pipeline(&pipeline_group_id, &pipeline_id, params.timeout_secs) + { + Ok(shutdown) => { + if !params.wait { + return (StatusCode::ACCEPTED, Json(shutdown)).into_response(); + } + + let deadline = Instant::now() + Duration::from_secs(params.timeout_secs); + let mut last_status = Some(shutdown); + loop { + let Some(shutdown_id) = last_status + .as_ref() + .map(|status| status.shutdown_id.clone()) + else { + return operation_error_response( + StatusCode::INTERNAL_SERVER_ERROR, + crate::ControlPlaneError::Internal { + message: "initial shutdown status disappeared while waiting" + .to_string(), + }, + ); + }; + match state.controller.shutdown_status( + &pipeline_group_id, + &pipeline_id, + &shutdown_id, + ) { + Ok(Some(current)) if shutdown_is_terminal(¤t.state) => { + let status = if shutdown_is_success(¤t.state) { + StatusCode::OK + } else { + StatusCode::CONFLICT + }; + return (status, Json(current)).into_response(); + } + Ok(Some(current)) => { + last_status = Some(current); + } + Ok(None) | Err(crate::ControlPlaneError::ShutdownNotFound) => { + return operation_error_response( + StatusCode::NOT_FOUND, + crate::ControlPlaneError::ShutdownNotFound, + ); + } + Err(other) => { + return operation_error_response(StatusCode::INTERNAL_SERVER_ERROR, other); + } + } + + if Instant::now() >= deadline { + return match last_status { + Some(status) => (StatusCode::GATEWAY_TIMEOUT, Json(status)).into_response(), + None => operation_error_response( + StatusCode::INTERNAL_SERVER_ERROR, + crate::ControlPlaneError::Internal { + message: "shutdown status disappeared before timeout response" + .to_string(), + }, + ), + }; + } + + tokio::time::sleep(Duration::from_millis(100)).await; + } + } + Err(error @ crate::ControlPlaneError::GroupNotFound) + | Err(error @ crate::ControlPlaneError::PipelineNotFound) => { + operation_error_response(StatusCode::NOT_FOUND, error) + } + Err(crate::ControlPlaneError::RolloutConflict) => operation_error_response( + StatusCode::CONFLICT, + crate::ControlPlaneError::RolloutConflict, + ), + Err(crate::ControlPlaneError::InvalidRequest { message }) => operation_error_response( + StatusCode::UNPROCESSABLE_ENTITY, + crate::ControlPlaneError::InvalidRequest { message }, + ), + Err(other) => operation_error_response(StatusCode::INTERNAL_SERVER_ERROR, other), + } +} + +/// Returns aggregated runtime status for one logical pipeline. pub async fn show_status( Path((pipeline_group_id, pipeline_id)): Path<(String, String)>, State(state): State, @@ -66,6 +393,8 @@ pub async fn show_status( /// - Should be cheap and internal (not dependent on external systems). /// /// ToDo Implement heartbeat checks. +/// +/// Serves the liveness probe for one logical pipeline. async fn liveness( Path((pipeline_group_id, pipeline_id)): Path<(String, String)>, State(state): State, @@ -86,6 +415,8 @@ async fn liveness( /// - Gate traffic until startup work is done (pipeline deployed and running). /// - Temporarily remove the Pod from load balancing when it can't serve correctly. /// - Can check key dependencies, but avoid making it too fragile. +/// +/// Serves the readiness probe for one logical pipeline. async fn readiness( Path((pipeline_group_id, pipeline_id)): Path<(String, String)>, State(state): State, @@ -98,3 +429,317 @@ async fn readiness( (StatusCode::SERVICE_UNAVAILABLE, "NOT OK") } } + +#[cfg(test)] +mod tests { + use super::*; + use crate::{ControlPlane, ControlPlaneError, PipelineDetails, RolloutStatus, ShutdownStatus}; + use axum::body::to_bytes; + use otap_df_admin_types::operations::{OperationError, OperationErrorKind}; + use otap_df_config::observed_state::ObservedStateSettings; + use otap_df_engine::memory_limiter::MemoryPressureState; + use otap_df_state::store::ObservedStateStore; + use otap_df_telemetry::registry::TelemetryRegistryHandle; + use serde_json::json; + use std::sync::Arc; + + #[derive(Clone)] + struct StubControlPlane { + replace_result: Result, + rollout_status_result: Result, ControlPlaneError>, + shutdown_result: Result, + shutdown_status_result: Result, ControlPlaneError>, + } + + impl ControlPlane for StubControlPlane { + fn shutdown_all(&self, _timeout_secs: u64) -> Result<(), ControlPlaneError> { + Ok(()) + } + + fn shutdown_pipeline( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _timeout_secs: u64, + ) -> Result { + self.shutdown_result.clone() + } + + fn reconfigure_pipeline( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _request: crate::ReconfigureRequest, + ) -> Result { + self.replace_result.clone() + } + + fn pipeline_details( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + ) -> Result, ControlPlaneError> { + Ok(None) + } + + fn rollout_status( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _rollout_id: &str, + ) -> Result, ControlPlaneError> { + self.rollout_status_result.clone() + } + + fn shutdown_status( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _shutdown_id: &str, + ) -> Result, ControlPlaneError> { + self.shutdown_status_result.clone() + } + } + + fn test_app_state(controller: Arc) -> AppState { + let metrics_registry = TelemetryRegistryHandle::new(); + let observed_state_store = + ObservedStateStore::new(&ObservedStateSettings::default(), metrics_registry.clone()); + + AppState { + observed_state_store: observed_state_store.handle(), + metrics_registry, + controller, + log_tap: None, + memory_pressure_state: MemoryPressureState::default(), + } + } + + fn request() -> crate::ReconfigureRequest { + crate::ReconfigureRequest { + pipeline: serde_json::from_value(json!({ + "type": "otap", + "nodes": { + "recv": { + "type": "receiver:fake", + "config": {} + } + } + })) + .expect("fixture pipeline should deserialize"), + step_timeout_secs: 60, + drain_timeout_secs: 60, + } + } + + fn rollout_status(state: PipelineRolloutState) -> RolloutStatus { + serde_json::from_value(json!({ + "rolloutId": "rollout-1", + "pipelineGroupId": "default", + "pipelineId": "main", + "action": "replace", + "state": state, + "targetGeneration": 1, + "previousGeneration": 0, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:01Z", + "cores": [] + })) + .expect("fixture rollout status should deserialize") + } + + fn shutdown_status(state: &str) -> ShutdownStatus { + serde_json::from_value(json!({ + "shutdownId": "shutdown-1", + "pipelineGroupId": "default", + "pipelineId": "main", + "state": state, + "startedAt": "2026-01-01T00:00:00Z", + "updatedAt": "2026-01-01T00:00:01Z", + "cores": [] + })) + .expect("fixture shutdown status should deserialize") + } + + /// Scenario: the control plane rejects a pipeline reconfigure request + /// before rollout work starts. + /// Guarantees: the admin handler converts that rejection into a structured + /// operation-error body with the expected HTTP status. + #[tokio::test] + async fn put_pipeline_returns_operation_error_body_on_invalid_request() { + let response = put_pipeline( + Path(("default".to_string(), "main".to_string())), + Query(WaitParams { + wait: false, + timeout_secs: 60, + }), + State(test_app_state(Arc::new(StubControlPlane { + replace_result: Err(ControlPlaneError::InvalidRequest { + message: "invalid candidate".to_string(), + }), + rollout_status_result: Ok(None), + shutdown_result: Ok(shutdown_status("succeeded")), + shutdown_status_result: Ok(None), + }))), + Json(request()), + ) + .await + .into_response(); + + assert_eq!(response.status(), StatusCode::UNPROCESSABLE_ENTITY); + let body = to_bytes(response.into_body(), usize::MAX) + .await + .expect("body should collect"); + let error: OperationError = + serde_json::from_slice(&body).expect("error body should deserialize"); + assert_eq!(error.kind, OperationErrorKind::InvalidRequest); + assert_eq!(error.message.as_deref(), Some("invalid candidate")); + } + + /// Scenario: a waited pipeline reconfigure request times out and the + /// control plane can still report the latest rollout snapshot. + /// Guarantees: the admin handler returns HTTP 504 with that rollout status + /// body instead of dropping the operation context. + #[tokio::test] + async fn put_pipeline_timeout_returns_latest_rollout_status_snapshot() { + let response = put_pipeline( + Path(("default".to_string(), "main".to_string())), + Query(WaitParams { + wait: true, + timeout_secs: 0, + }), + State(test_app_state(Arc::new(StubControlPlane { + replace_result: Ok(rollout_status(PipelineRolloutState::Running)), + rollout_status_result: Ok(Some(rollout_status(PipelineRolloutState::Running))), + shutdown_result: Ok(shutdown_status("succeeded")), + shutdown_status_result: Ok(None), + }))), + Json(request()), + ) + .await + .into_response(); + + assert_eq!(response.status(), StatusCode::GATEWAY_TIMEOUT); + let body = to_bytes(response.into_body(), usize::MAX) + .await + .expect("body should collect"); + let status: RolloutStatus = + serde_json::from_slice(&body).expect("timeout body should deserialize"); + assert_eq!(status.rollout_id, "rollout-1"); + assert_eq!(status.state, PipelineRolloutState::Running); + } + + /// Scenario: a pipeline shutdown request collides with an active rollout + /// for the same logical pipeline. + /// Guarantees: the admin handler returns a typed conflict body so callers + /// can distinguish request rejection from shutdown progress. + #[tokio::test] + async fn shutdown_pipeline_returns_operation_error_body_on_conflict() { + let response = shutdown_pipeline( + Path(("default".to_string(), "main".to_string())), + Query(WaitParams { + wait: false, + timeout_secs: 60, + }), + State(test_app_state(Arc::new(StubControlPlane { + replace_result: Ok(rollout_status(PipelineRolloutState::Succeeded)), + rollout_status_result: Ok(None), + shutdown_result: Err(ControlPlaneError::RolloutConflict), + shutdown_status_result: Ok(None), + }))), + ) + .await + .into_response(); + + assert_eq!(response.status(), StatusCode::CONFLICT); + let body = to_bytes(response.into_body(), usize::MAX) + .await + .expect("body should collect"); + let error: OperationError = + serde_json::from_slice(&body).expect("error body should deserialize"); + assert_eq!(error.kind, OperationErrorKind::Conflict); + assert_eq!(error.message, None); + } + + /// Scenario: a waited pipeline shutdown request times out while the control + /// plane still has a current shutdown snapshot. + /// Guarantees: the admin handler responds with HTTP 504 and the latest + /// shutdown status body for follow-up polling. + #[tokio::test] + async fn shutdown_pipeline_timeout_returns_latest_status_snapshot() { + let response = shutdown_pipeline( + Path(("default".to_string(), "main".to_string())), + Query(WaitParams { + wait: true, + timeout_secs: 0, + }), + State(test_app_state(Arc::new(StubControlPlane { + replace_result: Ok(rollout_status(PipelineRolloutState::Succeeded)), + rollout_status_result: Ok(None), + shutdown_result: Ok(shutdown_status("running")), + shutdown_status_result: Ok(Some(shutdown_status("running"))), + }))), + ) + .await + .into_response(); + + assert_eq!(response.status(), StatusCode::GATEWAY_TIMEOUT); + let body = to_bytes(response.into_body(), usize::MAX) + .await + .expect("body should collect"); + let status: ShutdownStatus = + serde_json::from_slice(&body).expect("timeout body should deserialize"); + assert_eq!(status.shutdown_id, "shutdown-1"); + assert_eq!(status.state, "running"); + } + + /// Scenario: a caller asks for a rollout status id that is no longer + /// available from the control plane. + /// Guarantees: the admin handler returns HTTP 404 so evicted rollout + /// history is observable as not found. + #[tokio::test] + async fn show_rollout_returns_not_found_when_status_is_missing() { + let response = show_rollout( + Path(( + "default".to_string(), + "main".to_string(), + "rollout-1".to_string(), + )), + State(test_app_state(Arc::new(StubControlPlane { + replace_result: Ok(rollout_status(PipelineRolloutState::Succeeded)), + rollout_status_result: Ok(None), + shutdown_result: Ok(shutdown_status("succeeded")), + shutdown_status_result: Ok(None), + }))), + ) + .await + .into_response(); + + assert_eq!(response.status(), StatusCode::NOT_FOUND); + } + + /// Scenario: a caller asks for a shutdown status id that is no longer + /// available from the control plane. + /// Guarantees: the admin handler returns HTTP 404 so evicted shutdown + /// history is observable as not found. + #[tokio::test] + async fn show_shutdown_returns_not_found_when_status_is_missing() { + let response = show_shutdown( + Path(( + "default".to_string(), + "main".to_string(), + "shutdown-1".to_string(), + )), + State(test_app_state(Arc::new(StubControlPlane { + replace_result: Ok(rollout_status(PipelineRolloutState::Succeeded)), + rollout_status_result: Ok(None), + shutdown_result: Ok(shutdown_status("succeeded")), + shutdown_status_result: Ok(None), + }))), + ) + .await + .into_response(); + + assert_eq!(response.status(), StatusCode::NOT_FOUND); + } +} diff --git a/rust/otap-dataflow/crates/admin/src/pipeline_group.rs b/rust/otap-dataflow/crates/admin/src/pipeline_group.rs index e98efc3dcb..4a9de0909f 100644 --- a/rust/otap-dataflow/crates/admin/src/pipeline_group.rs +++ b/rust/otap-dataflow/crates/admin/src/pipeline_group.rs @@ -3,19 +3,19 @@ //! Pipeline group endpoints. //! -//! - GET `/api/v1/pipeline-groups/:id/pipelines` - list active pipelines and their status (ToDo) -//! - POST `/api/v1/pipeline-groups/shutdown` - shutdown all pipelines in all groups +//! - GET `/api/v1/groups/:id/pipelines` - list active pipelines and their status (ToDo) +//! - POST `/api/v1/groups/shutdown` - shutdown all pipelines in all groups //! - Query parameters: //! - `wait` (bool, default: false) - if true, block until all pipelines have stopped //! - `timeout_secs` (u64, default: 60) - maximum seconds to wait when `wait=true` //! //! Example (fire-and-forget): //! ```sh -//! curl -X POST http://localhost:8080/api/v1/pipeline-groups/shutdown +//! curl -X POST http://localhost:8080/api/v1/groups/shutdown //! ``` //! Example (wait for graceful shutdown with 30s timeout): //! ```sh -//! curl -X POST "http://localhost:8080/api/v1/pipeline-groups/shutdown?wait=true&timeout_secs=30" +//! curl -X POST "http://localhost:8080/api/v1/groups/shutdown?wait=true&timeout_secs=30" //! ``` //! //! - 200 OK if `wait=true` and all pipelines stopped successfully @@ -37,8 +37,8 @@ use axum::routing::{get, post}; use axum::{Json, Router}; use chrono::Utc; use otap_df_admin_types::{ + groups::{ShutdownResponse, ShutdownStatus, Status as GroupsStatus}, operations::OperationOptions, - pipeline_groups::{ShutdownResponse, ShutdownStatus, Status as PipelineGroupsStatus}, }; use otap_df_telemetry::otel_info; use std::time::{Duration, Instant}; @@ -47,16 +47,14 @@ use std::time::{Duration, Instant}; pub(crate) fn routes() -> Router { Router::new() // Returns a summary of all pipelines and their statuses. - .route("/pipeline-groups/status", get(show_status)) + .route("/groups/status", get(show_status)) // Shutdown all pipelines in all groups. - .route("/pipeline-groups/shutdown", post(shutdown_all_pipelines)) + .route("/groups/shutdown", post(shutdown_all_pipelines)) // ToDo Global liveness and readiness probes. } -pub async fn show_status( - State(state): State, -) -> Result, StatusCode> { - Ok(Json(PipelineGroupsStatus { +pub async fn show_status(State(state): State) -> Result, StatusCode> { + Ok(Json(GroupsStatus { generated_at: Utc::now().to_rfc3339(), pipelines: json_shape(&state.observed_state_store.snapshot()), })) @@ -74,31 +72,13 @@ async fn shutdown_all_pipelines( timeout_secs = params.timeout_secs ); - // Send shutdown message to all pipelines - let errors: Vec<_> = (*state.ctrl_msg_senders.lock().await) - .drain(..) - .filter_map(|sender| { - // Use the timeout from params for the shutdown deadline - let deadline = Instant::now() + Duration::from_secs(params.timeout_secs); - sender - .try_send_shutdown( - deadline, - "Shutdown requested via the `/api/v1/pipeline-groups/shutdown` endpoint." - .to_owned(), - ) - .err() - }) - .map(|e| e.to_string()) - .collect(); - - // If there were errors sending shutdown messages, return immediately - if !errors.is_empty() { - otel_info!("shutdown.failed", error_count = errors.len()); + if let Err(err) = state.controller.shutdown_all(params.timeout_secs) { + otel_info!("shutdown.failed", error = ?err); return ( StatusCode::INTERNAL_SERVER_ERROR, Json(ShutdownResponse { status: ShutdownStatus::Failed, - errors: Some(errors), + errors: Some(vec![format!("{err:?}")]), duration_ms: Some(start_time.elapsed().as_millis() as u64), }), ); diff --git a/rust/otap-dataflow/crates/admin/src/telemetry.rs b/rust/otap-dataflow/crates/admin/src/telemetry.rs index 8e014a9fa7..87b1748d35 100644 --- a/rust/otap-dataflow/crates/admin/src/telemetry.rs +++ b/rust/otap-dataflow/crates/admin/src/telemetry.rs @@ -5,11 +5,12 @@ //! //! - /api/v1/telemetry/live-schema - current semantic conventions registry //! - /api/v1/telemetry/logs - retained internal logs from the in-memory log tap +//! - /api/v1/telemetry/logs/stream - live internal log stream over WebSocket //! - /api/v1/telemetry/metrics - current aggregated metrics in JSON, line protocol, or Prometheus text format //! - /api/v1/telemetry/metrics/aggregate - aggregated metrics grouped by metric set name and optional attributes use crate::AppState; -use crate::convert::{convert_attribute_value, json_shape}; +use crate::convert::json_shape; use axum::extract::ws::{Message, WebSocket, WebSocketUpgrade}; use axum::extract::{Query, State}; use axum::http::{StatusCode, header}; @@ -27,7 +28,7 @@ use otap_df_telemetry::self_tracing::format_log_record_to_string; use otap_df_telemetry::semconv::SemConvRegistry; use serde::{Deserialize, Serialize}; use std::collections::hash_map::Entry; -use std::collections::{BTreeMap, HashMap, HashSet}; +use std::collections::{HashMap, HashSet}; use std::fmt::Write as _; use std::sync::Arc; use tokio::sync::broadcast; @@ -147,8 +148,40 @@ struct AggregateGroup { metrics: HashMap, } -fn logs_response(registry: &TelemetryRegistryHandle, result: LogQueryResult) -> api::LogsResponse { - api::LogsResponse { +#[derive(Serialize)] +pub(crate) struct LogsResponse { + oldest_seq: Option, + newest_seq: Option, + next_seq: u64, + truncated_before_seq: Option, + dropped_on_ingest: u64, + dropped_on_retention: u64, + retained_bytes: usize, + logs: Vec, +} + +#[derive(Serialize)] +struct LogEntry { + seq: u64, + timestamp: String, + level: String, + target: String, + event_name: String, + file: Option, + line: Option, + rendered: String, + contexts: Vec, +} + +#[derive(Serialize)] +struct ResolvedLogContext { + entity_key: String, + schema_name: Option, + attributes: HashMap, +} + +fn logs_response(registry: &TelemetryRegistryHandle, result: LogQueryResult) -> LogsResponse { + LogsResponse { oldest_seq: result.oldest_seq, newest_seq: result.newest_seq, next_seq: result.next_seq, @@ -164,9 +197,9 @@ fn logs_response(registry: &TelemetryRegistryHandle, result: LogQueryResult) -> } } -fn render_log_entry(registry: &TelemetryRegistryHandle, entry: &RetainedLogEvent) -> api::LogEntry { +fn render_log_entry(registry: &TelemetryRegistryHandle, entry: &RetainedLogEvent) -> LogEntry { let callsite = entry.event.record.callsite(); - api::LogEntry { + LogEntry { seq: entry.seq, timestamp: chrono::DateTime::::from(entry.event.time).to_rfc3339(), level: callsite.level().to_string(), @@ -186,25 +219,25 @@ fn render_log_message(event: &LogEvent) -> String { fn resolve_log_contexts( registry: &TelemetryRegistryHandle, event: &LogEvent, -) -> Vec { +) -> Vec { event .record .context .iter() .map(|entity_key| { registry - .visit_entity(*entity_key, |attrs| api::ResolvedLogContext { + .visit_entity(*entity_key, |attrs| ResolvedLogContext { entity_key: format!("{entity_key:?}"), schema_name: Some(attrs.schema_name().to_string()), attributes: attrs .iter_attributes() - .map(|(key, value)| (key.to_string(), convert_attribute_value(value))) + .map(|(key, value)| (key.to_string(), value.clone())) .collect(), }) - .unwrap_or_else(|| api::ResolvedLogContext { + .unwrap_or_else(|| ResolvedLogContext { entity_key: format!("{entity_key:?}"), schema_name: None, - attributes: BTreeMap::new(), + attributes: HashMap::new(), }) }) .collect() @@ -233,7 +266,10 @@ pub async fn get_logs( after: q.after, limit, }); - Ok(Json(logs_response(&state.metrics_registry, result))) + Ok(Json(json_shape(&logs_response( + &state.metrics_registry, + result, + )))) } /// Handler for the `/api/v1/telemetry/metrics` endpoint. @@ -1242,7 +1278,7 @@ fn escape_prom_help(s: &str) -> String { // WebSocket live log stream (/api/v1/telemetry/logs/stream) // --------------------------------------------------------------------------- -/// Map a level string to a numeric severity (TRACE=0 … ERROR=4). +/// Map a level string to a numeric severity (TRACE=0 through ERROR=4). /// Unknown levels are treated as TRACE (lowest severity). /// /// Uses ASCII-only comparison to avoid allocating a temporary uppercase string. @@ -1289,7 +1325,7 @@ struct LogFilter { impl LogFilter { /// Returns `true` when the rendered log entry passes all active criteria. - fn matches(&self, entry: &api::LogEntry) -> bool { + fn matches(&self, entry: &LogEntry) -> bool { if let Some(min_ts) = &self.minimum_timestamp { if let Ok(ts) = chrono::DateTime::parse_from_rfc3339(&entry.timestamp) { if ts.with_timezone(&chrono::Utc) < *min_ts { @@ -1331,7 +1367,7 @@ impl LogFilter { /// /// Checks `minimum_level` and `minimum_timestamp` without rendering the /// entry, so we can skip the more expensive `render_log_entry()` call for - /// events that would be rejected anyway. `search_query` is intentionally + /// events that would be rejected anyway. `search_query` is intentionally /// not checked here because it operates on the rendered text. fn prefilter_raw(&self, event: &RetainedLogEvent) -> bool { if let Some(min_ts) = &self.minimum_timestamp { @@ -1373,16 +1409,16 @@ impl LogFilter { } } -/// Client → server WebSocket messages. +/// Client to server WebSocket messages. #[derive(Deserialize)] #[serde(tag = "type", rename_all = "camelCase")] enum WsClientMsg { - /// Begin streaming. Sends an initial retained-log snapshot, then follows + /// Begin streaming. Sends an initial retained-log snapshot, then follows /// with live events. Subscribe { /// Cursor: only include retained entries strictly newer than this seq. after: Option, - /// Maximum retained entries in the initial snapshot (clamped 1–1000). + /// Maximum retained entries in the initial snapshot (clamped 1-1000). limit: Option, /// Case-insensitive text filter (applied server-side). #[serde(rename = "searchQuery")] @@ -1419,7 +1455,7 @@ enum WsClientMsg { Ping, } -/// Server → client WebSocket messages. +/// Server to client WebSocket messages. #[derive(Serialize)] #[serde(tag = "type", rename_all = "snake_case")] enum WsServerMsg { @@ -1432,12 +1468,12 @@ enum WsServerMsg { dropped_on_ingest: u64, dropped_on_retention: u64, retained_bytes: usize, - logs: Vec, + logs: Vec, }, /// Single live log entry pushed by the server. Log { #[serde(flatten)] - entry: api::LogEntry, + entry: LogEntry, }, /// Current pause state and cursor position. State { paused: bool, next_seq: u64 }, @@ -1501,10 +1537,10 @@ async fn ws_send_snapshot( /// 2. The server sends the initial retained-log snapshot, then streams live /// events via `log` messages. /// 3. `pause` / `resume` toggle server-side forwarding without closing the -/// socket. While paused the server still drains the broadcast channel so +/// socket. While paused the server still drains the broadcast channel so /// that the producer is never slowed by this client. /// 4. On `backfill` the server re-queries the retained ring buffer and sends a -/// `snapshot`. The cursor is updated so subsequent live events do not +/// `snapshot`. The cursor is updated so subsequent live events do not /// duplicate. /// 5. If the client falls more than `SUBSCRIBER_CHANNEL_CAPACITY` events /// behind, the broadcast channel drops the overflow; the server notifies the @@ -1529,8 +1565,8 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { let mut paused = false; let mut filter = LogFilter::default(); // Tracks the sequence number of the last event we acknowledged (sent or - // deliberately skipped while paused). Used in `state` replies so the - // client knows where the live cursor stands. + // deliberately skipped while paused). Used in `state` replies so the client + // knows where the live cursor stands. let mut cursor: u64 = 0; loop { @@ -1562,11 +1598,9 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { Ok(WsClientMsg::Backfill { after, limit }) => { let limit = limit.unwrap_or(100).clamp(1, 1000); let result = log_tap.query(LogQuery { after, limit }); - // Only advance cursor — never move it backward. A - // client may request an older `after` (e.g. a lag - // gap backfill) while the live stream has already - // moved the cursor forward; preserving the maximum - // keeps the dedup guard in the live event arm sound. + // Only advance cursor; never move it backward. A client may + // request an older `after` (e.g. a lag gap backfill) while the + // live stream has already moved the cursor forward. cursor = cursor.max(result.next_seq); if !ws_send_snapshot(&mut ws, registry, result, &filter).await { break; @@ -1586,7 +1620,7 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { } } Some(Ok(Message::Close(_))) | None => break, - Some(Ok(_)) => {} // binary / ping frames — ignore + Some(Ok(_)) => {} // binary / ping frames; ignore Some(Err(_)) => break, } } @@ -1600,7 +1634,7 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { // were already delivered in the most recent snapshot // or backfill (the subscribe-before-query race window). if entry_seq <= cursor { - // Discard silently — already in the snapshot. + // Discard silently; already in the snapshot. } else { // Advance cursor so `state` replies are accurate // even when paused or filtered. @@ -1627,8 +1661,8 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { } Err(broadcast::error::RecvError::Lagged(n)) => { // The client was too slow; events were dropped from its - // receiver slot. `cursor` here is the last seq we - // successfully delivered — the client can use it as + // receiver slot. `cursor` here is the last seq we + // successfully delivered; the client can use it as // the `after` param for a backfill to recover the gap. let msg = WsServerMsg::Error { message: format!( @@ -1661,7 +1695,7 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { { // Subscribe to the broadcast channel BEFORE querying // retained logs so we do not miss events recorded between - // the query and the first receive. Live events with + // the query and the first receive. Live events with // seq <= cursor (set from snapshot.next_seq below) are // silently discarded in the live_event arm to prevent // duplicates for that race window. @@ -1703,13 +1737,75 @@ async fn handle_ws_logs(mut ws: WebSocket, state: AppState) { #[cfg(test)] mod tests { use super::*; - use axum::body::to_bytes; + use crate::{ + ControlPlane, ControlPlaneError, PipelineDetails, ReconfigureRequest, RolloutStatus, + ShutdownStatus, + }; + use axum::body::{Body, to_bytes}; use otap_df_config::observed_state::ObservedStateSettings; use otap_df_engine::memory_limiter::MemoryPressureState; use otap_df_state::store::ObservedStateStore; use otap_df_telemetry::descriptor::{Instrument, MetricsField, Temporality}; use std::sync::Arc; - use tokio::sync::Mutex; + use tower::ServiceExt; + + struct NoopControlPlane; + + impl ControlPlane for NoopControlPlane { + fn shutdown_all(&self, _timeout_secs: u64) -> Result<(), ControlPlaneError> { + Err(ControlPlaneError::Internal { + message: "not used in telemetry tests".to_string(), + }) + } + + fn shutdown_pipeline( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _timeout_secs: u64, + ) -> Result { + Err(ControlPlaneError::Internal { + message: "not used in telemetry tests".to_string(), + }) + } + + fn reconfigure_pipeline( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _request: ReconfigureRequest, + ) -> Result { + Err(ControlPlaneError::Internal { + message: "not used in telemetry tests".to_string(), + }) + } + + fn pipeline_details( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + ) -> Result, ControlPlaneError> { + Ok(None) + } + + fn rollout_status( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _rollout_id: &str, + ) -> Result, ControlPlaneError> { + Ok(None) + } + + fn shutdown_status( + &self, + _pipeline_group_id: &str, + _pipeline_id: &str, + _shutdown_id: &str, + ) -> Result, ControlPlaneError> { + Ok(None) + } + } static TEST_METRICS_DESCRIPTOR: MetricsDescriptor = MetricsDescriptor { name: "test_metrics", @@ -1753,8 +1849,8 @@ mod tests { AppState { observed_state_store: observed_state_store.handle(), metrics_registry, + controller: Arc::new(NoopControlPlane), log_tap: None, - ctrl_msg_senders: Arc::new(Mutex::new(Vec::new())), memory_pressure_state: MemoryPressureState::default(), } } @@ -1789,6 +1885,22 @@ mod tests { ); } + #[tokio::test] + async fn telemetry_routes_include_logs_stream_websocket_endpoint() { + let response = routes() + .with_state(test_app_state()) + .oneshot( + axum::http::Request::builder() + .uri("/telemetry/logs/stream") + .body(Body::empty()) + .expect("request should build"), + ) + .await + .expect("route should respond"); + + assert_ne!(response.status(), StatusCode::NOT_FOUND); + } + /// Ensures aggregate group ordering is deterministic: metric-set name first, /// then metric count when names are equal. #[test] @@ -2218,8 +2330,8 @@ mod tests { // LogFilter unit tests // --------------------------------------------------------------------------- - fn make_log_entry(rendered: &str, level: &str, target: &str, timestamp: &str) -> api::LogEntry { - api::LogEntry { + fn make_log_entry(rendered: &str, level: &str, target: &str, timestamp: &str) -> LogEntry { + LogEntry { seq: 1, timestamp: timestamp.to_string(), level: level.to_string(), @@ -2299,11 +2411,9 @@ mod tests { "2026-01-01T00:00:01Z", ); - // The filter must pass the matching entry and reject the other. assert!(filter.matches(&match_entry)); assert!(!filter.matches(&no_match_entry)); - // Simulate the retain() call used in ws_send_snapshot. let mut logs = vec![match_entry, no_match_entry]; logs.retain(|e| filter.matches(e)); assert_eq!(logs.len(), 1); @@ -2312,7 +2422,6 @@ mod tests { #[test] fn level_severity_ordering_is_correct() { - // TRACE < DEBUG < INFO < WARN < ERROR assert!(level_severity("TRACE") < level_severity("DEBUG")); assert!(level_severity("DEBUG") < level_severity("INFO")); assert!(level_severity("INFO") < level_severity("WARN")); @@ -2361,7 +2470,6 @@ mod tests { #[test] fn log_filter_minimum_level_and_search_query_combine() { - // Both constraints must pass. let filter = LogFilter::from_params( Some("critical".to_string()), None, @@ -2376,28 +2484,35 @@ mod tests { } // --------------------------------------------------------------------------- - // WebSocket ↔ HTTP schema alignment tests + // WebSocket / HTTP schema alignment tests // --------------------------------------------------------------------------- #[test] fn ws_log_msg_serializes_same_fields_as_api_log_entry() { let entry = make_log_entry("hello", "INFO", "admin", "2026-01-01T00:00:00Z"); - let msg = WsServerMsg::Log { - entry: entry.clone(), - }; + let expected_seq = entry.seq; + let expected_timestamp = entry.timestamp.clone(); + let expected_level = entry.level.clone(); + let expected_target = entry.target.clone(); + let expected_event_name = entry.event_name.clone(); + let expected_rendered = entry.rendered.clone(); + let msg = WsServerMsg::Log { entry }; let json: serde_json::Value = serde_json::to_value(&msg).unwrap(); let obj = json.as_object().unwrap(); - // The flattened entry must carry the same fields as api::LogEntry - // plus the discriminator tag. assert_eq!(obj.get("type").unwrap(), "log"); - assert_eq!(obj.get("seq").unwrap(), entry.seq); - assert_eq!(obj.get("timestamp").unwrap(), &entry.timestamp); - assert_eq!(obj.get("level").unwrap(), &entry.level); - assert_eq!(obj.get("target").unwrap(), &entry.target); - assert_eq!(obj.get("event_name").unwrap(), &entry.event_name); - assert_eq!(obj.get("rendered").unwrap(), &entry.rendered); + assert_eq!(obj.get("seq").unwrap(), expected_seq); + assert_eq!(obj.get("timestamp").unwrap(), &expected_timestamp); + assert_eq!(obj.get("level").unwrap(), &expected_level); + assert_eq!(obj.get("target").unwrap(), &expected_target); + assert_eq!(obj.get("event_name").unwrap(), &expected_event_name); + assert_eq!(obj.get("rendered").unwrap(), &expected_rendered); assert!(obj.contains_key("contexts")); + + let roundtrip: api::LogEntry = + serde_json::from_value(json).expect("log message should match api::LogEntry shape"); + assert_eq!(roundtrip.seq, 1); + assert_eq!(roundtrip.rendered, "hello"); } #[test] @@ -2417,7 +2532,6 @@ mod tests { let logs = json.get("logs").unwrap().as_array().unwrap(); assert_eq!(logs.len(), 1); - // Each log in the snapshot must deserialize as a valid api::LogEntry. let roundtrip: api::LogEntry = serde_json::from_value(logs[0].clone()) .expect("snapshot log should match api::LogEntry"); assert_eq!(roundtrip.seq, 1); diff --git a/rust/otap-dataflow/crates/channel/src/mpmc.rs b/rust/otap-dataflow/crates/channel/src/mpmc.rs index c6456b8079..cd9a841d1d 100644 --- a/rust/otap-dataflow/crates/channel/src/mpmc.rs +++ b/rust/otap-dataflow/crates/channel/src/mpmc.rs @@ -350,6 +350,14 @@ impl Receiver { let state = self.channel.state.borrow(); state.buffer.is_empty() } + + /// Checks whether the channel has been closed and will accept no further + /// sends. + #[must_use] + pub fn is_closed(&self) -> bool { + let state = self.channel.state.borrow(); + state.is_closed + } } struct SendFuture { diff --git a/rust/otap-dataflow/crates/channel/src/mpsc.rs b/rust/otap-dataflow/crates/channel/src/mpsc.rs index e5f3b558e4..86c7faea55 100644 --- a/rust/otap-dataflow/crates/channel/src/mpsc.rs +++ b/rust/otap-dataflow/crates/channel/src/mpsc.rs @@ -342,6 +342,14 @@ impl Receiver { let state = self.channel.state.borrow(); state.buffer.is_empty() } + + /// Checks whether the channel has been closed and will accept no further + /// sends. + #[must_use] + pub fn is_closed(&self) -> bool { + let state = self.channel.state.borrow(); + state.is_closed + } } struct SendFuture { diff --git a/rust/otap-dataflow/crates/config/src/engine/resolve.rs b/rust/otap-dataflow/crates/config/src/engine/resolve.rs index 8865c99896..70200e33ef 100644 --- a/rust/otap-dataflow/crates/config/src/engine/resolve.rs +++ b/rust/otap-dataflow/crates/config/src/engine/resolve.rs @@ -76,6 +76,59 @@ pub struct ResolvedPipelineConfig { pub role: ResolvedPipelineRole, } +impl ResolvedPipelineConfig { + /// Compares two resolved pipelines for exact runtime equivalence. + /// + /// Logical identity is intentionally ignored here; callers compare two + /// candidate snapshots for the same logical pipeline and only care whether + /// runtime-relevant config and resolved policies match. + #[must_use] + pub fn runtime_matches(&self, other: &Self) -> bool { + let Self { + pipeline_group_id: _, + pipeline_id: _, + pipeline: self_pipeline, + policies: self_policies, + role: self_role, + } = self; + let Self { + pipeline_group_id: _, + pipeline_id: _, + pipeline: other_pipeline, + policies: other_policies, + role: other_role, + } = other; + + self_role == other_role + && self_pipeline == other_pipeline + && self_policies == other_policies + } + + /// Compares two resolved pipelines while ignoring resource-only policy + /// differences used by resize classification. + #[must_use] + pub fn runtime_shape_matches_ignoring_resources(&self, other: &Self) -> bool { + let Self { + pipeline_group_id: _, + pipeline_id: _, + pipeline: self_pipeline, + policies: self_policies, + role: self_role, + } = self; + let Self { + pipeline_group_id: _, + pipeline_id: _, + pipeline: other_pipeline, + policies: other_policies, + role: other_role, + } = other; + + self_role == other_role + && self_pipeline.eq_ignoring_policies(other_pipeline) + && self_policies.eq_ignoring_resources(other_policies) + } +} + impl OtelDataflowSpec { /// Resolves and materializes policies once for all pipelines. /// @@ -173,3 +226,127 @@ impl OtelDataflowSpec { self.topics.get(topic_name).cloned() } } + +#[cfg(test)] +mod tests { + use super::{ResolvedPipelineConfig, ResolvedPipelineRole}; + use crate::pipeline::PipelineConfig; + use crate::policy::{CoreAllocation, ResolvedPolicies, ResourcesPolicy, TelemetryPolicy}; + + #[test] + fn runtime_shape_matches_ignoring_resources_ignores_resource_only_changes() { + let current = ResolvedPipelineConfig { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + pipeline: PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("current pipeline should parse"), + policies: ResolvedPolicies { + resources: ResourcesPolicy { + core_allocation: CoreAllocation::core_count(1), + memory_limiter: None, + }, + ..ResolvedPolicies::default() + }, + role: ResolvedPipelineRole::Regular, + }; + let candidate = ResolvedPipelineConfig { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + pipeline: PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 2 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("candidate pipeline should parse"), + policies: ResolvedPolicies { + resources: ResourcesPolicy { + core_allocation: CoreAllocation::core_count(2), + memory_limiter: None, + }, + ..ResolvedPolicies::default() + }, + role: ResolvedPipelineRole::Regular, + }; + + assert!(!current.runtime_matches(&candidate)); + assert!(current.runtime_shape_matches_ignoring_resources(&candidate)); + } + + #[test] + fn runtime_shape_matches_ignoring_resources_detects_runtime_policy_change() { + let current = ResolvedPipelineConfig { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + pipeline: PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("current pipeline should parse"), + policies: ResolvedPolicies::default(), + role: ResolvedPipelineRole::Regular, + }; + let candidate = ResolvedPipelineConfig { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + pipeline: current.pipeline.clone(), + policies: ResolvedPolicies { + telemetry: TelemetryPolicy { + pipeline_metrics: false, + ..TelemetryPolicy::default() + }, + ..ResolvedPolicies::default() + }, + role: ResolvedPipelineRole::Regular, + }; + + assert!(!current.runtime_shape_matches_ignoring_resources(&candidate)); + } +} diff --git a/rust/otap-dataflow/crates/config/src/lib.rs b/rust/otap-dataflow/crates/config/src/lib.rs index 107fdee33c..55711bc009 100644 --- a/rust/otap-dataflow/crates/config/src/lib.rs +++ b/rust/otap-dataflow/crates/config/src/lib.rs @@ -153,7 +153,7 @@ impl Serialize for PipelineKey { } /// Unique key for identifying a pipeline running on a specific core. -#[derive(Debug, Clone, Serialize)] +#[derive(Debug, Clone, Serialize, PartialEq, Eq, Hash)] pub struct DeployedPipelineKey { /// The unique ID of the pipeline group the pipeline belongs to. pub pipeline_group_id: PipelineGroupId, @@ -163,4 +163,11 @@ pub struct DeployedPipelineKey { /// The CPU core ID the pipeline is pinned to. pub core_id: CoreId, + + /// Monotonic deployment generation for this logical pipeline. + /// + /// Generation `0` is the initial startup deployment. Higher generations are + /// created by live reconfiguration rollouts. + #[serde(default)] + pub deployment_generation: u64, } diff --git a/rust/otap-dataflow/crates/config/src/pipeline.rs b/rust/otap-dataflow/crates/config/src/pipeline.rs index 065bfe2971..2e56d2977b 100644 --- a/rust/otap-dataflow/crates/config/src/pipeline.rs +++ b/rust/otap-dataflow/crates/config/src/pipeline.rs @@ -510,6 +510,34 @@ impl FromIterator<(NodeId, Arc)> for PipelineNodes { } impl PipelineConfig { + /// Compares two pipeline configs while intentionally ignoring the optional + /// pipeline-level policies block. + /// + /// This keeps the "what is pipeline shape vs. what is policy" decision next + /// to the struct definition so new fields require an explicit choice. + #[must_use] + pub fn eq_ignoring_policies(&self, other: &Self) -> bool { + let Self { + r#type: self_type, + policies: _, + nodes: self_nodes, + extensions: self_extensions, + connections: self_connections, + } = self; + let Self { + r#type: other_type, + policies: _, + nodes: other_nodes, + extensions: other_extensions, + connections: other_connections, + } = other; + + self_type == other_type + && self_nodes == other_nodes + && self_extensions == other_extensions + && self_connections == other_connections + } + /// Create a new [`PipelineConfig`] from a JSON string. pub fn from_json( pipeline_group_id: PipelineGroupId, @@ -1379,6 +1407,99 @@ mod tests { use crate::pipeline::{PipelineConfigBuilder, PipelineType}; use serde_json::json; + #[test] + fn eq_ignoring_policies_ignores_policy_only_changes() { + let current = super::PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("current pipeline should parse"); + let candidate = super::PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 2 + telemetry: + pipeline_metrics: false +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("candidate pipeline should parse"); + + assert_ne!(current, candidate); + assert!(current.eq_ignoring_policies(&candidate)); + } + + #[test] + fn eq_ignoring_policies_detects_topology_change() { + let current = super::PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("current pipeline should parse"); + let candidate = super::PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +nodes: + input: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: exporter +"#, + ) + .expect("candidate pipeline should parse"); + + assert!(!current.eq_ignoring_policies(&candidate)); + } + #[test] fn test_duplicate_node_errors() { let result = PipelineConfigBuilder::new() diff --git a/rust/otap-dataflow/crates/config/src/policy.rs b/rust/otap-dataflow/crates/config/src/policy.rs index d5309cb598..c3092d7052 100644 --- a/rust/otap-dataflow/crates/config/src/policy.rs +++ b/rust/otap-dataflow/crates/config/src/policy.rs @@ -206,6 +206,33 @@ pub struct ResolvedPolicies { /// (opt-in only -- no headers are captured or propagated by default). pub transport_headers: Option, } + +impl ResolvedPolicies { + /// Compares resolved policies while intentionally ignoring the resources + /// policy, which controls placement and scaling rather than runtime shape. + #[must_use] + pub fn eq_ignoring_resources(&self, other: &Self) -> bool { + let Self { + channel_capacity: self_channel_capacity, + health: self_health, + telemetry: self_telemetry, + resources: _, + transport_headers: self_transport_headers, + } = self; + let Self { + channel_capacity: other_channel_capacity, + health: other_health, + telemetry: other_telemetry, + resources: _, + transport_headers: other_transport_headers, + } = other; + + self_channel_capacity == other_channel_capacity + && self_health == other_health + && self_telemetry == other_telemetry + && self_transport_headers == other_transport_headers + } +} /// instrumentation overhead. #[derive( Clone, Copy, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize, JsonSchema, @@ -595,6 +622,41 @@ mod tests { use super::{MemoryLimiterMode, MemoryLimiterPolicy, MemoryLimiterSource, Policies}; use std::time::Duration; + #[test] + fn resolved_policies_eq_ignoring_resources_ignores_resource_only_changes() { + let current = super::ResolvedPolicies { + resources: super::ResourcesPolicy { + core_allocation: super::CoreAllocation::core_count(1), + memory_limiter: None, + }, + ..super::ResolvedPolicies::default() + }; + let candidate = super::ResolvedPolicies { + resources: super::ResourcesPolicy { + core_allocation: super::CoreAllocation::core_count(2), + memory_limiter: None, + }, + ..super::ResolvedPolicies::default() + }; + + assert_ne!(current, candidate); + assert!(current.eq_ignoring_resources(&candidate)); + } + + #[test] + fn resolved_policies_eq_ignoring_resources_detects_runtime_policy_change() { + let current = super::ResolvedPolicies::default(); + let candidate = super::ResolvedPolicies { + telemetry: super::TelemetryPolicy { + pipeline_metrics: false, + ..super::TelemetryPolicy::default() + }, + ..super::ResolvedPolicies::default() + }; + + assert!(!current.eq_ignoring_resources(&candidate)); + } + #[test] fn defaults_match_expected_values() { let defaults = Policies::resolve([&Policies::default()]); diff --git a/rust/otap-dataflow/crates/controller/Cargo.toml b/rust/otap-dataflow/crates/controller/Cargo.toml index 2b0b198ddd..66cc251c10 100644 --- a/rust/otap-dataflow/crates/controller/Cargo.toml +++ b/rust/otap-dataflow/crates/controller/Cargo.toml @@ -30,3 +30,5 @@ tokio = { workspace = true } tokio-util = { workspace = true } flume = { workspace = true } smallvec = { workspace = true } +chrono = { workspace = true } +serde_json = { workspace = true } diff --git a/rust/otap-dataflow/crates/controller/src/lib.rs b/rust/otap-dataflow/crates/controller/src/lib.rs index 547ac54b37..ee635562f5 100644 --- a/rust/otap-dataflow/crates/controller/src/lib.rs +++ b/rust/otap-dataflow/crates/controller/src/lib.rs @@ -68,13 +68,14 @@ use otap_df_engine::ReceivedAtNode; use otap_df_engine::Unwindable; use otap_df_engine::context::{ControllerContext, PipelineContext}; use otap_df_engine::control::{ - PipelineCompletionMsgReceiver, PipelineCompletionMsgSender, RuntimeCtrlMsgReceiver, - RuntimeCtrlMsgSender, pipeline_completion_msg_channel, runtime_ctrl_msg_channel, + PipelineAdminSender, PipelineCompletionMsgReceiver, PipelineCompletionMsgSender, + RuntimeCtrlMsgReceiver, RuntimeCtrlMsgSender, pipeline_completion_msg_channel, + runtime_ctrl_msg_channel, }; use otap_df_engine::entity_context::{ node_entity_key, pipeline_entity_key, set_pipeline_entity_key, }; -use otap_df_engine::error::{Error as EngineError, error_summary_from}; +use otap_df_engine::error::Error as EngineError; use otap_df_engine::memory_limiter::{ EffectiveMemoryLimiter, MemoryLimiterTick, MemoryPressureBehaviorConfig, MemoryPressureChanged, MemoryPressureLevel, @@ -93,17 +94,25 @@ use otap_df_telemetry::{ }; use smallvec::smallvec; use std::collections::{HashMap, HashSet}; +use std::panic::{AssertUnwindSafe, catch_unwind}; use std::sync::Arc; use std::sync::mpsc as std_mpsc; use std::thread; +use std::time::Duration; /// Error types and helpers for the controller module. pub mod error; +mod live_control; /// Reusable startup helpers (validation, CLI overrides, system info). pub mod startup; /// Utilities to spawn async tasks on dedicated threads with graceful shutdown. pub mod thread_task; +use live_control::{ + ControllerRuntime, LaunchedPipelineThread, PanicReport, RuntimeInstanceError, + RuntimeInstanceExit, +}; + /// Controller for managing pipelines in a thread-per-core model. /// /// # Thread Safety @@ -272,6 +281,103 @@ impl, + pipeline_group_id: &PipelineGroupId, + pipeline_id: &PipelineId, + pipeline_cfg: &PipelineConfig, + ) -> Result<(), String> { + for (node_id, node_cfg) in pipeline_cfg.node_iter() { + let urn_str = node_cfg.r#type.as_str(); + let validate_config_fn = match node_cfg.kind() { + NodeKind::Receiver => pipeline_factory + .get_receiver_factory_map() + .get(urn_str) + .map(|factory| factory.validate_config), + NodeKind::Processor | NodeKind::ProcessorChain => pipeline_factory + .get_processor_factory_map() + .get(urn_str) + .map(|factory| factory.validate_config), + NodeKind::Exporter => pipeline_factory + .get_exporter_factory_map() + .get(urn_str) + .map(|factory| factory.validate_config), + NodeKind::Extension => { + // Extensions are not yet validated here because PipelineFactory + // does not expose an extension factory registry. + continue; + } + }; + + let Some(validate_fn) = validate_config_fn else { + let kind_name = match node_cfg.kind() { + NodeKind::Receiver => "receiver", + NodeKind::Processor | NodeKind::ProcessorChain => "processor", + NodeKind::Exporter => "exporter", + NodeKind::Extension => unreachable!("handled above"), + }; + return Err(format!( + "Unknown {} component `{}` in pipeline_group={} pipeline={} node={}", + kind_name, + urn_str, + pipeline_group_id.as_ref(), + pipeline_id.as_ref(), + node_id.as_ref() + )); + }; + + validate_fn(&node_cfg.config).map_err(|err| { + format!( + "Invalid config for component `{}` in pipeline_group={} pipeline={} node={}: {}", + urn_str, + pipeline_group_id.as_ref(), + pipeline_id.as_ref(), + node_id.as_ref(), + err + ) + })?; + } + Ok(()) + } + + /// Validates every configured pipeline and observability pipeline against registered components. + fn validate_engine_components_with_factory( + pipeline_factory: &'static PipelineFactory, + engine_cfg: &OtelDataflowSpec, + ) -> Result<(), String> { + for (pipeline_group_id, pipeline_group) in &engine_cfg.groups { + for (pipeline_id, pipeline_cfg) in &pipeline_group.pipelines { + Self::validate_pipeline_components_with_factory( + pipeline_factory, + pipeline_group_id, + pipeline_id, + pipeline_cfg, + )?; + } + } + + if let Some(obs_pipeline) = &engine_cfg.engine.observability.pipeline { + let obs_group_id: PipelineGroupId = SYSTEM_PIPELINE_GROUP_ID.into(); + let obs_pipeline_id: PipelineId = SYSTEM_OBSERVABILITY_PIPELINE_ID.into(); + let obs_pipeline_config = obs_pipeline.clone().into_pipeline_config(); + Self::validate_pipeline_components_with_factory( + pipeline_factory, + &obs_group_id, + &obs_pipeline_id, + &obs_pipeline_config, + )?; + } + + Ok(()) + } + + /// Validates that every configured node resolves to a registered component and that the + /// static component-specific configuration validates. + pub fn validate_engine_components(&self, engine_cfg: &OtelDataflowSpec) -> Result<(), String> { + Self::validate_engine_components_with_factory(self.pipeline_factory, engine_cfg) + } + /// Starts the controller with the given engine configurations. pub fn run_forever(&self, engine_config: OtelDataflowSpec) -> Result<(), Error> { self.run_with_mode( @@ -448,10 +554,13 @@ impl, group_names: &HashMap<(PipelineGroupId, TopicName), TopicName>, - ) -> ( - HashMap, - Vec, - ) { + ) -> Result< + ( + HashMap, + Vec, + ), + Error, + > { let mut usage_by_declared_topic = HashMap::::new(); for declared_name in global_names.values().chain(group_names.values()) { _ = usage_by_declared_topic.insert(declared_name.clone(), TopicUsageSummary::default()); @@ -518,16 +627,13 @@ impl = usage_by_declared_topic.keys().cloned().collect(); - declared_topics.sort_by(|left, right| left.as_ref().cmp(right.as_ref())); + let mut declared_topics: Vec<_> = usage_by_declared_topic.into_iter().collect(); + declared_topics.sort_by(|(left, _), (right, _)| left.as_ref().cmp(right.as_ref())); - for declared_topic in declared_topics { - let summary = usage_by_declared_topic - .get(&declared_topic) - .expect("declared topic must have a usage summary"); - let topology_mode = Self::infer_topic_mode(summary); + let mut inferred_modes = HashMap::with_capacity(declared_topics.len()); + let mut inferred_mode_reports = Vec::with_capacity(declared_topics.len()); + for (declared_topic, summary) in declared_topics { + let topology_mode = Self::infer_topic_mode(&summary); inferred_mode_reports.push(InferredTopicModeReport { topic: declared_topic.clone(), topology_mode, @@ -542,7 +648,7 @@ impl>(); group_ids.sort_by(|left, right| left.as_ref().cmp(right.as_ref())); for group_id in group_ids { - let group_cfg = config - .groups - .get(&group_id) - .expect("group collected from config must still exist"); - let mut pipeline_ids = group_cfg.pipelines.keys().cloned().collect::>(); - pipeline_ids.sort_by(|left, right| left.as_ref().cmp(right.as_ref())); - for pipeline_id in pipeline_ids { - let pipeline_cfg = group_cfg - .pipelines - .get(&pipeline_id) - .expect("pipeline collected from config must still exist"); + let Some(group_cfg) = config.groups.get(&group_id) else { + return Err(Error::PipelineRuntimeError { + source: Box::new(EngineError::InternalError { + message: format!( + "group `{}` disappeared while validating topic wiring", + group_id.as_ref() + ), + }), + }); + }; + let mut pipelines = group_cfg.pipelines.iter().collect::>(); + pipelines.sort_by(|(left, _), (right, _)| left.as_ref().cmp(right.as_ref())); + for (pipeline_id, pipeline_cfg) in pipelines { Self::collect_topic_wiring_edges_for_pipeline( &mut adjacency, &group_id, - &pipeline_id, + pipeline_id, pipeline_cfg, global_names, group_names, @@ -887,13 +995,20 @@ impl> = - ctrl_msg_senders - .into_iter() - .map(|sender| { - Arc::new(sender) - as Arc - }) - .collect(); - otap_df_admin::run( admin_settings, obs_state_handle, - admin_senders, + control_plane, telemetry_registry, memory_pressure_state, log_tap_handle, @@ -1461,48 +1529,8 @@ impl> = Vec::with_capacity(threads.len()); - for (thread_name, thread_id, pipeline_key, handle) in threads { - match handle.join() { - Ok(Ok(_)) => { - engine_evt_reporter.report(EngineEvent::drained(pipeline_key, None)); - } - Ok(Err(e)) => { - let err_summary: ErrorSummary = error_summary_from_gen(&e); - engine_evt_reporter.report(EngineEvent::pipeline_runtime_error( - pipeline_key.clone(), - "Pipeline encountered a runtime error.", - err_summary, - )); - results.push(Err(e)); - } - Err(e) => { - let err_summary = ErrorSummary::Pipeline { - error_kind: "panic".into(), - message: "The pipeline panicked during execution.".into(), - source: Some(format!("{e:?}")), - }; - engine_evt_reporter.report(EngineEvent::pipeline_runtime_error( - pipeline_key.clone(), - "The pipeline panicked during execution.", - err_summary, - )); - // Thread join failed, handle the error - let core_id = pipeline_key.core_id; - return Err(Error::ThreadPanic { - thread_name, - thread_id, - core_id, - panic_message: format!("{e:?}"), - }); - } - } - } - - // Check if any pipeline threads returned an error - if let Some(err) = results.into_iter().find_map(Result::err) { - return Err(err); + if run_mode == RunMode::ShutdownWhenDone { + runtime.wait_until_all_instances_exit(); } // In standard engine mode we keep the main thread parked after startup. @@ -1523,6 +1551,10 @@ impl Ok(available_core_ids), - CoreAllocationStrategy::CoreCount => { - match core_allocation.count { - Some(count) => { - if count == 0 { - Ok(available_core_ids) - } else if count > num_cores { - Err(Error::InvalidCoreAllocation { + CoreAllocationStrategy::CoreCount => match core_allocation.count { + Some(count) => { + if count == 0 { + Ok(available_core_ids) + } else if count > num_cores { + Err(Error::InvalidCoreAllocation { + alloc: core_allocation.clone(), + message: format!( + "Requested {} cores but only {} cores available on this system", + count, num_cores + ), + available: available_core_ids.iter().map(|c| c.id).collect(), + }) + } else { + Ok(available_core_ids.into_iter().take(count).collect()) + } + } + None => Ok(available_core_ids), + }, + CoreAllocationStrategy::CoreSet => match &core_allocation.set { + Some(set) => { + for r in set.iter() { + if r.start > r.end { + return Err(Error::InvalidCoreAllocation { alloc: core_allocation.clone(), message: format!( - "Requested {} cores but only {} cores available on this system", - count, num_cores + "Invalid core range: start ({}) is greater than end ({})", + r.start, r.end ), available: available_core_ids.iter().map(|c| c.id).collect(), - }) - } else { - Ok(available_core_ids.into_iter().take(count).collect()) + }); + } + if r.start > max_core_id { + return Err(Error::InvalidCoreAllocation { + alloc: core_allocation.clone(), + message: format!( + "Core ID {} exceeds available cores (system has cores 0-{})", + r.start, max_core_id + ), + available: available_core_ids.iter().map(|c| c.id).collect(), + }); + } + if r.end > max_core_id { + return Err(Error::InvalidCoreAllocation { + alloc: core_allocation.clone(), + message: format!( + "Core ID {} exceeds available cores (system has cores 0-{})", + r.end, max_core_id + ), + available: available_core_ids.iter().map(|c| c.id).collect(), + }); } } - None => { - // Treat no count supplied the same as count: 0 - Ok(available_core_ids) - } - } - } - CoreAllocationStrategy::CoreSet => { - match &core_allocation.set { - Some(set) => { - // Validate all ranges first - for r in set.iter() { - if r.start > r.end { - return Err(Error::InvalidCoreAllocation { - alloc: core_allocation.clone(), - message: format!( - "Invalid core range: start ({}) is greater than end ({})", - r.start, r.end - ), - available: available_core_ids.iter().map(|c| c.id).collect(), - }); - } - if r.start > max_core_id { - return Err(Error::InvalidCoreAllocation { - alloc: core_allocation.clone(), - message: format!( - "Core ID {} exceeds available cores (system has cores 0-{})", - r.start, max_core_id - ), - available: available_core_ids.iter().map(|c| c.id).collect(), - }); - } - if r.end > max_core_id { + + for (i, r1) in set.iter().enumerate() { + for r2 in set.iter().skip(i + 1) { + if r1.start <= r2.end && r2.start <= r1.end { + let overlap_start = r1.start.max(r2.start); + let overlap_end = r1.end.min(r2.end); return Err(Error::InvalidCoreAllocation { alloc: core_allocation.clone(), message: format!( - "Core ID {} exceeds available cores (system has cores 0-{})", - r.end, max_core_id + "Core ranges overlap: {}-{} and {}-{} share cores {}-{}", + r1.start, + r1.end, + r2.start, + r2.end, + overlap_start, + overlap_end ), available: available_core_ids.iter().map(|c| c.id).collect(), }); } } + } - // Check for overlapping ranges - for (i, r1) in set.iter().enumerate() { - for r2 in set.iter().skip(i + 1) { - // Two ranges overlap if they share any common cores - if r1.start <= r2.end && r2.start <= r1.end { - let overlap_start = r1.start.max(r2.start); - let overlap_end = r1.end.min(r2.end); - return Err(Error::InvalidCoreAllocation { - alloc: core_allocation.clone(), - message: format!( - "Core ranges overlap: {}-{} and {}-{} share cores {}-{}", - r1.start, - r1.end, - r2.start, - r2.end, - overlap_start, - overlap_end - ), - available: available_core_ids - .iter() - .map(|c| c.id) - .collect(), - }); - } - } - } - - // Filter cores in range - let selected: Vec<_> = available_core_ids - .into_iter() - // Naively check if each interval contains the point - // This problem is known as the "Interval Stabbing Problem" - // and has more efficient but more complex solutions - .filter(|c| set.iter().any(|r| r.start <= c.id && c.id <= r.end)) - .collect(); - - if selected.is_empty() { - return Err(Error::InvalidCoreAllocation { - alloc: core_allocation.clone(), - message: "No available cores in the specified ranges".to_owned(), - available: core_affinity::get_core_ids() - .unwrap_or_default() - .iter() - .map(|c| c.id) - .collect(), - }); - } + let selected: Vec<_> = available_core_ids + .into_iter() + .filter(|c| set.iter().any(|r| r.start <= c.id && c.id <= r.end)) + .collect(); - Ok(selected) + if selected.is_empty() { + return Err(Error::InvalidCoreAllocation { + alloc: core_allocation.clone(), + message: "No available cores in the specified ranges".to_owned(), + available: core_affinity::get_core_ids() + .unwrap_or_default() + .iter() + .map(|c| c.id) + .collect(), + }); } - None => Err(Error::InvalidCoreAllocation { - alloc: core_allocation.clone(), - message: "No range of cores supplied for allocation".to_owned(), - available: core_affinity::get_core_ids() - .unwrap_or_default() - .iter() - .map(|c| c.id) - .collect(), - }), + + Ok(selected) } - } + None => Ok(Vec::new()), + }, } } @@ -1755,29 +1762,150 @@ impl, + pipeline_key: DeployedPipelineKey, + core_id: CoreId, + num_cores: usize, + pipeline_config: PipelineConfig, + channel_capacity_policy: ChannelCapacityPolicy, + telemetry_policy: TelemetryPolicy, + transport_headers_policy: Option, + controller_ctx: ControllerContext, + metrics_reporter: MetricsReporter, + engine_evt_reporter: ObservedEventReporter, + tracing_setup: TracingSetup, + telemetry_reporting_interval: Duration, + memory_pressure_tx: tokio::sync::watch::Sender, + config: &OtelDataflowSpec, + declared_topics: &DeclaredTopics, + runtime: std::sync::Weak>, + thread_id: usize, + internal_telemetry: Option<( + InternalTelemetrySettings, + std_mpsc::SyncSender>, + )>, + ) -> Result, Error> { + let mut pipeline_ctx = controller_ctx.pipeline_context_with_generation( + pipeline_key.pipeline_group_id.clone(), + pipeline_key.pipeline_id.clone(), + pipeline_key.core_id, + num_cores, + thread_id, + pipeline_key.deployment_generation, + ); + let topic_set = Self::build_pipeline_topic_set( + config, + declared_topics, + &pipeline_key.pipeline_group_id, + &pipeline_key.pipeline_id, + pipeline_key.core_id, + )?; + pipeline_ctx.set_topic_set(topic_set); + let (runtime_ctrl_msg_tx, runtime_ctrl_msg_rx) = + runtime_ctrl_msg_channel(channel_capacity_policy.control.pipeline); + let (pipeline_completion_msg_tx, pipeline_completion_msg_rx) = + pipeline_completion_msg_channel(channel_capacity_policy.control.completion); + let control_sender: Arc = Arc::new(runtime_ctrl_msg_tx.clone()); + let memory_pressure_rx = memory_pressure_tx.subscribe(); + let thread_name = format!( + "pipeline-{}-{}-core-{}-gen-{}", + pipeline_key.pipeline_group_id.as_ref(), + pipeline_key.pipeline_id.as_ref(), + pipeline_key.core_id, + pipeline_key.deployment_generation + ); + let run_key = pipeline_key.clone(); + let runtime_key = pipeline_key.clone(); + let runtime_thread_name = thread_name.clone(); + let _handle = thread::Builder::new() + .name(thread_name.clone()) + .spawn(move || { + let exit = match catch_unwind(AssertUnwindSafe(|| { + Self::run_pipeline_thread( + run_key, + core_id, + pipeline_config, + channel_capacity_policy, + telemetry_policy, + transport_headers_policy, + telemetry_reporting_interval, + pipeline_factory, + pipeline_ctx, + engine_evt_reporter, + metrics_reporter, + runtime_ctrl_msg_tx, + runtime_ctrl_msg_rx, + pipeline_completion_msg_tx, + pipeline_completion_msg_rx, + memory_pressure_rx, + tracing_setup, + internal_telemetry, + ) + })) { + Ok(Ok(_)) => RuntimeInstanceExit::Success, + Ok(Err(err)) => { + RuntimeInstanceExit::Error(RuntimeInstanceError::runtime(err.to_string())) + } + Err(panic) => RuntimeInstanceExit::Error(RuntimeInstanceError::from_panic( + PanicReport::capture( + "runtime thread", + panic, + Some(runtime_thread_name), + Some(thread_id), + Some(runtime_key.core_id), + ), + )), + }; + if let Some(runtime) = runtime.upgrade() { + runtime.note_instance_exit(runtime_key, exit); + } + // The controller runtime may already be gone during teardown. In that case there + // is nothing left to update, so late exit reporting is intentionally best-effort. + }) + .map_err(|e| Error::ThreadSpawnError { + thread_name: thread_name.clone(), + source: e, + })?; + + Ok(LaunchedPipelineThread { + pipeline_key, + control_sender, + _marker: std::marker::PhantomData, + }) + } + /// Spawns the internal telemetry pipeline if engine observability config provides one. /// /// Returns the thread handle if an internal pipeline was spawned /// and waits for it to start, or None. #[allow(clippy::too_many_arguments)] fn spawn_internal_pipeline_if_configured( + runtime: std::sync::Weak>, its_key: DeployedPipelineKey, its_core: CoreId, observability_pipeline: Option, config: &OtelDataflowSpec, - declared_topics: &DeclaredTopics, telemetry_system: &InternalTelemetrySystem, pipeline_factory: &'static PipelineFactory, controller_ctx: &ControllerContext, engine_evt_reporter: &ObservedEventReporter, metrics_reporter: &MetricsReporter, - telemetry_reporting_interval: std::time::Duration, + telemetry_reporting_interval: Duration, memory_pressure_tx: &tokio::sync::watch::Sender, tracing_setup: TracingSetup, - ) -> Result, Error>>)>, Error> { + ) -> Result>, Error> { let (internal_config, channel_capacity_policy, telemetry_policy): ( PipelineConfig, ChannelCapacityPolicy, @@ -1809,66 +1937,32 @@ impl its_settings, }; - let mut internal_pipeline_ctx = controller_ctx.pipeline_context_with( - its_key.pipeline_group_id.clone(), - its_key.pipeline_id.clone(), - its_key.core_id, - 1, // Internal telemetry pipeline runs on a single core - 0, // TODO: we do not have a thread_id - ); - let topic_set = Self::build_pipeline_topic_set( - config, - declared_topics, - &its_key.pipeline_group_id, - &its_key.pipeline_id, - its_key.core_id, - )?; - internal_pipeline_ctx.set_topic_set(topic_set); - - // Create control message channel for internal pipeline - let (internal_ctrl_tx, internal_ctrl_rx) = - runtime_ctrl_msg_channel(channel_capacity_policy.control.pipeline); - let (internal_return_tx, internal_return_rx) = - pipeline_completion_msg_channel(channel_capacity_policy.control.completion); - // Create a channel to signal startup success/failure let (startup_tx, startup_rx) = std_mpsc::sync_channel::>(1); - - let thread_name = "internal-pipeline".to_string(); - let internal_evt_reporter = engine_evt_reporter.clone(); - let internal_metrics_reporter = metrics_reporter.clone(); - let internal_channel_capacity_policy = channel_capacity_policy; - let internal_telemetry_policy = telemetry_policy; - let internal_memory_pressure_rx = memory_pressure_tx.subscribe(); - - let handle = thread::Builder::new() - .name(thread_name.clone()) - .spawn(move || { - Self::run_pipeline_thread( - its_key, - its_core, - internal_config, - internal_channel_capacity_policy, - internal_telemetry_policy, - None, // no transport headers for the internal observability pipeline - telemetry_reporting_interval, - pipeline_factory, - internal_pipeline_ctx, - internal_evt_reporter, - internal_metrics_reporter, - internal_ctrl_tx, - internal_ctrl_rx, - internal_return_tx, - internal_return_rx, - internal_memory_pressure_rx, - tracing_setup, - Some((its_settings, startup_tx)), - ) - }) - .map_err(|e| Error::ThreadSpawnError { - thread_name: thread_name.clone(), - source: e, - })?; + let launched = Self::launch_pipeline_thread( + pipeline_factory, + its_key, + its_core, + 1, + internal_config, + channel_capacity_policy, + telemetry_policy, + None, + controller_ctx.clone(), + metrics_reporter.clone(), + engine_evt_reporter.clone(), + tracing_setup, + telemetry_reporting_interval, + memory_pressure_tx.clone(), + config, + runtime + .upgrade() + .expect("controller runtime should exist while spawning internal pipeline") + .declared_topics(), + runtime, + 0, + Some((its_settings, startup_tx)), + )?; // Wait for the internal pipeline to signal successful startup match startup_rx.recv() { @@ -1892,7 +1986,7 @@ impl, - telemetry_reporting_interval: std::time::Duration, + telemetry_reporting_interval: Duration, pipeline_factory: &'static PipelineFactory, pipeline_context: PipelineContext, obs_evt_reporter: ObservedEventReporter, @@ -2017,27 +2111,6 @@ impl ErrorSummary { - match error { - Error::PipelineRuntimeError { source } => { - if let Some(engine_error) = source.downcast_ref::() { - error_summary_from(engine_error) - } else { - ErrorSummary::Pipeline { - error_kind: "runtime".into(), - message: source.to_string(), - source: None, - } - } - } - _ => ErrorSummary::Pipeline { - error_kind: "runtime".into(), - message: error.to_string(), - source: None, - }, - } -} - #[cfg(test)] mod tests { use super::*; @@ -2305,7 +2378,6 @@ connections: #[test] fn select_with_adjacent_ranges_succeeds() { - // Adjacent but non-overlapping ranges should work let core_allocation = CoreAllocation::core_set(vec![ CoreRange { start: 2, end: 3 }, CoreRange { start: 4, end: 5 }, @@ -3122,7 +3194,7 @@ groups: ); assert_eq!( local_block.default_publish_outcome_config().timeout, - std::time::Duration::from_secs(46) + Duration::from_secs(46) ); // group-local declaration must override global policy for same local name @@ -3147,7 +3219,7 @@ groups: ); assert_eq!( overridden.default_publish_outcome_config().timeout, - std::time::Duration::from_secs(47) + Duration::from_secs(47) ); } diff --git a/rust/otap-dataflow/crates/controller/src/live_control/README.md b/rust/otap-dataflow/crates/controller/src/live_control/README.md new file mode 100644 index 0000000000..584d669657 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control/README.md @@ -0,0 +1,109 @@ +# Controller Live Control + +`live_control` owns the in-process runtime model used by the admin control +plane to reconfigure and shut down logical pipelines while the engine is +running. It is deliberately internal to the controller: public admin API +shapes live in `otap-df-admin` and `otap-df-admin-types`, while this module +tracks the mutable controller state required to execute those API requests. + +## Goals + +- Accept per-pipeline rollout and shutdown requests without restarting the + whole engine. +- Keep controller state consistent across asynchronous pipeline-thread exits, + rollout workers, shutdown workers, and observed-state updates. +- Preserve useful recent operation snapshots while bounding in-memory + retention. +- Keep old runtime instances visible only while active controller work still + needs generation-specific status. + +## Architecture + +The module is split by responsibility: + +- `mod.rs` is the facade. It defines `ControllerRuntime`, the control-plane + adapter, startup registration, shared pruning helpers, and the + `ControlPlane` implementation. +- `state.rs` defines the in-memory state model: rollout/shutdown records, + runtime-instance records, candidate plans, panic/error reports, and retention + constants. +- `planning.rs` validates requests, classifies rollout actions, prepares + candidate rollout/shutdown plans, records accepted operations, updates status + snapshots, and spawns background workers. +- `execution.rs` runs rollout and shutdown workers. It handles create, resize, + replace, rollback, panic cleanup, and per-core progress updates. +- `runtime.rs` launches pipeline threads, registers instances, records exits, + sends shutdown requests, waits for readiness/exit, and exposes global runtime + shutdown/error helpers. + +`ControllerRuntime` is the shared owner. It is held behind an `Arc` by the +admin control-plane adapter and by detached rollout/shutdown workers. Pipeline +threads receive a `Weak>` so they can report exits without +extending the controller lifetime. + +## Lifecycle Model + +Live control separates three related concepts: + +- A logical pipeline is identified by `(pipeline_group_id, pipeline_id)` and + points at the committed resolved pipeline plus its active generation. +- A pipeline group is the config hierarchy that contains related pipelines, + group-local topics, and group-level policies. Current live-control operations + target one logical pipeline inside that group. +- A deployed runtime instance is identified by `(pipeline_group_id, + pipeline_id, core_id, deployment_generation)` and tracks whether that thread + is still active or has exited. +- A controller operation is a rollout or shutdown record with public progress + state and per-core details. + +Rollouts are classified before execution: + +- `create` launches a logical pipeline that did not exist. +- `noop` commits an identical request without launching a worker. +- `resize` changes core placement without changing the runtime shape. +- `replace` launches a new generation, waits for readiness, then drains the + previous generation. + +Shutdown targets the currently active deployed instances for one logical +pipeline. Global shutdown bypasses operation history and broadcasts shutdown to +all active instances. + +## Design Decisions + +- The controller is the authority for when old generations can be retired. + Observed-state compaction is invoked only after active rollout/shutdown work + no longer needs generation-specific entries. +- The current consistency scope is one logical pipeline. Planning validates a + candidate against a cloned full config snapshot, but commit patches only that + pipeline into the latest live config. This intentionally does not provide + whole-config serializability across concurrent operations on different + logical pipelines. +- Terminal rollout and shutdown records are retained in memory with both a + per-logical-pipeline cap and a TTL. This keeps recent admin lookups useful + without unbounded history growth. +- Runtime exit reporting is race-tolerant. A pipeline thread can exit before + `register_launched_instance()` publishes it as active; such exits are parked + in `pending_instance_exits` and reconciled during registration. +- Worker panic handling is unwind-safe. Panic cleanup records terminal failure, + clears active-operation conflict state, and reports concise public failure + reasons plus detailed internal diagnostics. +- Topic broker runtime shape is not mutable through live reconfiguration. + Rollout planning rejects requests that would require changing declared topic + backend, policy, or selected implementation mode. + +## Current Limits + +- Rollout and shutdown workers are detached OS threads. They are supervised by + panic cleanup, but there is no bounded worker pool or join-handle supervisor. +- Topic declaration changes are intentionally rejected. Supporting them would + require a separate broker migration model. +- Operation history is in-memory only. It is bounded and useful for recent + lookups, but it is not durable across controller restarts. +- Full group shutdown is orchestrated above this module by issuing + per-pipeline/global control-plane calls; this module tracks per-pipeline + live-control state. +- Future group-level reconfiguration can widen the active-operation conflict + scope from logical pipeline to pipeline group without changing the existing + per-pipeline endpoint shape. +- Rollbacks are best effort. If rollback itself fails, the operation records + `rollback_failed` and preserves diagnostics for operators. diff --git a/rust/otap-dataflow/crates/controller/src/live_control/execution.rs b/rust/otap-dataflow/crates/controller/src/live_control/execution.rs new file mode 100644 index 0000000000..30f1ca69b4 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control/execution.rs @@ -0,0 +1,906 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +//! Background rollout and shutdown execution. +//! +//! The planning module records accepted operations and spawns workers; this +//! module contains the worker bodies. Each worker updates per-core progress, +//! drives runtime instance launch/shutdown, commits successful generations, and +//! performs best-effort rollback when a multi-step rollout fails. + +use super::*; + +impl + ControllerRuntime +{ + /// Emits the internal telemetry event for a rollout/shutdown worker panic. + pub(super) fn report_controller_worker_panic( + &self, + pipeline_key: &PipelineKey, + operation_kind: &'static str, + operation_id: &str, + report: &PanicReport, + ) { + let ErrorSummary::Pipeline { + error_kind, + message, + source, + } = report.error_summary() + else { + unreachable!("panic reports are always pipeline-level summaries"); + }; + + otel_error!( + "controller.worker_panic", + pipeline_group_id = %pipeline_key.pipeline_group_id(), + pipeline_id = %pipeline_key.pipeline_id(), + operation_kind = operation_kind, + operation_id = operation_id, + error_kind = error_kind.as_str(), + message = message.as_str(), + source = source.as_deref().unwrap_or(""), + ); + } + + /// Forces rollout terminal cleanup when the detached rollout worker panics. + pub(super) fn handle_rollout_worker_panic( + &self, + pipeline_key: &PipelineKey, + rollout_id: &str, + thread_name: String, + panic: Box, + ) { + let report = PanicReport::capture("rollout worker", panic, Some(thread_name), None, None); + let failure_reason = report.summary_message(); + self.request_rollout_panic_candidate_cleanup(pipeline_key, rollout_id); + self.update_rollout(pipeline_key, rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::Failed; + rollout.failure_reason = Some(failure_reason.clone()); + }); + self.report_controller_worker_panic(pipeline_key, "rollout", rollout_id, &report); + self.finish_rollout(pipeline_key, rollout_id); + } + + /// Sends shutdown to candidate instances left behind by a panicking rollout worker. + fn request_rollout_panic_candidate_cleanup( + &self, + pipeline_key: &PipelineKey, + rollout_id: &str, + ) { + let (mut candidates, timeout_secs) = { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let Some(rollout) = state.rollouts.get(rollout_id) else { + return; + }; + let target_generation = rollout.target_generation; + let timeout_secs = rollout.drain_timeout_secs; + + // Only an uncommitted target generation is safe to clean up here. + // Resize rollouts use the committed generation, and a post-commit + // panic means the target generation is already the serving one. + if state + .logical_pipelines + .get(pipeline_key) + .is_some_and(|record| record.active_generation == target_generation) + { + return; + } + + let candidates = state + .runtime_instances + .iter() + .filter_map(|(deployed_key, instance)| { + if deployed_key.pipeline_group_id == *pipeline_key.pipeline_group_id() + && deployed_key.pipeline_id == *pipeline_key.pipeline_id() + && deployed_key.deployment_generation == target_generation + && matches!(instance.lifecycle, RuntimeInstanceLifecycle::Active) + && instance.control_sender.is_some() + { + Some(deployed_key.clone()) + } else { + None + } + }) + .collect::>(); + (candidates, timeout_secs) + }; + candidates.sort_by_key(|deployed_key| deployed_key.core_id); + + for deployed_key in candidates { + if let Err(message) = + self.request_instance_shutdown(&deployed_key, timeout_secs, "rollout panic cleanup") + { + otel_error!( + "controller.rollout_panic_cleanup_failed", + pipeline_group_id = %deployed_key.pipeline_group_id.as_ref(), + pipeline_id = %deployed_key.pipeline_id.as_ref(), + core_id = deployed_key.core_id, + deployment_generation = deployed_key.deployment_generation, + rollout_id = rollout_id, + error = message.as_str(), + ); + } + } + } + + /// Forces shutdown terminal cleanup when the detached shutdown worker panics. + pub(super) fn handle_shutdown_worker_panic( + &self, + pipeline_key: &PipelineKey, + shutdown_id: &str, + thread_name: String, + panic: Box, + ) { + let report = PanicReport::capture("shutdown worker", panic, Some(thread_name), None, None); + let failure_reason = report.summary_message(); + self.update_shutdown(pipeline_key, shutdown_id, |shutdown| { + shutdown.state = ShutdownLifecycleState::Failed; + shutdown.failure_reason = Some(failure_reason.clone()); + }); + self.report_controller_worker_panic(pipeline_key, "shutdown", shutdown_id, &report); + } + + /// Runs one accepted rollout plan and records its terminal state. + pub(super) fn run_rollout(self: Arc, plan: CandidateRolloutPlan) { + self.update_rollout(&plan.pipeline_key, &plan.rollout.rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::Running; + }); + + let result = match plan.action { + RolloutAction::Create => self.run_create_rollout(&plan), + RolloutAction::NoOp => Ok(()), + RolloutAction::Replace => self.run_replace_rollout(&plan), + RolloutAction::Resize => self.run_resize_rollout(&plan), + }; + + match result { + Ok(()) => { + self.update_rollout(&plan.pipeline_key, &plan.rollout.rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::Succeeded; + rollout.failure_reason = None; + }); + } + Err(RolloutExecutionError::Failed(reason)) => { + self.update_rollout(&plan.pipeline_key, &plan.rollout.rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::Failed; + rollout.failure_reason = Some(reason); + }); + } + Err(RolloutExecutionError::RollbackFailed(reason)) => { + self.update_rollout(&plan.pipeline_key, &plan.rollout.rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::RollbackFailed; + rollout.failure_reason = Some(reason); + }); + } + } + + self.finish_rollout(&plan.pipeline_key, &plan.rollout.rollout_id); + } + + /// Drives one pipeline shutdown operation to completion or failure. + pub(super) fn run_shutdown(self: Arc, plan: CandidateShutdownPlan) { + self.update_shutdown(&plan.pipeline_key, &plan.shutdown.shutdown_id, |shutdown| { + shutdown.state = ShutdownLifecycleState::Running; + }); + + for deployed_key in &plan.target_instances { + if let Err(message) = + self.request_instance_shutdown(deployed_key, plan.timeout_secs, "pipeline shutdown") + { + self.update_shutdown(&plan.pipeline_key, &plan.shutdown.shutdown_id, |shutdown| { + shutdown.state = ShutdownLifecycleState::Failed; + shutdown.failure_reason = Some(message.clone()); + if let Some(core) = shutdown.cores.iter_mut().find(|core| { + core.core_id == deployed_key.core_id + && core.deployment_generation == deployed_key.deployment_generation + }) { + core.state = "failed".to_owned(); + core.updated_at = timestamp_now(); + core.detail = Some(message.clone()); + } + }); + return; + } + + self.update_shutdown(&plan.pipeline_key, &plan.shutdown.shutdown_id, |shutdown| { + if let Some(core) = shutdown.cores.iter_mut().find(|core| { + core.core_id == deployed_key.core_id + && core.deployment_generation == deployed_key.deployment_generation + }) { + core.state = "shutdown_requested".to_owned(); + core.updated_at = timestamp_now(); + } + }); + } + + let deadline = Instant::now() + Duration::from_secs(plan.timeout_secs); + let mut remaining: HashSet<_> = plan.target_instances.iter().cloned().collect(); + while !remaining.is_empty() { + let mut completed = Vec::new(); + for deployed_key in &remaining { + match self.instance_exit(deployed_key) { + Some(RuntimeInstanceExit::Success) => { + completed.push(deployed_key.clone()); + } + Some(RuntimeInstanceExit::Error(error)) => { + self.update_shutdown( + &plan.pipeline_key, + &plan.shutdown.shutdown_id, + |shutdown| { + shutdown.state = ShutdownLifecycleState::Failed; + shutdown.failure_reason = Some(error.message.clone()); + if let Some(core) = shutdown.cores.iter_mut().find(|core| { + core.core_id == deployed_key.core_id + && core.deployment_generation + == deployed_key.deployment_generation + }) { + core.state = "failed".to_owned(); + core.updated_at = timestamp_now(); + core.detail = Some(error.message.clone()); + } + }, + ); + return; + } + None => {} + } + } + + for deployed_key in completed { + let _ = remaining.remove(&deployed_key); + self.update_shutdown(&plan.pipeline_key, &plan.shutdown.shutdown_id, |shutdown| { + if let Some(core) = shutdown.cores.iter_mut().find(|core| { + core.core_id == deployed_key.core_id + && core.deployment_generation == deployed_key.deployment_generation + }) { + core.state = "exited".to_owned(); + core.updated_at = timestamp_now(); + } + }); + } + + if remaining.is_empty() { + break; + } + + if Instant::now() >= deadline { + let failure_reason = remaining + .iter() + .next() + .map(|deployed_key| { + format!( + "timed out waiting for pipeline {}:{} core={} generation={} to drain", + deployed_key.pipeline_group_id.as_ref(), + deployed_key.pipeline_id.as_ref(), + deployed_key.core_id, + deployed_key.deployment_generation + ) + }) + .unwrap_or_else(|| "shutdown timed out".to_owned()); + self.update_shutdown(&plan.pipeline_key, &plan.shutdown.shutdown_id, |shutdown| { + shutdown.state = ShutdownLifecycleState::Failed; + shutdown.failure_reason = Some(failure_reason.clone()); + for deployed_key in &remaining { + if let Some(core) = shutdown.cores.iter_mut().find(|core| { + core.core_id == deployed_key.core_id + && core.deployment_generation == deployed_key.deployment_generation + }) { + core.state = "failed".to_owned(); + core.updated_at = timestamp_now(); + core.detail = Some(failure_reason.clone()); + } + } + }); + return; + } + + thread::sleep(Duration::from_millis(50)); + } + + self.update_shutdown(&plan.pipeline_key, &plan.shutdown.shutdown_id, |shutdown| { + shutdown.state = ShutdownLifecycleState::Succeeded; + }); + } + + /// Creates a brand-new logical pipeline by launching all target instances. + pub(super) fn run_create_rollout( + self: &Arc, + plan: &CandidateRolloutPlan, + ) -> Result<(), RolloutExecutionError> { + let mut launched = Vec::new(); + let deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + for core_id in &plan.target_assigned_cores { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "starting", + None, + ); + let deployed_key = match self.launch_regular_pipeline_instance( + &plan.resolved_pipeline, + *core_id, + plan.target_generation, + ) { + Ok(deployed_key) => deployed_key, + Err(err) => { + let reason = err.to_string(); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "failed", + Some(reason.clone()), + ); + // Create rollouts have no previous generation to restore, so a launch + // failure must tear down any candidate instances that were already started. + let _ = self.shutdown_instances(&launched, plan.drain_timeout_secs); + return Err(RolloutExecutionError::Failed(reason)); + } + }; + launched.push(deployed_key); + } + + for deployed_key in &launched { + self.wait_for_pipeline_ready(deployed_key, deadline) + .map_err(|reason| { + let _ = self.shutdown_instances(&launched, plan.drain_timeout_secs); + RolloutExecutionError::Failed(reason) + })?; + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + deployed_key.core_id, + "ready", + None, + ); + } + + self.commit_pipeline_record(plan, plan.target_generation); + Ok(()) + } + + /// Resizes a pipeline when only core allocation changed and common cores stay untouched. + pub(super) fn run_resize_rollout( + self: &Arc, + plan: &CandidateRolloutPlan, + ) -> Result<(), RolloutExecutionError> { + let Some(current_record) = plan.current_record.as_ref() else { + return Err(RolloutExecutionError::Failed( + "internal error: resize rollout missing current record".to_owned(), + )); + }; + let active_generation = current_record.active_generation; + let mut started_cores = Vec::new(); + let mut retired_cores = Vec::new(); + + for core_id in &plan.resize_start_cores { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "starting", + None, + ); + + let new_key = match self.launch_regular_pipeline_instance( + &plan.resolved_pipeline, + *core_id, + active_generation, + ) { + Ok(new_key) => new_key, + Err(err) => { + let reason = err.to_string(); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "failed", + Some(reason.clone()), + ); + return self.rollback_resize_rollout( + plan, + &started_cores, + &retired_cores, + reason, + ); + } + }; + let ready_deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + if let Err(reason) = self.wait_for_pipeline_ready(&new_key, ready_deadline) { + let _ = self.shutdown_instances(&[new_key], plan.drain_timeout_secs); + return self.rollback_resize_rollout(plan, &started_cores, &retired_cores, reason); + } + + started_cores.push(*core_id); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "started", + None, + ); + } + + for core_id in &plan.resize_stop_cores { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "draining_old", + None, + ); + + let old_key = DeployedPipelineKey { + pipeline_group_id: plan.pipeline_group_id.clone(), + pipeline_id: plan.pipeline_id.clone(), + core_id: *core_id, + deployment_generation: active_generation, + }; + if let Err(reason) = + self.shutdown_instance(&old_key, plan.drain_timeout_secs, "resize rollout drain") + { + return self.rollback_resize_rollout(plan, &started_cores, &retired_cores, reason); + } + + retired_cores.push(*core_id); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "retired", + None, + ); + } + + self.commit_pipeline_record(plan, active_generation); + self.clear_pipeline_serving_generations( + &plan.pipeline_key, + plan.current_assigned_cores + .iter() + .chain(plan.target_assigned_cores.iter()) + .copied(), + ); + Ok(()) + } + + /// Performs the serial rolling cutover used for topology or config changes. + pub(super) fn run_replace_rollout( + self: &Arc, + plan: &CandidateRolloutPlan, + ) -> Result<(), RolloutExecutionError> { + let Some(previous) = plan.current_record.as_ref() else { + return Err(RolloutExecutionError::Failed( + "internal error: replace rollout missing current record".to_owned(), + )); + }; + let previous_generation = previous.active_generation; + for core_id in &plan.current_assigned_cores { + self.observed_state_store.set_pipeline_serving_generation( + plan.pipeline_key.clone(), + *core_id, + previous_generation, + ); + } + + let mut activated_added_cores = Vec::new(); + let mut switched_common_cores = Vec::new(); + let mut retired_removed_cores = Vec::new(); + + for core_id in &plan.added_assigned_cores { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "starting", + None, + ); + + let new_key = match self.launch_regular_pipeline_instance( + &plan.resolved_pipeline, + *core_id, + plan.target_generation, + ) { + Ok(new_key) => new_key, + Err(err) => { + let reason = err.to_string(); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "failed", + Some(reason.clone()), + ); + return self.rollback_replace_rollout( + plan, + &switched_common_cores, + &activated_added_cores, + &retired_removed_cores, + reason, + ); + } + }; + let ready_deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + if let Err(reason) = self.wait_for_pipeline_ready(&new_key, ready_deadline) { + let _ = self.shutdown_instances(&[new_key], plan.drain_timeout_secs); + return self.rollback_replace_rollout( + plan, + &switched_common_cores, + &activated_added_cores, + &retired_removed_cores, + reason, + ); + } + + self.observed_state_store.set_pipeline_serving_generation( + plan.pipeline_key.clone(), + *core_id, + plan.target_generation, + ); + activated_added_cores.push(*core_id); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "switched", + None, + ); + } + + for core_id in &plan.common_assigned_cores { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "starting", + None, + ); + + let new_key = match self.launch_regular_pipeline_instance( + &plan.resolved_pipeline, + *core_id, + plan.target_generation, + ) { + Ok(new_key) => new_key, + Err(err) => { + let reason = err.to_string(); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "failed", + Some(reason.clone()), + ); + return self.rollback_replace_rollout( + plan, + &switched_common_cores, + &activated_added_cores, + &retired_removed_cores, + reason, + ); + } + }; + let ready_deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + if let Err(reason) = self.wait_for_pipeline_ready(&new_key, ready_deadline) { + let _ = self.shutdown_instances(&[new_key], plan.drain_timeout_secs); + return self.rollback_replace_rollout( + plan, + &switched_common_cores, + &activated_added_cores, + &retired_removed_cores, + reason, + ); + } + + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "draining_old", + None, + ); + + let old_key = DeployedPipelineKey { + pipeline_group_id: plan.pipeline_group_id.clone(), + pipeline_id: plan.pipeline_id.clone(), + core_id: *core_id, + deployment_generation: previous_generation, + }; + if let Err(reason) = + self.shutdown_instance(&old_key, plan.drain_timeout_secs, "rolling cutover drain") + { + let _ = self.shutdown_instances(&[new_key], plan.drain_timeout_secs); + return self.rollback_replace_rollout( + plan, + &switched_common_cores, + &activated_added_cores, + &retired_removed_cores, + reason, + ); + } + + self.observed_state_store.set_pipeline_serving_generation( + plan.pipeline_key.clone(), + *core_id, + plan.target_generation, + ); + switched_common_cores.push(*core_id); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "switched", + None, + ); + } + + for core_id in &plan.removed_assigned_cores { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "draining_old", + None, + ); + + let old_key = DeployedPipelineKey { + pipeline_group_id: plan.pipeline_group_id.clone(), + pipeline_id: plan.pipeline_id.clone(), + core_id: *core_id, + deployment_generation: previous_generation, + }; + if let Err(reason) = self.shutdown_instance( + &old_key, + plan.drain_timeout_secs, + "resource policy rollout drain", + ) { + return self.rollback_replace_rollout( + plan, + &switched_common_cores, + &activated_added_cores, + &retired_removed_cores, + reason, + ); + } + + self.observed_state_store + .clear_pipeline_serving_generation(plan.pipeline_key.clone(), *core_id); + retired_removed_cores.push(*core_id); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "retired", + None, + ); + } + + self.commit_pipeline_record(plan, plan.target_generation); + self.clear_pipeline_serving_generations( + &plan.pipeline_key, + plan.current_assigned_cores + .iter() + .chain(plan.target_assigned_cores.iter()) + .copied(), + ); + Ok(()) + } + + /// Restores the prior core footprint after a resize rollout fails. + pub(super) fn rollback_resize_rollout( + self: &Arc, + plan: &CandidateRolloutPlan, + started_cores: &[usize], + retired_cores: &[usize], + failure_reason: String, + ) -> Result<(), RolloutExecutionError> { + self.update_rollout(&plan.pipeline_key, &plan.rollout.rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::RollingBack; + rollout.failure_reason = Some(failure_reason.clone()); + }); + let Some(previous) = plan.current_record.as_ref() else { + return Err(RolloutExecutionError::RollbackFailed( + "internal error: resize rollback missing current record".to_owned(), + )); + }; + let previous_generation = previous.active_generation; + + for core_id in retired_cores.iter().rev() { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rollback_starting", + None, + ); + + let old_key = self + .launch_regular_pipeline_instance(&previous.resolved, *core_id, previous_generation) + .map_err(|err| RolloutExecutionError::RollbackFailed(err.to_string()))?; + let ready_deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + self.wait_for_pipeline_ready(&old_key, ready_deadline) + .map_err(RolloutExecutionError::RollbackFailed)?; + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rolled_back", + None, + ); + } + + for core_id in started_cores.iter().rev() { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rollback_starting", + None, + ); + + let new_key = DeployedPipelineKey { + pipeline_group_id: plan.pipeline_group_id.clone(), + pipeline_id: plan.pipeline_id.clone(), + core_id: *core_id, + deployment_generation: previous_generation, + }; + self.shutdown_instance(&new_key, plan.drain_timeout_secs, "rollback cleanup") + .map_err(RolloutExecutionError::RollbackFailed)?; + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rolled_back", + None, + ); + } + + self.clear_pipeline_serving_generations( + &plan.pipeline_key, + plan.current_assigned_cores + .iter() + .chain(plan.target_assigned_cores.iter()) + .copied(), + ); + Err(RolloutExecutionError::Failed(failure_reason)) + } + + /// Restores the previous serving generation after a replace rollout fails. + pub(super) fn rollback_replace_rollout( + self: &Arc, + plan: &CandidateRolloutPlan, + switched_common_cores: &[usize], + activated_added_cores: &[usize], + retired_removed_cores: &[usize], + failure_reason: String, + ) -> Result<(), RolloutExecutionError> { + self.update_rollout(&plan.pipeline_key, &plan.rollout.rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::RollingBack; + rollout.failure_reason = Some(failure_reason.clone()); + }); + let Some(previous) = plan.current_record.as_ref() else { + return Err(RolloutExecutionError::RollbackFailed( + "internal error: replace rollback missing current record".to_owned(), + )); + }; + let previous_generation = previous.active_generation; + + for core_id in retired_removed_cores.iter().rev() { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rollback_starting", + None, + ); + + let old_key = self + .launch_regular_pipeline_instance(&previous.resolved, *core_id, previous_generation) + .map_err(|err| RolloutExecutionError::RollbackFailed(err.to_string()))?; + let ready_deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + self.wait_for_pipeline_ready(&old_key, ready_deadline) + .map_err(RolloutExecutionError::RollbackFailed)?; + self.observed_state_store.set_pipeline_serving_generation( + plan.pipeline_key.clone(), + *core_id, + previous_generation, + ); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rolled_back", + None, + ); + } + + for core_id in switched_common_cores.iter().rev() { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rollback_starting", + None, + ); + + let old_key = self + .launch_regular_pipeline_instance(&previous.resolved, *core_id, previous_generation) + .map_err(|err| RolloutExecutionError::RollbackFailed(err.to_string()))?; + let ready_deadline = Instant::now() + Duration::from_secs(plan.step_timeout_secs); + self.wait_for_pipeline_ready(&old_key, ready_deadline) + .map_err(RolloutExecutionError::RollbackFailed)?; + + let new_key = DeployedPipelineKey { + pipeline_group_id: plan.pipeline_group_id.clone(), + pipeline_id: plan.pipeline_id.clone(), + core_id: *core_id, + deployment_generation: plan.target_generation, + }; + self.shutdown_instance(&new_key, plan.drain_timeout_secs, "rollback drain") + .map_err(RolloutExecutionError::RollbackFailed)?; + self.observed_state_store.set_pipeline_serving_generation( + plan.pipeline_key.clone(), + *core_id, + previous_generation, + ); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rolled_back", + None, + ); + } + + for core_id in activated_added_cores.iter().rev() { + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rollback_starting", + None, + ); + + let new_key = DeployedPipelineKey { + pipeline_group_id: plan.pipeline_group_id.clone(), + pipeline_id: plan.pipeline_id.clone(), + core_id: *core_id, + deployment_generation: plan.target_generation, + }; + self.shutdown_instance(&new_key, plan.drain_timeout_secs, "rollback cleanup") + .map_err(RolloutExecutionError::RollbackFailed)?; + self.observed_state_store + .clear_pipeline_serving_generation(plan.pipeline_key.clone(), *core_id); + self.update_rollout_core_state( + &plan.pipeline_key, + &plan.rollout.rollout_id, + *core_id, + "rolled_back", + None, + ); + } + self.clear_pipeline_serving_generations( + &plan.pipeline_key, + plan.current_assigned_cores + .iter() + .chain(plan.target_assigned_cores.iter()) + .copied(), + ); + Err(RolloutExecutionError::Failed(failure_reason)) + } + + /// Best-effort cleanup helper for batches of launched candidate instances. + pub(super) fn shutdown_instances( + self: &Arc, + keys: &[DeployedPipelineKey], + timeout_secs: u64, + ) -> Result<(), String> { + for key in keys { + self.shutdown_instance(key, timeout_secs, "candidate cleanup")?; + } + Ok(()) + } +} diff --git a/rust/otap-dataflow/crates/controller/src/live_control/mod.rs b/rust/otap-dataflow/crates/controller/src/live_control/mod.rs new file mode 100644 index 0000000000..ba106f66a5 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control/mod.rs @@ -0,0 +1,529 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +//! Live reconfiguration runtime for controller-owned pipelines. +//! +//! This module is the controller's internal state machine for live pipeline +//! rollout and shutdown. It translates admin control-plane requests into +//! concrete runtime-instance changes, tracks active and terminal operations, +//! reconciles pipeline-thread exits with controller bookkeeping, and compacts +//! observed state once old generations are no longer needed for coordination. +//! +//! The submodules intentionally split the lifecycle by concern: +//! planning validates requests and records accepted operations, execution runs +//! rollout/shutdown workers, runtime owns per-instance launch and exit +//! reporting, and state contains the shared in-memory model. + +use super::*; +use chrono::Utc; +use otap_df_admin::{ + ControlPlane, ControlPlaneError, PipelineDetails, + PipelineRolloutState as ApiPipelineRolloutState, + PipelineRolloutSummary as ApiPipelineRolloutSummary, ReconfigureRequest, RolloutCoreStatus, + RolloutStatus, ShutdownCoreStatus, ShutdownStatus, +}; +use otap_df_state::conditions::ConditionStatus; +use otap_df_state::phase::PipelinePhase; +use otap_df_state::pipeline_status::{PipelineRolloutState, PipelineRolloutSummary}; +use std::any::Any; +use std::backtrace::Backtrace; +use std::collections::VecDeque; +use std::io; +use std::panic::{AssertUnwindSafe, catch_unwind}; +use std::sync::{Condvar, Mutex}; +use std::time::{Duration, Instant}; + +mod execution; +mod planning; +mod runtime; +mod state; + +#[cfg(test)] +use self::state::TERMINAL_OPERATION_RETENTION_TTL; +use self::state::{ + ActiveRuntimeCoreState, CandidateRolloutPlan, CandidateShutdownPlan, ControllerRuntimeState, + LogicalPipelineRecord, RolloutAction, RolloutCoreProgress, RolloutExecutionError, + RolloutLifecycleState, RolloutRecord, RuntimeInstanceLifecycle, RuntimeInstanceRecord, + ShutdownCoreProgress, ShutdownLifecycleState, ShutdownRecord, TERMINAL_ROLLOUT_RETENTION_LIMIT, + TERMINAL_SHUTDOWN_RETENTION_LIMIT, TopicRuntimeProfile, is_expired, timestamp_now, +}; +pub(crate) use self::state::{PanicReport, RuntimeInstanceError, RuntimeInstanceExit}; + +/// Shared live-control runtime used by the admin control plane and workers. +/// +/// `ControllerRuntime` is the synchronization point for logical pipeline +/// records, deployed runtime instances, rollout/shutdown histories, observed +/// state updates, and topic/runtime registries. All mutable controller state is +/// kept behind `state`; pipeline execution threads report back through a +/// `Weak>` so they do not keep the controller alive during +/// teardown. +pub(super) struct ControllerRuntime { + /// Factory used to build runtime pipelines for new instances. + pipeline_factory: &'static PipelineFactory, + /// Static controller context cloned into launched pipeline threads. + controller_context: ControllerContext, + /// Mutable observed-state store used for compaction and status updates. + observed_state_store: ObservedStateStore, + /// Read handle used by wait paths to observe readiness and phase changes. + observed_state_handle: ObservedStateHandle, + /// Reporter used for lifecycle and runtime-error events. + engine_event_reporter: ObservedEventReporter, + /// Metrics reporter cloned into launched runtime instances. + metrics_reporter: MetricsReporter, + /// Topic registry shared by all runtime instances. + declared_topics: DeclaredTopics, + /// Controller-wide core ids available for policy-based allocation. + available_core_ids: Vec, + /// Tracing setup cloned into launched runtime threads. + engine_tracing_setup: TracingSetup, + /// Runtime telemetry reporting cadence. + telemetry_reporting_interval: Duration, + /// Memory-pressure signal fanout shared with pipeline runtimes. + memory_pressure_tx: tokio::sync::watch::Sender, + /// All mutable live-control state protected by a single mutex. + state: Mutex, + /// Wakes global shutdown waiters when runtime instance liveness changes. + state_changed: Condvar, +} + +/// Thin adapter that exposes `ControllerRuntime` through the admin trait. +struct ControllerControlPlane { + runtime: Arc>, +} + +/// Result of launching one pipeline runtime thread. +/// +/// The controller stores the `control_sender` while the instance is active and +/// drops it after shutdown is requested so the pipeline can observe control +/// channel closure once node tasks finish. +pub(super) struct LaunchedPipelineThread { + /// Concrete deployed instance key for the launched runtime thread. + pub(super) pipeline_key: DeployedPipelineKey, + /// Admin sender used by live control to send shutdown to the instance. + pub(super) control_sender: Arc, + /// Keeps the launch result tied to the pipeline data type. + pub(super) _marker: std::marker::PhantomData, +} + +impl + ControllerRuntime +{ + #[allow(clippy::too_many_arguments)] + /// Builds the resident controller runtime used by live reconfiguration. + pub(super) fn new( + pipeline_factory: &'static PipelineFactory, + controller_context: ControllerContext, + observed_state_store: ObservedStateStore, + observed_state_handle: ObservedStateHandle, + engine_event_reporter: ObservedEventReporter, + metrics_reporter: MetricsReporter, + declared_topics: DeclaredTopics, + available_core_ids: Vec, + engine_tracing_setup: TracingSetup, + telemetry_reporting_interval: Duration, + memory_pressure_tx: tokio::sync::watch::Sender, + live_config: OtelDataflowSpec, + ) -> Self { + Self { + pipeline_factory, + controller_context, + observed_state_store, + observed_state_handle, + engine_event_reporter, + metrics_reporter, + declared_topics, + available_core_ids, + engine_tracing_setup, + telemetry_reporting_interval, + memory_pressure_tx, + state: Mutex::new(ControllerRuntimeState { + live_config, + logical_pipelines: HashMap::new(), + runtime_instances: HashMap::new(), + pending_instance_exits: HashMap::new(), + rollouts: HashMap::new(), + active_rollouts: HashMap::new(), + terminal_rollouts: HashMap::new(), + shutdowns: HashMap::new(), + active_shutdowns: HashMap::new(), + terminal_shutdowns: HashMap::new(), + generation_counters: HashMap::new(), + active_instances: 0, + next_rollout_id: 0, + next_shutdown_id: 0, + next_thread_id: 1, + first_error: None, + }), + state_changed: Condvar::new(), + } + } + + /// Seeds the runtime registry with a pipeline already committed at startup. + pub(super) fn register_committed_pipeline( + &self, + resolved: ResolvedPipelineConfig, + generation: u64, + ) { + let pipeline_key = PipelineKey::new( + resolved.pipeline_group_id.clone(), + resolved.pipeline_id.clone(), + ); + if let Ok(active_cores) = self.assigned_cores_for_resolved(&resolved) { + self.observed_state_store + .set_pipeline_active_cores(pipeline_key.clone(), active_cores); + } + self.observed_state_store + .set_pipeline_active_generation(pipeline_key.clone(), generation); + + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + _ = state + .generation_counters + .insert(pipeline_key.clone(), generation + 1); + _ = state.logical_pipelines.insert( + pipeline_key, + LogicalPipelineRecord { + resolved, + active_generation: generation, + }, + ); + } + + /// Allocates the next controller-local logical thread identifier. + pub(super) fn next_thread_id(&self) -> usize { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let thread_id = state.next_thread_id; + state.next_thread_id += 1; + thread_id + } + + /// Returns the declared-topic registry shared with launched pipelines. + pub(super) fn declared_topics(&self) -> &DeclaredTopics { + &self.declared_topics + } + + /// Exposes the runtime as the admin control-plane trait object. + pub(super) fn control_plane(self: &Arc) -> Arc { + Arc::new(ControllerControlPlane { + runtime: Arc::clone(self), + }) + } + + /// Checks whether a logical pipeline still has an active rollout or shutdown. + fn pipeline_has_active_operation_locked( + state: &ControllerRuntimeState, + pipeline_key: &PipelineKey, + ) -> bool { + state.active_rollouts.contains_key(pipeline_key) + || state.active_shutdowns.contains_key(pipeline_key) + } + + /// Applies a terminal instance exit to controller state after the instance + /// has been registered as active. + fn apply_instance_exit_locked( + state: &mut ControllerRuntimeState, + pipeline_key: &DeployedPipelineKey, + exit: &RuntimeInstanceExit, + ) -> bool { + if let Some(instance) = state.runtime_instances.get_mut(pipeline_key) { + instance.lifecycle = RuntimeInstanceLifecycle::Exited(exit.clone()); + } + state.active_instances = state.active_instances.saturating_sub(1); + if let RuntimeInstanceExit::Error(error) = exit { + if state.first_error.is_none() { + state.first_error = Some(error.message.clone()); + } + } + let logical_pipeline_key = PipelineKey::new( + pipeline_key.pipeline_group_id.clone(), + pipeline_key.pipeline_id.clone(), + ); + Self::prune_exited_runtime_instances_for_pipeline_locked(state, &logical_pipeline_key) + } + + /// Marks a rollout terminal and enqueues it for bounded retention. + fn record_terminal_rollout_locked( + state: &mut ControllerRuntimeState, + pipeline_key: &PipelineKey, + rollout_id: &str, + now: Instant, + ) { + let mut enqueue = false; + if let Some(rollout) = state.rollouts.get_mut(rollout_id) { + if rollout.state.is_terminal() && rollout.completed_at.is_none() { + rollout.completed_at = Some(now); + enqueue = true; + } + } + if enqueue { + state + .terminal_rollouts + .entry(pipeline_key.clone()) + .or_default() + .push_back(rollout_id.to_owned()); + } + Self::prune_terminal_rollout_queue_locked(state, pipeline_key, now); + } + + /// Evicts expired or over-cap terminal rollout snapshots for one pipeline. + fn prune_terminal_rollout_queue_locked( + state: &mut ControllerRuntimeState, + pipeline_key: &PipelineKey, + now: Instant, + ) { + while let Some((rollout_id, queue_len)) = + state.terminal_rollouts.get(pipeline_key).and_then(|queue| { + queue + .front() + .cloned() + .map(|rollout_id| (rollout_id, queue.len())) + }) + { + let should_evict = queue_len > TERMINAL_ROLLOUT_RETENTION_LIMIT + || state + .rollouts + .get(&rollout_id) + .is_none_or(|rollout| is_expired(rollout.completed_at, now)); + if !should_evict { + break; + } + + if let Some(evicted_id) = state + .terminal_rollouts + .get_mut(pipeline_key) + .and_then(VecDeque::pop_front) + { + _ = state.rollouts.remove(&evicted_id); + } + } + + if state + .terminal_rollouts + .get(pipeline_key) + .is_some_and(VecDeque::is_empty) + { + _ = state.terminal_rollouts.remove(pipeline_key); + } + } + + /// Marks a shutdown terminal and enqueues it for bounded retention. + fn record_terminal_shutdown_locked( + state: &mut ControllerRuntimeState, + pipeline_key: &PipelineKey, + shutdown_id: &str, + now: Instant, + ) { + let mut enqueue = false; + if let Some(shutdown) = state.shutdowns.get_mut(shutdown_id) { + if shutdown.state.is_terminal() && shutdown.completed_at.is_none() { + shutdown.completed_at = Some(now); + enqueue = true; + } + } + if enqueue { + state + .terminal_shutdowns + .entry(pipeline_key.clone()) + .or_default() + .push_back(shutdown_id.to_owned()); + } + Self::prune_terminal_shutdown_queue_locked(state, pipeline_key, now); + } + + /// Evicts expired or over-cap terminal shutdown snapshots for one pipeline. + fn prune_terminal_shutdown_queue_locked( + state: &mut ControllerRuntimeState, + pipeline_key: &PipelineKey, + now: Instant, + ) { + while let Some((shutdown_id, queue_len)) = state + .terminal_shutdowns + .get(pipeline_key) + .and_then(|queue| { + queue + .front() + .cloned() + .map(|shutdown_id| (shutdown_id, queue.len())) + }) + { + let should_evict = queue_len > TERMINAL_SHUTDOWN_RETENTION_LIMIT + || state + .shutdowns + .get(&shutdown_id) + .is_none_or(|shutdown| is_expired(shutdown.completed_at, now)); + if !should_evict { + break; + } + + if let Some(evicted_id) = state + .terminal_shutdowns + .get_mut(pipeline_key) + .and_then(VecDeque::pop_front) + { + _ = state.shutdowns.remove(&evicted_id); + } + } + + if state + .terminal_shutdowns + .get(pipeline_key) + .is_some_and(VecDeque::is_empty) + { + _ = state.terminal_shutdowns.remove(pipeline_key); + } + } + + /// Runs TTL/cap eviction across all retained terminal operation history. + fn prune_terminal_operation_history_locked(state: &mut ControllerRuntimeState, now: Instant) { + let rollout_keys: Vec<_> = state.terminal_rollouts.keys().cloned().collect(); + for pipeline_key in rollout_keys { + Self::prune_terminal_rollout_queue_locked(state, &pipeline_key, now); + } + + let shutdown_keys: Vec<_> = state.terminal_shutdowns.keys().cloned().collect(); + for pipeline_key in shutdown_keys { + Self::prune_terminal_shutdown_queue_locked(state, &pipeline_key, now); + } + } + + /// Drops exited runtime instances once no active controller work still needs them. + fn prune_exited_runtime_instances_for_pipeline_locked( + state: &mut ControllerRuntimeState, + pipeline_key: &PipelineKey, + ) -> bool { + if Self::pipeline_has_active_operation_locked(state, pipeline_key) { + return false; + } + + state.runtime_instances.retain(|deployed_key, instance| { + if deployed_key.pipeline_group_id != *pipeline_key.pipeline_group_id() + || deployed_key.pipeline_id != *pipeline_key.pipeline_id() + { + return true; + } + + matches!(instance.lifecycle, RuntimeInstanceLifecycle::Active) + }); + true + } + + /// Opportunistically trims retained rollout and shutdown history. + fn prune_retained_operation_history(&self) { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + Self::prune_terminal_operation_history_locked(&mut state, Instant::now()); + } + + /// Trims exited instances and terminal history for one logical pipeline. + fn prune_pipeline_runtime_and_history(&self, pipeline_key: &PipelineKey) { + let should_compact = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let should_compact = + Self::prune_exited_runtime_instances_for_pipeline_locked(&mut state, pipeline_key); + Self::prune_terminal_rollout_queue_locked(&mut state, pipeline_key, Instant::now()); + Self::prune_terminal_shutdown_queue_locked(&mut state, pipeline_key, Instant::now()); + should_compact + }; + if should_compact { + self.observed_state_store + .compact_pipeline_instances(pipeline_key); + } + } +} + +impl + ControlPlane for ControllerControlPlane +{ + fn shutdown_all(&self, timeout_secs: u64) -> Result<(), ControlPlaneError> { + self.runtime.request_shutdown_all(timeout_secs) + } + + fn shutdown_pipeline( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + timeout_secs: u64, + ) -> Result { + self.runtime + .request_shutdown_pipeline(pipeline_group_id, pipeline_id, timeout_secs) + } + + fn reconfigure_pipeline( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + request: ReconfigureRequest, + ) -> Result { + let plan = self + .runtime + .prepare_rollout_plan(pipeline_group_id, pipeline_id, &request)?; + self.runtime.spawn_rollout(plan) + } + + fn pipeline_details( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + ) -> Result, ControlPlaneError> { + self.runtime.pipeline_details_snapshot(&PipelineKey::new( + pipeline_group_id.to_owned().into(), + pipeline_id.to_owned().into(), + )) + } + + fn rollout_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + rollout_id: &str, + ) -> Result, ControlPlaneError> { + let expected_key = PipelineKey::new( + pipeline_group_id.to_owned().into(), + pipeline_id.to_owned().into(), + ); + let Some(status) = self.runtime.rollout_status_snapshot(rollout_id) else { + return Ok(None); + }; + let actual_key = + PipelineKey::new(status.pipeline_group_id.clone(), status.pipeline_id.clone()); + if actual_key != expected_key { + return Err(ControlPlaneError::RolloutNotFound); + } + Ok(Some(status)) + } + + fn shutdown_status( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + shutdown_id: &str, + ) -> Result, ControlPlaneError> { + let expected_key = PipelineKey::new( + pipeline_group_id.to_owned().into(), + pipeline_id.to_owned().into(), + ); + let Some(status) = self.runtime.shutdown_status_snapshot(shutdown_id) else { + return Ok(None); + }; + let actual_key = + PipelineKey::new(status.pipeline_group_id.clone(), status.pipeline_id.clone()); + if actual_key != expected_key { + return Err(ControlPlaneError::ShutdownNotFound); + } + Ok(Some(status)) + } +} + +#[cfg(test)] +#[path = "../live_control_tests.rs"] +mod tests; diff --git a/rust/otap-dataflow/crates/controller/src/live_control/planning.rs b/rust/otap-dataflow/crates/controller/src/live_control/planning.rs new file mode 100644 index 0000000000..725c702343 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control/planning.rs @@ -0,0 +1,883 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +//! Request planning, operation recording, and worker spawning. +//! +//! Planning converts admin requests into explicit candidate plans while holding +//! no long-running runtime resources. It also owns operation-record insertion +//! and status snapshot materialization because those steps are tightly coupled +//! to conflict detection and bounded history retention. + +use super::*; + +impl + ControllerRuntime +{ + /// Resolves the concrete core ids selected by a pipeline resource policy. + pub(super) fn assigned_cores_for_resolved( + &self, + resolved_pipeline: &ResolvedPipelineConfig, + ) -> Result, ControlPlaneError> { + Controller::::select_cores_for_allocation( + self.available_core_ids.clone(), + &resolved_pipeline.policies.resources.core_allocation, + ) + .map(|cores| cores.into_iter().map(|core| core.id).collect()) + .map_err(|err| ControlPlaneError::InvalidRequest { + message: err.to_string(), + }) + } + + /// Reports which active cores still belong to the current committed generation. + pub(super) fn active_runtime_core_state( + &self, + pipeline_key: &PipelineKey, + active_generation: u64, + ) -> ActiveRuntimeCoreState { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let mut current_generation_cores = Vec::new(); + let mut has_foreign_active_generations = false; + + for (deployed_key, instance) in &state.runtime_instances { + if deployed_key.pipeline_group_id != *pipeline_key.pipeline_group_id() + || deployed_key.pipeline_id != *pipeline_key.pipeline_id() + || !matches!(instance.lifecycle, RuntimeInstanceLifecycle::Active) + { + continue; + } + + if deployed_key.deployment_generation == active_generation { + current_generation_cores.push(deployed_key.core_id); + } else { + has_foreign_active_generations = true; + } + } + + current_generation_cores.sort_unstable(); + ActiveRuntimeCoreState { + current_generation_cores, + has_foreign_active_generations, + } + } + + /// Builds the effective runtime topic profile map used to reject broker mutations. + pub(super) fn pipeline_topic_profiles( + config: &OtelDataflowSpec, + ) -> Result, ControlPlaneError> { + let (global_names, group_names) = + Controller::::build_declared_topic_name_maps(config).map_err(|err| { + ControlPlaneError::InvalidRequest { + message: err.to_string(), + } + })?; + Controller::::validate_topic_wiring_acyclic(config, &global_names, &group_names) + .map_err(|err| ControlPlaneError::InvalidRequest { + message: err.to_string(), + })?; + let (inferred_modes, _) = + Controller::::infer_topic_modes(config, &global_names, &group_names).map_err( + |err| ControlPlaneError::InvalidRequest { + message: err.to_string(), + }, + )?; + let default_selection_policy = config.engine.topics.impl_selection; + + let mut profiles = HashMap::new(); + for (topic_name, spec) in &config.topics { + let declared_name = global_names + .get(topic_name) + .ok_or_else(|| ControlPlaneError::Internal { + message: format!( + "missing declared topic name for global topic `{}` while building runtime profiles", + topic_name.as_ref() + ), + })? + .clone(); + let topology_mode = inferred_modes + .get(&declared_name) + .copied() + .unwrap_or(InferredTopicMode::Mixed); + let selection_policy = spec.impl_selection.unwrap_or(default_selection_policy); + let selected_mode = Controller::::apply_topic_impl_selection_policy( + topology_mode, + selection_policy, + ); + _ = profiles.insert( + declared_name, + TopicRuntimeProfile { + backend: spec.backend, + policies: spec.policies.clone(), + selected_mode, + }, + ); + } + + for (group_id, group_cfg) in &config.groups { + for (topic_name, spec) in &group_cfg.topics { + let declared_name = group_names + .get(&(group_id.clone(), topic_name.clone())) + .ok_or_else(|| ControlPlaneError::Internal { + message: format!( + "missing declared topic name for group `{}` topic `{}` while building runtime profiles", + group_id.as_ref(), + topic_name.as_ref() + ), + })? + .clone(); + let topology_mode = inferred_modes + .get(&declared_name) + .copied() + .unwrap_or(InferredTopicMode::Mixed); + let selection_policy = spec.impl_selection.unwrap_or(default_selection_policy); + let selected_mode = Controller::::apply_topic_impl_selection_policy( + topology_mode, + selection_policy, + ); + _ = profiles.insert( + declared_name, + TopicRuntimeProfile { + backend: spec.backend, + policies: spec.policies.clone(), + selected_mode, + }, + ); + } + } + + Ok(profiles) + } + + /// Classifies a reconfigure request and prepares the rollout state machine inputs. + pub(super) fn prepare_rollout_plan( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + request: &ReconfigureRequest, + ) -> Result { + let pipeline_group_id: PipelineGroupId = pipeline_group_id.to_owned().into(); + let pipeline_id: PipelineId = pipeline_id.to_owned().into(); + let pipeline_key = PipelineKey::new(pipeline_group_id.clone(), pipeline_id.clone()); + + let (live_config, current_record) = { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if !state.live_config.groups.contains_key(&pipeline_group_id) { + return Err(ControlPlaneError::GroupNotFound); + } + if state.active_rollouts.contains_key(&pipeline_key) + || state.active_shutdowns.contains_key(&pipeline_key) + { + return Err(ControlPlaneError::RolloutConflict); + } + ( + state.live_config.clone(), + state.logical_pipelines.get(&pipeline_key).cloned(), + ) + }; + + let candidate_pipeline = request.pipeline.clone(); + candidate_pipeline + .validate(&pipeline_group_id, &pipeline_id) + .map_err(|err| ControlPlaneError::InvalidRequest { + message: err.to_string(), + })?; + + let mut candidate_config = live_config.clone(); + let group_cfg = candidate_config + .groups + .get_mut(&pipeline_group_id) + .ok_or_else(|| ControlPlaneError::Internal { + message: format!( + "group `{}` disappeared while preparing rollout plan", + pipeline_group_id.as_ref() + ), + })?; + _ = group_cfg + .pipelines + .insert(pipeline_id.clone(), candidate_pipeline.clone()); + + candidate_config + .validate() + .map_err(|err| ControlPlaneError::InvalidRequest { + message: err.to_string(), + })?; + Controller::::validate_engine_components_with_factory( + self.pipeline_factory, + &candidate_config, + ) + .map_err(|message| ControlPlaneError::InvalidRequest { message })?; + + let current_profiles = Self::pipeline_topic_profiles(&live_config)?; + let candidate_profiles = Self::pipeline_topic_profiles(&candidate_config)?; + if current_profiles != candidate_profiles { + return Err(ControlPlaneError::InvalidRequest { + message: "request would require runtime topic broker mutation".to_owned(), + }); + } + + let resolved_pipeline = candidate_config + .resolve() + .pipelines + .into_iter() + .find(|pipeline| { + pipeline.role == ResolvedPipelineRole::Regular + && pipeline.pipeline_group_id == pipeline_group_id + && pipeline.pipeline_id == pipeline_id + }) + .ok_or_else(|| ControlPlaneError::Internal { + message: "candidate pipeline disappeared during resolution".to_owned(), + })?; + let current_assigned_cores = if let Some(record) = current_record.as_ref() { + self.assigned_cores_for_resolved(&record.resolved)? + } else { + Vec::new() + }; + let target_assigned_cores = self.assigned_cores_for_resolved(&resolved_pipeline)?; + let current_core_set: HashSet<_> = current_assigned_cores.iter().copied().collect(); + let target_core_set: HashSet<_> = target_assigned_cores.iter().copied().collect(); + let active_runtime_state = current_record + .as_ref() + .map(|record| self.active_runtime_core_state(&pipeline_key, record.active_generation)) + .unwrap_or(ActiveRuntimeCoreState { + current_generation_cores: Vec::new(), + has_foreign_active_generations: false, + }); + let active_core_set: HashSet<_> = active_runtime_state + .current_generation_cores + .iter() + .copied() + .collect(); + let common_assigned_cores: Vec<_> = target_assigned_cores + .iter() + .copied() + .filter(|core_id| current_core_set.contains(core_id)) + .collect(); + let added_assigned_cores: Vec<_> = target_assigned_cores + .iter() + .copied() + .filter(|core_id| !current_core_set.contains(core_id)) + .collect(); + let removed_assigned_cores: Vec<_> = current_assigned_cores + .iter() + .copied() + .filter(|core_id| !target_core_set.contains(core_id)) + .collect(); + let resize_start_cores: Vec<_> = target_assigned_cores + .iter() + .copied() + .filter(|core_id| !active_core_set.contains(core_id)) + .collect(); + let resize_stop_cores: Vec<_> = active_runtime_state + .current_generation_cores + .iter() + .copied() + .filter(|core_id| !target_core_set.contains(core_id)) + .collect(); + let action = if let Some(record) = current_record.as_ref() { + let identical_update = current_assigned_cores == target_assigned_cores + && active_runtime_state.current_generation_cores == target_assigned_cores + && !active_runtime_state.has_foreign_active_generations + && record.resolved.runtime_matches(&resolved_pipeline); + let resize_only = current_assigned_cores != target_assigned_cores + && !active_runtime_state.has_foreign_active_generations + && record + .resolved + .runtime_shape_matches_ignoring_resources(&resolved_pipeline); + if identical_update { + RolloutAction::NoOp + } else if resize_only { + RolloutAction::Resize + } else { + RolloutAction::Replace + } + } else { + RolloutAction::Create + }; + let (resize_start_cores, resize_stop_cores) = match action { + RolloutAction::Resize => (resize_start_cores, resize_stop_cores), + RolloutAction::Create | RolloutAction::NoOp | RolloutAction::Replace => { + (Vec::new(), Vec::new()) + } + }; + let previous_generation = current_record + .as_ref() + .map(|record| record.active_generation); + + let (rollout_id, target_generation) = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if state.active_rollouts.contains_key(&pipeline_key) + || state.active_shutdowns.contains_key(&pipeline_key) + { + return Err(ControlPlaneError::RolloutConflict); + } + let rollout_id = format!("rollout-{}", state.next_rollout_id); + state.next_rollout_id += 1; + let target_generation = match action { + RolloutAction::NoOp | RolloutAction::Resize => { + previous_generation.ok_or_else(|| ControlPlaneError::Internal { + message: format!( + "rollout planner produced {:?} for {}:{} without a current generation", + action, + pipeline_key.pipeline_group_id().as_ref(), + pipeline_key.pipeline_id().as_ref() + ), + })? + } + RolloutAction::Create | RolloutAction::Replace => { + let generation_counter = state + .generation_counters + .entry(pipeline_key.clone()) + .or_insert(0); + let target_generation = *generation_counter; + *generation_counter += 1; + target_generation + } + }; + (rollout_id, target_generation) + }; + + let rollout_core_ids = match action { + RolloutAction::NoOp => Vec::new(), + RolloutAction::Resize => { + let mut ids = resize_start_cores.clone(); + let additional_stop_cores: Vec<_> = resize_stop_cores + .iter() + .copied() + .filter(|core_id| !ids.contains(core_id)) + .collect(); + ids.extend(additional_stop_cores); + ids + } + RolloutAction::Create | RolloutAction::Replace => { + let mut ids = target_assigned_cores.clone(); + ids.extend(removed_assigned_cores.iter().copied()); + ids + } + }; + let cores = rollout_core_ids + .into_iter() + .map(|core_id| RolloutCoreProgress { + core_id, + previous_generation: match action { + RolloutAction::Create => None, + RolloutAction::NoOp => active_core_set + .contains(&core_id) + .then_some(previous_generation) + .flatten(), + RolloutAction::Replace => current_core_set + .contains(&core_id) + .then_some(previous_generation) + .flatten(), + RolloutAction::Resize => active_core_set + .contains(&core_id) + .then_some(previous_generation) + .flatten(), + }, + target_generation, + state: "pending".to_owned(), + updated_at: timestamp_now(), + detail: None, + }) + .collect(); + let step_timeout_secs = request.step_timeout_secs.max(1); + let drain_timeout_secs = request.drain_timeout_secs.max(1); + let rollout = RolloutRecord::new( + rollout_id, + pipeline_group_id.clone(), + pipeline_id.clone(), + action, + target_generation, + current_record + .as_ref() + .map(|record| record.active_generation), + drain_timeout_secs, + cores, + ); + + Ok(CandidateRolloutPlan { + pipeline_key, + pipeline_group_id, + pipeline_id, + action, + resolved_pipeline, + current_record, + current_assigned_cores, + target_assigned_cores, + common_assigned_cores, + added_assigned_cores, + removed_assigned_cores, + resize_start_cores, + resize_stop_cores, + target_generation, + rollout, + step_timeout_secs, + drain_timeout_secs, + }) + } + + /// Registers a newly accepted rollout and publishes its initial summary. + pub(super) fn insert_rollout( + &self, + pipeline_key: &PipelineKey, + rollout: RolloutRecord, + ) -> Result<(), ControlPlaneError> { + self.prune_retained_operation_history(); + { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if state.active_rollouts.contains_key(pipeline_key) + || state.active_shutdowns.contains_key(pipeline_key) + { + return Err(ControlPlaneError::RolloutConflict); + } + _ = state + .active_rollouts + .insert(pipeline_key.clone(), rollout.rollout_id.clone()); + _ = state + .rollouts + .insert(rollout.rollout_id.clone(), rollout.clone()); + } + self.observed_state_store + .set_pipeline_rollout_summary(pipeline_key.clone(), rollout.summary()); + Ok(()) + } + + /// Applies an in-place update to a rollout record and refreshes observed state. + pub(super) fn update_rollout(&self, pipeline_key: &PipelineKey, rollout_id: &str, update: F) + where + F: FnOnce(&mut RolloutRecord), + { + let summary = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let Some(rollout) = state.rollouts.get_mut(rollout_id) else { + return; + }; + update(rollout); + rollout.updated_at = timestamp_now(); + let is_terminal = rollout.state.is_terminal(); + let summary = rollout.summary(); + if is_terminal { + Self::record_terminal_rollout_locked( + &mut state, + pipeline_key, + rollout_id, + Instant::now(), + ); + } + summary + }; + self.observed_state_store + .set_pipeline_rollout_summary(pipeline_key.clone(), summary); + } + + /// Updates the per-core progress entry for a rollout. + pub(super) fn update_rollout_core_state( + &self, + pipeline_key: &PipelineKey, + rollout_id: &str, + core_id: usize, + state: &str, + detail: Option, + ) { + self.update_rollout(pipeline_key, rollout_id, |rollout| { + if let Some(core) = rollout + .cores + .iter_mut() + .find(|core| core.core_id == core_id) + { + core.state = state.to_owned(); + core.updated_at = timestamp_now(); + core.detail = detail; + } + }); + } + + /// Marks a rollout inactive and prunes any no-longer-needed retained state. + pub(super) fn finish_rollout(&self, pipeline_key: &PipelineKey, rollout_id: &str) { + { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if state + .active_rollouts + .get(pipeline_key) + .is_some_and(|id| id == rollout_id) + { + let _ = state.active_rollouts.remove(pipeline_key); + } + } + self.prune_pipeline_runtime_and_history(pipeline_key); + } + + /// Returns the latest rollout snapshot, evicting expired history first. + pub(super) fn rollout_status_snapshot(&self, rollout_id: &str) -> Option { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + Self::prune_terminal_operation_history_locked(&mut state, Instant::now()); + state.rollouts.get(rollout_id).map(RolloutRecord::status) + } + + /// Clears temporary serving-generation overrides after a rollout settles. + pub(super) fn clear_pipeline_serving_generations( + &self, + pipeline_key: &PipelineKey, + core_ids: I, + ) where + I: IntoIterator, + { + for core_id in core_ids { + self.observed_state_store + .clear_pipeline_serving_generation(pipeline_key.clone(), core_id); + } + } + + /// Commits the winning pipeline config and active generation into runtime state. + pub(super) fn commit_pipeline_record( + &self, + plan: &CandidateRolloutPlan, + active_generation: u64, + ) { + { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if let Some(group_cfg) = state.live_config.groups.get_mut(&plan.pipeline_group_id) { + _ = group_cfg.pipelines.insert( + plan.pipeline_id.clone(), + plan.resolved_pipeline.pipeline.clone(), + ); + } + _ = state.logical_pipelines.insert( + plan.pipeline_key.clone(), + LogicalPipelineRecord { + resolved: plan.resolved_pipeline.clone(), + active_generation, + }, + ); + } + self.observed_state_store.set_pipeline_active_cores( + plan.pipeline_key.clone(), + plan.target_assigned_cores.iter().copied(), + ); + self.observed_state_store + .set_pipeline_active_generation(plan.pipeline_key.clone(), active_generation); + } + + /// Selects the active instances targeted by a per-pipeline shutdown request. + pub(super) fn prepare_shutdown_plan( + &self, + pipeline_group_id: &str, + pipeline_id: &str, + timeout_secs: u64, + ) -> Result { + let pipeline_group_id: PipelineGroupId = pipeline_group_id.to_owned().into(); + let pipeline_id: PipelineId = pipeline_id.to_owned().into(); + let pipeline_key = PipelineKey::new(pipeline_group_id.clone(), pipeline_id.clone()); + + let target_instances = { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if !state.live_config.groups.contains_key(&pipeline_group_id) { + return Err(ControlPlaneError::GroupNotFound); + } + if !state.logical_pipelines.contains_key(&pipeline_key) { + return Err(ControlPlaneError::PipelineNotFound); + } + if state.active_rollouts.contains_key(&pipeline_key) + || state.active_shutdowns.contains_key(&pipeline_key) + { + return Err(ControlPlaneError::RolloutConflict); + } + + let targets: Vec<_> = state + .runtime_instances + .iter() + .filter_map(|(deployed_key, instance)| { + if deployed_key.pipeline_group_id == pipeline_group_id + && deployed_key.pipeline_id == pipeline_id + && matches!(instance.lifecycle, RuntimeInstanceLifecycle::Active) + { + Some(deployed_key.clone()) + } else { + None + } + }) + .collect(); + if targets.is_empty() { + return Err(ControlPlaneError::InvalidRequest { + message: format!( + "pipeline {}:{} is already stopped", + pipeline_group_id.as_ref(), + pipeline_id.as_ref() + ), + }); + } + targets + }; + + let shutdown_id = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if state.active_rollouts.contains_key(&pipeline_key) + || state.active_shutdowns.contains_key(&pipeline_key) + { + return Err(ControlPlaneError::RolloutConflict); + } + let shutdown_id = format!("shutdown-{}", state.next_shutdown_id); + state.next_shutdown_id += 1; + shutdown_id + }; + + let shutdown = ShutdownRecord::new( + shutdown_id, + pipeline_group_id, + pipeline_id, + target_instances + .iter() + .map(|instance| ShutdownCoreProgress { + core_id: instance.core_id, + deployment_generation: instance.deployment_generation, + state: "pending".to_owned(), + updated_at: timestamp_now(), + detail: None, + }) + .collect(), + ); + + Ok(CandidateShutdownPlan { + pipeline_key, + shutdown, + target_instances, + timeout_secs: timeout_secs.max(1), + }) + } + + /// Registers a newly accepted shutdown operation. + pub(super) fn insert_shutdown( + &self, + pipeline_key: &PipelineKey, + shutdown: ShutdownRecord, + ) -> Result<(), ControlPlaneError> { + self.prune_retained_operation_history(); + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if state.active_rollouts.contains_key(pipeline_key) + || state.active_shutdowns.contains_key(pipeline_key) + { + return Err(ControlPlaneError::RolloutConflict); + } + _ = state + .active_shutdowns + .insert(pipeline_key.clone(), shutdown.shutdown_id.clone()); + _ = state + .shutdowns + .insert(shutdown.shutdown_id.clone(), shutdown); + Ok(()) + } + + /// Applies an in-place update to a shutdown record and prunes on completion. + pub(super) fn update_shutdown( + &self, + pipeline_key: &PipelineKey, + shutdown_id: &str, + update: F, + ) where + F: FnOnce(&mut ShutdownRecord), + { + let should_prune = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let Some(shutdown) = state.shutdowns.get_mut(shutdown_id) else { + return; + }; + update(shutdown); + shutdown.updated_at = timestamp_now(); + let is_terminal = shutdown.state.is_terminal(); + if is_terminal { + Self::record_terminal_shutdown_locked( + &mut state, + pipeline_key, + shutdown_id, + Instant::now(), + ); + let _ = state.active_shutdowns.remove(pipeline_key); + true + } else { + false + } + }; + + if should_prune { + self.prune_pipeline_runtime_and_history(pipeline_key); + } + } + + /// Returns the latest shutdown snapshot, evicting expired history first. + pub(super) fn shutdown_status_snapshot(&self, shutdown_id: &str) -> Option { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + Self::prune_terminal_operation_history_locked(&mut state, Instant::now()); + state.shutdowns.get(shutdown_id).map(ShutdownRecord::status) + } + + /// Returns committed pipeline details plus any active rollout summary. + pub(super) fn pipeline_details_snapshot( + &self, + pipeline_key: &PipelineKey, + ) -> Result, ControlPlaneError> { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let Some(record) = state.logical_pipelines.get(pipeline_key) else { + if !state + .live_config + .groups + .contains_key(pipeline_key.pipeline_group_id()) + { + return Err(ControlPlaneError::GroupNotFound); + } + return Ok(None); + }; + let rollout = state + .active_rollouts + .get(pipeline_key) + .and_then(|rollout_id| state.rollouts.get(rollout_id)) + .map(RolloutRecord::api_summary); + Ok(Some(PipelineDetails { + pipeline_group_id: pipeline_key.pipeline_group_id().clone(), + pipeline_id: pipeline_key.pipeline_id().clone(), + active_generation: Some(record.active_generation), + pipeline: record.resolved.pipeline.clone(), + rollout, + })) + } + + /// Records a rollout and launches its background execution worker. + pub(super) fn spawn_rollout( + self: &Arc, + plan: CandidateRolloutPlan, + ) -> Result { + let rollout_id = plan.rollout.rollout_id.clone(); + let pipeline_key = plan.pipeline_key.clone(); + self.insert_rollout(&pipeline_key, plan.rollout.clone())?; + if matches!(plan.action, RolloutAction::NoOp) { + self.commit_pipeline_record(&plan, plan.target_generation); + self.update_rollout(&pipeline_key, &rollout_id, |rollout| { + rollout.state = RolloutLifecycleState::Succeeded; + rollout.failure_reason = None; + }); + self.finish_rollout(&pipeline_key, &rollout_id); + return self.rollout_status_snapshot(&rollout_id).ok_or_else(|| { + ControlPlaneError::Internal { + message: format!("rollout {rollout_id} disappeared before response"), + } + }); + } + + let initial_status = plan.rollout.status(); + let runtime = Arc::clone(self); + let rollout_runtime = Arc::clone(&runtime); + let rollout_cleanup_runtime = Arc::clone(&runtime); + let worker_pipeline_key = pipeline_key.clone(); + let worker_rollout_id = rollout_id.clone(); + let worker_thread_name = format!( + "rollout-{}-{}", + pipeline_key.pipeline_group_id().as_ref(), + pipeline_key.pipeline_id().as_ref() + ); + let _rollout_handle = thread::Builder::new() + .name(worker_thread_name.clone()) + .spawn(move || { + if let Err(panic) = + catch_unwind(AssertUnwindSafe(|| rollout_runtime.run_rollout(plan))) + { + rollout_cleanup_runtime.handle_rollout_worker_panic( + &worker_pipeline_key, + &worker_rollout_id, + worker_thread_name, + panic, + ); + } + }) + .map_err(|err| { + runtime.finish_rollout(&pipeline_key, &rollout_id); + ControlPlaneError::Internal { + message: err.to_string(), + } + })?; + Ok(initial_status) + } + + /// Records a shutdown and launches its background execution worker. + pub(super) fn spawn_shutdown( + self: &Arc, + plan: CandidateShutdownPlan, + ) -> Result { + let shutdown_id = plan.shutdown.shutdown_id.clone(); + let pipeline_key = plan.pipeline_key.clone(); + let initial_status = plan.shutdown.status(); + self.insert_shutdown(&pipeline_key, plan.shutdown.clone())?; + let runtime = Arc::clone(self); + let shutdown_runtime = Arc::clone(&runtime); + let shutdown_cleanup_runtime = Arc::clone(&runtime); + let worker_pipeline_key = pipeline_key.clone(); + let worker_shutdown_id = shutdown_id.clone(); + let worker_thread_name = format!( + "shutdown-{}-{}", + pipeline_key.pipeline_group_id().as_ref(), + pipeline_key.pipeline_id().as_ref() + ); + let _shutdown_handle = thread::Builder::new() + .name(worker_thread_name.clone()) + .spawn(move || { + if let Err(panic) = + catch_unwind(AssertUnwindSafe(|| shutdown_runtime.run_shutdown(plan))) + { + shutdown_cleanup_runtime.handle_shutdown_worker_panic( + &worker_pipeline_key, + &worker_shutdown_id, + worker_thread_name, + panic, + ); + } + }) + .map_err(|err| { + runtime.update_shutdown(&pipeline_key, &shutdown_id, |shutdown| { + shutdown.state = ShutdownLifecycleState::Failed; + shutdown.failure_reason = Some(err.to_string()); + }); + ControlPlaneError::Internal { + message: err.to_string(), + } + })?; + Ok(initial_status) + } +} diff --git a/rust/otap-dataflow/crates/controller/src/live_control/runtime.rs b/rust/otap-dataflow/crates/controller/src/live_control/runtime.rs new file mode 100644 index 0000000000..2b21eff1b5 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control/runtime.rs @@ -0,0 +1,497 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +//! Runtime-instance launch, shutdown, and exit reporting. +//! +//! This module owns the boundary between controller state and actual pipeline +//! threads. It registers launched instances, reconciles early exits, sends +//! shutdown control messages, waits for readiness/exit transitions, and exposes +//! global runtime shutdown/error helpers used by controller teardown. + +use super::*; + +/// Formats a deployed instance compactly for aggregated operator errors. +fn deployed_instance_label(deployed_key: &DeployedPipelineKey) -> String { + format!( + "{}:{} core={} generation={}", + deployed_key.pipeline_group_id.as_ref(), + deployed_key.pipeline_id.as_ref(), + deployed_key.core_id, + deployed_key.deployment_generation + ) +} + +impl + ControllerRuntime +{ + /// Launches one regular pipeline instance on a specific core and generation. + pub(super) fn launch_regular_pipeline_instance( + self: &Arc, + resolved_pipeline: &ResolvedPipelineConfig, + core_id: usize, + deployment_generation: u64, + ) -> Result { + let thread_id = self.next_thread_id(); + let num_cores = self + .assigned_cores_for_resolved(resolved_pipeline) + .map_err(|err| Error::PipelineRuntimeError { + source: Box::new(io::Error::other(format!("{err:?}"))), + })? + .len(); + let deployed_key = DeployedPipelineKey { + pipeline_group_id: resolved_pipeline.pipeline_group_id.clone(), + pipeline_id: resolved_pipeline.pipeline_id.clone(), + core_id, + deployment_generation, + }; + let launched = Controller::::launch_pipeline_thread( + self.pipeline_factory, + deployed_key.clone(), + CoreId { id: core_id }, + num_cores, + resolved_pipeline.pipeline.clone(), + resolved_pipeline.policies.channel_capacity.clone(), + resolved_pipeline.policies.telemetry.clone(), + resolved_pipeline.policies.transport_headers.clone(), + self.controller_context.clone(), + self.metrics_reporter.clone(), + self.engine_event_reporter.clone(), + self.engine_tracing_setup.clone(), + self.telemetry_reporting_interval, + self.memory_pressure_tx.clone(), + &self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()) + .live_config, + &self.declared_topics, + Arc::downgrade(self), + thread_id, + None, + )?; + self.register_launched_instance(launched); + Ok(deployed_key) + } + + /// Registers a launched instance and reconciles the race where the thread exited first. + /// + /// The launch path inserts the instance as Active here, while the runtime thread reports its + /// terminal exit independently through note_instance_exit(). If that exit arrived first, it + /// was parked in pending_instance_exits and is applied immediately during registration. + pub(crate) fn register_launched_instance( + self: &Arc, + launched: LaunchedPipelineThread, + ) { + let should_compact = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + _ = state.runtime_instances.insert( + launched.pipeline_key.clone(), + RuntimeInstanceRecord { + control_sender: Some(launched.control_sender.clone()), + lifecycle: RuntimeInstanceLifecycle::Active, + }, + ); + state.active_instances += 1; + let pending_exit = state.pending_instance_exits.remove(&launched.pipeline_key); + let should_compact = if let Some(exit) = pending_exit.as_ref() { + Self::apply_instance_exit_locked(&mut state, &launched.pipeline_key, exit) + } else { + false + }; + self.state_changed.notify_all(); + should_compact + }; + + if should_compact { + let logical_pipeline_key = PipelineKey::new( + launched.pipeline_key.pipeline_group_id.clone(), + launched.pipeline_key.pipeline_id.clone(), + ); + self.observed_state_store + .compact_pipeline_instances(&logical_pipeline_key); + } + } + + /// Records a pipeline instance exit and closes the registration-before/after-exit race. + /// + /// If the instance is already visible in runtime_instances, the exit is applied immediately. + /// Otherwise we store it in pending_instance_exits so register_launched_instance() can + /// reconcile it as soon as registration becomes visible. + pub(crate) fn note_instance_exit( + &self, + pipeline_key: DeployedPipelineKey, + exit: RuntimeInstanceExit, + ) { + match &exit { + RuntimeInstanceExit::Success => { + self.engine_event_reporter + .report(EngineEvent::drained(pipeline_key.clone(), None)); + } + RuntimeInstanceExit::Error(error) => { + self.engine_event_reporter + .report(EngineEvent::pipeline_runtime_error( + pipeline_key.clone(), + "Pipeline encountered a runtime error.", + error.error_summary(), + )); + } + } + + let should_compact = { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if state.runtime_instances.contains_key(&pipeline_key) { + Self::apply_instance_exit_locked(&mut state, &pipeline_key, &exit) + } else { + _ = state + .pending_instance_exits + .insert(pipeline_key.clone(), exit.clone()); + false + } + }; + if should_compact { + let logical_pipeline_key = PipelineKey::new( + pipeline_key.pipeline_group_id.clone(), + pipeline_key.pipeline_id.clone(), + ); + self.observed_state_store + .compact_pipeline_instances(&logical_pipeline_key); + } + self.state_changed.notify_all(); + } + + /// Waits for a specific deployed instance to report admitted plus ready. + pub(super) fn wait_for_pipeline_ready( + &self, + deployed_key: &DeployedPipelineKey, + deadline: Instant, + ) -> Result<(), String> { + let pipeline_key = PipelineKey::new( + deployed_key.pipeline_group_id.clone(), + deployed_key.pipeline_id.clone(), + ); + loop { + if let Some(status) = self.observed_state_handle.pipeline_status(&pipeline_key) { + if let Some(instance) = + status.instance_status(deployed_key.core_id, deployed_key.deployment_generation) + { + let accepted = instance.accepted_condition().status == ConditionStatus::True; + let ready = instance.ready_condition().status == ConditionStatus::True; + if accepted && ready { + return Ok(()); + } + match instance.phase() { + PipelinePhase::Failed(_) + | PipelinePhase::Rejected(_) + | PipelinePhase::Deleted + | PipelinePhase::Stopped => { + return Err(format!( + "pipeline failed to become ready on core {} (generation {})", + deployed_key.core_id, deployed_key.deployment_generation + )); + } + _ => {} + } + } + } + + if let Some(exit) = self.instance_exit(deployed_key) { + return match exit { + RuntimeInstanceExit::Success => Err(format!( + "pipeline exited before reporting ready on core {} (generation {})", + deployed_key.core_id, deployed_key.deployment_generation + )), + RuntimeInstanceExit::Error(error) => Err(error.message), + }; + } + + if Instant::now() >= deadline { + return Err(format!( + "timed out waiting for admitted+ready on core {} (generation {})", + deployed_key.core_id, deployed_key.deployment_generation + )); + } + thread::sleep(Duration::from_millis(50)); + } + } + + /// Returns the terminal exit result for one deployed instance, if any. + pub(super) fn instance_exit( + &self, + deployed_key: &DeployedPipelineKey, + ) -> Option { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + state + .runtime_instances + .get(deployed_key) + .and_then(|instance| match &instance.lifecycle { + RuntimeInstanceLifecycle::Active => None, + RuntimeInstanceLifecycle::Exited(exit) => Some(exit.clone()), + }) + } + + /// Sends shutdown to one instance and releases the retained control sender. + pub(super) fn request_instance_shutdown( + &self, + deployed_key: &DeployedPipelineKey, + timeout_secs: u64, + reason: &str, + ) -> Result<(), String> { + let sender = { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let Some(instance) = state.runtime_instances.get(deployed_key) else { + return Err(format!( + "pipeline instance {}:{} core={} generation={} is not registered", + deployed_key.pipeline_group_id.as_ref(), + deployed_key.pipeline_id.as_ref(), + deployed_key.core_id, + deployed_key.deployment_generation + )); + }; + + match &instance.lifecycle { + RuntimeInstanceLifecycle::Exited(RuntimeInstanceExit::Success) => return Ok(()), + RuntimeInstanceLifecycle::Exited(RuntimeInstanceExit::Error(error)) => { + return Err(error.message.clone()); + } + RuntimeInstanceLifecycle::Active => {} + } + + instance.control_sender.clone().ok_or_else(|| { + format!( + "shutdown already requested for pipeline {}:{} core={} generation={}", + deployed_key.pipeline_group_id.as_ref(), + deployed_key.pipeline_id.as_ref(), + deployed_key.core_id, + deployed_key.deployment_generation + ) + })? + }; + + if let Err(err) = sender.try_send_shutdown( + Instant::now() + Duration::from_secs(timeout_secs.max(1)), + reason.to_owned(), + ) { + return match self.instance_exit(deployed_key) { + Some(RuntimeInstanceExit::Success) => Ok(()), + Some(RuntimeInstanceExit::Error(error)) => Err(error.message), + None => Err(err.to_string()), + }; + } + self.release_instance_control_sender(deployed_key); + Ok(()) + } + + /// Waits until a specific deployed instance exits or the deadline expires. + pub(super) fn wait_for_instance_exit( + &self, + deployed_key: &DeployedPipelineKey, + deadline: Instant, + ) -> Result<(), String> { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + loop { + if let Some(instance) = state.runtime_instances.get(deployed_key) { + match &instance.lifecycle { + RuntimeInstanceLifecycle::Active => {} + RuntimeInstanceLifecycle::Exited(RuntimeInstanceExit::Success) => { + return Ok(()); + } + RuntimeInstanceLifecycle::Exited(RuntimeInstanceExit::Error(error)) => { + return Err(error.message.clone()); + } + } + } + + let Some(remaining) = deadline.checked_duration_since(Instant::now()) else { + return Err(format!( + "timed out waiting for pipeline {}:{} core={} generation={} to drain", + deployed_key.pipeline_group_id.as_ref(), + deployed_key.pipeline_id.as_ref(), + deployed_key.core_id, + deployed_key.deployment_generation + )); + }; + + // Runtime registration and exit reporting both publish through this + // mutex/condvar pair, so exit waits can sleep until real controller + // state changes instead of polling every 50ms. + let (next_state, _) = self + .state_changed + .wait_timeout(state, remaining) + .unwrap_or_else(|poisoned| poisoned.into_inner()); + state = next_state; + } + } + + /// Requests shutdown for one instance and waits until it exits. + pub(super) fn shutdown_instance( + &self, + deployed_key: &DeployedPipelineKey, + timeout_secs: u64, + reason: &str, + ) -> Result<(), String> { + self.request_instance_shutdown(deployed_key, timeout_secs, reason)?; + self.wait_for_instance_exit( + deployed_key, + Instant::now() + Duration::from_secs(timeout_secs.max(1)), + ) + } + + /// Drops the retained admin sender after shutdown has been accepted. + /// + /// The retained sender is the controller's "not yet signaled" marker for + /// an active instance. Releasing it makes shutdown dispatch idempotent for + /// that instance and lets the pipeline control loop observe channel closure + /// once node tasks have exited. + pub(super) fn release_instance_control_sender(&self, deployed_key: &DeployedPipelineKey) { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + if let Some(instance) = state.runtime_instances.get_mut(deployed_key) { + instance.control_sender = None; + } + } + + /// Broadcasts shutdown to every currently active runtime instance. + /// + /// This is best-effort across the snapshot: one failed send must not prevent + /// later instances from receiving shutdown. It is also idempotent at the + /// dispatch boundary: instances that already accepted shutdown have released + /// their retained control sender and are skipped by later calls. + pub(super) fn request_shutdown_all(&self, timeout_secs: u64) -> Result<(), ControlPlaneError> { + // Snapshot under the state lock, then send outside the lock so runtime + // callbacks can report exits while shutdown dispatch is in progress. + // Only active instances with a retained sender are eligible; a missing + // sender means shutdown was already accepted by a previous request. + let mut senders: Vec<_> = { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + state + .runtime_instances + .iter() + .filter_map(|(deployed_key, instance)| match instance.lifecycle { + RuntimeInstanceLifecycle::Active => instance + .control_sender + .as_ref() + .map(|sender| (deployed_key.clone(), sender.clone())), + RuntimeInstanceLifecycle::Exited(_) => None, + }) + .collect() + }; + // Stabilize both test assertions and the aggregated error message. + senders.sort_by_key(|(deployed_key, _)| { + ( + deployed_key.pipeline_group_id.as_ref().to_owned(), + deployed_key.pipeline_id.as_ref().to_owned(), + deployed_key.core_id, + deployed_key.deployment_generation, + ) + }); + + let mut failures = Vec::new(); + for (deployed_key, sender) in senders { + if let Err(err) = sender.try_send_shutdown( + Instant::now() + Duration::from_secs(timeout_secs.max(1)), + "global shutdown".to_owned(), + ) { + // A failed send can race with the runtime thread exiting after + // the snapshot was taken. Treat clean exit as success, report a + // terminal runtime error if one was recorded, and otherwise keep + // the retained sender so a later shutdown-all can retry it. + match self.instance_exit(&deployed_key) { + Some(RuntimeInstanceExit::Success) => { + self.release_instance_control_sender(&deployed_key); + } + Some(RuntimeInstanceExit::Error(error)) => { + failures.push(format!( + "{}: {}", + deployed_instance_label(&deployed_key), + error.message + )); + } + None => { + failures.push(format!( + "{}: {}", + deployed_instance_label(&deployed_key), + err + )); + } + } + } else { + // After a successful send, the controller should not send + // another shutdown message to this same active instance. + self.release_instance_control_sender(&deployed_key); + } + } + + if failures.is_empty() { + Ok(()) + } else { + // Report all failures together after every eligible instance has + // been attempted, preserving best-effort shutdown semantics. + Err(ControlPlaneError::Internal { + message: format!( + "failed to send global shutdown to {} runtime instance(s): {}", + failures.len(), + failures.join("; ") + ), + }) + } + } + + /// Starts a tracked shutdown operation for one logical pipeline. + pub(super) fn request_shutdown_pipeline( + self: &Arc, + pipeline_group_id: &str, + pipeline_id: &str, + timeout_secs: u64, + ) -> Result { + let plan = self.prepare_shutdown_plan(pipeline_group_id, pipeline_id, timeout_secs)?; + self.spawn_shutdown(plan) + } + + /// Blocks until all active runtime instances have exited. + pub(crate) fn wait_until_all_instances_exit(&self) { + let mut state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + while state.active_instances > 0 { + state = self + .state_changed + .wait(state) + .unwrap_or_else(|poisoned| poisoned.into_inner()); + } + } + + /// Returns the first runtime error observed by any watched pipeline thread. + pub(crate) fn take_runtime_error(&self) -> Option { + let state = self + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + state + .first_error + .as_ref() + .map(|message| Error::PipelineRuntimeError { + source: Box::new(io::Error::other(message.clone())), + }) + } +} diff --git a/rust/otap-dataflow/crates/controller/src/live_control/state.rs b/rust/otap-dataflow/crates/controller/src/live_control/state.rs new file mode 100644 index 0000000000..bd71c231f9 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control/state.rs @@ -0,0 +1,569 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +//! Shared state types for the live-control runtime. +//! +//! This module intentionally contains mostly data and small conversion helpers. +//! Planning, execution, and runtime-instance management all mutate these +//! records through `ControllerRuntime` while holding the runtime mutex. + +use super::*; + +/// Maximum terminal rollout records retained per logical pipeline. +pub(super) const TERMINAL_ROLLOUT_RETENTION_LIMIT: usize = 32; +/// Maximum terminal shutdown records retained per logical pipeline. +pub(super) const TERMINAL_SHUTDOWN_RETENTION_LIMIT: usize = 32; +/// Maximum age for terminal rollout/shutdown records kept in memory. +pub(super) const TERMINAL_OPERATION_RETENTION_TTL: Duration = Duration::from_secs(24 * 60 * 60); + +fn panic_payload_message(payload: &(dyn Any + Send)) -> String { + if let Some(message) = payload.downcast_ref::<&str>() { + (*message).to_owned() + } else if let Some(message) = payload.downcast_ref::() { + message.clone() + } else { + "non-string panic payload".to_owned() + } +} + +#[derive(Debug, Clone)] +/// Structured panic capture with public and diagnostic renderings. +/// +/// Rollout/shutdown workers and runtime thread watchers use this type to keep +/// operator-visible failure messages concise while preserving thread context +/// and a forced backtrace for internal telemetry. +pub(crate) struct PanicReport { + pub(super) kind: &'static str, + pub(super) payload_message: String, + pub(super) thread_name: Option, + pub(super) thread_id: Option, + pub(super) core_id: Option, + pub(super) backtrace: String, +} + +impl PanicReport { + /// Captures a panic payload plus best-effort worker/thread context. + pub(crate) fn capture( + kind: &'static str, + panic: Box, + thread_name: Option, + thread_id: Option, + core_id: Option, + ) -> Self { + Self { + kind, + payload_message: panic_payload_message(&*panic), + thread_name, + thread_id, + core_id, + backtrace: Backtrace::force_capture().to_string(), + } + } + + /// Returns the short message stored in public rollout/shutdown status. + pub(super) fn summary_message(&self) -> String { + format!("{} panicked: {}", self.kind, self.payload_message) + } + + /// Returns the diagnostic message used as internal error source detail. + pub(super) fn detail_message(&self) -> String { + let mut context = Vec::new(); + if let Some(thread_name) = &self.thread_name { + context.push(format!("thread_name={thread_name}")); + } + if let Some(thread_id) = self.thread_id { + context.push(format!("thread_id={thread_id}")); + } + if let Some(core_id) = self.core_id { + context.push(format!("core_id={core_id}")); + } + + let mut detail = self.summary_message(); + if !context.is_empty() { + detail.push_str("\ncontext: "); + detail.push_str(&context.join(", ")); + } + detail.push_str("\nbacktrace:\n"); + detail.push_str(&self.backtrace); + detail + } + + /// Converts the panic report into the observed-state error payload. + pub(super) fn error_summary(&self) -> ErrorSummary { + ErrorSummary::Pipeline { + error_kind: "panic".into(), + message: self.summary_message(), + source: Some(self.detail_message()), + } + } +} + +#[derive(Debug, Clone)] +/// Error recorded when a deployed runtime instance exits unsuccessfully. +pub(crate) struct RuntimeInstanceError { + pub(super) error_kind: String, + pub(super) message: String, + pub(super) detail: Option, +} + +impl RuntimeInstanceError { + /// Builds a plain runtime error without panic diagnostics. + pub(crate) fn runtime(message: String) -> Self { + Self { + error_kind: "runtime".into(), + message, + detail: None, + } + } + + /// Builds a runtime error from structured panic diagnostics. + pub(crate) fn from_panic(report: PanicReport) -> Self { + Self { + error_kind: "panic".into(), + message: report.summary_message(), + detail: Some(report.detail_message()), + } + } + + /// Converts the runtime error into the observed-state error payload. + pub(super) fn error_summary(&self) -> ErrorSummary { + ErrorSummary::Pipeline { + error_kind: self.error_kind.clone(), + message: self.message.clone(), + source: self.detail.clone(), + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +/// Execution strategy selected for a rollout request. +pub(super) enum RolloutAction { + Create, + NoOp, + Replace, + Resize, +} + +impl RolloutAction { + const fn as_str(self) -> &'static str { + match self { + Self::Create => "create", + Self::NoOp => "noop", + Self::Replace => "replace", + Self::Resize => "resize", + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +/// Internal lifecycle for one rollout operation. +pub(super) enum RolloutLifecycleState { + Pending, + Running, + Succeeded, + Failed, + RollingBack, + RollbackFailed, +} + +impl RolloutLifecycleState { + const fn as_pipeline_rollout_state(self) -> PipelineRolloutState { + match self { + Self::Pending => PipelineRolloutState::Pending, + Self::Running => PipelineRolloutState::Running, + Self::Succeeded => PipelineRolloutState::Succeeded, + Self::Failed => PipelineRolloutState::Failed, + Self::RollingBack => PipelineRolloutState::RollingBack, + Self::RollbackFailed => PipelineRolloutState::RollbackFailed, + } + } + + const fn as_api_pipeline_rollout_state(self) -> ApiPipelineRolloutState { + match self { + Self::Pending => ApiPipelineRolloutState::Pending, + Self::Running => ApiPipelineRolloutState::Running, + Self::Succeeded => ApiPipelineRolloutState::Succeeded, + Self::Failed => ApiPipelineRolloutState::Failed, + Self::RollingBack => ApiPipelineRolloutState::RollingBack, + Self::RollbackFailed => ApiPipelineRolloutState::RollbackFailed, + } + } + + pub(super) const fn is_terminal(self) -> bool { + matches!(self, Self::Succeeded | Self::Failed | Self::RollbackFailed) + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +/// Internal lifecycle for one pipeline shutdown operation. +pub(super) enum ShutdownLifecycleState { + Pending, + Running, + Succeeded, + Failed, +} + +impl ShutdownLifecycleState { + const fn as_str(self) -> &'static str { + match self { + Self::Pending => "pending", + Self::Running => "running", + Self::Succeeded => "succeeded", + Self::Failed => "failed", + } + } + + pub(super) const fn is_terminal(self) -> bool { + matches!(self, Self::Succeeded | Self::Failed) + } +} + +#[derive(Debug, Clone)] +/// Per-core progress row within a rollout operation. +pub(super) struct RolloutCoreProgress { + pub(super) core_id: usize, + pub(super) previous_generation: Option, + pub(super) target_generation: u64, + pub(super) state: String, + pub(super) updated_at: String, + pub(super) detail: Option, +} + +#[derive(Debug, Clone)] +/// In-memory rollout record retained for active and recent terminal lookups. +pub(super) struct RolloutRecord { + pub(super) rollout_id: String, + pub(super) pipeline_group_id: PipelineGroupId, + pub(super) pipeline_id: PipelineId, + pub(super) action: RolloutAction, + pub(super) state: RolloutLifecycleState, + pub(super) target_generation: u64, + pub(super) previous_generation: Option, + /// Drain timeout requested with the rollout, reused for panic cleanup. + pub(super) drain_timeout_secs: u64, + pub(super) started_at: String, + pub(super) updated_at: String, + pub(super) failure_reason: Option, + pub(super) cores: Vec, + pub(super) completed_at: Option, +} + +impl RolloutRecord { + /// Creates the initial in-memory record for a rollout operation. + pub(super) fn new( + rollout_id: String, + pipeline_group_id: PipelineGroupId, + pipeline_id: PipelineId, + action: RolloutAction, + target_generation: u64, + previous_generation: Option, + drain_timeout_secs: u64, + cores: Vec, + ) -> Self { + let now = timestamp_now(); + Self { + rollout_id, + pipeline_group_id, + pipeline_id, + action, + state: RolloutLifecycleState::Pending, + target_generation, + previous_generation, + drain_timeout_secs, + started_at: now.clone(), + updated_at: now, + failure_reason: None, + cores, + completed_at: None, + } + } + + /// Builds the compact rollout summary exposed through observed state. + pub(super) fn summary(&self) -> PipelineRolloutSummary { + PipelineRolloutSummary { + rollout_id: self.rollout_id.clone(), + state: self.state.as_pipeline_rollout_state(), + target_generation: self.target_generation, + started_at: self.started_at.clone(), + updated_at: self.updated_at.clone(), + failure_reason: self.failure_reason.clone(), + } + } + + /// Builds the admin-facing rollout summary embedded in pipeline details. + pub(super) fn api_summary(&self) -> ApiPipelineRolloutSummary { + ApiPipelineRolloutSummary { + rollout_id: self.rollout_id.clone(), + state: self.state.as_api_pipeline_rollout_state(), + target_generation: self.target_generation, + started_at: self.started_at.clone(), + updated_at: self.updated_at.clone(), + failure_reason: self.failure_reason.clone(), + } + } + + /// Materializes the full rollout status returned by the control plane. + pub(super) fn status(&self) -> RolloutStatus { + RolloutStatus { + rollout_id: self.rollout_id.clone(), + pipeline_group_id: self.pipeline_group_id.clone(), + pipeline_id: self.pipeline_id.clone(), + action: self.action.as_str().to_owned(), + state: self.state.as_api_pipeline_rollout_state(), + target_generation: self.target_generation, + previous_generation: self.previous_generation, + started_at: self.started_at.clone(), + updated_at: self.updated_at.clone(), + failure_reason: self.failure_reason.clone(), + cores: self + .cores + .iter() + .map(|core| RolloutCoreStatus { + core_id: core.core_id, + previous_generation: core.previous_generation, + target_generation: core.target_generation, + state: core.state.clone(), + updated_at: core.updated_at.clone(), + detail: core.detail.clone(), + }) + .collect(), + } + } +} + +#[derive(Debug, Clone)] +/// Per-instance progress row within a shutdown operation. +pub(super) struct ShutdownCoreProgress { + pub(super) core_id: usize, + pub(super) deployment_generation: u64, + pub(super) state: String, + pub(super) updated_at: String, + pub(super) detail: Option, +} + +#[derive(Debug, Clone)] +/// In-memory shutdown record retained for active and recent terminal lookups. +pub(super) struct ShutdownRecord { + pub(super) shutdown_id: String, + pub(super) pipeline_group_id: PipelineGroupId, + pub(super) pipeline_id: PipelineId, + pub(super) state: ShutdownLifecycleState, + pub(super) started_at: String, + pub(super) updated_at: String, + pub(super) failure_reason: Option, + pub(super) cores: Vec, + pub(super) completed_at: Option, +} + +impl ShutdownRecord { + /// Creates the initial in-memory record for a pipeline shutdown operation. + pub(super) fn new( + shutdown_id: String, + pipeline_group_id: PipelineGroupId, + pipeline_id: PipelineId, + cores: Vec, + ) -> Self { + let now = timestamp_now(); + Self { + shutdown_id, + pipeline_group_id, + pipeline_id, + state: ShutdownLifecycleState::Pending, + started_at: now.clone(), + updated_at: now, + failure_reason: None, + cores, + completed_at: None, + } + } + + /// Materializes the full shutdown status returned by the control plane. + pub(super) fn status(&self) -> ShutdownStatus { + ShutdownStatus { + shutdown_id: self.shutdown_id.clone(), + pipeline_group_id: self.pipeline_group_id.clone(), + pipeline_id: self.pipeline_id.clone(), + state: self.state.as_str().to_owned(), + started_at: self.started_at.clone(), + updated_at: self.updated_at.clone(), + failure_reason: self.failure_reason.clone(), + cores: self + .cores + .iter() + .map(|core| ShutdownCoreStatus { + core_id: core.core_id, + deployment_generation: core.deployment_generation, + state: core.state.clone(), + updated_at: core.updated_at.clone(), + detail: core.detail.clone(), + }) + .collect(), + } + } +} + +/// Controller-owned record for one deployed runtime instance. +pub(super) struct RuntimeInstanceRecord { + // The controller drops this sender once shutdown is requested so the + // pipeline control loop can observe channel closure after node tasks exit. + pub(super) control_sender: Option>, + pub(super) lifecycle: RuntimeInstanceLifecycle, +} + +#[derive(Debug, Clone)] +/// Runtime-instance liveness as understood by the controller. +pub(super) enum RuntimeInstanceLifecycle { + /// The pipeline thread is still expected to be running. + Active, + /// The pipeline thread reported a terminal exit. + Exited(RuntimeInstanceExit), +} + +#[derive(Debug, Clone)] +/// Terminal result reported by a deployed pipeline runtime thread. +pub(crate) enum RuntimeInstanceExit { + /// The runtime exited normally after drain/shutdown. + Success, + /// The runtime exited due to a pipeline error or panic. + Error(RuntimeInstanceError), +} + +#[derive(Debug, Clone)] +/// Committed logical pipeline config plus the active deployment generation. +pub(super) struct LogicalPipelineRecord { + pub(super) resolved: ResolvedPipelineConfig, + pub(super) active_generation: u64, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +/// Topic runtime properties that cannot be mutated by live rollout. +pub(super) struct TopicRuntimeProfile { + pub(super) backend: TopicBackendKind, + pub(super) policies: otap_df_config::topic::TopicPolicies, + pub(super) selected_mode: InferredTopicMode, +} + +/// Complete mutable state protected by `ControllerRuntime::state`. +/// +/// Keep this type as plain data: methods that enforce lifecycle invariants +/// should live on `ControllerRuntime` so mutations can update observed state +/// and wake condition variables consistently. +pub(super) struct ControllerRuntimeState { + /// Latest accepted full engine config, including committed live changes. + pub(super) live_config: OtelDataflowSpec, + /// Committed logical pipelines keyed by group/pipeline id. + pub(super) logical_pipelines: HashMap, + /// Deployed runtime instances keyed by group/pipeline/core/generation. + pub(super) runtime_instances: HashMap, + // A pipeline thread can finish before register_launched_instance() publishes it as Active. + // We park that exit here and reconcile it during registration instead of leaving stale + // liveness behind. + pub(super) pending_instance_exits: HashMap, + /// Rollout snapshots retained for active and recent terminal lookups. + pub(super) rollouts: HashMap, + /// Active rollout id per logical pipeline; presence causes operation conflict. + pub(super) active_rollouts: HashMap, + /// FIFO terminal rollout ids per logical pipeline for cap/TTL eviction. + pub(super) terminal_rollouts: HashMap>, + /// Shutdown snapshots retained for active and recent terminal lookups. + pub(super) shutdowns: HashMap, + /// Active shutdown id per logical pipeline; presence causes operation conflict. + pub(super) active_shutdowns: HashMap, + /// FIFO terminal shutdown ids per logical pipeline for cap/TTL eviction. + pub(super) terminal_shutdowns: HashMap>, + /// Next deployment generation to assign for each logical pipeline. + pub(super) generation_counters: HashMap, + /// Count of runtime instances still considered active by the controller. + pub(super) active_instances: usize, + /// Monotonic rollout id suffix. + pub(super) next_rollout_id: u64, + /// Monotonic shutdown id suffix. + pub(super) next_shutdown_id: u64, + /// Monotonic logical runtime-thread id used for diagnostics. + pub(super) next_thread_id: usize, + /// First runtime failure surfaced to global controller shutdown handling. + pub(super) first_error: Option, +} + +#[derive(Debug)] +/// Fully validated rollout plan ready for background execution. +/// +/// The planner precomputes generation ids, target core sets, resize deltas, +/// operation records, and timeouts so the worker can execute without +/// reinterpreting the admin request. +pub(super) struct CandidateRolloutPlan { + /// Logical pipeline targeted by the rollout. + pub(super) pipeline_key: PipelineKey, + pub(super) pipeline_group_id: PipelineGroupId, + pub(super) pipeline_id: PipelineId, + /// Execution strategy selected by request classification. + pub(super) action: RolloutAction, + /// Resolved target pipeline config after applying the request. + pub(super) resolved_pipeline: ResolvedPipelineConfig, + /// Current committed record, absent for create rollouts. + pub(super) current_record: Option, + /// Core allocation from the committed record. + pub(super) current_assigned_cores: Vec, + /// Core allocation requested by the candidate config. + pub(super) target_assigned_cores: Vec, + /// Cores present in both current and target assignments. + pub(super) common_assigned_cores: Vec, + /// Cores present only in the target assignment. + pub(super) added_assigned_cores: Vec, + /// Cores present only in the current assignment. + pub(super) removed_assigned_cores: Vec, + /// Cores to launch for resize-only rollouts. + pub(super) resize_start_cores: Vec, + /// Cores to drain for resize-only rollouts. + pub(super) resize_stop_cores: Vec, + /// Deployment generation assigned to the target runtime instances. + pub(super) target_generation: u64, + /// Initial rollout status record to insert before spawning a worker. + pub(super) rollout: RolloutRecord, + /// Per-step readiness timeout in seconds. + pub(super) step_timeout_secs: u64, + /// Drain timeout in seconds for old instances. + pub(super) drain_timeout_secs: u64, +} + +#[derive(Debug)] +/// Fully validated shutdown plan ready for background execution. +pub(super) struct CandidateShutdownPlan { + /// Logical pipeline targeted by the shutdown. + pub(super) pipeline_key: PipelineKey, + /// Initial shutdown status record to insert before spawning a worker. + pub(super) shutdown: ShutdownRecord, + /// Active deployed instances that must exit for shutdown success. + pub(super) target_instances: Vec, + /// Per-instance shutdown timeout in seconds. + pub(super) timeout_secs: u64, +} + +/// Snapshot of active cores for the current committed generation. +pub(super) struct ActiveRuntimeCoreState { + /// Active cores still running the committed generation. + pub(super) current_generation_cores: Vec, + /// Whether another active generation exists for the same logical pipeline. + pub(super) has_foreign_active_generations: bool, +} + +/// Returns a fresh RFC3339 timestamp for externally visible status updates. +pub(super) fn timestamp_now() -> String { + Utc::now().to_rfc3339() +} + +/// Returns whether a terminal operation snapshot has exceeded retention TTL. +pub(super) fn is_expired(completed_at: Option, now: Instant) -> bool { + completed_at + .and_then(|completed_at| now.checked_duration_since(completed_at)) + .is_some_and(|age| age >= TERMINAL_OPERATION_RETENTION_TTL) +} + +#[derive(Debug)] +/// Rollout worker failure category used to distinguish rollback failures. +pub(super) enum RolloutExecutionError { + /// The rollout failed before or outside rollback handling. + Failed(String), + /// Rollback was attempted but did not restore the previous runtime shape. + RollbackFailed(String), +} diff --git a/rust/otap-dataflow/crates/controller/src/live_control_tests.rs b/rust/otap-dataflow/crates/controller/src/live_control_tests.rs new file mode 100644 index 0000000000..4955e9b794 --- /dev/null +++ b/rust/otap-dataflow/crates/controller/src/live_control_tests.rs @@ -0,0 +1,2707 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +use super::*; +use otap_df_config::engine::ResolvedPipelineRole; +use otap_df_config::observed_state::ObservedStateSettings; +use otap_df_config::settings::telemetry::logs::LogLevel; +use otap_df_engine::ExporterFactory; +use otap_df_engine::ReceiverFactory; +use otap_df_engine::config::{ExporterConfig, ReceiverConfig}; +use otap_df_engine::control::{ + RuntimeControlMsg, RuntimeCtrlMsgReceiver, runtime_ctrl_msg_channel, +}; +use otap_df_engine::error::Error as EngineError; +use otap_df_engine::exporter::ExporterWrapper; +use otap_df_engine::receiver::ReceiverWrapper; +use otap_df_engine::wiring_contract::WiringContract; +use otap_df_state::pipeline_status::PipelineStatus; +use otap_df_telemetry::TracingSetup; +use otap_df_telemetry::event::EngineEvent; +use otap_df_telemetry::tracing_init::ProviderSetup; +use tokio_util::sync::CancellationToken; + +fn available_core_ids() -> Vec { + vec![ + CoreId { id: 0 }, + CoreId { id: 1 }, + CoreId { id: 2 }, + CoreId { id: 3 }, + CoreId { id: 4 }, + CoreId { id: 5 }, + CoreId { id: 6 }, + CoreId { id: 7 }, + ] +} + +fn test_validate_config(_config: &serde_json::Value) -> Result<(), otap_df_config::error::Error> { + Ok(()) +} + +fn test_receiver_create( + _pipeline_ctx: PipelineContext, + _node: otap_df_engine::node::NodeId, + _node_config: Arc, + _receiver_config: &ReceiverConfig, +) -> Result, otap_df_config::error::Error> { + panic!("test receiver factory should not be constructed") +} + +fn test_exporter_create( + _pipeline_ctx: PipelineContext, + _node: otap_df_engine::node::NodeId, + _node_config: Arc, + _exporter_config: &ExporterConfig, +) -> Result, otap_df_config::error::Error> { + panic!("test exporter factory should not be constructed") +} + +static TEST_RECEIVER_FACTORIES: &[ReceiverFactory<()>] = &[ + ReceiverFactory { + name: "urn:test:receiver:example", + create: test_receiver_create, + wiring_contract: WiringContract::UNRESTRICTED, + validate_config: test_validate_config, + }, + ReceiverFactory { + name: "urn:otel:receiver:topic", + create: test_receiver_create, + wiring_contract: WiringContract::UNRESTRICTED, + validate_config: test_validate_config, + }, +]; + +static TEST_EXPORTER_FACTORIES: &[ExporterFactory<()>] = &[ + ExporterFactory { + name: "urn:test:exporter:example", + create: test_exporter_create, + wiring_contract: WiringContract::UNRESTRICTED, + validate_config: test_validate_config, + }, + ExporterFactory { + name: "urn:otel:exporter:topic", + create: test_exporter_create, + wiring_contract: WiringContract::UNRESTRICTED, + validate_config: test_validate_config, + }, +]; + +static TEST_PIPELINE_FACTORY: PipelineFactory<()> = + PipelineFactory::new(TEST_RECEIVER_FACTORIES, &[], TEST_EXPORTER_FACTORIES, &[]); + +fn test_runtime(config: &OtelDataflowSpec) -> Arc> { + let registry = TelemetryRegistryHandle::new(); + let observed_state_store = + ObservedStateStore::new(&ObservedStateSettings::default(), registry.clone()); + let observed_state_handle = observed_state_store.handle(); + let engine_event_reporter = observed_state_store.reporter(Default::default()); + let (_metrics_rx, metrics_reporter) = MetricsReporter::create_new_and_receiver(8); + let declared_topics = + Controller::<()>::declare_topics(config).expect("declared topics should be valid"); + let (memory_pressure_tx, _memory_pressure_rx) = + tokio::sync::watch::channel(MemoryPressureChanged::initial()); + + Arc::new(ControllerRuntime::new( + &TEST_PIPELINE_FACTORY, + ControllerContext::new(registry), + observed_state_store, + observed_state_handle, + engine_event_reporter, + metrics_reporter, + declared_topics, + available_core_ids(), + TracingSetup::new(ProviderSetup::Noop, LogLevel::default(), engine_context), + Duration::from_secs(1), + memory_pressure_tx, + config.clone(), + )) +} + +struct ObservedStateRunner { + cancel: CancellationToken, + join: Option>, +} + +impl ObservedStateRunner { + fn start(runtime: &ControllerRuntime<()>) -> Self { + let cancel = CancellationToken::new(); + let store = runtime.observed_state_store.clone(); + let cancel_clone = cancel.clone(); + let join = thread::spawn(move || { + let runtime = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build() + .expect("observed-state test runtime should build"); + runtime + .block_on(store.run(cancel_clone)) + .expect("observed-state consumer should exit cleanly"); + }); + Self { + cancel, + join: Some(join), + } + } +} + +impl Drop for ObservedStateRunner { + fn drop(&mut self) { + self.cancel.cancel(); + if let Some(join) = self.join.take() { + join.join() + .expect("observed-state consumer thread should join cleanly"); + } + } +} + +fn deployed_key( + pipeline_group_id: &str, + pipeline_id: &str, + core_id: usize, + generation: u64, +) -> DeployedPipelineKey { + DeployedPipelineKey { + pipeline_group_id: pipeline_group_id.to_owned().into(), + pipeline_id: pipeline_id.to_owned().into(), + core_id, + deployment_generation: generation, + } +} + +fn report_ready(runtime: &ControllerRuntime<()>, key: DeployedPipelineKey) { + runtime + .engine_event_reporter + .report(EngineEvent::admitted(key.clone(), None)); + runtime + .engine_event_reporter + .report(EngineEvent::ready(key, None)); +} + +fn report_stopped(runtime: &ControllerRuntime<()>, key: DeployedPipelineKey) { + runtime + .engine_event_reporter + .report(EngineEvent::admitted(key.clone(), None)); + runtime + .engine_event_reporter + .report(EngineEvent::ready(key.clone(), None)); + runtime + .engine_event_reporter + .report(EngineEvent::shutdown_requested(key.clone(), None)); + runtime + .engine_event_reporter + .report(EngineEvent::drained(key, None)); +} + +fn wait_for_observed_status( + runtime: &ControllerRuntime<()>, + pipeline_key: &PipelineKey, + predicate: F, +) -> PipelineStatus +where + F: Fn(&PipelineStatus) -> bool, +{ + let deadline = Instant::now() + Duration::from_secs(5); + loop { + if let Some(status) = runtime.observed_state_handle.pipeline_status(pipeline_key) { + if predicate(&status) { + return status; + } + } + assert!( + Instant::now() < deadline, + "timed out waiting for observed status predicate on {}:{}", + pipeline_key.pipeline_group_id(), + pipeline_key.pipeline_id() + ); + thread::sleep(Duration::from_millis(25)); + } +} + +fn engine_config_with_pipeline(pipeline_yaml: &str) -> OtelDataflowSpec { + OtelDataflowSpec::from_yaml(&format!( + r#" +version: otel_dataflow/v1 +groups: + g1: + pipelines: + p1: +{pipeline_yaml} +"# + )) + .expect("engine config should parse") +} + +fn simple_pipeline_yaml() -> &'static str { + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"# +} + +fn register_existing_pipeline(runtime: &ControllerRuntime<()>, config: &OtelDataflowSpec) { + register_pipeline(runtime, config, "g1", "p1"); +} + +fn register_pipeline( + runtime: &ControllerRuntime<()>, + config: &OtelDataflowSpec, + group_id: &str, + pipeline_id: &str, +) { + let resolved = config + .resolve() + .pipelines + .into_iter() + .find(|pipeline| { + pipeline.role == ResolvedPipelineRole::Regular + && pipeline.pipeline_group_id.as_ref() == group_id + && pipeline.pipeline_id.as_ref() == pipeline_id + }) + .expect("resolved pipeline should exist"); + runtime.register_committed_pipeline(resolved, 0); +} + +fn register_runtime_instance( + runtime: &ControllerRuntime<()>, + pipeline_group_id: &str, + pipeline_id: &str, + core_id: usize, + generation: u64, + lifecycle: RuntimeInstanceLifecycle, +) -> RuntimeCtrlMsgReceiver<()> { + let (tx, rx) = runtime_ctrl_msg_channel::<()>(4); + let control_sender: Arc = Arc::new(tx.clone()); + let is_active = matches!(&lifecycle, RuntimeInstanceLifecycle::Active); + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + _ = state.runtime_instances.insert( + DeployedPipelineKey { + pipeline_group_id: pipeline_group_id.to_owned().into(), + pipeline_id: pipeline_id.to_owned().into(), + core_id, + deployment_generation: generation, + }, + RuntimeInstanceRecord { + control_sender: Some(control_sender), + lifecycle, + }, + ); + if is_active { + state.active_instances += 1; + } + rx +} + +fn register_runtime_instance_with_sender( + runtime: &ControllerRuntime<()>, + pipeline_key: DeployedPipelineKey, + control_sender: Arc, + lifecycle: RuntimeInstanceLifecycle, +) { + let is_active = matches!(&lifecycle, RuntimeInstanceLifecycle::Active); + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + _ = state.runtime_instances.insert( + pipeline_key, + RuntimeInstanceRecord { + control_sender: Some(control_sender), + lifecycle, + }, + ); + if is_active { + state.active_instances += 1; + } +} + +struct RecordingPipelineAdminSender { + calls: Arc>>, + failure: Option, +} + +impl PipelineAdminSender for RecordingPipelineAdminSender { + fn try_send_shutdown(&self, _deadline: Instant, reason: String) -> Result<(), EngineError> { + self.calls + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()) + .push(reason); + if let Some(failure) = &self.failure { + Err(EngineError::RuntimeMsgError { + error: failure.clone(), + }) + } else { + Ok(()) + } + } +} + +fn recording_admin_sender( + failure: Option<&str>, +) -> (Arc, Arc>>) { + let calls = Arc::new(Mutex::new(Vec::new())); + let sender = Arc::new(RecordingPipelineAdminSender { + calls: Arc::clone(&calls), + failure: failure.map(ToOwned::to_owned), + }); + (sender, calls) +} + +fn launched_runtime_instance( + pipeline_group_id: &str, + pipeline_id: &str, + core_id: usize, + generation: u64, +) -> LaunchedPipelineThread<()> { + let (tx, _rx) = runtime_ctrl_msg_channel::<()>(4); + let control_sender: Arc = Arc::new(tx); + LaunchedPipelineThread { + pipeline_key: DeployedPipelineKey { + pipeline_group_id: pipeline_group_id.to_owned().into(), + pipeline_id: pipeline_id.to_owned().into(), + core_id, + deployment_generation: generation, + }, + control_sender, + _marker: std::marker::PhantomData, + } +} + +fn wait_for_shutdown_state( + runtime: &ControllerRuntime<()>, + shutdown_id: &str, + expected_state: &str, +) -> ShutdownStatus { + let deadline = Instant::now() + Duration::from_secs(5); + loop { + let status = runtime + .shutdown_status_snapshot(shutdown_id) + .expect("shutdown should exist"); + if status.state == expected_state { + return status; + } + assert!( + Instant::now() < deadline, + "timed out waiting for shutdown {shutdown_id} to reach state {expected_state}, current state: {}", + status.state + ); + thread::sleep(Duration::from_millis(25)); + } +} + +fn wait_for_shutdown_message(receiver: &mut RuntimeCtrlMsgReceiver<()>) -> RuntimeControlMsg<()> { + let deadline = Instant::now() + Duration::from_secs(2); + loop { + if let Ok(message) = receiver.try_recv() { + return message; + } + assert!( + Instant::now() < deadline, + "timed out waiting for shutdown control message" + ); + thread::sleep(Duration::from_millis(25)); + } +} + +fn complete_instance_exit_on_shutdown( + runtime: Arc>, + mut receiver: RuntimeCtrlMsgReceiver<()>, + deployed_key: DeployedPipelineKey, + expected_reason: &'static str, +) -> thread::JoinHandle<()> { + thread::spawn(move || { + assert!(matches!( + wait_for_shutdown_message(&mut receiver), + RuntimeControlMsg::Shutdown { reason, .. } if reason == expected_reason + )); + runtime.note_instance_exit(deployed_key, RuntimeInstanceExit::Success); + }) +} + +fn terminal_rollout_record( + pipeline_group_id: &str, + pipeline_id: &str, + rollout_id: &str, +) -> RolloutRecord { + let mut rollout = RolloutRecord::new( + rollout_id.to_owned(), + pipeline_group_id.to_owned().into(), + pipeline_id.to_owned().into(), + RolloutAction::Replace, + 1, + Some(0), + 60, + Vec::new(), + ); + rollout.state = RolloutLifecycleState::Succeeded; + rollout +} + +fn terminal_shutdown_record( + pipeline_group_id: &str, + pipeline_id: &str, + shutdown_id: &str, +) -> ShutdownRecord { + let mut shutdown = ShutdownRecord::new( + shutdown_id.to_owned(), + pipeline_group_id.to_owned().into(), + pipeline_id.to_owned().into(), + Vec::new(), + ); + shutdown.state = ShutdownLifecycleState::Succeeded; + shutdown +} + +/// Scenario: a reconfigure request changes only the effective core +/// allocation from one assigned core to two. +/// Guarantees: rollout planning classifies the change as a resize, starts +/// only the added core, and keeps the current generation unchanged. +#[test] +fn prepare_rollout_plan_accepts_core_allocation_scale_up() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _receiver = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 2 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("replacement should parse"); + + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("core allocation changes should be planned"); + + assert_eq!(plan.action, RolloutAction::Resize); + assert_eq!(plan.current_assigned_cores, vec![0]); + assert_eq!(plan.target_assigned_cores, vec![0, 1]); + assert_eq!(plan.common_assigned_cores, vec![0]); + assert_eq!(plan.added_assigned_cores, vec![1]); + assert!(plan.removed_assigned_cores.is_empty()); + assert_eq!(plan.resize_start_cores, vec![1]); + assert!(plan.resize_stop_cores.is_empty()); + assert_eq!(plan.target_generation, 0); + assert_eq!( + plan.rollout + .cores + .iter() + .map(|core| core.core_id) + .collect::>(), + vec![1] + ); +} + +/// Scenario: a reconfigure request changes only the effective core +/// allocation from two assigned cores to one. +/// Guarantees: rollout planning classifies the change as a resize, stops +/// only the removed core, and keeps the current generation unchanged. +#[test] +fn prepare_rollout_plan_accepts_core_allocation_scale_down() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 2 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _receiver0 = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + let _receiver1 = + register_runtime_instance(&runtime, "g1", "p1", 1, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("replacement should parse"); + + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("core allocation changes should be planned"); + + assert_eq!(plan.action, RolloutAction::Resize); + assert_eq!(plan.current_assigned_cores, vec![0, 1]); + assert_eq!(plan.target_assigned_cores, vec![0]); + assert_eq!(plan.common_assigned_cores, vec![0]); + assert!(plan.added_assigned_cores.is_empty()); + assert_eq!(plan.removed_assigned_cores, vec![1]); + assert!(plan.resize_start_cores.is_empty()); + assert_eq!(plan.resize_stop_cores, vec![1]); + assert_eq!(plan.target_generation, 0); + assert_eq!( + plan.rollout + .cores + .iter() + .map(|core| core.core_id) + .collect::>(), + vec![1] + ); +} + +/// Scenario: the submitted pipeline config is effectively identical to the +/// committed active pipeline and serving footprint. +/// Guarantees: rollout planning short-circuits to `NoOp` rather than +/// scheduling a replace or resize operation. +#[test] +fn prepare_rollout_plan_returns_noop_for_identical_active_pipeline() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _receiver = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("replacement should parse"); + + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("identical updates should be planned"); + + assert_eq!(plan.action, RolloutAction::NoOp); + assert_eq!(plan.target_generation, 0); + assert!(plan.rollout.cores.is_empty()); + assert!(plan.resize_start_cores.is_empty()); + assert!(plan.resize_stop_cores.is_empty()); +} + +/// Scenario: the controller executes a rollout plan that has already been +/// classified as `NoOp`. +/// Guarantees: the controller returns an immediate successful rollout +/// snapshot, preserves the committed generation, and leaves no in-flight +/// rollout summary behind. +#[test] +fn spawn_rollout_returns_immediate_success_for_noop() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _receiver = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("replacement should parse"); + + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("identical updates should be planned"); + + let status = runtime + .spawn_rollout(plan) + .expect("noop rollout should succeed"); + + assert_eq!(status.action, "noop"); + assert_eq!(status.state, ApiPipelineRolloutState::Succeeded); + assert_eq!(status.target_generation, 0); + assert!(status.cores.is_empty()); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let details = runtime + .pipeline_details_snapshot(&pipeline_key) + .expect("group should exist") + .expect("pipeline should exist"); + assert_eq!(details.active_generation, Some(0)); + assert!(details.rollout.is_none()); + + let rollout = runtime + .rollout_status_snapshot(&status.rollout_id) + .expect("completed rollout should remain queryable"); + assert_eq!(rollout.state, ApiPipelineRolloutState::Succeeded); +} + +/// Scenario: a reconfigure request changes the runtime graph shape while +/// also changing the resource footprint. +/// Guarantees: planning keeps the safer replace path instead of collapsing +/// the update into a resource-only resize. +#[test] +fn prepare_rollout_plan_keeps_replace_when_runtime_shape_changes() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _receiver = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 2 +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("runtime shape changes should still be planned"); + + assert_eq!(plan.action, RolloutAction::Replace); + assert_eq!(plan.target_generation, 1); + assert_eq!(plan.common_assigned_cores, vec![0]); + assert_eq!(plan.added_assigned_cores, vec![1]); + assert!(plan.resize_start_cores.is_empty()); + assert!(plan.resize_stop_cores.is_empty()); + assert_eq!( + plan.rollout + .cores + .iter() + .map(|core| core.core_id) + .collect::>(), + vec![0, 1] + ); +} + +/// Scenario: a reconfigure request would require a runtime topic-broker +/// mutation for an existing logical pipeline. +/// Guarantees: planning rejects the request before rollout starts and +/// surfaces an invalid-request error to the caller. +#[test] +fn prepare_rollout_plan_rejects_topic_runtime_mutation() { + let config = OtelDataflowSpec::from_yaml( + r#" +version: otel_dataflow/v1 +topics: + shared: {} +groups: + g1: + pipelines: + p1: + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + to_topic: + type: "urn:otel:exporter:topic" + config: + topic: shared + connections: + - from: receiver + to: to_topic +"#, + ) + .expect("config should parse"); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + from_topic: + type: "urn:otel:receiver:topic" + config: + topic: shared + subscription: + mode: balanced + group: workers + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: from_topic + to: exporter +"#, + ) + .expect("replacement should parse"); + + let err = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect_err("topic runtime changes should be rejected"); + + match err { + ControlPlaneError::InvalidRequest { message } => { + assert!(message.contains("topic broker mutation")); + } + other => panic!("unexpected error: {other:?}"), + } +} + +/// Scenario: a second rollout is requested for a logical pipeline that +/// already has an active rollout record. +/// Guarantees: planning rejects the new request with a rollout conflict +/// instead of interleaving two rollout state machines. +#[test] +fn prepare_rollout_plan_rejects_concurrent_rollout_for_same_pipeline() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement.clone(), + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("first rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + + let err = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect_err("second rollout should conflict"); + + assert_eq!(err, ControlPlaneError::RolloutConflict); +} + +/// Scenario: a new rollout has been registered for a logical pipeline but +/// has not yet committed its candidate config. +/// Guarantees: pipeline details still return the committed config while +/// exposing the pending rollout summary separately. +#[test] +fn pipeline_details_returns_committed_config_while_rollout_is_pending() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 1 +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement.clone(), + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + + let details = runtime + .pipeline_details_snapshot(&PipelineKey::new("g1".into(), "p1".into())) + .expect("group should exist") + .expect("pipeline details should exist"); + + let mut committed_nodes = details + .pipeline + .node_iter() + .map(|(node_id, _)| node_id.as_ref().to_owned()) + .collect::>(); + committed_nodes.sort(); + assert_eq!( + committed_nodes, + vec!["exporter".to_owned(), "receiver".to_owned()] + ); + assert_eq!(details.active_generation, Some(0)); + assert_eq!( + details + .rollout + .expect("pending rollout summary should be present") + .target_generation, + 1 + ); +} + +/// Scenario: panic diagnostics are captured for a worker panic with explicit +/// thread metadata. +/// Guarantees: the short summary stays operator-friendly while the detailed +/// form includes thread context and a captured backtrace. +#[test] +fn panic_report_formats_summary_and_detail() { + let report = PanicReport::capture( + "rollout worker", + Box::new("boom"), + Some("rollout-g1-p1".to_owned()), + Some(17), + Some(3), + ); + + assert_eq!(report.summary_message(), "rollout worker panicked: boom"); + let detail = report.detail_message(); + assert!(detail.contains("rollout worker panicked: boom")); + assert!(detail.contains("thread_name=rollout-g1-p1")); + assert!(detail.contains("thread_id=17")); + assert!(detail.contains("core_id=3")); + assert!(detail.contains("backtrace:")); +} + +/// Scenario: a panic is raised with a non-string payload. +/// Guarantees: the captured panic summary stays readable and avoids the older +/// generic placeholder text. +#[test] +fn panic_report_non_string_payload_has_useful_fallback() { + let report = PanicReport::capture("shutdown worker", Box::new(7usize), None, None, None); + + assert_eq!( + report.summary_message(), + "shutdown worker panicked: non-string panic payload" + ); + assert!( + !report + .summary_message() + .contains("panic payload was not a string") + ); +} + +/// Scenario: a detached rollout worker panics before it reaches the normal +/// terminal-state bookkeeping path. +/// Guarantees: the rollout is forced into a failed terminal state and the +/// logical pipeline no longer stays blocked by a stale active-rollout entry. +#[test] +fn rollout_worker_panic_marks_failed_and_clears_conflict() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement.clone(), + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + + runtime.handle_rollout_worker_panic( + &plan.pipeline_key, + &plan.rollout.rollout_id, + "rollout-g1-p1".to_owned(), + Box::new("boom"), + ); + + let status = runtime + .rollout_status_snapshot(&plan.rollout.rollout_id) + .expect("rollout should remain queryable"); + assert_eq!(status.state, ApiPipelineRolloutState::Failed); + assert!( + status + .failure_reason + .as_deref() + .is_some_and(|message| message.contains("rollout worker panicked: boom")) + ); + assert!( + status + .failure_reason + .as_deref() + .is_some_and(|message| !message.contains("backtrace:")) + ); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!(!state.active_rollouts.contains_key(&plan.pipeline_key)); + drop(state); + + let _next_plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("rollout conflict should be cleared after panic cleanup"); +} + +/// Scenario: a rollout worker panics after launching an uncommitted candidate +/// generation. +/// Guarantees: panic cleanup requests shutdown for the candidate generation +/// before clearing the active rollout, avoiding active orphan instances. +#[test] +fn rollout_worker_panic_requests_shutdown_for_uncommitted_candidate_generation() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + + let candidate_key = deployed_key("g1", "p1", 0, plan.target_generation); + let mut candidate_rx = register_runtime_instance( + &runtime, + "g1", + "p1", + 0, + plan.target_generation, + RuntimeInstanceLifecycle::Active, + ); + + runtime.handle_rollout_worker_panic( + &plan.pipeline_key, + &plan.rollout.rollout_id, + "rollout-g1-p1".to_owned(), + Box::new("boom"), + ); + + assert!(matches!( + wait_for_shutdown_message(&mut candidate_rx), + RuntimeControlMsg::Shutdown { reason, .. } if reason == "rollout panic cleanup" + )); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!( + state + .runtime_instances + .get(&candidate_key) + .expect("candidate instance should still be tracked until exit") + .control_sender + .is_none(), + "panic cleanup should release the retained sender after shutdown dispatch" + ); + assert!(!state.active_rollouts.contains_key(&plan.pipeline_key)); +} + +/// Scenario: a rollout worker panics after the target generation was already +/// committed as serving. +/// Guarantees: panic cleanup does not shut down the committed generation, +/// which would turn a late bookkeeping panic into runtime outage. +#[test] +fn rollout_worker_panic_does_not_shutdown_committed_target_generation() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + runtime.commit_pipeline_record(&plan, plan.target_generation); + + let mut candidate_rx = register_runtime_instance( + &runtime, + "g1", + "p1", + 0, + plan.target_generation, + RuntimeInstanceLifecycle::Active, + ); + + runtime.handle_rollout_worker_panic( + &plan.pipeline_key, + &plan.rollout.rollout_id, + "rollout-g1-p1".to_owned(), + Box::new("boom"), + ); + + assert!( + candidate_rx.try_recv().is_err(), + "committed target generation must not receive panic-cleanup shutdown" + ); +} + +/// Scenario: a resize rollback must clean up cores that were already started +/// before a later step fails. +/// Guarantees: rollback sends shutdown to those started cores instead of +/// leaving them running after the rollout fails. +#[test] +fn rollback_resize_rollout_cleans_up_started_cores() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _existing = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 2 +nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null +connections: + - from: receiver + to: exporter +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("resize rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + + let started_key = deployed_key("g1", "p1", 1, plan.target_generation); + let started_rx = register_runtime_instance( + &runtime, + "g1", + "p1", + 1, + plan.target_generation, + RuntimeInstanceLifecycle::Active, + ); + let exit_thread = complete_instance_exit_on_shutdown( + Arc::clone(&runtime), + started_rx, + started_key.clone(), + "rollback cleanup", + ); + + let result = runtime.rollback_resize_rollout(&plan, &[1], &[], "boom".to_owned()); + + assert!(matches!( + result, + Err(RolloutExecutionError::Failed(reason)) if reason == "boom" + )); + exit_thread + .join() + .expect("resize rollback shutdown helper should join cleanly"); + assert!(matches!( + runtime.instance_exit(&started_key), + Some(RuntimeInstanceExit::Success) + )); +} + +/// Scenario: a replace rollback must clean up added candidate cores that were +/// already serving the target generation before a later step fails. +/// Guarantees: rollback sends shutdown to those activated added cores instead +/// of leaving the candidate generation running. +#[test] +fn rollback_replace_rollout_cleans_up_activated_added_cores() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _existing = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let replacement = PipelineConfig::from_yaml( + "g1".into(), + "p1".into(), + r#" +policies: + resources: + core_allocation: + type: core_count + count: 2 +nodes: + input: + type: "urn:test:receiver:example" + config: null + output: + type: "urn:test:exporter:example" + config: null +connections: + - from: input + to: output +"#, + ) + .expect("replacement should parse"); + let plan = runtime + .prepare_rollout_plan( + "g1", + "p1", + &ReconfigureRequest { + pipeline: replacement, + step_timeout_secs: 60, + drain_timeout_secs: 60, + }, + ) + .expect("replace rollout plan should be accepted"); + runtime + .insert_rollout(&plan.pipeline_key, plan.rollout.clone()) + .expect("rollout should register"); + + let added_key = deployed_key("g1", "p1", 1, plan.target_generation); + let added_rx = register_runtime_instance( + &runtime, + "g1", + "p1", + 1, + plan.target_generation, + RuntimeInstanceLifecycle::Active, + ); + let exit_thread = complete_instance_exit_on_shutdown( + Arc::clone(&runtime), + added_rx, + added_key.clone(), + "rollback cleanup", + ); + + let result = runtime.rollback_replace_rollout(&plan, &[], &[1], &[], "boom".to_owned()); + + assert!(matches!( + result, + Err(RolloutExecutionError::Failed(reason)) if reason == "boom" + )); + exit_thread + .join() + .expect("replace rollback shutdown helper should join cleanly"); + assert!(matches!( + runtime.instance_exit(&added_key), + Some(RuntimeInstanceExit::Success) + )); +} + +/// Scenario: a shutdown request targets a group id that does not exist in +/// the controller's committed config. +/// Guarantees: per-pipeline shutdown fails fast with `GroupNotFound` +/// instead of creating a shutdown record. +#[test] +fn request_shutdown_pipeline_rejects_missing_group() { + let config = engine_config_with_pipeline( + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + + let err = runtime + .request_shutdown_pipeline("missing", "p1", 5) + .expect_err("missing group should be rejected"); + + assert_eq!(err, ControlPlaneError::GroupNotFound); +} + +/// Scenario: a shutdown request targets a pipeline id that is not present +/// in an existing group. +/// Guarantees: per-pipeline shutdown rejects the request with +/// `PipelineNotFound` before any runtime instances are touched. +#[test] +fn request_shutdown_pipeline_rejects_missing_pipeline() { + let config = engine_config_with_pipeline( + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + + let err = runtime + .request_shutdown_pipeline("g1", "missing", 5) + .expect_err("missing pipeline should be rejected"); + + assert_eq!(err, ControlPlaneError::PipelineNotFound); +} + +/// Scenario: a detached shutdown worker panics before it reaches the normal +/// terminal-state bookkeeping path. +/// Guarantees: the shutdown is forced into a failed terminal state and the +/// logical pipeline no longer stays blocked by a stale active-shutdown entry. +#[test] +fn shutdown_worker_panic_marks_failed_and_clears_conflict() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _rx = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let plan = runtime + .prepare_shutdown_plan("g1", "p1", 5) + .expect("shutdown plan should be accepted"); + runtime + .insert_shutdown(&plan.pipeline_key, plan.shutdown.clone()) + .expect("shutdown should register"); + + runtime.handle_shutdown_worker_panic( + &plan.pipeline_key, + &plan.shutdown.shutdown_id, + "shutdown-g1-p1".to_owned(), + Box::new("boom"), + ); + + let status = runtime + .shutdown_status_snapshot(&plan.shutdown.shutdown_id) + .expect("shutdown should remain queryable"); + assert_eq!(status.state, "failed"); + assert!( + status + .failure_reason + .as_deref() + .is_some_and(|message| message.contains("shutdown worker panicked: boom")) + ); + assert!( + status + .failure_reason + .as_deref() + .is_some_and(|message| !message.contains("backtrace:")) + ); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!(!state.active_shutdowns.contains_key(&plan.pipeline_key)); + drop(state); + + let _next_plan = runtime + .prepare_shutdown_plan("g1", "p1", 5) + .expect("shutdown conflict should be cleared after panic cleanup"); +} + +/// Scenario: a shutdown request arrives while the same logical pipeline is +/// already under rollout. +/// Guarantees: shutdown is rejected with a rollout conflict so the rollout +/// controller remains the single owner of that pipeline's lifecycle. +#[test] +fn request_shutdown_pipeline_rejects_active_rollout() { + let config = engine_config_with_pipeline( + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + _ = state + .active_rollouts + .insert(pipeline_key, "rollout-42".to_owned()); + drop(state); + + let err = runtime + .request_shutdown_pipeline("g1", "p1", 5) + .expect_err("active rollout should conflict"); + + assert_eq!(err, ControlPlaneError::RolloutConflict); +} + +/// Scenario: a second shutdown request targets a logical pipeline that +/// already has an active shutdown operation. +/// Guarantees: the controller rejects the duplicate request instead of +/// creating competing shutdown records. +#[test] +fn request_shutdown_pipeline_rejects_active_shutdown() { + let config = engine_config_with_pipeline( + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let shutdown = ShutdownRecord::new( + "shutdown-0".to_owned(), + "g1".into(), + "p1".into(), + vec![ShutdownCoreProgress { + core_id: 0, + deployment_generation: 0, + state: "pending".to_owned(), + updated_at: timestamp_now(), + detail: None, + }], + ); + runtime + .insert_shutdown(&pipeline_key, shutdown) + .expect("shutdown should register"); + + let err = runtime + .request_shutdown_pipeline("g1", "p1", 5) + .expect_err("active shutdown should conflict"); + + assert_eq!(err, ControlPlaneError::RolloutConflict); +} + +/// Scenario: a shutdown request targets a committed pipeline that currently +/// has no active runtime instances. +/// Guarantees: the controller rejects the request as an invalid already +/// stopped pipeline instead of synthesizing a no-op shutdown operation. +#[test] +fn request_shutdown_pipeline_rejects_already_stopped_pipeline() { + let config = engine_config_with_pipeline( + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let err = runtime + .request_shutdown_pipeline("g1", "p1", 5) + .expect_err("already stopped pipeline should be rejected"); + + match err { + ControlPlaneError::InvalidRequest { message } => { + assert!(message.contains("already stopped")); + } + other => panic!("unexpected error: {other:?}"), + } +} + +/// Scenario: a shutdown request targets one logical pipeline while other +/// pipelines and exited instances still exist in the runtime registry. +/// Guarantees: only active instances for the requested logical pipeline +/// receive shutdown control messages and relinquish their control senders. +#[test] +fn request_shutdown_pipeline_targets_only_active_instances_for_pipeline() { + let config = OtelDataflowSpec::from_yaml( + r#" +version: otel_dataflow/v1 +groups: + g1: + pipelines: + p1: + policies: + resources: + core_allocation: + type: core_count + count: 2 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter + p2: + policies: + resources: + core_allocation: + type: core_count + count: 1 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ) + .expect("config should parse"); + let runtime = test_runtime(&config); + register_pipeline(&runtime, &config, "g1", "p1"); + register_pipeline(&runtime, &config, "g1", "p2"); + + let mut p1_core0 = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + let mut p1_core1 = + register_runtime_instance(&runtime, "g1", "p1", 1, 0, RuntimeInstanceLifecycle::Active); + let mut p1_exited = register_runtime_instance( + &runtime, + "g1", + "p1", + 2, + 0, + RuntimeInstanceLifecycle::Exited(RuntimeInstanceExit::Success), + ); + let mut p2_core0 = + register_runtime_instance(&runtime, "g1", "p2", 3, 0, RuntimeInstanceLifecycle::Active); + + let _shutdown = runtime + .request_shutdown_pipeline("g1", "p1", 5) + .expect("shutdown request should be accepted"); + + assert!(matches!( + wait_for_shutdown_message(&mut p1_core0), + RuntimeControlMsg::Shutdown { reason, .. } if reason == "pipeline shutdown" + )); + assert!(matches!( + wait_for_shutdown_message(&mut p1_core1), + RuntimeControlMsg::Shutdown { reason, .. } if reason == "pipeline shutdown" + )); + assert!( + p1_exited.try_recv().is_err(), + "exited runtime should not receive shutdown" + ); + assert!( + p2_core0.try_recv().is_err(), + "other pipelines must not receive shutdown" + ); + let deadline = Instant::now() + Duration::from_secs(2); + loop { + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let p1_core0_released = state + .runtime_instances + .get(&DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 0, + deployment_generation: 0, + }) + .and_then(|instance| instance.control_sender.as_ref()) + .is_none(); + let p1_core1_released = state + .runtime_instances + .get(&DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 1, + deployment_generation: 0, + }) + .and_then(|instance| instance.control_sender.as_ref()) + .is_none(); + let p2_core0_retained = state + .runtime_instances + .get(&DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p2".into(), + core_id: 3, + deployment_generation: 0, + }) + .and_then(|instance| instance.control_sender.as_ref()) + .is_some(); + drop(state); + + if p1_core0_released && p1_core1_released && p2_core0_retained { + break; + } + assert!( + Instant::now() < deadline, + "timed out waiting for targeted control senders to be released" + ); + thread::sleep(Duration::from_millis(25)); + } +} + +/// Scenario: global shutdown dispatch encounters a send failure for one +/// active runtime instance while other active instances still need the signal. +/// Guarantees: shutdown dispatch is best effort across the whole snapshot: +/// every active sender is attempted, successful sends relinquish their retained +/// control sender, repeated calls do not re-signal instances that already +/// accepted shutdown, and failures are reported only after the full pass. +#[test] +fn request_shutdown_all_attempts_all_active_instances_before_returning_error() { + let runtime = test_runtime(&engine_config_with_pipeline(simple_pipeline_yaml())); + let key0 = deployed_key("g1", "p1", 0, 0); + let key1 = deployed_key("g1", "p1", 1, 0); + let key2 = deployed_key("g1", "p1", 2, 0); + let (sender0, calls0) = recording_admin_sender(None); + let (sender1, calls1) = recording_admin_sender(Some("simulated send failure")); + let (sender2, calls2) = recording_admin_sender(None); + + register_runtime_instance_with_sender( + &runtime, + key0.clone(), + sender0, + RuntimeInstanceLifecycle::Active, + ); + register_runtime_instance_with_sender( + &runtime, + key1.clone(), + sender1, + RuntimeInstanceLifecycle::Active, + ); + register_runtime_instance_with_sender( + &runtime, + key2.clone(), + sender2, + RuntimeInstanceLifecycle::Active, + ); + + let err = runtime + .request_shutdown_all(5) + .expect_err("shutdown-all should report the failed sender after dispatching all sends"); + + assert_eq!( + *calls0 + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + vec!["global shutdown".to_owned()] + ); + assert_eq!( + *calls1 + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + vec!["global shutdown".to_owned()] + ); + assert_eq!( + *calls2 + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + vec!["global shutdown".to_owned()] + ); + + let ControlPlaneError::Internal { message } = err else { + panic!("unexpected shutdown-all error: {err:?}"); + }; + assert!(message.contains("g1:p1 core=1 generation=0")); + assert!(message.contains("simulated send failure")); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!( + state + .runtime_instances + .get(&key0) + .and_then(|instance| instance.control_sender.as_ref()) + .is_none(), + "successful shutdown send should release key0 control sender" + ); + assert!( + state + .runtime_instances + .get(&key1) + .and_then(|instance| instance.control_sender.as_ref()) + .is_some(), + "failed shutdown send should retain key1 control sender" + ); + assert!( + state + .runtime_instances + .get(&key2) + .and_then(|instance| instance.control_sender.as_ref()) + .is_none(), + "successful shutdown send should release key2 control sender" + ); + drop(state); + + // The first pass released the control sender for successful instances, so + // a retry should only reattempt the instance whose shutdown send failed. + let err = runtime + .request_shutdown_all(5) + .expect_err("shutdown-all retry should still report the failed sender"); + + let ControlPlaneError::Internal { message } = err else { + panic!("unexpected shutdown-all retry error: {err:?}"); + }; + assert!(message.contains("g1:p1 core=1 generation=0")); + assert!(message.contains("simulated send failure")); + assert_eq!( + *calls0 + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + vec!["global shutdown".to_owned()] + ); + assert_eq!( + *calls1 + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + vec!["global shutdown".to_owned(), "global shutdown".to_owned()] + ); + assert_eq!( + *calls2 + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + vec!["global shutdown".to_owned()] + ); +} + +/// Scenario: all targeted runtime instances exit cleanly after a pipeline +/// shutdown request is accepted. +/// Guarantees: the shutdown record reaches `succeeded`, tracks per-core +/// completion, and removes the active shutdown lock for that pipeline. +#[test] +fn request_shutdown_pipeline_tracks_completion() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 2 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let mut core0 = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + let mut core1 = + register_runtime_instance(&runtime, "g1", "p1", 1, 0, RuntimeInstanceLifecycle::Active); + + let shutdown = runtime + .request_shutdown_pipeline("g1", "p1", 5) + .expect("shutdown request should be accepted"); + assert_eq!(shutdown.state, "pending"); + + assert!(matches!( + wait_for_shutdown_message(&mut core0), + RuntimeControlMsg::Shutdown { reason, .. } if reason == "pipeline shutdown" + )); + assert!(matches!( + wait_for_shutdown_message(&mut core1), + RuntimeControlMsg::Shutdown { reason, .. } if reason == "pipeline shutdown" + )); + + runtime.note_instance_exit( + DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 0, + deployment_generation: 0, + }, + RuntimeInstanceExit::Success, + ); + { + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!( + state.runtime_instances.contains_key(&DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 0, + deployment_generation: 0, + }), + "active shutdown should retain exited instances until completion" + ); + } + runtime.note_instance_exit( + DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 1, + deployment_generation: 0, + }, + RuntimeInstanceExit::Success, + ); + + let status = wait_for_shutdown_state(&runtime, &shutdown.shutdown_id, "succeeded"); + assert_eq!(status.cores.len(), 2); + assert!(status.cores.iter().all(|core| core.state == "exited")); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!( + !state + .active_shutdowns + .contains_key(&PipelineKey::new("g1".into(), "p1".into())) + ); + assert!(!state.runtime_instances.contains_key(&DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 0, + deployment_generation: 0, + })); + assert!(!state.runtime_instances.contains_key(&DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 1, + deployment_generation: 0, + })); +} + +/// Scenario: a pipeline shutdown request is accepted but the targeted +/// runtime instance never exits before the shutdown deadline. +/// Guarantees: the shutdown record transitions to `failed`, preserves the +/// timeout reason, and records the failed per-core state for callers. +#[test] +fn request_shutdown_pipeline_tracks_timeout_failure() { + let config = engine_config_with_pipeline( + r#" + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let mut core0 = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let shutdown = runtime + .request_shutdown_pipeline("g1", "p1", 1) + .expect("shutdown request should be accepted"); + assert!(matches!( + wait_for_shutdown_message(&mut core0), + RuntimeControlMsg::Shutdown { reason, .. } if reason == "pipeline shutdown" + )); + + let status = wait_for_shutdown_state(&runtime, &shutdown.shutdown_id, "failed"); + assert!( + status + .failure_reason + .as_deref() + .is_some_and(|reason| reason.contains("timed out waiting")) + ); + assert_eq!(status.cores.len(), 1); + assert_eq!(status.cores[0].state, "failed"); +} + +/// Scenario: terminal rollout history grows beyond the retention cap for one +/// logical pipeline while another pipeline also retains rollout history. +/// Guarantees: eviction is oldest-first and scoped per logical pipeline rather +/// than dropping unrelated rollout history. +#[test] +fn terminal_rollout_history_is_bounded_per_pipeline() { + let runtime = test_runtime(&engine_config_with_pipeline(simple_pipeline_yaml())); + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let other_pipeline_key = PipelineKey::new("g1".into(), "p2".into()); + + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + for index in 0..=TERMINAL_ROLLOUT_RETENTION_LIMIT { + let rollout_id = format!("rollout-{index}"); + _ = state.rollouts.insert( + rollout_id.clone(), + terminal_rollout_record("g1", "p1", &rollout_id), + ); + ControllerRuntime::<()>::record_terminal_rollout_locked( + &mut state, + &pipeline_key, + &rollout_id, + Instant::now(), + ); + } + + let other_rollout_id = "rollout-other".to_owned(); + _ = state.rollouts.insert( + other_rollout_id.clone(), + terminal_rollout_record("g1", "p2", &other_rollout_id), + ); + ControllerRuntime::<()>::record_terminal_rollout_locked( + &mut state, + &other_pipeline_key, + &other_rollout_id, + Instant::now(), + ); + + assert!(!state.rollouts.contains_key("rollout-0")); + assert!(state.rollouts.contains_key("rollout-1")); + assert!(state.rollouts.contains_key(&other_rollout_id)); + assert_eq!( + state + .terminal_rollouts + .get(&pipeline_key) + .map(|queue| queue.len()), + Some(TERMINAL_ROLLOUT_RETENTION_LIMIT) + ); + assert_eq!( + state + .terminal_rollouts + .get(&other_pipeline_key) + .map(|queue| queue.len()), + Some(1) + ); +} + +/// Scenario: terminal shutdown history grows beyond the retention cap for one +/// logical pipeline while another pipeline also retains shutdown history. +/// Guarantees: shutdown eviction is oldest-first and scoped per logical +/// pipeline rather than trimming unrelated shutdown history. +#[test] +fn terminal_shutdown_history_is_bounded_per_pipeline() { + let runtime = test_runtime(&engine_config_with_pipeline(simple_pipeline_yaml())); + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let other_pipeline_key = PipelineKey::new("g1".into(), "p2".into()); + + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + for index in 0..=TERMINAL_SHUTDOWN_RETENTION_LIMIT { + let shutdown_id = format!("shutdown-{index}"); + _ = state.shutdowns.insert( + shutdown_id.clone(), + terminal_shutdown_record("g1", "p1", &shutdown_id), + ); + ControllerRuntime::<()>::record_terminal_shutdown_locked( + &mut state, + &pipeline_key, + &shutdown_id, + Instant::now(), + ); + } + + let other_shutdown_id = "shutdown-other".to_owned(); + _ = state.shutdowns.insert( + other_shutdown_id.clone(), + terminal_shutdown_record("g1", "p2", &other_shutdown_id), + ); + ControllerRuntime::<()>::record_terminal_shutdown_locked( + &mut state, + &other_pipeline_key, + &other_shutdown_id, + Instant::now(), + ); + + assert!(!state.shutdowns.contains_key("shutdown-0")); + assert!(state.shutdowns.contains_key("shutdown-1")); + assert!(state.shutdowns.contains_key(&other_shutdown_id)); + assert_eq!( + state + .terminal_shutdowns + .get(&pipeline_key) + .map(|queue| queue.len()), + Some(TERMINAL_SHUTDOWN_RETENTION_LIMIT) + ); + assert_eq!( + state + .terminal_shutdowns + .get(&other_pipeline_key) + .map(|queue| queue.len()), + Some(1) + ); +} + +/// Scenario: terminal rollout and shutdown ids outlive their retention TTL in +/// the controller's in-memory history. +/// Guarantees: history pruning expires those terminal records and subsequent +/// by-id lookups return not found instead of growing unboundedly. +#[test] +fn terminal_operation_history_expires_after_ttl() { + let runtime = test_runtime(&engine_config_with_pipeline(simple_pipeline_yaml())); + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let rollout_id = "rollout-old".to_owned(); + let shutdown_id = "shutdown-old".to_owned(); + let prune_now = Instant::now() + .checked_add(TERMINAL_OPERATION_RETENTION_TTL + Duration::from_secs(2)) + .expect("synthetic prune deadline should be representable"); + let expired_at = prune_now + .checked_sub(TERMINAL_OPERATION_RETENTION_TTL + Duration::from_secs(1)) + .expect("synthetic completed_at should be representable"); + + { + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + + let mut rollout = terminal_rollout_record("g1", "p1", &rollout_id); + rollout.completed_at = Some(expired_at); + _ = state.rollouts.insert(rollout_id.clone(), rollout); + state + .terminal_rollouts + .entry(pipeline_key.clone()) + .or_default() + .push_back(rollout_id.clone()); + + let mut shutdown = terminal_shutdown_record("g1", "p1", &shutdown_id); + shutdown.completed_at = Some(expired_at); + _ = state.shutdowns.insert(shutdown_id.clone(), shutdown); + state + .terminal_shutdowns + .entry(pipeline_key.clone()) + .or_default() + .push_back(shutdown_id.clone()); + + // Use a synthetic future `now` here instead of relying on + // `Instant::now() - ttl`, which can underflow on Windows near the + // monotonic clock origin. + ControllerRuntime::<()>::prune_terminal_operation_history_locked(&mut state, prune_now); + } + + assert!(runtime.rollout_status_snapshot(&rollout_id).is_none()); + assert!(runtime.shutdown_status_snapshot(&shutdown_id).is_none()); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!(!state.rollouts.contains_key(&rollout_id)); + assert!(!state.shutdowns.contains_key(&shutdown_id)); + assert!(!state.terminal_rollouts.contains_key(&pipeline_key)); + assert!(!state.terminal_shutdowns.contains_key(&pipeline_key)); +} + +/// Scenario: an instance exits when there is no active rollout or shutdown for +/// its logical pipeline. +/// Guarantees: the controller does not retain that exited runtime instance as +/// history once no active control-plane operation depends on it. +#[test] +fn exited_runtime_instances_without_active_operation_are_pruned_immediately() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + let _rx = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let deployed_key = DeployedPipelineKey { + pipeline_group_id: "g1".into(), + pipeline_id: "p1".into(), + core_id: 0, + deployment_generation: 0, + }; + runtime.note_instance_exit(deployed_key.clone(), RuntimeInstanceExit::Success); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert!(!state.runtime_instances.contains_key(&deployed_key)); +} + +/// Scenario: a runtime thread reports exit before the controller finishes +/// registering the launched instance as active. +/// Guarantees: early exit bookkeeping is reconciled during registration, so +/// active-instance tracking does not leak and the pending-exit entry is cleared. +#[test] +fn register_launched_instance_reconciles_early_exit_without_leaking_active_count() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + register_existing_pipeline(&runtime, &config); + + let deployed_key = deployed_key("g1", "p1", 0, 0); + runtime.note_instance_exit(deployed_key.clone(), RuntimeInstanceExit::Success); + + runtime.register_launched_instance(launched_runtime_instance("g1", "p1", 0, 0)); + + let state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + assert_eq!(state.active_instances, 0); + assert!(!state.pending_instance_exits.contains_key(&deployed_key)); + assert!(!state.runtime_instances.contains_key(&deployed_key)); +} + +/// Scenario: a completed rollout has advanced the committed active generation, +/// but observed state still contains the older generation for the same core. +/// Guarantees: controller cleanup compacts observed state to the selected +/// active generation so retained instance memory no longer grows with rollout +/// count after completion. +#[test] +fn prune_pipeline_runtime_and_history_compacts_observed_state_to_active_generation() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + let _runner = ObservedStateRunner::start(&runtime); + register_existing_pipeline(&runtime, &config); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + report_ready(&runtime, deployed_key("g1", "p1", 0, 0)); + report_ready(&runtime, deployed_key("g1", "p1", 0, 1)); + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 2 + }); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(0, 1).is_some()); + + runtime + .observed_state_store + .set_pipeline_active_generation(pipeline_key.clone(), 1); + runtime.prune_pipeline_runtime_and_history(&pipeline_key); + + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 1 + }); + assert!(status.instance_status(0, 1).is_some()); + assert!(status.instance_status(0, 0).is_none()); +} + +/// Scenario: a logical pipeline has fully shut down and observed state still +/// contains an older generation alongside the final stopped generation. +/// Guarantees: controller cleanup keeps the last stopped generation per core so +/// `/status` remains useful after shutdown while superseded generations are +/// released. +#[test] +fn prune_pipeline_runtime_and_history_keeps_last_stopped_generation_view() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + let _runner = ObservedStateRunner::start(&runtime); + register_existing_pipeline(&runtime, &config); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + report_stopped(&runtime, deployed_key("g1", "p1", 0, 0)); + report_stopped(&runtime, deployed_key("g1", "p1", 0, 1)); + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 2 + }); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(0, 1).is_some()); + + runtime + .observed_state_store + .set_pipeline_active_generation(pipeline_key.clone(), 1); + runtime.prune_pipeline_runtime_and_history(&pipeline_key); + + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 1 + }); + assert_eq!(status.total_cores(), 1); + assert_eq!(status.running_cores(), 0); + assert!(matches!( + status + .instance_status(0, 1) + .expect("latest stopped generation should remain") + .phase(), + PipelinePhase::Stopped + )); + assert!(status.instance_status(0, 0).is_none()); +} + +/// Scenario: a pure resize-down retires one core without changing the active +/// generation, and observed state still retains both core instances on that +/// same generation. +/// Guarantees: controller cleanup compacts observed state to the committed +/// active core footprint so `/status` stops counting the drained core as +/// serving after the resize completes. +#[test] +fn prune_pipeline_runtime_and_history_compacts_resize_down_same_generation() { + let config = engine_config_with_pipeline( + r#" + policies: + resources: + core_allocation: + type: core_count + count: 2 + nodes: + receiver: + type: "urn:test:receiver:example" + config: null + exporter: + type: "urn:test:exporter:example" + config: null + connections: + - from: receiver + to: exporter +"#, + ); + let runtime = test_runtime(&config); + let _runner = ObservedStateRunner::start(&runtime); + register_existing_pipeline(&runtime, &config); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + report_ready(&runtime, deployed_key("g1", "p1", 0, 0)); + report_stopped(&runtime, deployed_key("g1", "p1", 1, 0)); + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 2 + }); + assert_eq!(status.total_cores(), 2); + assert_eq!(status.running_cores(), 1); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(1, 0).is_some()); + + runtime + .observed_state_store + .set_pipeline_active_cores(pipeline_key.clone(), [0]); + runtime.prune_pipeline_runtime_and_history(&pipeline_key); + + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 1 + }); + assert_eq!(status.total_cores(), 1); + assert_eq!(status.running_cores(), 1); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(1, 0).is_none()); +} + +/// Scenario: a runtime instance exits while a shutdown operation for the same +/// logical pipeline is still active and observed state contains overlapping +/// generations. +/// Guarantees: observed state is not compacted early, so controller wait paths +/// can continue reading generation-specific status until the shutdown finishes. +#[test] +fn note_instance_exit_does_not_compact_observed_state_while_shutdown_is_active() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + let _runner = ObservedStateRunner::start(&runtime); + register_existing_pipeline(&runtime, &config); + let _rx = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + report_ready(&runtime, deployed_key("g1", "p1", 0, 0)); + report_ready(&runtime, deployed_key("g1", "p1", 0, 1)); + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 2 + }); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(0, 1).is_some()); + + { + let mut state = runtime + .state + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let _ = state + .active_shutdowns + .insert(pipeline_key.clone(), "shutdown-0".to_owned()); + } + + runtime.note_instance_exit(deployed_key("g1", "p1", 0, 0), RuntimeInstanceExit::Success); + + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + status.per_instance().len() == 2 + }); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(0, 1).is_some()); +} + +/// Scenario: a watched runtime thread panics after the runtime instance has +/// already been admitted and marked ready in observed state. +/// Guarantees: the public runtime error message stays short while the recent +/// event stores richer panic diagnostics in `ErrorSummary::source`. +#[test] +fn runtime_thread_panic_populates_error_source_in_observed_status() { + let config = engine_config_with_pipeline(simple_pipeline_yaml()); + let runtime = test_runtime(&config); + let _runner = ObservedStateRunner::start(&runtime); + register_existing_pipeline(&runtime, &config); + + let deployed_key = deployed_key("g1", "p1", 0, 0); + let _rx = + register_runtime_instance(&runtime, "g1", "p1", 0, 0, RuntimeInstanceLifecycle::Active); + report_ready(&runtime, deployed_key.clone()); + + let pipeline_key = PipelineKey::new("g1".into(), "p1".into()); + let _ = wait_for_observed_status(&runtime, &pipeline_key, |status| { + matches!( + status + .instance_status(0, 0) + .map(|instance| instance.phase()), + Some(PipelinePhase::Running) + ) + }); + + runtime.note_instance_exit( + deployed_key, + RuntimeInstanceExit::Error(RuntimeInstanceError::from_panic(PanicReport::capture( + "runtime thread", + Box::new("boom"), + Some("pipeline-g1-p1-core-0".to_owned()), + Some(11), + Some(0), + ))), + ); + + let status = wait_for_observed_status(&runtime, &pipeline_key, |status| { + matches!( + status + .instance_status(0, 0) + .map(|instance| instance.phase()), + Some(PipelinePhase::Failed(_)) + ) + }); + let json = serde_json::to_value(&status).expect("status should serialize"); + let recent_event = &json["instances"][0]["status"]["recentEvents"][0]["Engine"]; + let error = &recent_event["type"]["Error"]["RuntimeError"]["Pipeline"]; + assert_eq!( + recent_event["message"], + "Pipeline encountered a runtime error." + ); + assert_eq!(error["error_kind"], "panic"); + assert_eq!(error["message"], "runtime thread panicked: boom"); + let source = error["source"] + .as_str() + .expect("runtime panic source should be serialized"); + assert!(source.contains("thread_name=pipeline-g1-p1-core-0")); + assert!(source.contains("thread_id=11")); + assert!(source.contains("core_id=0")); + assert!(source.contains("backtrace:")); +} diff --git a/rust/otap-dataflow/crates/core-nodes/src/exporters/topic_exporter/mod.rs b/rust/otap-dataflow/crates/core-nodes/src/exporters/topic_exporter/mod.rs index e2d4102e67..7c522cad97 100644 --- a/rust/otap-dataflow/crates/core-nodes/src/exporters/topic_exporter/mod.rs +++ b/rust/otap-dataflow/crates/core-nodes/src/exporters/topic_exporter/mod.rs @@ -263,6 +263,8 @@ impl TopicExporter { match queue_on_full { TopicQueueOnFullPolicy::Block => { let published = Arc::new(data.clone_without_context()); + // Preserve a cheap uncontended fast path: only retain a blocked + // publish when the topic runtime reports real backpressure. if should_track_end_to_end { let tracked_publisher = tracked_publisher .expect("tracked publisher should exist when ack propagation is auto"); @@ -385,6 +387,10 @@ impl Exporter for TopicExporter { let mut pending_outcomes: FuturesUnordered< Pin + Send>>, > = FuturesUnordered::new(); + // The exporter owns at most one blocked publish at a time. While that + // future is waiting inside the topic runtime, the inbox is switched to + // control-only reads so shutdown stays responsive and ownership of the + // blocked pdata remains unambiguous. let mut blocked_publish: Option = None; let tracked_publisher = (ack_propagation_mode == TopicAckPropagationMode::Auto) .then(|| topic.tracked_publisher()); diff --git a/rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/mod.rs b/rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/mod.rs index ff0107f07c..19678fae54 100644 --- a/rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/mod.rs +++ b/rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/mod.rs @@ -1068,6 +1068,65 @@ mod tests { .run_validation(drain_validation); } + /// Scenario: receiver-first shutdown reaches the pre-generated hot path + /// while it is sending many batches in one iteration. + /// Guarantees: the generated send loop yields often enough for the outer + /// control select to observe `DrainIngress` promptly instead of timing out + /// behind a long uncapped send burst. + #[test] + fn test_drain_ingress_exits_promptly_during_high_throughput_send_loop() { + let test_runtime = TestRuntime::new(); + + let registry_path = VirtualDirectoryPath::GitRepo { + url: "https://github.com/open-telemetry/semantic-conventions.git".to_owned(), + sub_folder: Some("model".to_owned()), + refspec: None, + }; + + let traffic_config = TrafficConfig::new(Some(1000), None, 1, 0, 0, 1); + let config = Config::new(traffic_config, registry_path) + .with_data_source(DataSource::Static) + .with_generation_strategy(GenerationStrategy::PreGenerated); + + let node_config = Arc::new(NodeUserConfig::new_receiver_config( + OTAP_FAKE_DATA_GENERATOR_URN, + )); + let telemetry_registry_handle = TelemetryRegistryHandle::new(); + let controller_ctx = ControllerContext::new(telemetry_registry_handle.clone()); + let pipeline_ctx = + controller_ctx.pipeline_context_with("grp".into(), "pipeline".into(), 0, 1, 0); + let receiver = ReceiverWrapper::local( + FakeGeneratorReceiver::new(pipeline_ctx, config), + test_node("fake_receiver_hot_drain"), + node_config, + test_runtime.config(), + ); + + let drain_scenario = + move |ctx: TestContext| -> Pin>> { + Box::pin(async move { + sleep(Duration::from_millis(200)).await; + let deadline = std::time::Instant::now() + Duration::from_secs(5); + ctx.send_control_msg(NodeControlMsg::DrainIngress { + deadline, + reason: "test hot drain".to_owned(), + }) + .await + .expect("Failed to send DrainIngress"); + }) + }; + + let drain_validation = + |_ctx: NotSendValidateContext| -> Pin>> { + Box::pin(async {}) + }; + + test_runtime + .set_receiver(receiver) + .run_test(drain_scenario) + .run_validation(drain_validation); + } + /// Regression test: verifies that a non-terminal control message /// (CollectTelemetry) arriving during the rate-limit sleep does NOT /// break the sleep early – the receiver should still respect the diff --git a/rust/otap-dataflow/crates/core-nodes/src/receivers/topic_receiver/mod.rs b/rust/otap-dataflow/crates/core-nodes/src/receivers/topic_receiver/mod.rs index feb1766256..a770b9ff9e 100644 --- a/rust/otap-dataflow/crates/core-nodes/src/receivers/topic_receiver/mod.rs +++ b/rust/otap-dataflow/crates/core-nodes/src/receivers/topic_receiver/mod.rs @@ -255,6 +255,13 @@ impl local::Receiver for TopicReceiver { ); let mut draining_deadline: Option = None; let mut draining_reason: Option = None; + // These represent two different handoff stages: + // - `pending_forward` is one permitted topic delivery that is still trying + // to enter the downstream pipeline and therefore must not be + // committed yet. + // - `pending_tracked_message_ids` are deliveries that were already + // forwarded and committed locally, but whose downstream Ack/Nack has + // not been bridged back to the topic runtime yet. let mut pending_tracked_message_ids = HashSet::new(); let mut pending_forward: Option = None; @@ -659,6 +666,12 @@ impl local::Receiver for TopicReceiver { let send_started_at = Instant::now(); match effect_handler.try_send_message_with_source_node(pdata) { Ok(()) => { + // Commit the topic delivery permit only after the + // downstream pipeline accepts the + // message. That keeps drain precise: + // an unadmitted message can still be + // aborted locally instead of turning + // into tracked in-flight work. delivery.commit(); if let Some(message_id) = tracked_message_id { _ = pending_tracked_message_ids.insert(message_id); diff --git a/rust/otap-dataflow/crates/engine/src/attributes.rs b/rust/otap-dataflow/crates/engine/src/attributes.rs index 303ce09217..761816b34d 100644 --- a/rust/otap-dataflow/crates/engine/src/attributes.rs +++ b/rust/otap-dataflow/crates/engine/src/attributes.rs @@ -91,6 +91,10 @@ pub struct PipelineAttributeSet { /// Pipeline group identifier. #[attribute] pub pipeline_group_id: Cow<'static, str>, + + /// Deployment generation for this runtime instance. + #[attribute] + pub deployment_generation: u64, } /// Node attributes. diff --git a/rust/otap-dataflow/crates/engine/src/context.rs b/rust/otap-dataflow/crates/engine/src/context.rs index 0e42692a2b..2eb733173d 100644 --- a/rust/otap-dataflow/crates/engine/src/context.rs +++ b/rust/otap-dataflow/crates/engine/src/context.rs @@ -126,6 +126,7 @@ pub struct PipelineContextParams { pub struct PipelineContext { controller_context: ControllerContext, pipeline_context_params: PipelineContextParams, + deployment_generation: u64, pipeline_telemetry_attrs: HashMap, node_id: ConfigNodeId, node_urn: NodeUrn, @@ -169,7 +170,28 @@ impl ControllerContext { num_cores: usize, thread_id: usize, ) -> PipelineContext { - PipelineContext::new( + self.pipeline_context_with_generation( + pipeline_group_id, + pipeline_id, + core_id, + num_cores, + thread_id, + 0, + ) + } + + /// Returns a new pipeline context with an explicit deployment generation. + #[must_use] + pub fn pipeline_context_with_generation( + &self, + pipeline_group_id: PipelineGroupId, + pipeline_id: PipelineId, + core_id: usize, + num_cores: usize, + thread_id: usize, + deployment_generation: u64, + ) -> PipelineContext { + PipelineContext::new_with_generation( self.clone(), PipelineContextParams { pipeline_group_id, @@ -178,6 +200,7 @@ impl ControllerContext { num_cores, thread_id, }, + deployment_generation, ) } @@ -212,13 +235,24 @@ impl ControllerContext { impl PipelineContext { /// Creates a new `PipelineContext`. + #[allow(dead_code)] pub(crate) fn new( parent_ctx: ControllerContext, pipeline_context_params: PipelineContextParams, + ) -> Self { + Self::new_with_generation(parent_ctx, pipeline_context_params, 0) + } + + /// Creates a new `PipelineContext` with an explicit deployment generation. + pub(crate) fn new_with_generation( + parent_ctx: ControllerContext, + pipeline_context_params: PipelineContextParams, + deployment_generation: u64, ) -> Self { Self { controller_context: parent_ctx, pipeline_context_params, + deployment_generation, node_id: Default::default(), node_urn: Default::default(), node_kind: Default::default(), @@ -248,6 +282,12 @@ impl PipelineContext { self.pipeline_context_params.core_id } + /// Returns the deployment generation associated with this pipeline runtime. + #[must_use] + pub const fn deployment_generation(&self) -> u64 { + self.deployment_generation + } + /// Returns the total number of cores allocated to this pipeline. /// /// This is useful for nodes that need to share resources (like disk budgets) @@ -451,6 +491,7 @@ impl PipelineContext { engine_attrs: self.engine_attribute_set(), pipeline_id: self.pipeline_context_params.pipeline_id.clone(), pipeline_group_id: self.pipeline_context_params.pipeline_group_id.clone(), + deployment_generation: self.deployment_generation, } } @@ -542,6 +583,7 @@ impl PipelineContext { Self { controller_context: self.controller_context.clone(), pipeline_context_params: self.pipeline_context_params.clone(), + deployment_generation: self.deployment_generation, pipeline_telemetry_attrs: self.pipeline_telemetry_attrs.clone(), node_id, node_urn, diff --git a/rust/otap-dataflow/crates/engine/src/local/message.rs b/rust/otap-dataflow/crates/engine/src/local/message.rs index c6fa64bc5f..c56e6b0a3a 100644 --- a/rust/otap-dataflow/crates/engine/src/local/message.rs +++ b/rust/otap-dataflow/crates/engine/src/local/message.rs @@ -255,4 +255,14 @@ impl LocalReceiver { LocalReceiverInner::Mpmc(receiver) => receiver.is_empty(), } } + + /// Returns `true` once all senders are gone or the channel has been + /// explicitly closed. + #[must_use] + pub fn is_closed(&self) -> bool { + match &self.inner { + LocalReceiverInner::Mpsc(receiver) => receiver.is_closed(), + LocalReceiverInner::Mpmc(receiver) => receiver.is_closed(), + } + } } diff --git a/rust/otap-dataflow/crates/engine/src/message.rs b/rust/otap-dataflow/crates/engine/src/message.rs index 9957fedabf..39acff3ea1 100644 --- a/rust/otap-dataflow/crates/engine/src/message.rs +++ b/rust/otap-dataflow/crates/engine/src/message.rs @@ -182,6 +182,15 @@ impl Receiver { Receiver::Shared(receiver) => receiver.is_empty(), } } + + /// Checks whether the receive side has observed channel closure. + #[must_use] + pub fn is_closed(&self) -> bool { + match self { + Receiver::Local(receiver) => receiver.is_closed(), + Receiver::Shared(receiver) => receiver.is_closed(), + } + } } /// Small private adapter trait used by [`InboxCore`]. @@ -201,6 +210,8 @@ trait ChannelReceiver { fn try_recv(&mut self) -> Result; fn is_empty(&self) -> bool; + + fn is_closed(&self) -> bool; } impl ChannelReceiver for Receiver { @@ -215,6 +226,10 @@ impl ChannelReceiver for Receiver { fn is_empty(&self) -> bool { Receiver::is_empty(self) } + + fn is_closed(&self) -> bool { + Receiver::is_closed(self) + } } impl ChannelReceiver for SharedReceiver { @@ -229,6 +244,10 @@ impl ChannelReceiver for SharedReceiver { fn is_empty(&self) -> bool { SharedReceiver::is_empty(self) } + + fn is_closed(&self) -> bool { + SharedReceiver::is_closed(self) + } } /// Shutdown-drain policy for [`InboxCore::recv_with_policy`]. @@ -344,10 +363,14 @@ where } fn shutdown_drain_complete(&self) -> bool { - self.pdata_rx - .as_ref() - .expect("pdata_rx must exist") - .is_empty() + let pdata_rx = self.pdata_rx.as_ref().expect("pdata_rx must exist"); + // Shutdown may only be released once no upstream sender can still + // deliver more work into this inbox. Queue emptiness alone is not + // sufficient because an upstream node can still finish one already + // admitted message outside the channel and send it after we observe an + // empty buffer. + pdata_rx.is_closed() + && pdata_rx.is_empty() && self .local_scheduler .as_ref() @@ -1095,4 +1118,58 @@ mod tests { Message::Control(NodeControlMsg::Shutdown { .. }) )); } + + /// Scenario: an exporter has latched shutdown, its bounded inbox is + /// temporarily empty, but an upstream sender is still alive and may still + /// forward one already-admitted message later. + /// Guarantees: the exporter does not release the latched shutdown on queue + /// emptiness alone; it stays alive until that late pdata arrives and the + /// upstream channel closes. + #[tokio::test] + async fn exporter_inbox_waits_for_upstream_closure_before_shutdown() { + let (control_tx, control_rx) = mpsc::Channel::>::new(16); + let (pdata_tx, pdata_rx) = mpsc::Channel::::new(16); + let mut inbox = ExporterInbox::new( + Receiver::Local(LocalReceiver::mpsc(control_rx)), + Receiver::Local(LocalReceiver::mpsc(pdata_rx)), + 9, + Interests::empty(), + ); + + control_tx + .send_async(NodeControlMsg::Shutdown { + deadline: Instant::now() + Duration::from_secs(1), + reason: "shutdown".to_owned(), + }) + .await + .expect("shutdown should enqueue"); + + let pending = tokio::time::timeout(Duration::from_millis(20), inbox.recv_when(false)).await; + assert!( + pending.is_err(), + "shutdown should stay latched while upstream senders can still deliver pdata" + ); + + pdata_tx + .send_async(TestMsg::new("late")) + .await + .expect("late pdata should enqueue"); + + let drained = tokio::time::timeout(Duration::from_millis(50), inbox.recv_when(false)) + .await + .expect("late pdata should unblock the exporter inbox") + .expect("late pdata should drain"); + assert!(matches!(drained, Message::PData(TestMsg(ref value)) if value == "late")); + + drop(pdata_tx); + + let shutdown = tokio::time::timeout(Duration::from_millis(50), inbox.recv_when(false)) + .await + .expect("shutdown should follow once upstream closes") + .expect("shutdown control should arrive"); + assert!(matches!( + shutdown, + Message::Control(NodeControlMsg::Shutdown { .. }) + )); + } } diff --git a/rust/otap-dataflow/crates/engine/src/pipeline_ctrl.rs b/rust/otap-dataflow/crates/engine/src/pipeline_ctrl.rs index 60827c7a35..49b209a54b 100644 --- a/rust/otap-dataflow/crates/engine/src/pipeline_ctrl.rs +++ b/rust/otap-dataflow/crates/engine/src/pipeline_ctrl.rs @@ -1332,6 +1332,7 @@ mod tests { pipeline_group_id, pipeline_id, core_id, + deployment_generation: 0, }, pipeline_context, pipeline_rx, @@ -1813,6 +1814,7 @@ mod tests { pipeline_group_id: pipeline_group_id.clone(), pipeline_id: pipeline_id.clone(), core_id, + deployment_generation: 0, }; let controller_context = ControllerContext::new(metrics_system.registry()); let pipeline_context_params = PipelineContextParams { @@ -3175,6 +3177,7 @@ mod tests { pipeline_group_id, pipeline_id, core_id: 0, + deployment_generation: 0, }, pipeline_context.clone(), pipeline_rx, @@ -3416,6 +3419,7 @@ mod tests { pipeline_group_id, pipeline_id, core_id: 0, + deployment_generation: 0, }, pipeline_context, pipeline_rx, @@ -3492,6 +3496,7 @@ mod tests { pipeline_group_id, pipeline_id, core_id: 0, + deployment_generation: 0, }, pipeline_context, pipeline_rx, diff --git a/rust/otap-dataflow/crates/engine/src/shared/message.rs b/rust/otap-dataflow/crates/engine/src/shared/message.rs index 5722e307c7..23a6139227 100644 --- a/rust/otap-dataflow/crates/engine/src/shared/message.rs +++ b/rust/otap-dataflow/crates/engine/src/shared/message.rs @@ -267,6 +267,16 @@ impl SharedReceiver { SharedReceiverInner::Mpmc(receiver) => receiver.is_empty(), } } + + /// Returns `true` once all senders are gone or the channel has been + /// explicitly closed. + #[must_use] + pub fn is_closed(&self) -> bool { + match &self.inner { + SharedReceiverInner::Mpsc(receiver) => receiver.is_closed(), + SharedReceiverInner::Mpmc(receiver) => receiver.is_disconnected(), + } + } } #[cfg(test)] diff --git a/rust/otap-dataflow/crates/engine/src/testing/dst/common.rs b/rust/otap-dataflow/crates/engine/src/testing/dst/common.rs index b53f9530e7..14276dc1e3 100644 --- a/rust/otap-dataflow/crates/engine/src/testing/dst/common.rs +++ b/rust/otap-dataflow/crates/engine/src/testing/dst/common.rs @@ -139,6 +139,7 @@ pub(super) fn build_manager( pipeline_group_id, pipeline_id, core_id: 0, + deployment_generation: 0, }, pipeline_context.clone(), pipeline_rx, diff --git a/rust/otap-dataflow/crates/engine/src/testing/exporter.rs b/rust/otap-dataflow/crates/engine/src/testing/exporter.rs index 6c239ad38d..dbf043ba86 100644 --- a/rust/otap-dataflow/crates/engine/src/testing/exporter.rs +++ b/rust/otap-dataflow/crates/engine/src/testing/exporter.rs @@ -40,7 +40,7 @@ pub struct TestContext { /// Sender for control messages control_tx: Sender>, /// Sender for pipeline data - pdata_tx: Sender, + pdata_tx: Option>, /// Message counter for tracking processed messages counters: CtrlMsgCounters, /// Receiver for runtime control messages @@ -71,7 +71,7 @@ impl TestContext { ) -> Self { Self { control_tx, - pdata_tx, + pdata_tx: Some(pdata_tx), counters, runtime_ctrl_msg_receiver: None, pipeline_completion_msg_receiver: None, @@ -142,7 +142,11 @@ impl TestContext { /// /// Returns an error if the message could not be sent. pub async fn send_pdata(&self, content: PData) -> Result<(), SendError> { - self.pdata_tx.send(content).await + self.pdata_tx + .as_ref() + .expect("pdata sender must exist during the active test phase") + .send(content) + .await } /// Sleeps for the specified duration. @@ -362,10 +366,15 @@ impl ValidationPhase { let ValidationPhase { rt, local_tasks, - context, + mut context, run_exporter_handle, } = self; + // Validation does not drive new pdata. Drop the harness-side sender + // clone before waiting for exporter shutdown so tests do not keep the + // exporter input channel artificially open after the scenario finishes. + let _ = context.pdata_tx.take(); + // First run all the spawned tasks to completion rt.block_on(local_tasks); diff --git a/rust/otap-dataflow/crates/engine/src/topic/topic.rs b/rust/otap-dataflow/crates/engine/src/topic/topic.rs index 806e77a5ad..5503aac5b9 100644 --- a/rust/otap-dataflow/crates/engine/src/topic/topic.rs +++ b/rust/otap-dataflow/crates/engine/src/topic/topic.rs @@ -692,8 +692,10 @@ impl MixedTopic { } // Acquire balanced-group capacity atomically from the publisher's point of - // view: if one group is full, drop any partial acquisitions before waiting - // and retry from a fresh mixed-topic snapshot. + // view. Mixed topics are intentionally all-or-nothing across balanced and + // broadcast delivery, so publishers must not keep permits reserved in + // "fast" groups while they wait on a "slow" one. Drop any partial + // acquisitions before awaiting capacity and retry from a fresh snapshot. async fn acquire_balanced_admission( &self, ) -> Result<(Arc<[Arc>]>, BalancedPermitVec), Error> { @@ -750,6 +752,9 @@ impl MixedTopic { } } + // Broadcast is intentionally last so mixed async publish has the same + // visible contract as mixed try_publish: no broadcast subscriber can + // observe a message before the balanced side has admitted it. self.broadcast_ring.publish(Envelope { id, tracked: false, @@ -772,6 +777,9 @@ impl MixedTopic { let id = self.next_message_id(); if self.has_balanced_groups.load(Ordering::Acquire) { let (groups, permits) = self.acquire_balanced_admission().await?; + // Tracked publish capacity is consumed only after admission + // succeeds. Waiting on a full balanced group must not hand back a + // receipt for a message that has not been accepted anywhere yet. let receipt = self.outcomes.register(id, timeout, permit); let envelope = Envelope { id, @@ -821,6 +829,8 @@ impl MixedTopic { }; let (permits, blocking_group) = Self::try_acquire_balanced_admission(&groups)?; if blocking_group.is_some() { + // Keep try_publish all-or-nothing for mixed topics: if balanced + // admission fails, nothing is published to broadcast either. Ok((PublishOutcome::DroppedOnFull, id)) } else { for (group, permit) in groups.as_ref().iter().zip(permits) { diff --git a/rust/otap-dataflow/crates/otap/tests/core_node_liveness_tests.rs b/rust/otap-dataflow/crates/otap/tests/core_node_liveness_tests.rs index e0c45127dd..2a39bfe532 100644 --- a/rust/otap-dataflow/crates/otap/tests/core_node_liveness_tests.rs +++ b/rust/otap-dataflow/crates/otap/tests/core_node_liveness_tests.rs @@ -231,6 +231,7 @@ fn run_pipeline_with_condition( pipeline_group_id: pipeline_group_id.clone(), pipeline_id: pipeline_id.clone(), core_id: 0, + deployment_generation: 0, }; let metrics_reporter = telemetry_system.reporter(); let event_reporter = observed_state_store.reporter(SendPolicy::default()); @@ -365,6 +366,7 @@ where pipeline_group_id: pipeline_group_id.clone(), pipeline_id: pipeline_id.clone(), core_id: 0, + deployment_generation: 0, }; let metrics_reporter = telemetry_system.reporter(); let event_reporter = observed_state_store.reporter(SendPolicy::default()); diff --git a/rust/otap-dataflow/crates/otap/tests/durable_buffer_processor_tests.rs b/rust/otap-dataflow/crates/otap/tests/durable_buffer_processor_tests.rs index a7ffbca265..021c896ebf 100644 --- a/rust/otap-dataflow/crates/otap/tests/durable_buffer_processor_tests.rs +++ b/rust/otap-dataflow/crates/otap/tests/durable_buffer_processor_tests.rs @@ -510,6 +510,7 @@ where pipeline_group_id: pipeline_group_id.clone(), pipeline_id: pipeline_id.clone(), core_id: 0, + deployment_generation: 0, }; // Create a metrics reporter with our own receiver so we can inspect metrics. // Use a very large channel so it never overflows, even on extremely slow CI @@ -825,6 +826,7 @@ where pipeline_group_id: pipeline_group_id.clone(), pipeline_id: pipeline_id.clone(), core_id: 0, + deployment_generation: 0, }; let metrics_reporter = telemetry_system.reporter(); let event_reporter = observed_state_store.reporter(SendPolicy::default()); diff --git a/rust/otap-dataflow/crates/otap/tests/pipeline_tests.rs b/rust/otap-dataflow/crates/otap/tests/pipeline_tests.rs index dc3d111cb2..c308b37e70 100644 --- a/rust/otap-dataflow/crates/otap/tests/pipeline_tests.rs +++ b/rust/otap-dataflow/crates/otap/tests/pipeline_tests.rs @@ -87,6 +87,7 @@ fn test_telemetry_registries_cleanup() { pipeline_group_id, pipeline_id, core_id: 0, + deployment_generation: 0, }; let metrics_reporter = telemetry_system.reporter(); let event_reporter = observed_state_store.reporter(SendPolicy::default()); diff --git a/rust/otap-dataflow/crates/state/src/pipeline_rt_status.rs b/rust/otap-dataflow/crates/state/src/pipeline_rt_status.rs index 54acc4d79a..8b2fd73339 100644 --- a/rust/otap-dataflow/crates/state/src/pipeline_rt_status.rs +++ b/rust/otap-dataflow/crates/state/src/pipeline_rt_status.rs @@ -269,24 +269,33 @@ impl PipelineRuntimeStatus { } } + /// Returns the current Accepted condition for this runtime instance. #[must_use] - pub(crate) const fn accepted_condition(&self) -> &ConditionState { + pub const fn accepted_condition(&self) -> &ConditionState { &self.accepted_condition } #[must_use] - pub(crate) const fn ready_condition(&self) -> &ConditionState { + /// Returns the current Ready condition for this runtime instance. + pub const fn ready_condition(&self) -> &ConditionState { &self.ready_condition } #[must_use] - pub(crate) fn conditions(&self) -> [Condition; 2] { + /// Returns the serialized condition pair for this runtime instance. + pub fn conditions(&self) -> [Condition; 2] { [ Condition::from_state(ConditionKind::Accepted, &self.accepted_condition), Condition::from_state(ConditionKind::Ready, &self.ready_condition), ] } + /// Returns the current phase for this runtime instance. + #[must_use] + pub const fn phase(&self) -> PipelinePhase { + self.phase + } + /// Apply a single observed event to this pipeline. /// Returns what changed (if anything) or an Error::InvalidTransition. pub(crate) fn apply(&mut self, event_type: EventType) -> Result { @@ -382,6 +391,9 @@ impl PipelineRuntimeStatus { (PipelinePhase::Draining, EventType::Error(ErrEv::DrainError(_))) => { self.goto(PipelinePhase::Failed(FailReason::DrainError)) } + (PipelinePhase::Draining, EventType::Error(ErrEv::RuntimeError(_))) => { + self.goto(PipelinePhase::Failed(FailReason::RuntimeError)) + } (PipelinePhase::Draining, EventType::Request(Req::DeleteRequested)) => { if !self.delete_pending { self.delete_pending = true; @@ -680,6 +692,29 @@ mod tests { assert_eq!(p2.phase, PipelinePhase::Failed(FailReason::DeleteError)); } + #[test] + fn draining_runtime_error_becomes_terminal_failure() { + let mut p = PipelineRuntimeStatus::default(); + p.apply_many([ + EventType::Success(OkEv::Admitted), + EventType::Success(OkEv::Ready), + EventType::Request(Req::ShutdownRequested), + ]) + .unwrap(); + + _ = p + .apply(EventType::Error(ErrEv::RuntimeError( + ErrorSummary::Pipeline { + error_kind: "".to_string(), + message: "late send failed during shutdown".to_string(), + source: None, + }, + ))) + .unwrap(); + + assert_eq!(p.phase, PipelinePhase::Failed(FailReason::RuntimeError)); + } + #[test] fn invalid_transition_is_error() { let mut p = PipelineRuntimeStatus::default(); // Pending diff --git a/rust/otap-dataflow/crates/state/src/pipeline_status.rs b/rust/otap-dataflow/crates/state/src/pipeline_status.rs index 1c281a69f8..230bdb3797 100644 --- a/rust/otap-dataflow/crates/state/src/pipeline_status.rs +++ b/rust/otap-dataflow/crates/state/src/pipeline_status.rs @@ -1,7 +1,7 @@ // Copyright The OpenTelemetry Authors // SPDX-License-Identifier: Apache-2.0 -//! Observed pipeline status and aggregation logic per core. +//! Observed pipeline status and aggregation logic per runtime instance. use crate::conditions::{ Condition, ConditionKind, ConditionReason, ConditionState, ConditionStatus, @@ -12,15 +12,81 @@ use otap_df_config::CoreId; use otap_df_config::health::{HealthPolicy, PhaseKind, Quorum}; use serde::Serialize; use serde::ser::SerializeStruct; -use std::collections::HashMap; +use std::collections::{HashMap, HashSet}; use std::time::SystemTime; -/// Aggregated, controller-synthesized view for a pipeline across all targeted -/// cores. This is what external APIs will return for `status`. +/// Unique runtime-instance key for a logical pipeline. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct RuntimeInstanceKey { + /// CPU core hosting the runtime instance. + pub core_id: CoreId, + /// Deployment generation for this runtime instance. + pub deployment_generation: u64, +} + +/// Rollout state summary exposed on pipeline status snapshots. +#[derive(Debug, Clone, Serialize, PartialEq, Eq)] +#[serde(rename_all = "snake_case")] +pub enum PipelineRolloutState { + /// Rollout has been accepted but work has not started yet. + Pending, + /// Rollout is actively applying changes. + Running, + /// Rollout completed successfully and the target generation is serving. + Succeeded, + /// Rollout failed before completion. + Failed, + /// Automatic rollback is in progress. + RollingBack, + /// Rollback could not restore a fully healthy serving set. + RollbackFailed, +} + +/// Lightweight rollout summary embedded into `/status` pipeline payloads. +#[derive(Debug, Clone, Serialize, PartialEq, Eq)] +#[serde(rename_all = "camelCase")] +pub struct PipelineRolloutSummary { + /// Controller-assigned rollout identifier. + pub rollout_id: String, + /// Current rollout lifecycle state. + pub state: PipelineRolloutState, + /// Candidate generation being rolled out. + pub target_generation: u64, + /// RFC3339 timestamp for rollout creation. + pub started_at: String, + /// RFC3339 timestamp for the latest rollout state transition. + pub updated_at: String, + /// Human-readable failure or rollback reason when present. + #[serde(skip_serializing_if = "Option::is_none")] + pub failure_reason: Option, +} + +#[derive(Debug, Serialize)] +#[serde(rename_all = "camelCase")] +struct RuntimeInstanceStatusView<'a> { + core_id: CoreId, + deployment_generation: u64, + status: &'a PipelineRuntimeStatus, +} + +/// Aggregated, controller-synthesized view for a logical pipeline. #[derive(Debug, Clone)] pub struct PipelineStatus { - /// Per-core details to aid debugging and aggregation. - pub(crate) cores: HashMap, + /// Per-instance details to aid debugging and overlap-aware generation aggregation. + pub(crate) instances: HashMap, + + /// Serving generation selected per core by the controller during rollout. + pub(crate) serving_generations: HashMap, + + /// Last committed generation for this logical pipeline. + pub(crate) active_generation: Option, + + /// Committed core footprint for the active generation when no rollout-specific + /// per-core serving override is active. + pub(crate) active_cores: HashSet, + + /// Optional rollout summary for UI/API consumers. + pub(crate) rollout: Option, health_policy: HealthPolicy, } @@ -28,40 +94,111 @@ pub struct PipelineStatus { impl PipelineStatus { pub(crate) fn new(health_policy: HealthPolicy) -> Self { Self { - cores: HashMap::new(), + instances: HashMap::new(), + serving_generations: HashMap::new(), + active_generation: None, + active_cores: HashSet::new(), + rollout: None, health_policy, } } - /// Returns the current per-core status map. + /// Returns the current per-instance status map. #[must_use] - pub const fn per_core(&self) -> &HashMap { - &self.cores + pub const fn per_instance(&self) -> &HashMap { + &self.instances + } + + /// Returns the current serving generation map keyed by core. + #[must_use] + pub const fn serving_generations(&self) -> &HashMap { + &self.serving_generations + } + + /// Returns the committed active generation, if known. + #[must_use] + pub const fn active_generation(&self) -> Option { + self.active_generation + } + + /// Returns the runtime status for a specific `(core, generation)`. + #[must_use] + /// Returns the status for one observed runtime instance generation on a core. + pub fn instance_status( + &self, + core_id: CoreId, + deployment_generation: u64, + ) -> Option<&PipelineRuntimeStatus> { + self.instances.get(&RuntimeInstanceKey { + core_id, + deployment_generation, + }) + } + + /// Records the committed active generation for this logical pipeline. + pub(crate) fn set_active_generation(&mut self, generation: u64) { + self.active_generation = Some(generation); + } + + /// Records the committed serving core footprint for the active generation. + pub(crate) fn set_active_cores(&mut self, core_ids: I) + where + I: IntoIterator, + { + self.active_cores = core_ids.into_iter().collect(); + } + + /// Pins the serving generation chosen for one logical core. + pub(crate) fn set_serving_generation(&mut self, core_id: CoreId, generation: u64) { + _ = self.serving_generations.insert(core_id, generation); + } + + /// Removes the serving-generation override for one logical core. + pub(crate) fn clear_serving_generation(&mut self, core_id: CoreId) { + let _ = self.serving_generations.remove(&core_id); + } + + /// Stores the rollout summary currently exposed for this pipeline. + pub(crate) fn set_rollout_summary(&mut self, rollout: PipelineRolloutSummary) { + self.rollout = Some(rollout); + } + + /// Clears the rollout summary once no rollout is active anymore. + pub(crate) fn clear_rollout_summary(&mut self) { + self.rollout = None; + } + + /// Compacts retained runtime instances down to the generations currently + /// selected for status aggregation. + pub(crate) fn compact_instances_to_selected(&mut self) { + let retained: HashSet<_> = self.selected_runtime_keys().into_iter().collect(); + self.instances.retain(|key, _| retained.contains(key)); } #[must_use] - /// Returns the number of cores currently tracked for this pipeline. + /// Returns the number of currently serving cores for this logical pipeline. pub fn total_cores(&self) -> usize { - self.cores.len() + self.selected_runtimes().len() } #[must_use] - /// Returns how many cores are presently in the running phase. + /// Returns how many serving cores are presently in the running phase. pub fn running_cores(&self) -> usize { - self.cores - .values() - .filter(|c| matches!(c.phase, PipelinePhase::Running)) + self.selected_runtimes() + .into_iter() + .filter(|(_, runtime)| matches!(runtime.phase, PipelinePhase::Running)) .count() } #[must_use] - /// Returns true if all cores have reached a terminal state (Stopped, Deleted, Failed, or Rejected). - /// Returns false if there are no cores tracked or if any core is still active. + /// Returns true if all observed runtime instances have reached a terminal state. pub fn is_terminated(&self) -> bool { - if self.cores.is_empty() { + if self.instances.is_empty() { return false; } - self.cores.values().all(|c| c.phase.is_terminal()) + self.instances + .values() + .all(|runtime| runtime.phase.is_terminal()) } #[must_use] @@ -73,8 +210,10 @@ impl PipelineStatus { ] } + /// Aggregates the accepted condition across the selected serving runtimes. fn aggregate_accepted_condition(&self) -> Condition { - if self.cores.is_empty() { + let selected = self.selected_runtimes(); + if selected.is_empty() { return Condition { kind: ConditionKind::Accepted, status: ConditionStatus::Unknown, @@ -89,7 +228,7 @@ impl PipelineStatus { let mut any_unknown: Option = None; let mut latest_true_time: Option = None; - for runtime in self.cores.values() { + for (_, runtime) in selected { let cond = runtime.accepted_condition().clone(); match cond.status { ConditionStatus::True => { @@ -122,7 +261,10 @@ impl PipelineStatus { status: ConditionStatus::False, reason: state.reason.clone().or(Some(ConditionReason::NotAccepted)), message: state.message.clone().or_else(|| { - Some("One or more cores have not accepted the configuration.".to_string()) + Some( + "One or more serving cores have not accepted the configuration." + .to_string(), + ) }), last_transition_time: state.last_transition_time, }; @@ -136,10 +278,9 @@ impl PipelineStatus { .reason .clone() .or_else(|| Some(ConditionReason::unknown("Unknown"))), - message: state - .message - .clone() - .or_else(|| Some("Acceptance is unknown for one or more cores.".to_string())), + message: state.message.clone().or_else(|| { + Some("Acceptance is unknown for one or more serving cores.".to_string()) + }), last_transition_time: state.last_transition_time, }; } @@ -149,15 +290,17 @@ impl PipelineStatus { status: ConditionStatus::True, reason: Some(ConditionReason::ConfigValid), message: Some( - "Pipeline configuration validated and resource policy constraints are satisfied." + "Serving pipeline configuration validated and resource policy constraints are satisfied." .to_string(), ), last_transition_time: latest_true_time, } } + /// Aggregates the ready condition across the selected serving runtimes. fn aggregate_ready_condition(&self) -> Condition { - if self.cores.is_empty() { + let selected = self.selected_runtimes(); + if selected.is_empty() { return Condition { kind: ConditionKind::Ready, status: ConditionStatus::Unknown, @@ -167,8 +310,9 @@ impl PipelineStatus { }; } - let (ready_numer, ready_denom) = self.count_quorum(|c| { - c.phase.kind() != PhaseKind::Deleted && self.health_policy.is_ready(c.phase.kind()) + let (ready_numer, ready_denom) = self.count_quorum_from(&selected, |runtime| { + runtime.phase.kind() != PhaseKind::Deleted + && self.health_policy.is_ready(runtime.phase.kind()) }); let required = required_ready_count(self.health_policy.ready_quorum, ready_denom); let readiness_met = ready_denom > 0 && ready_numer >= required; @@ -178,7 +322,7 @@ impl PipelineStatus { let mut latest_false_time: Option = None; let mut latest_unknown: Option = None; - for runtime in self.cores.values() { + for (_, runtime) in selected { let cond = runtime.ready_condition().clone(); match cond.status { ConditionStatus::True => { @@ -224,14 +368,16 @@ impl PipelineStatus { kind: ConditionKind::Ready, status: ConditionStatus::False, reason: Some(ConditionReason::NoActiveCores), - message: Some("No active cores are available to evaluate readiness.".to_string()), + message: Some( + "No active serving cores are available to evaluate readiness.".to_string(), + ), last_transition_time: last_time, }; } if let Some(state) = latest_false { let message = format!( - "Pipeline is not ready; ready quorum {} not met ({} of {} cores ready).", + "Pipeline is not ready; ready quorum {} not met ({} of {} serving cores ready).", describe_quorum(self.health_policy.ready_quorum, required), ready_numer, ready_denom @@ -253,10 +399,9 @@ impl PipelineStatus { .reason .clone() .or_else(|| Some(ConditionReason::unknown("Unknown"))), - message: state - .message - .clone() - .or_else(|| Some("Readiness is unknown for one or more cores.".to_string())), + message: state.message.clone().or_else(|| { + Some("Readiness is unknown for one or more serving cores.".to_string()) + }), last_transition_time: state.last_transition_time, }; } @@ -270,43 +415,121 @@ impl PipelineStatus { } } - /// Returns a boolean representing the liveness across cores, governed by the aggregation - /// policy. + /// Returns a boolean representing the liveness across serving cores. #[must_use] pub fn liveness(&self) -> bool { - let (numer, denom) = self.count_quorum(|c| self.health_policy.is_live(c.phase.kind())); + let selected = self.selected_runtimes(); + let (numer, denom) = self.count_quorum_from(&selected, |runtime| { + self.health_policy.is_live(runtime.phase.kind()) + }); quorum_satisfied(numer, denom, self.health_policy.live_quorum) } - /// Returns a boolean representing the readiness across cores, governed by the aggregation - /// policy. + /// Returns a boolean representing the readiness across serving cores. #[must_use] pub fn readiness(&self) -> bool { - let (numer, denom) = self.count_quorum(|c| { - c.phase.kind() != PhaseKind::Deleted && self.health_policy.is_ready(c.phase.kind()) + let selected = self.selected_runtimes(); + let (numer, denom) = self.count_quorum_from(&selected, |runtime| { + runtime.phase.kind() != PhaseKind::Deleted + && self.health_policy.is_ready(runtime.phase.kind()) }); denom > 0 && quorum_satisfied(numer, denom, self.health_policy.ready_quorum) } - /// Counts how many cores satisfy the given predicate, returning (numerator, denominator). - /// - /// The denominator excludes cores in `Deleted` phase. - /// The numerator excludes cores in `Deleted` phase and counts only cores satisfying the - /// predicate. The predicate is usually checking for liveness or readiness. - fn count_quorum(&self, pred: F) -> (usize, usize) + /// Selects the runtime instances that currently represent this logical pipeline. + fn selected_runtime_keys(&self) -> Vec { + if !self.serving_generations.is_empty() { + return self + .serving_generations + .iter() + .map(|(core_id, generation)| RuntimeInstanceKey { + core_id: *core_id, + deployment_generation: *generation, + }) + .filter(|key| self.instances.contains_key(key)) + .collect(); + } + + if let Some(active_generation) = self.active_generation { + let selected: Vec<_> = self + .instances + .iter() + .filter(|(key, _)| { + key.deployment_generation == active_generation + && (self.active_cores.is_empty() + || self.active_cores.contains(&key.core_id)) + }) + .map(|(key, _)| *key) + .collect(); + if !selected.is_empty() { + return selected; + } + } + + let mut per_core: HashMap = HashMap::new(); + for key in self.instances.keys() { + if !self.active_cores.is_empty() && !self.active_cores.contains(&key.core_id) { + continue; + } + let replace = per_core + .get(&key.core_id) + .is_none_or(|existing| key.deployment_generation > existing.deployment_generation); + if replace { + _ = per_core.insert(key.core_id, *key); + } + } + per_core.into_values().collect() + } + + /// Selects the runtime instances that should contribute to aggregated status. + fn selected_runtimes(&self) -> Vec<(RuntimeInstanceKey, &PipelineRuntimeStatus)> { + self.selected_runtime_keys() + .into_iter() + .filter_map(|key| self.instances.get(&key).map(|runtime| (key, runtime))) + .collect() + } + + /// Builds a per-core view of the selected runtime instances. + fn selected_core_map(&self) -> HashMap { + self.selected_runtimes() + .into_iter() + .map(|(key, runtime)| (key.core_id, runtime.clone())) + .collect() + } + + /// Builds the retained per-instance view exposed for overlap-aware status + /// debugging. + fn retained_instance_views(&self) -> Vec> { + let mut instances = self + .instances + .iter() + .map(|(key, status)| RuntimeInstanceStatusView { + core_id: key.core_id, + deployment_generation: key.deployment_generation, + status, + }) + .collect::>(); + instances.sort_by_key(|instance| (instance.core_id, instance.deployment_generation)); + instances + } + + /// Counts how many selected runtimes satisfy a quorum predicate. + fn count_quorum_from( + &self, + selected: &[(RuntimeInstanceKey, &PipelineRuntimeStatus)], + pred: F, + ) -> (usize, usize) where F: Fn(&PipelineRuntimeStatus) -> bool, { - let denom = self - .cores - .values() - .filter(|c| c.phase.kind() != PhaseKind::Deleted) + let denom = selected + .iter() + .filter(|(_, runtime)| runtime.phase.kind() != PhaseKind::Deleted) .count(); - let numer = self - .cores - .values() - .filter(|c| c.phase.kind() != PhaseKind::Deleted) - .filter(|c| pred(c)) + let numer = selected + .iter() + .filter(|(_, runtime)| runtime.phase.kind() != PhaseKind::Deleted) + .filter(|(_, runtime)| pred(runtime)) .count(); (numer, denom) } @@ -368,12 +591,18 @@ impl Serialize for PipelineStatus { where S: serde::Serializer, { - let mut state = serializer.serialize_struct("PipelineStatus", 5)?; - let conditions = self.conditions(); - state.serialize_field("conditions", &conditions)?; + let selected_cores = self.selected_core_map(); + let retained_instances = self.retained_instance_views(); + + let mut state = serializer.serialize_struct("PipelineStatus", 8)?; + state.serialize_field("conditions", &self.conditions())?; state.serialize_field("totalCores", &self.total_cores())?; state.serialize_field("runningCores", &self.running_cores())?; - state.serialize_field("cores", &self.cores)?; + state.serialize_field("cores", &selected_cores)?; + state.serialize_field("instances", &retained_instances)?; + state.serialize_field("activeGeneration", &self.active_generation)?; + state.serialize_field("servingGenerations", &self.serving_generations)?; + state.serialize_field("rollout", &self.rollout)?; state.end() } } @@ -383,7 +612,6 @@ mod tests { use super::*; use crate::conditions::{ConditionKind, ConditionReason, ConditionState, ConditionStatus}; use crate::phase::FailReason; - use std::collections::HashMap; use std::time::{Duration, SystemTime}; fn runtime(phase: PipelinePhase) -> PipelineRuntimeStatus { @@ -393,11 +621,23 @@ mod tests { } } + fn insert_runtime( + status: &mut PipelineStatus, + core_id: CoreId, + generation: u64, + runtime: PipelineRuntimeStatus, + ) { + _ = status.instances.insert( + RuntimeInstanceKey { + core_id, + deployment_generation: generation, + }, + runtime, + ); + } + fn new_status(policy: HealthPolicy) -> PipelineStatus { - PipelineStatus { - cores: HashMap::new(), - health_policy: policy, - } + PipelineStatus::new(policy) } fn runtime_with_conditions( @@ -440,24 +680,31 @@ mod tests { ready_quorum: Quorum::Percent(100), }; let mut status = new_status(policy); - _ = status.cores.insert(0, runtime(PipelinePhase::Running)); - _ = status.cores.insert(1, runtime(PipelinePhase::Running)); - _ = status - .cores - .insert(2, runtime(PipelinePhase::Failed(FailReason::RuntimeError))); - _ = status.cores.insert(3, runtime(PipelinePhase::Deleted)); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Running)); + insert_runtime( + &mut status, + 2, + 0, + runtime(PipelinePhase::Failed(FailReason::RuntimeError)), + ); + insert_runtime(&mut status, 3, 0, runtime(PipelinePhase::Deleted)); + status.set_active_generation(0); assert!(status.liveness()); - _ = status - .cores - .insert(1, runtime(PipelinePhase::Failed(FailReason::RuntimeError))); + insert_runtime( + &mut status, + 1, + 0, + runtime(PipelinePhase::Failed(FailReason::RuntimeError)), + ); assert!(!status.liveness()); } #[test] - fn readiness_requires_all_non_deleted_cores_to_be_ready() { + fn readiness_requires_all_selected_cores_to_be_ready() { let policy = HealthPolicy { live_if: vec![PhaseKind::Running], ready_if: vec![PhaseKind::Running], @@ -465,21 +712,24 @@ mod tests { ready_quorum: Quorum::Percent(100), }; let mut status = new_status(policy); - _ = status.cores.insert(0, runtime(PipelinePhase::Running)); - _ = status.cores.insert(1, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Running)); + status.set_active_generation(0); assert!(status.readiness()); - _ = status.cores.insert(1, runtime(PipelinePhase::Updating)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Updating)); assert!(!status.readiness()); } #[test] - fn aggregated_accept_condition_false_if_any_core_not_accepted() { + fn aggregated_accept_condition_false_if_any_serving_core_not_accepted() { let policy = HealthPolicy::default(); let mut status = new_status(policy); - _ = status.cores.insert( + insert_runtime( + &mut status, + 0, 0, runtime_with_conditions( PipelinePhase::Running, @@ -491,8 +741,10 @@ mod tests { Some(ts(10)), ), ); - _ = status.cores.insert( + insert_runtime( + &mut status, 1, + 0, runtime_with_conditions( PipelinePhase::Pending, ConditionStatus::False, @@ -503,6 +755,7 @@ mod tests { Some(ts(20)), ), ); + status.set_active_generation(0); let accepted = status .conditions() @@ -524,7 +777,9 @@ mod tests { ready_quorum: Quorum::Percent(100), }; let mut status = new_status(policy); - _ = status.cores.insert( + insert_runtime( + &mut status, + 0, 0, runtime_with_conditions( PipelinePhase::Running, @@ -536,8 +791,10 @@ mod tests { Some(ts(5)), ), ); - _ = status.cores.insert( + insert_runtime( + &mut status, 1, + 0, runtime_with_conditions( PipelinePhase::Failed(FailReason::RuntimeError), ConditionStatus::True, @@ -548,6 +805,7 @@ mod tests { Some(ts(12)), ), ); + status.set_active_generation(0); let ready = status .conditions() @@ -565,4 +823,184 @@ mod tests { ); assert_eq!(ready.last_transition_time, Some(ts(12))); } + + /// Scenario: observed state contains overlapping generations during a + /// mixed-generation rollout and the controller marks which generation is + /// serving on each core. + /// Guarantees: aggregation selects the serving generation per core so + /// total/running core counts and readiness reflect the active serving set. + #[test] + fn serving_generation_selection_supports_mixed_blue_green_rollout() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 0, 1, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Running)); + status.set_active_generation(0); + status.set_serving_generation(0, 1); + status.set_serving_generation(1, 0); + + assert_eq!(status.total_cores(), 2); + assert_eq!(status.running_cores(), 2); + assert!(status.readiness()); + } + + /// Scenario: observed state contains multiple generations for the same + /// cores while the controller has already pinned the serving generation on + /// each core. + /// Guarantees: compaction retains only the serving `(core, generation)` + /// instances and removes superseded generations from retained state. + #[test] + fn compact_instances_retains_only_serving_generations() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 0, 1, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 1, runtime(PipelinePhase::Stopped)); + status.set_active_generation(0); + status.set_serving_generation(0, 1); + status.set_serving_generation(1, 0); + + status.compact_instances_to_selected(); + + assert_eq!(status.per_instance().len(), 2); + assert!(status.instance_status(0, 1).is_some()); + assert!(status.instance_status(1, 0).is_some()); + assert!(status.instance_status(0, 0).is_none()); + assert!(status.instance_status(1, 1).is_none()); + } + + /// Scenario: observed state has multiple generations but there is no + /// mixed-generation serving override and the controller has committed a new + /// active generation. + /// Guarantees: compaction retains only the committed active generation. + #[test] + fn compact_instances_retains_only_active_generation_when_no_serving_override() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 0, 1, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 1, 1, runtime(PipelinePhase::Running)); + status.set_active_generation(1); + + status.compact_instances_to_selected(); + + assert_eq!(status.per_instance().len(), 2); + assert!(status.instance_status(0, 1).is_some()); + assert!(status.instance_status(1, 1).is_some()); + assert!(status.instance_status(0, 0).is_none()); + assert!(status.instance_status(1, 0).is_none()); + } + + /// Scenario: observed state contains only superseded generations relative + /// to the last committed active generation. + /// Guarantees: compaction falls back to the highest observed generation per + /// core so status remains bounded without dropping the last known view. + #[test] + fn compact_instances_falls_back_to_latest_generation_per_core() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 0, 2, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 1, 1, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 3, runtime(PipelinePhase::Stopped)); + status.set_active_generation(9); + + status.compact_instances_to_selected(); + + assert_eq!(status.per_instance().len(), 2); + assert!(status.instance_status(0, 2).is_some()); + assert!(status.instance_status(1, 3).is_some()); + assert!(status.instance_status(0, 0).is_none()); + assert!(status.instance_status(1, 1).is_none()); + } + + /// Scenario: a logical pipeline has been fully shut down and observed state + /// still contains an older generation alongside the final stopped + /// generation. + /// Guarantees: compaction keeps the last stopped generation per core so + /// `/status` continues to surface the final stopped view instead of + /// collapsing to an empty runtime set. + #[test] + fn compact_instances_preserves_last_stopped_generation_view_after_shutdown() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 0, 1, runtime(PipelinePhase::Stopped)); + status.set_active_generation(1); + status.set_active_cores([0]); + + status.compact_instances_to_selected(); + + assert_eq!(status.per_instance().len(), 1); + assert_eq!(status.total_cores(), 1); + assert_eq!(status.running_cores(), 0); + assert!(matches!( + status + .instance_status(0, 1) + .expect("latest generation should remain") + .phase(), + PipelinePhase::Stopped + )); + assert!(status.instance_status(0, 0).is_none()); + } + + /// Scenario: a pure resize-down retires one core without changing the + /// committed generation, so multiple retained instances share the same + /// active generation across different cores. + /// Guarantees: aggregated status and compaction respect the committed core + /// footprint instead of treating every instance on the active generation as + /// still serving. + #[test] + fn active_generation_selection_respects_committed_core_footprint() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Stopped)); + status.set_active_generation(0); + status.set_active_cores([0]); + + assert_eq!(status.total_cores(), 1); + assert_eq!(status.running_cores(), 1); + + status.compact_instances_to_selected(); + + assert_eq!(status.per_instance().len(), 1); + assert!(status.instance_status(0, 0).is_some()); + assert!(status.instance_status(1, 0).is_none()); + } + + /// Scenario: a rolling cutover overlaps the old and new generations on one + /// core while aggregation must still reflect only the selected serving set. + /// Guarantees: `/status.instances` preserves both retained generations for + /// debugging, while aggregated `cores` and core counts continue to use the + /// selected serving generation per core. + #[test] + fn serialization_preserves_overlap_aware_instances_while_aggregating_selected_cores() { + let mut status = new_status(HealthPolicy::default()); + insert_runtime(&mut status, 0, 0, runtime(PipelinePhase::Stopped)); + insert_runtime(&mut status, 0, 1, runtime(PipelinePhase::Running)); + insert_runtime(&mut status, 1, 0, runtime(PipelinePhase::Running)); + status.set_active_generation(0); + status.set_serving_generation(0, 1); + status.set_serving_generation(1, 0); + + let json = serde_json::to_value(&status).expect("pipeline status should serialize"); + let instances = json["instances"] + .as_array() + .expect("instances should serialize as an array"); + + assert_eq!(json["totalCores"], 2); + assert_eq!(json["runningCores"], 2); + assert_eq!( + json["cores"] + .as_object() + .expect("cores should serialize as an object") + .len(), + 2 + ); + assert_eq!(instances.len(), 3); + assert_eq!(instances[0]["coreId"], 0); + assert_eq!(instances[0]["deploymentGeneration"], 0); + assert_eq!(instances[1]["coreId"], 0); + assert_eq!(instances[1]["deploymentGeneration"], 1); + assert_eq!(instances[2]["coreId"], 1); + assert_eq!(instances[2]["deploymentGeneration"], 0); + } } diff --git a/rust/otap-dataflow/crates/state/src/store.rs b/rust/otap-dataflow/crates/state/src/store.rs index 8b1660d3f8..da1dd01ec5 100644 --- a/rust/otap-dataflow/crates/state/src/store.rs +++ b/rust/otap-dataflow/crates/state/src/store.rs @@ -7,7 +7,7 @@ use crate::ObservedEventRingBuffer; use crate::error::Error; use crate::phase::PipelinePhase; use crate::pipeline_rt_status::{ApplyOutcome, PipelineRuntimeStatus}; -use crate::pipeline_status::PipelineStatus; +use crate::pipeline_status::{PipelineRolloutSummary, PipelineStatus, RuntimeInstanceKey}; use otap_df_config::PipelineKey; use otap_df_config::health::HealthPolicy; use otap_df_config::observed_state::{ObservedStateSettings, SendPolicy}; @@ -170,6 +170,134 @@ impl ObservedStateStore { _ = policies.insert(pipeline_key, health_policy); } + /// Returns the health policy currently configured for one logical pipeline. + fn health_policy_for_pipeline(&self, pipeline_key: &PipelineKey) -> HealthPolicy { + self.health_policies + .lock() + .ok() + .and_then(|policies| policies.get(pipeline_key).cloned()) + .unwrap_or_else(|| self.default_health_policy.clone()) + } + + /// Records the committed active generation for a logical pipeline. + pub fn set_pipeline_active_generation(&self, pipeline_key: PipelineKey, generation: u64) { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + let status = pipelines + .entry(pipeline_key.clone()) + .or_insert_with(|| PipelineStatus::new(self.health_policy_for_pipeline(&pipeline_key))); + status.set_active_generation(generation); + } + + /// Records the committed serving core footprint for a logical pipeline. + pub fn set_pipeline_active_cores(&self, pipeline_key: PipelineKey, core_ids: I) + where + I: IntoIterator, + { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + let status = pipelines + .entry(pipeline_key.clone()) + .or_insert_with(|| PipelineStatus::new(self.health_policy_for_pipeline(&pipeline_key))); + status.set_active_cores(core_ids); + } + + /// Records which generation is serving traffic for the given logical core. + pub fn set_pipeline_serving_generation( + &self, + pipeline_key: PipelineKey, + core_id: otap_df_config::CoreId, + generation: u64, + ) { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + let status = pipelines + .entry(pipeline_key.clone()) + .or_insert_with(|| PipelineStatus::new(self.health_policy_for_pipeline(&pipeline_key))); + status.set_serving_generation(core_id, generation); + } + + /// Removes the serving-generation marker for a logical core. + pub fn clear_pipeline_serving_generation( + &self, + pipeline_key: PipelineKey, + core_id: otap_df_config::CoreId, + ) { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + if let Some(status) = pipelines.get_mut(&pipeline_key) { + status.clear_serving_generation(core_id); + } + } + + /// Updates the rollout summary exposed in `/status`. + pub fn set_pipeline_rollout_summary( + &self, + pipeline_key: PipelineKey, + rollout: PipelineRolloutSummary, + ) { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + let status = pipelines + .entry(pipeline_key.clone()) + .or_insert_with(|| PipelineStatus::new(self.health_policy_for_pipeline(&pipeline_key))); + status.set_rollout_summary(rollout); + } + + /// Clears any rollout summary for the logical pipeline. + pub fn clear_pipeline_rollout_summary(&self, pipeline_key: PipelineKey) { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + if let Some(status) = pipelines.get_mut(&pipeline_key) { + status.clear_rollout_summary(); + } + } + + /// Compacts retained observed instances for one logical pipeline to the + /// generations currently selected for status aggregation. + pub fn compact_pipeline_instances(&self, pipeline_key: &PipelineKey) { + let mut pipelines = self.pipelines.lock().unwrap_or_else(|poisoned| { + otel_error!( + "state.mutex_poisoned", + action = "continuing with possibly inconsistent state" + ); + poisoned.into_inner() + }); + if let Some(status) = pipelines.get_mut(pipeline_key) { + status.compact_instances_to_selected(); + } + } + /// Returns a handle that can be used to read the current observed state. #[must_use] pub fn handle(&self) -> ObservedStateHandle { @@ -281,11 +409,17 @@ impl ObservedStateStore { let ps = pipelines .entry(pipeline_key) .or_insert_with(|| PipelineStatus::new(health_policy)); + if ps.active_generation().is_none() { + ps.set_active_generation(key.deployment_generation); + } - // Upsert the core record and its condition snapshot + // Upsert the runtime-instance record and its condition snapshot let cs = ps - .cores - .entry(key.core_id) + .instances + .entry(RuntimeInstanceKey { + core_id: key.core_id, + deployment_generation: key.deployment_generation, + }) .or_insert_with(|| PipelineRuntimeStatus { phase: PipelinePhase::Pending, last_heartbeat_time: observed_event.time, @@ -444,6 +578,7 @@ mod tests { pipeline_group_id: Cow::Borrowed("group"), pipeline_id: Cow::Borrowed("pipeline"), core_id, + deployment_generation: 0, } } @@ -587,7 +722,7 @@ mod tests { "All {num_cores} cores should reach Running when engine events are reliable. \ Stuck in Pending: {}", status - .per_core() + .per_instance() .values() .filter(|c| matches!(c.phase, PipelinePhase::Pending)) .count(), @@ -684,7 +819,7 @@ mod tests { "All {num_cores} cores should reach Running despite log channel contention. \ Stuck in Pending: {}", status - .per_core() + .per_instance() .values() .filter(|c| matches!(c.phase, PipelinePhase::Pending)) .count(), diff --git a/rust/otap-dataflow/crates/telemetry/src/lib.rs b/rust/otap-dataflow/crates/telemetry/src/lib.rs index b9d3e78cec..ecb7b93143 100644 --- a/rust/otap-dataflow/crates/telemetry/src/lib.rs +++ b/rust/otap-dataflow/crates/telemetry/src/lib.rs @@ -443,6 +443,7 @@ mod tests { use otap_df_pdata::proto::opentelemetry::resource::v1::Resource; use otap_df_pdata::testing::equiv::assert_equivalent; use prost::Message; + use std::time::Duration; fn test_reporter() -> ObservedEventReporter { let (sender, _receiver) = flume::bounded(16); @@ -503,7 +504,9 @@ mod tests { }); // Receiver should have the log - let recv = rx.recv().expect("receiver should have log after emit"); + let recv = rx + .recv_timeout(Duration::from_secs(1)) + .expect("receiver should have log after emit"); assert!(matches!(recv, ObservedEvent::Log(_))); let text = recv.to_string(); assert!(text.contains("test log message"), "log message is {}", text); diff --git a/rust/otap-dataflow/crates/validation/src/simulate.rs b/rust/otap-dataflow/crates/validation/src/simulate.rs index de963c04aa..02f53ec0a5 100644 --- a/rust/otap-dataflow/crates/validation/src/simulate.rs +++ b/rust/otap-dataflow/crates/validation/src/simulate.rs @@ -5,7 +5,7 @@ use crate::error::ValidationError; use crate::metrics_types::{MetricSetSnapshot, MetricsSnapshot}; use otap_df_admin_api::{ AdminClient, AdminEndpoint, HttpAdminClientSettings, engine::ProbeStatus, - operations::OperationOptions, pipeline_groups::ShutdownStatus, telemetry::MetricsOptions, + groups::ShutdownStatus, operations::OperationOptions, telemetry::MetricsOptions, }; use otap_df_config::engine::OtelDataflowSpec; use otap_df_controller::Controller; @@ -147,7 +147,7 @@ async fn wait_for_validation_finished( /// shutdown pipeline after running async fn shutdown_pipeline(client: &AdminClient) -> Result<(), ValidationError> { let response = client - .pipeline_groups() + .groups() .shutdown(&OperationOptions::default()) .await .map_err(admin_error)?; @@ -476,7 +476,7 @@ mod tests { .await; Mock::given(method("POST")) - .and(path("/api/v1/pipeline-groups/shutdown")) + .and(path("/api/v1/groups/shutdown")) .and(query_param("wait", "false")) .and(query_param("timeout_secs", "60")) .respond_with(ResponseTemplate::new(202).set_body_json(serde_json::json!({ diff --git a/rust/otap-dataflow/docs/admin/README.md b/rust/otap-dataflow/docs/admin/README.md index f7e2c187d8..e31505f3ec 100644 --- a/rust/otap-dataflow/docs/admin/README.md +++ b/rust/otap-dataflow/docs/admin/README.md @@ -3,12 +3,14 @@ This section documents the admin surface of the OTAP Dataflow Engine: - runtime endpoints used for health, status, and telemetry; +- live pipeline reconfiguration and shutdown operations; - embedded browser UI behavior and architecture. - the public Rust admin SDK. ## Document map - [Admin UI Architecture](architecture.md) +- [Live Pipeline Reconfiguration](live-reconfiguration.md) - [Crate README (admin endpoints and crate layout)](../../crates/admin/README.md) - [Public Rust SDK README](../../crates/admin-api/README.md) @@ -26,6 +28,9 @@ raw HTTP requests directly. For architecture details (state model, derivation rules, graph rules, testing), start with [Admin UI Architecture](architecture.md). +For the live mutation API used to create, replace, resize, and shut down +logical pipelines, see [Live Pipeline Reconfiguration](live-reconfiguration.md). + ## UI module tests Prerequisite: diff --git a/rust/otap-dataflow/docs/admin/live-reconfiguration.md b/rust/otap-dataflow/docs/admin/live-reconfiguration.md new file mode 100644 index 0000000000..4c728228d6 --- /dev/null +++ b/rust/otap-dataflow/docs/admin/live-reconfiguration.md @@ -0,0 +1,480 @@ +# Live Pipeline Reconfiguration + +This document describes the live reconfiguration flow exposed by the admin API. + +The feature lets a running OTel Dataflow Engine mutate one logical pipeline at +a time without restarting the process or reloading the full startup file. + +## Goals + +- Reconfigure one pipeline in a running engine instance. +- Keep the mutation scoped to a single `(pipeline_group_id, pipeline_id)`. +- Preserve service continuity for topology/config changes with a serial rolling + cutover that overlaps old and new instances only on the affected cores. +- Support pure resource policy changes, including scale up and scale down, + without restarting unchanged cores. +- Make progress observable through admin endpoints instead of hidden internal + controller state. + +## Supported Operations + +- Create a new pipeline inside an existing pipeline group. +- Replace an existing pipeline with a new topology or node configuration. +- Resize an existing pipeline when the only effective runtime change is + `policies.resources.core_allocation`. +- Accept an effectively identical update as a `noop`. +- Track rollout progress with a rollout id. +- Shutdown a logical pipeline and track shutdown progress with a shutdown id. + +## Terminology + +Live reconfiguration uses a few controller-specific terms. They are important +because the admin API exposes both committed pipeline state and in-progress +runtime state. + +- Logical pipeline: the named pipeline addressed by `(pipeline_group_id, + pipeline_id)`. A logical pipeline can have several runtime instances over + time as it is rolled, resized, or shut down. +- Runtime instance: one concrete execution of a logical pipeline on one core. + Runtime instances are identified by `(pipeline_group_id, pipeline_id, core_id, + deployment_generation)`. +- Deployment generation: a monotonically assigned version for runtime + instances of one logical pipeline. `create` and `replace` rollouts start a new + generation. `resize` keeps the same generation and only changes the active + core set. +- Active generation: the generation currently committed by the controller as + the logical pipeline's desired serving generation. +- Serving generation: the generation currently selected for a specific core in + observed state. During a rolling cutover, different cores may temporarily + serve different generations. +- Candidate pipeline config: the pipeline config submitted by the client and + validated by the controller before it is committed into the live in-memory + engine config. +- Candidate generation: the target generation for a `create` or `replace` + rollout while it is still being tested and has not yet been committed as the + active generation. +- Candidate instance: a runtime instance launched from the candidate generation. + Candidate instances must become admitted and ready before the controller uses + them for serving. If the rollout fails before commit, the controller + best-effort shuts them down. +- Rollout worker: the background controller thread that executes an accepted + rollout plan after the admin request has been accepted. The API can return + before this worker finishes when `wait=false`. +- Rollout worker panic: an unexpected Rust panic in the rollout worker itself, + not a normal pipeline runtime error. The controller catches this panic, marks + the rollout failed, reports diagnostics, clears the active-operation conflict, + and cleans up uncommitted candidate instances when needed. +- Drain: a graceful shutdown step. The runtime stops accepting new ingress, + lets already admitted work finish as far as the node contracts allow, and + exits before the drain timeout. + +This document uses `serial rolling cutover with overlap` for topology-changing +replacement. + +During `replace`, the controller overlaps old and new instances only on the +core currently being switched: + +- start the new instance for one core; +- wait for `Admitted` and `Ready`; +- drain the old instance on that same core; +- move to the next core. + +This does not start a second complete serving fleet and perform one atomic +traffic flip across the whole pipeline. + +## Boundaries and Current Limits + +- Updates are in-memory only. The startup YAML file is not rewritten. +- The target pipeline group must already exist. +- Runtime topic broker mutation is rejected. In practice this means: + - no new or removed declared topics; + - no change to the selected topic mode; + - no change to topic backend or topic policies. +- Group-level and engine-level policy mutation is out of scope. +- There is no dedicated scale endpoint. Scale-only changes use the same `PUT` + endpoint as topology changes. + +## Consistency Model + +The current API serializes live operations per logical pipeline, identified by +`(pipeline_group_id, pipeline_id)`. A rollout or shutdown conflicts with another +active operation for the same logical pipeline, while operations for different +logical pipelines may run concurrently. + +Rollout planning validates a candidate by patching one pipeline into the +controller's current in-memory `OtelDataflowSpec` snapshot and running full +engine validation on that candidate snapshot. That validation does not make the +operation a whole-config transaction: another logical pipeline can commit before +this rollout commits, and commit applies only the accepted pipeline back into +the latest live config. + +The API intentionally leaves room to widen the consistency scope later. If +group-level invariants become mutable, the controller can serialize +config-mutating operations per pipeline group and return `409 Conflict` for +concurrent operations in that group without changing the existing pipeline +endpoint or response schema. Engine-level reconfiguration can be added as a +separate operation surface if full-engine transactions become necessary. + +## How It Works + +1. The client submits a candidate pipeline config to + `PUT /groups/{group}/pipelines/{id}`. +1. The controller patches exactly that pipeline into its live in-memory + `OtelDataflowSpec`. +1. The candidate config is validated as a full engine snapshot: + - pipeline structure and canonicalization; + - component config validation; + - whole-config validation, including topic cycle checks; + - topic runtime profile compatibility. +1. The controller classifies the update: + - `create`: the logical pipeline does not exist yet; + - `noop`: the resolved pipeline and active serving footprint already match + the request; + - `replace`: the runtime graph or runtime-significant node config changed; + - `resize`: only the effective core allocation changed. +1. The controller executes the plan: + - `create`: start all target instances in parallel and commit only if they + all become healthy. + - `noop`: record an immediately successful rollout result without restarting + any runtime instances. + - `replace`: do a serial rolling cutover with overlap per common core. + Start the new generation on one core, wait for admission and readiness, + then drain the old generation on that core. + - `resize`: start only newly added cores and drain only removed cores. + Common cores stay up and keep serving the current generation. +1. The controller records rollout progress and mirrors a summary into + `GET /groups/{group}/pipelines/{id}/status`. + +### Success Gate + +For `replace` and `create`, a new instance must reach both `Admitted` and +`Ready` before the controller commits the new serving state for that step. + +The request body carries two timeouts: + +- `stepTimeoutSecs`: how long to wait for the new instance to admit and become + ready. Default: `60`. +- `drainTimeoutSecs`: how long to wait for graceful drain of the old instance. + Default: `60`. + +The query string also supports an overall client wait timeout: + +- `timeout_secs` on the `PUT` request when `wait=true`. + +### Failure Handling + +- `create`: if any target instance fails to admit or become ready, the + controller shuts down the instances that were already launched and leaves the + committed config unchanged. +- `replace`: if a core fails during the rollout, the controller stops and + automatically rolls back already switched cores to the previous generation. +- `resize`: if added or removed cores fail during the operation, the controller + rolls the resize back by draining newly added cores and relaunching retired + cores when possible. +- If rollback cannot restore a healthy serving set, the rollout ends in + `rollback_failed` and the mixed state remains visible through status + endpoints. + +### Controller Safety Behaviors + +The controller treats live reconfiguration as a runtime lifecycle operation, +not just as an in-memory config edit. Several edge cases are handled explicitly +to avoid orphaned runtime instances, stale conflicts, or unbounded status +growth. + +- Partial `create` launch failure: if one core fails to launch after earlier + cores were already started, the controller best-effort shuts down the + candidate instances that were launched by that same create operation before + returning rollout failure. +- Readiness failure after candidate launch: if a candidate generation starts + but does not reach `Admitted` and `Ready` before the step timeout, the + controller shuts down the candidate instance before continuing with failure + handling or rollback. +- Rollout worker panic: if the detached rollout worker panics, the controller + records a terminal failed rollout, clears the active-operation conflict, and + emits internal panic diagnostics. If the panic happened while an uncommitted + target generation was active, the controller best-effort sends shutdown to + those candidate instances first. +- Committed generation protection: panic cleanup does not shut down a target + generation that is already committed as the active serving generation. This + prevents a late bookkeeping panic from turning a successful rollout into an + outage. +- Shutdown worker panic: if the detached shutdown worker panics, the controller + records a terminal failed shutdown and clears the active-operation conflict, + so later operations for the same logical pipeline are not blocked until + restart. +- Runtime thread panic or error: runtime instance failures are reported back + into observed state with a concise operator message and diagnostic source + detail. The instance is marked exited so controller liveness accounting can + progress. +- Launch and exit races: a runtime thread can exit before its launch + registration is visible to the controller. The controller records early exits + and reconciles them during registration, avoiding stale active-instance + counts. +- Global shutdown dispatch: `POST /groups/shutdown` snapshots active instances + and attempts shutdown delivery to all of them. One failed send does not + prevent later instances from receiving shutdown. Dispatch is idempotent for + instances that already accepted shutdown. +- Observed-state compaction: after active controller work no longer needs old + generations, the controller compacts retained instance status to the selected + serving view. During active rollout overlap, status still exposes both old and + new generations so operators can debug cutover behavior. +- Bounded operation history: terminal rollout and shutdown records are retained + only in a bounded in-memory window. Recent terminal ids remain useful for + follow-up inspection, but old by-id history is intentionally evictable. + +## API Surface + +### Read current pipeline config + +`GET /groups/{group}/pipelines/{id}` + +Returns: + +- `pipelineGroupId` +- `pipelineId` +- `activeGeneration` +- `pipeline` +- optional `rollout` summary + +### Create, replace, or resize a pipeline + +`PUT /groups/{group}/pipelines/{id}?wait=true|false&timeout_secs=` + +Request body: + +```json +{ + "pipeline": { + "...": "PipelineConfig" + }, + "stepTimeoutSecs": 60, + "drainTimeoutSecs": 60 +} +``` + +Behavior: + +- If `(group, id)` does not exist, the action is `create`. +- If the submitted config is already in effect, the action is `noop`. +- If only the effective core allocation changed, the action is `resize`. +- Otherwise the action is `replace`. + +Response body is a `RolloutStatus` with: + +- `rolloutId` +- `action` (`create`, `noop`, `replace`, `resize`) +- `state` (`pending`, `running`, `succeeded`, `failed`, `rolling_back`, + `rollback_failed`) +- `targetGeneration` +- `previousGeneration` +- `startedAt` +- `updatedAt` +- optional `failureReason` +- `cores` + +Status codes: + +- `202 Accepted`: request accepted and `wait=false` +- `200 OK`: `wait=true` and the rollout finished successfully +- `404 Not Found`: pipeline group does not exist +- `409 Conflict`: another incompatible live operation is active in the + controller's current consistency scope, or a waited rollout finished in + failure. In the current version of the API, that scope is one logical + pipeline. +- `422 Unprocessable Entity`: validation failure or unsupported runtime + mutation +- `504 Gateway Timeout`: `wait=true` exceeded the overall wait timeout + +### Read rollout progress + +`GET /groups/{group}/pipelines/{id}/rollouts/{rolloutId}` + +Returns the current `RolloutStatus` snapshot for that operation. +Terminal rollout ids are retained only within a bounded in-memory window, so +older ids may return `404 Not Found` after eviction. + +### Read observed pipeline status + +`GET /groups/{group}/pipelines/{id}/status` + +Returns the aggregated pipeline status. Useful fields during rollout: + +- `conditions` +- `totalCores` +- `runningCores` +- `activeGeneration` +- `servingGenerations` +- `rollout` +- `instances` + +Each `instances` entry is keyed by `(coreId, deploymentGeneration)`, so +overlapping old/new generations stay distinguishable during a rolling cutover. + +### Related shutdown endpoints + +- `POST /groups/{group}/pipelines/{id}/shutdown` +- `GET /groups/{group}/pipelines/{id}/shutdowns/{shutdownId}` +- `POST /groups/shutdown` + +These are separate from reconfiguration, but they use the same resident +controller and the same live-operation consistency scope. +Terminal shutdown ids are retained only within a bounded in-memory window, so +older ids may return `404 Not Found` after eviction. + +## Manual Examples + +The examples below use +[`configs/engine-conf/topic_multitenant_isolation.yaml`](../../configs/engine-conf/topic_multitenant_isolation.yaml). +That config binds admin HTTP to `127.0.0.1:8085` and defines the logical +pipeline `topic_multitenant_isolation/tenant_c_pipeline`. + +### Start the sample engine + +```bash +cargo run -- -c configs/engine-conf/topic_multitenant_isolation.yaml +``` + +In another terminal: + +```bash +BASE=http://127.0.0.1:8085/api/v1 +GROUP=topic_multitenant_isolation +PIPE=tenant_c_pipeline +``` + +Inspect the current committed config and observed runtime state: + +```bash +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE" | jq . +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE/status" | jq . +``` + +### Example: Topology change with serial rolling cutover + +This example inserts a debug processor between the topic receiver and the retry +processor. + +Build the request body from the live config: + +```bash +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE" \ + | jq ' + { + pipeline: ( + .pipeline + | .nodes += { + tenant_c_debug: { + type: "processor:debug", + config: { + verbosity: "basic" + } + } + } + | .connections = [ + {from: "tenant_c_receiver", to: "tenant_c_debug"}, + {from: "tenant_c_debug", to: "tenant_c_retry"}, + {from: "tenant_c_retry", to: "tenant_c_sink"} + ] + ), + stepTimeoutSecs: 60, + drainTimeoutSecs: 60 + } + ' \ + > /tmp/tenant_c_pipeline-debug.json +``` + +Submit the update and wait for completion: + +```bash +curl -sS -X PUT \ + "$BASE/groups/$GROUP/pipelines/$PIPE?wait=true&timeout_secs=120" \ + -H 'content-type: application/json' \ + --data-binary @/tmp/tenant_c_pipeline-debug.json | jq . +``` + +Expected result: + +- `action` is `replace` +- `state` ends as `succeeded` +- `targetGeneration` is greater than `previousGeneration` + +Verify the committed config and rollout-aware status: + +```bash +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE" | jq . +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE/status" \ + | jq '{conditions, totalCores, runningCores, activeGeneration, servingGenerations, rollout, instances}' +``` + +### Example: Async rollout tracking + +Use `wait=false` to return immediately, then poll the rollout resource: + +```bash +ROLLOUT_ID=$( + curl -sS -X PUT \ + "$BASE/groups/$GROUP/pipelines/$PIPE?wait=false" \ + -H 'content-type: application/json' \ + --data-binary @/tmp/tenant_c_pipeline-debug.json \ + | jq -r '.rolloutId' +) + +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE/rollouts/$ROLLOUT_ID" | jq . +``` + +### Example: Pure resource-policy resize + +This example changes only `coreAllocation.count` from `1` to `2`. The +controller detects that the runtime shape is otherwise unchanged and executes a +`resize` rollout instead of a full replace. + +```bash +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE" \ + | jq ' + { + pipeline: .pipeline, + stepTimeoutSecs: 60, + drainTimeoutSecs: 60 + } + | .pipeline.policies.resources.coreAllocation.count = 2 + ' \ + > /tmp/tenant_c_pipeline-scale-up.json +``` + +```bash +curl -sS -X PUT \ + "$BASE/groups/$GROUP/pipelines/$PIPE?wait=true&timeout_secs=120" \ + -H 'content-type: application/json' \ + --data-binary @/tmp/tenant_c_pipeline-scale-up.json | jq . +``` + +Expected result: + +- `action` is `resize` +- `targetGeneration` stays equal to `previousGeneration` +- only the added core is started + +Verify the pipeline footprint: + +```bash +curl -s "$BASE/groups/$GROUP/pipelines/$PIPE/status" \ + | jq '{totalCores, runningCores, activeGeneration, servingGenerations, rollout}' +``` + +Scale back down by setting `coreAllocation.count = 1` in the same request body +pattern. + +## Operational Notes + +- Different logical pipelines may roll concurrently in the current + implementation. +- A single logical pipeline allows only one active rollout or shutdown at a + time. +- Future group-level consistency can widen the conflict scope so concurrent + operations in the same group return `409 Conflict`. +- `GET /groups/{group}/pipelines/{id}` always returns the committed + live config, not an uncommitted candidate. +- `GET /groups/{group}/pipelines/{id}/status` is the best endpoint + for watching serving generations and per-instance phase changes during a + rollout. diff --git a/rust/otap-dataflow/src/main.rs b/rust/otap-dataflow/src/main.rs index 1c419ab935..d130a542d9 100644 --- a/rust/otap-dataflow/src/main.rs +++ b/rust/otap-dataflow/src/main.rs @@ -3,6 +3,7 @@ //! Create and run a multi-core pipeline +use cfg_if::cfg_if; use clap::Parser; use otap_df_config::config_provider::{ConfigFormat, resolve_config}; use otap_df_config::engine::OtelDataflowSpec; @@ -16,7 +17,6 @@ use otap_df_controller::startup; // Keep this side-effect import so the crate is linked and its `linkme` // distributed-slice registrations (core nodes) are visible // in `OTAP_PIPELINE_FACTORY` at runtime. -use cfg_if::cfg_if; use otap_df_core_nodes as _; use otap_df_otap::OTAP_PIPELINE_FACTORY; /// Project license text (Apache-2.0), embedded at compile time. @@ -238,6 +238,7 @@ fn main() -> Result<(), Box> { { dhat_start(); } + // Install the rustls crypto provider selected by the crypto-* feature flag. // This must happen before any TLS connections (reqwest, tonic, etc.). otap_df_otap::crypto::install_crypto_provider() @@ -324,14 +325,14 @@ mod tests { Ok(CoreAllocation::core_set(vec![ CoreRange { start: 0, end: 4 }, CoreRange { start: 5, end: 5 }, - CoreRange { start: 6, end: 7 }, + CoreRange { start: 6, end: 7 } ])) ); assert_eq!( parse_core_id_allocation("0..4"), Ok(CoreAllocation::core_set(vec![CoreRange { start: 0, - end: 4, + end: 4 }])) ); } @@ -499,7 +500,7 @@ connections: args.core_id_range, Some(CoreAllocation::core_set(vec![ CoreRange { start: 1, end: 3 }, - CoreRange { start: 7, end: 7 }, + CoreRange { start: 7, end: 7 } ])) ); assert_eq!(args.num_cores, None);