Skip to content

Add OS-level resource metric collection to flow run subprocesses#21071

Merged
desertaxle merged 9 commits intomainfrom
alexs/oss-7694-add-os-level-resource-metric-collection-to-flow-run
Mar 11, 2026
Merged

Add OS-level resource metric collection to flow run subprocesses#21071
desertaxle merged 9 commits intomainfrom
alexs/oss-7694-add-os-level-resource-metric-collection-to-flow-run

Conversation

@desertaxle
Copy link
Member

Summary

  • Adds TelemetrySettings model with enable_resource_metrics (default: True) and resource_metrics_interval_seconds (default: 10) settings under PREFECT_TELEMETRY_ env prefix
  • Adds opentelemetry-instrumentation-system-metrics to the otel optional extra and dev dependency group
  • Adds RunMetrics context manager that creates an OTel MeterProvider with OTLPMetricExporter, starts SystemMetricsInstrumentor filtered to process.cpu.utilization, process.memory.usage, process.memory.virtual with flow run resource attributes
  • Wraps run_flow() in engine.py __main__ block with RunMetrics so metrics are collected for the lifetime of flow run subprocesses
  • Endpoint resolution priority: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT env var > auto-derived from Cloud API URL > disabled
  • Gracefully no-ops when disabled, no endpoint available, or instrumentation packages not installed

Closes: OSS-7694

🤖 Generated with Claude Code

Add OpenTelemetry-based CPU and memory metric collection inside flow run
subprocesses, exporting via OTLP HTTP to the Cloud telemetry endpoint.

- Add TelemetrySettings model with enable_resource_metrics and
  resource_metrics_interval_seconds settings
- Add opentelemetry-instrumentation-system-metrics to otel extra and dev deps
- Add RunMetrics context manager that starts SystemMetricsInstrumentor
  filtered to process.cpu.utilization, process.memory.usage,
  process.memory.virtual with proper resource attributes
- Wrap run_flow() in engine.py __main__ block with RunMetrics

Closes: OSS-7694

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 10, 2026

Merging this PR will not alter performance

✅ 2 untouched benchmarks


Comparing alexs/oss-7694-add-os-level-resource-metric-collection-to-flow-run (bdf7702) with main (4ade8bb)

Open in CodSpeed

desertaxle and others added 2 commits March 10, 2026 14:27
…and type annotation

- Add PREFECT_TELEMETRY_ENABLE_RESOURCE_METRICS and
  PREFECT_TELEMETRY_RESOURCE_METRICS_INTERVAL_SECONDS to SUPPORTED_SETTINGS
- Fix test_noop_when_import_fails to use builtins module instead of
  __builtins__ dict
- Fix test_instruments_and_shuts_down to patch OTel classes at their source
  modules instead of at the import site
- Add type annotation to logger and use get_logger()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@chrisguidry chrisguidry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥵 This is going to be real!

desertaxle and others added 2 commits March 11, 2026 12:08
The OTLPMetricExporter needs the Prefect API key to authenticate with
Cloud's telemetry ingestion endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents a 30s stall on shutdown when the endpoint is unreachable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@desertaxle desertaxle marked this pull request as ready for review March 11, 2026 20:27
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 219e28a67c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…tion

- Only send API key to Cloud-derived endpoints, not user-overridden ones
- Pass settings into _resolve_metrics_endpoint to avoid double call
- Set export_timeout_millis=5000 on the reader to prevent 30s shutdown stall
- Add ge=1 validation on resource_metrics_interval_seconds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3453cf9dd0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +122 to +128
with RunMetrics(flow_run, flow):
if flow.isasync:
run_coro_as_sync(
run_flow(flow, flow_run=flow_run, error_logger=run_logger)
)
else:
run_flow(flow, flow_run=flow_run, error_logger=run_logger)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮‍💨

desertaxle and others added 3 commits March 11, 2026 15:56
…rt, cloud auth on override

- Wrap telemetry setup in try/except so initialization errors degrade to
  no-op instead of aborting the flow run
- Honor standard OTEL_EXPORTER_OTLP_ENDPOINT env var as fallback when the
  metrics-specific variable is not set
- Preserve Cloud auth headers when endpoint is overridden via env var
  (is_cloud now derived from connected_to_cloud, not endpoint source)
- Only pass headers kwarg to OTLPMetricExporter for Cloud endpoints so
  non-cloud exporters can use OTEL_EXPORTER_OTLP_HEADERS env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
User-overridden endpoints (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or
OTEL_EXPORTER_OTLP_ENDPOINT) now always return is_cloud=False, so the
Prefect API key is only attached to the auto-derived Cloud endpoint.
This prevents leaking credentials to third-party collectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents double-slash in derived endpoint when PREFECT_API_URL ends
with /, which can cause export failures behind strict proxies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@desertaxle desertaxle merged commit 37f6a32 into main Mar 11, 2026
76 checks passed
@desertaxle desertaxle deleted the alexs/oss-7694-add-os-level-resource-metric-collection-to-flow-run branch March 11, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants