Skip to content

feat: support managed endpoint direct push in config-reloader#22

Merged
nandajavarma merged 3 commits into
mainfrom
nv/managed-metrics-push
May 7, 2026
Merged

feat: support managed endpoint direct push in config-reloader#22
nandajavarma merged 3 commits into
mainfrom
nv/managed-metrics-push

Conversation

@nandajavarma

Copy link
Copy Markdown
Collaborator

What

Updates the runner VM's Prometheus config-reloader (update-prometheus-config.sh) to support the new managedEndpointUrl and managedBearerToken fields in the PrometheusConfig JSON stored in Secret Manager.

Why

Part of the telemetry ingest token work (design doc). The GCP runner now mints scoped JWTs and writes them to the PrometheusConfig secret. The config-reloader needs to read these fields and render a remote_write target with Bearer auth pointing directly at the management plane's /telemetry/metrics/v1 endpoint.

This bypasses the local RemoteWriteReceiverReportRunnerMetrics RPC path, reducing latency and removing a hop.

How

  • Parse managedEndpointUrl and managedBearerToken from the Secret Manager JSON
  • When both are present, render a remote_write target with authorization.type: Bearer and authorization.credentials
  • Fall back to localRemoteWriteUrl when managed endpoint fields are absent (backward compatible)
  • Refactor allowlist regex construction into a shared append_allowlist_relabel helper to avoid duplication between managed and local targets

Related PRs

  • gitpod-io/gitpod-next#22088 — backend auth (merged)
  • gitpod-io/gitpod-next#22089 — proto + handler (merged)
  • gitpod-io/gitpod-next#22114 — EC2 runner (draft)
  • gitpod-io/gitpod-next#22116 — GCP runner (draft, stacked on #22114)

Add support for managedEndpointUrl and managedBearerToken fields in the
PrometheusConfig JSON read from Secret Manager. When both fields are
present, the config-reloader renders a remote_write target with Bearer
authorization pointing directly at the management plane's telemetry
ingest endpoint, bypassing the local RemoteWriteReceiver.

Falls back to localRemoteWriteUrl when the managed endpoint fields are
absent, preserving backward compatibility.

Refactors the allowlist regex construction into a shared helper to avoid
duplication between managed and local targets.

Co-authored-by: Ona <no-reply@ona.com>
@nandajavarma

Copy link
Copy Markdown
Collaborator Author

The runner-side PR (gitpod-io/gitpod-next#22116) now gates the TelemetryTokenSource wiring on a MANAGED_METRICS_DIRECT_PUSH=true env var. This makes the runner code backward-compatible — without the env var, it falls back to the local RemoteWriteReceiver path.

This PR needs to set that env var on the runner container so the direct-push path activates once the Terraform module is applied. Add it to the docker run command for gitpod-runner:

         --env GITPOD_DEVELOPMENT_VERSION=${DEVELOPMENT_VERSION} \
+        --env MANAGED_METRICS_DIRECT_PUSH=true \
         --env no_proxy=${NO_PROXY} \

This ensures the rollout order is safe:

  1. Runner code lands first (env var unset → old path)
  2. Terraform module applied (sets env var + updates config-reloader → new path)

nandajavarma and others added 2 commits May 7, 2026 08:46
Add a Python-based metrics audit receiver (port 9095) that accepts
Prometheus remote_write POSTs and writes each payload to GCS using
the same path layout as the existing audit trail:
  metrics/runner/{runnerID}/{YYYY}/{MM}/{DD}/{HHmmss}.pb.snappy

When the managed endpoint is configured, the config-reloader now
renders two remote_write targets: one for the management plane
(Bearer auth) and one for the local audit receiver. This ensures
the audit trail captures exactly what is pushed to the management
plane.

Also sets MANAGED_METRICS_DIRECT_PUSH=true in runner.env and the
runner container env, enabling the direct-push code path in the
GCP runner (gitpod-io/gitpod-next#22116).

Co-authored-by: Ona <no-reply@ona.com>
@nandajavarma nandajavarma marked this pull request as ready for review May 7, 2026 10:21
@nandajavarma nandajavarma merged commit e9d5fb1 into main May 7, 2026
1 check passed
nandajavarma added a commit that referenced this pull request May 8, 2026
Port the managed metrics pipeline from the runner cloud-init (PR #22)
to the proxy cloud-init:

- Parse managedEndpointUrl, managedBearerToken, and allowlistPrefixes
  from the metrics secret
- Add managed endpoint remote_write target with Bearer auth
- Add metrics audit receiver (Python, port 9095) that writes payloads
  to GCS for customer audit trails
- Build allowlist regex once and reuse via append_allowlist_relabel
- Upgrade proxy SA bucket role from objectViewer to objectAdmin for
  audit writes
- Pass RUNNER_ASSETS_BUCKET_NAME to proxy cloud-init template

Co-authored-by: Ona <no-reply@ona.com>
easyCZ pushed a commit that referenced this pull request May 21, 2026
feat: support managed endpoint direct push in config-reloader
easyCZ pushed a commit that referenced this pull request May 21, 2026
Port the managed metrics pipeline from the runner cloud-init (PR #22)
to the proxy cloud-init:

- Parse managedEndpointUrl, managedBearerToken, and allowlistPrefixes
  from the metrics secret
- Add managed endpoint remote_write target with Bearer auth
- Add metrics audit receiver (Python, port 9095) that writes payloads
  to GCS for customer audit trails
- Build allowlist regex once and reuse via append_allowlist_relabel
- Upgrade proxy SA bucket role from objectViewer to objectAdmin for
  audit writes
- Pass RUNNER_ASSETS_BUCKET_NAME to proxy cloud-init template

Co-authored-by: Ona <no-reply@ona.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant