gcp: support sidecar containers + addons/gcp/otel-sidecar#245
Open
robbiet480 wants to merge 1 commit into
Open
gcp: support sidecar containers + addons/gcp/otel-sidecar#245robbiet480 wants to merge 1 commit into
robbiet480 wants to merge 1 commit into
Conversation
Adds two inputs to the gcp module so callers can wire OpenTelemetry
collectors (or any other sidecar) into the fleet-api Cloud Run service:
- sidecar_containers: list of arbitrary sidecar containers to colocate
with Fleet. Lets users plug in DDOT, otelcol-contrib, Grafana Alloy,
or any other observability agent that exposes an OTLP receiver on a
localhost port.
- service_only_env_vars: extra env vars applied only to the Cloud Run
service but not the migration job. The migration job runs as a Cloud
Run Job without sidecars, so things like OTEL_EXPORTER_OTLP_ENDPOINT
would point to a nonexistent listener during job execution.
Behavior is unchanged when both inputs use their empty defaults —
existing deployments don't need to update tfvars.
The new addons/gcp/otel-sidecar addon emits the corresponding Fleet env
vars (FLEET_LOGGING_TRACING_ENABLED, OTEL_EXPORTER_OTLP_*, etc.) in a
{universal, service_only} split so callers route them correctly. It
doesn't ship a sidecar container itself; the README has copy-pasteable
examples for OpenTelemetry Collector contrib and Datadog DDOT.
Multi-container Cloud Run services depend on
GoogleCloudPlatform/cloud-run/google//modules/v2 accepting sidecars
without a ports block, tracked in
GoogleCloudPlatform/terraform-google-cloud-run#450. Until that lands
upstream, callers using sidecar_containers will hit "exactly one
container with an exposed port" — documented in the addon README.
443b3b0 to
79f7005
Compare
Contributor
|
@robbiet480 Thank you for your contribution! We'll schedule the review of your proposed changes for next sprint. |
|
@BCTBB that's quite important fix, it blocks the pattern sidecars for cloud run now via official tf module, we need to make forked module for working around this right now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two inputs to the
gcpmodule so callers can colocate sidecars (typically OpenTelemetry collectors) with thefleet-apiCloud Run service, plus a newaddons/gcp/otel-sidecaraddon that emits the matching Fleet env vars.var.sidecar_containers(list of container objects) — wires arbitrary sidecars into the Cloud Run service. Matches the upstream cloud-run module's container schema. Useful for any collector exposing an OTLP receiver on a localhost port (Datadog DDOT, otelcol-contrib, Grafana Alloy, Honeycomb agent, …).var.service_only_env_vars(map) — env vars applied to the Cloud Run service but not the migration job. Needed for vars that depend on a sidecar being present, e.g.OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317— the migration job (a Cloud Run Job) has no sidecars, so it would log exporter retry noise during each migration if those vars leaked in.addons/gcp/otel-sidecar/— emits theFLEET_LOGGING_TRACING_ENABLED,FLEET_LOGGING_OTEL_LOGS_ENABLED,OTEL_EXPORTER_OTLP_*, andOTEL_RESOURCE_ATTRIBUTESenv vars in a{universal, service_only}split so callers route them correctly. Doesn't ship a sidecar container itself — the README has copy-pasteable examples for OpenTelemetry Collector contrib and Datadog DDOT.Follows the
addons/gcp/*convention established by #223 (okta-conditional-access) and #231 (pubsub-to-bigquery).Why
Fleet's server has built-in OpenTelemetry exporters for traces, metrics, and logs (
otlptracegrpc,otlpmetricgrpc,otlploggrpc— all gRPC). The pattern that actually works on Cloud Run is: run an OTel collector as a sidecar in the same instance, point Fleet atlocalhost:4317, let the collector forward to whatever backend.Today the
gcpmodule only renders a single container per service, so there's no clean way to express this. Direct OTLP intake from Fleet to a SaaS backend isn't a workable substitute — Datadog's direct intake is HTTP-only and requires delta temporality, neither of which Fleet supports without source patches.Comparable prior art:
addons/xrays-sidecardoes the same job for AWS X-Ray on ECS. This PR fills the GCP gap.Backwards compatibility
Both new inputs default to empty (
[]and{}). With defaults, the rendered output is functionally identical to today — only field added iscontainer_name = "fleet"on the main container (forces one revision swap on first deploy after upgrade, no behavior change).The migration job is untouched. The
fleet-serviceCloud Run service gets the new container name + adepends_ononly when callers actually pass sidecars.Prerequisite
Multi-container Cloud Run services require
GoogleCloudPlatform/terraform-google-cloud-run#450(open, mergeable, awaiting maintainer review) to land. That PR fixes alookup() != {}skip in the v2 module'sdynamic "ports"block that prevents sidecars from opting out of the ingress port. Until it merges, attemptingterraform applywith non-emptysidecar_containersfails with:I've called this out clearly in the addon README and the
var.sidecar_containersdescription. Callers can vendor the patched module locally as a workaround — that's how the production deployment this PR is extracted from runs today.This PR is fine to merge before #450; the new inputs default to empty so nothing breaks for existing users, and the new code path becomes useful once #450 ships.
Validation
terraform fmt -recursive+terraform validateclean for bothgcp/andaddons/gcp/otel-sidecar/.gcr.io/datadoghq/agent:latest-full+DD_OTELCOLLECTOR_ENABLED=true) to Datadog us5 — is live in production at Campus today, on both a primaryfleet-apiservice and a secondary bulk service. Verified all three signal types land in Datadog APM Traces, Metrics Explorer, and Log Explorer.Test plan
sidecar_containers = []andservice_only_env_vars = {}→ confirm no infrastructure change from currentmain./healthzstays passing and OTel data reaches the collector (e.g. via thedebugexporter).OTEL_EXPORTER_OTLP_ENDPOINTafter wiringservice_only_env_vars.