Wires Fleet's built-in OpenTelemetry SDK to a sidecar collector running alongside fleet-api on Cloud Run. Ships traces, metrics, and logs through whatever collector you mount (Datadog DDOT, the upstream OpenTelemetry Collector, Grafana Alloy, Honeycomb agent, ...).
This addon emits environment variables only. You provide the sidecar container yourself via the GCP module's sidecar_containers input — examples below.
Fleet emits OTLP/gRPC for all three signal types (otlptracegrpc, otlpmetricgrpc, otlploggrpc). You can ship that data through any OTel-compatible backend by running a collector as a sidecar in the same Cloud Run instance and pointing Fleet at localhost:4317.
Compared to Fleet's other observability paths:
| Path | What it covers | Where it goes |
|---|---|---|
addons/logging-destination-* |
osquery scheduled query / status / activity audit logs | SaaS log backends |
| This addon | Fleet server traces, metrics, logs (everything except osquery results) | Any OTel-compatible backend |
addons/xrays-sidecar |
Fleet server traces (AWS-only) | AWS X-Ray |
The GCP module must accept multi-container Cloud Run services, which requires the upstream GoogleCloudPlatform/cloud-run/google//modules/v2 module to support sidecars without a ports block. That fix is being tracked in GoogleCloudPlatform/terraform-google-cloud-run#450 — until it merges, attempting to deploy with var.sidecar_containers will fail with:
Error 400: Revision template should contain exactly one container with an exposed port.
Until upstream merges, you can vendor the patched module locally; this addon's outputs are unaffected by where the cloud-run module comes from.
module "fleet_otel" {
source = "github.com/fleetdm/fleet-terraform//addons/gcp/otel-sidecar?ref=main"
service_name = "fleet"
service_version = "v4.85.0" # match var.fleet_config.image_tag
deployment_environment = "prod"
extra_resource_attributes = {
"team" = "sre"
"region" = "us-central1"
}
}
module "fleet" {
source = "github.com/fleetdm/fleet-terraform//gcp?ref=main"
# ... other inputs ...
fleet_config = merge(var.fleet_config, {
extra_env_vars = merge(
coalesce(var.fleet_config.extra_env_vars, {}),
module.fleet_otel.fleet_extra_environment_variables.universal,
)
})
service_only_env_vars = module.fleet_otel.fleet_extra_environment_variables.service_only
sidecar_containers = [
# Pick one of the example sidecar definitions below.
]
}The split between universal and service_only env vars exists because Fleet's migration job (a Cloud Run Job, not a Service) runs without sidecars. The OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 var would point to a non-existent listener during migrations, causing exporter retry noise in the job logs. Wire universal into extra_env_vars (visible to both) and service_only into service_only_env_vars (Service only).
Each example is a single object you drop into module.fleet.sidecar_containers = [ ... ]. Cloud Run requires:
- Every sidecar declared in another container's
depends_on_containermust have its ownstartup_probe. The probe targets the OTLP gRPC port — once it accepts connections, the collector is ready. - Only one container per service may own the ingress port. Set
ports = { name = "", container_port = 0 }on sidecars to opt out (sentinel honored by the cloud-run v2 module once PR #450 lands).
sidecar_containers = [
{
container_name = "otel-collector"
container_image = "otel/opentelemetry-collector-contrib:latest"
container_args = ["--config=/etc/otelcol/config.yaml"]
# You need to mount a config.yaml. Store it as a Secret Manager secret
# and mount via volume_mounts. See addons/gcp/otel-sidecar/examples/
# for a config that exports to Datadog, Honeycomb, or Grafana Cloud.
volume_mounts = [{
name = "otel-config"
mount_path = "/etc/otelcol"
}]
resources = {
limits = { cpu = "500m", memory = "256Mi" }
}
ports = { name = "", container_port = 0 }
startup_probe = {
tcp_socket = { port = 4317 }
initial_delay_seconds = 5
period_seconds = 5
timeout_seconds = 2
failure_threshold = 30
}
}
]DDOT is the Datadog Agent in OTel-collector mode (DD_OTELCOLLECTOR_ENABLED=true). Documented for Linux, Kubernetes, and EKS Fargate; the Cloud Run install path isn't covered by Datadog docs but works in practice (verified production usage shipping all three signal types to us5).
Use agent:latest-full, not agent:latest — the -full variant bundles the DDOT subprocess.
sidecar_containers = [
{
container_name = "datadog-agent"
container_image = "gcr.io/datadoghq/agent:latest-full"
env_vars = {
DD_SITE = "us5.datadoghq.com" # or datadoghq.com, datadoghq.eu, etc.
DD_SERVICE = "fleet"
DD_ENV = "prod"
DD_VERSION = "v4.85.0"
DD_OTELCOLLECTOR_ENABLED = "true"
DD_LOGS_ENABLED = "true"
DD_HOSTNAME = "fleet-api-cloudrun" # Cloud Run instances are ephemeral; pin to a logical hostname
}
env_secret_vars = {
DD_API_KEY = {
secret = google_secret_manager_secret.datadog_api_key.secret_id
version = "latest"
}
}
resources = {
limits = { cpu = "1", memory = "512Mi" }
}
ports = { name = "", container_port = 0 }
startup_probe = {
tcp_socket = { port = 4317 }
initial_delay_seconds = 5
period_seconds = 5
timeout_seconds = 2
failure_threshold = 30
}
}
]Set enable_otel_logs = false on this addon if you're using gcr.io/datadoghq/serverless-init instead — its OTLP logs pipeline is broken (datadog-agent#34097). DDOT's is fine.
Similar pattern: image is grafana/alloy:latest, args point at a config file mounted via volume_mounts. Config follows the Alloy "river" config format with an otelcol.receiver.otlp block.
- Fleet sets
OTEL_SERVICE_NAME=fleetinternally viasemconv.ServiceName("fleet"). We set it again in the env var so callers can override. - The Go OTel SDK defaults to TLS for OTLP gRPC. Talking plaintext to a localhost sidecar requires
OTEL_EXPORTER_OTLP_INSECURE=true(this addon sets it). FLEET_LOGGING_OTEL_LOGS_ENABLED=truerequiresFLEET_LOGGING_TRACING_ENABLED=true(Fleet validates this on startup — seecmd/fleet/serve.go).- The migration job runs only once per
image_tagchange. Even without the env-var split, the noise window is brief, but the split is correct hygiene.