[LFXv2-608] Add OpenTelemetry tracing support#22
Conversation
Integrate slog-otel to automatically include trace_id and span_id from OpenTelemetry context in log output, enabling correlation between logs and distributed traces. Changes: - pkg/log/log.go: Wrap JSON handler with slogotel.OtelHandler - pkg/utils/otel_test.go: Fix TestNewPropagator to pass OTelConfig - charts/: Bump version to 0.3.5 - go.mod: Add github.com/remychantenay/slog-otel v1.3.4 Issue: LFXV2-608 Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
WalkthroughAdds OpenTelemetry: new OTEL setup and tests, initializes OTEL at startup with graceful shutdown, instruments server and client HTTP paths and logging for trace context, exposes OTEL Helm values, embeds build metadata via ldflags, and updates dependencies and CI build steps. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Auth Service (main)
participant OTEL as OpenTelemetry SDK
participant Exporters as OTLP Exporters
participant HTTP as HTTP Handler / Client
participant Logging as Logging Pipeline
Note over App: Startup
App->>OTEL: OTelConfigFromEnv()
App->>OTEL: SetupOTelSDKWithConfig(ctx, cfg)
activate OTEL
OTEL->>Exporters: Init traces/metrics/logs providers
OTEL-->>App: shutdown func
deactivate OTEL
Note over App: Incoming request
HTTP->>OTEL: Extract trace context (server)
HTTP->>Logging: Emit logs (slog-otel adds trace_id/span_id)
Logging->>Exporters: Export logs/traces
HTTP-->>App: Response
Note over App: Outbound request
App->>HTTP: Outbound HTTP client request
HTTP->>OTEL: Inject trace context (otelhttp transport)
HTTP->>Exporters: Export spans
Note over App: Shutdown
App->>OTEL: shutdown(ctx)
OTEL->>Exporters: Flush & close
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@go.mod`:
- Around line 14-29: The OpenTelemetry versions in go.mod are inconsistent:
align go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk,
go.opentelemetry.io/otel/sdk/metric and go.opentelemetry.io/otel/trace to the
same released SDK version (e.g., change all v1.39.0 entries to the supported
v1.38.0) and ensure exporter modules
go.opentelemetry.io/otel/exporters/otlp/otlptrace/* and
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/* match that SDK release;
keep otlplog (go.opentelemetry.io/otel/exporters/otlp/otlplog/*) and
go.opentelemetry.io/otel/log or go.opentelemetry.io/otel/sdk/log at their
compatible log-module version (e.g., v0.15.0) if that is required, or
bump/downgrade them to a set of consistent, tested versions—update the go.mod
entries for the modules listed (e.g., go.opentelemetry.io/otel,
go.opentelemetry.io/otel/sdk,
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc,
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc,
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc,
go.opentelemetry.io/otel/log, go.opentelemetry.io/otel/sdk/log) so they are
consistent, then run go mod tidy to verify.
🧹 Nitpick comments (6)
pkg/utils/otel.go (1)
282-319: Consider explicit protocol validation.The
elsebranch (line 296) treats any non-HTTP protocol value as gRPC. If an invalid protocol value is configured (e.g.,OTEL_EXPORTER_OTLP_PROTOCOL=invalid), the code will silently use gRPC instead of warning or failing.💡 Optional: Add explicit protocol validation
func newTraceProvider(ctx context.Context, cfg OTelConfig, res *resource.Resource) (*trace.TracerProvider, error) { var exporter trace.SpanExporter var err error - if cfg.Protocol == OTelProtocolHTTP { + switch cfg.Protocol { + case OTelProtocolHTTP: opts := []otlptracehttp.Option{} if cfg.Endpoint != "" { opts = append(opts, otlptracehttp.WithEndpoint(cfg.Endpoint)) } if cfg.Insecure { opts = append(opts, otlptracehttp.WithInsecure()) } exporter, err = otlptracehttp.New(ctx, opts...) - } else { + case OTelProtocolGRPC: opts := []otlptracegrpc.Option{} if cfg.Endpoint != "" { opts = append(opts, otlptracegrpc.WithEndpoint(cfg.Endpoint)) } if cfg.Insecure { opts = append(opts, otlptracegrpc.WithInsecure()) } exporter, err = otlptracegrpc.New(ctx, opts...) + default: + return nil, fmt.Errorf("unsupported OTLP protocol: %s", cfg.Protocol) }pkg/utils/otel_test.go (2)
120-132: Test for empty string may not work as intended.In the table-driven test, when
tt.envValueis empty string, thet.Setenvis skipped (line 122-124). However,t.Setenvfrom a previous subtest in the same parent test may still have setOTEL_TRACES_SAMPLE_RATIO. Sincet.Setenvcleanup happens at the end of each subtest, and subtests run sequentially, this should be fine. But the logic is confusing - the empty string case actually tests "no env var set" rather than "env var set to empty string".Consider clarifying the test name or explicitly testing both scenarios:
Suggested clarification
- {"empty string", "", 1.0}, // defaults to 1.0 + {"unset env var", "", 1.0}, // defaults to 1.0 when not setIf you also want to test the behavior when the env var is explicitly set to an empty string, add a separate test case that always calls
t.Setenv.
152-164: Same empty string test concern as above.The empty string case (line 146) won't call
t.Setenv, so it tests the "unset" scenario rather than "set to empty string". Consider renaming for clarity:- {"empty", "", false}, + {"unset", "", false},charts/lfx-v2-auth-service/templates/deployment.yaml (1)
57-60: Conditional check may have unintended behavior forinsecurefield.The check
{{- if .Values.app.otel.insecure }}evaluates the string value. Sinceinsecuredefaults to"false"in values.yaml (a non-empty string), this condition is always true, andOTEL_EXPORTER_OTLP_INSECUREwill always be set.This is likely the intended behavior (explicitly setting
falseis clearer than omitting the variable), but consider using a more explicit pattern for boolean-like fields:Alternative: Use `ne ""` for explicit empty check
- {{- if .Values.app.otel.insecure }} + {{- if ne .Values.app.otel.insecure "" }} - name: OTEL_EXPORTER_OTLP_INSECURE value: {{ .Values.app.otel.insecure | quote }} {{- end }}Or document that the field should be set to empty string
""to omit the env var entirely.pkg/logging/context.go (2)
13-37: Potentially redundant with slog-otel handler inpkg/log/log.go.The
slogotel.OtelHandlerinpkg/log/log.goalready automatically injectstrace_idandspan_idfrom the context into all log records. These helper functions provide the same functionality but require explicit usage.Consider documenting when to use these helpers vs relying on the automatic injection:
- Automatic (via OtelHandler): Default behavior when using
slog.Default()with a context-aware logging call- Manual (via these helpers): When creating child loggers or using loggers that bypass the default handler
If these helpers are intended for specific use cases not covered by the automatic injection, consider adding a comment explaining when to use them:
// LogWithContext returns a logger with trace context attributes added. // Use this when you need to create a child logger that persists trace context // across multiple log calls, or when using a logger that doesn't have // the OtelHandler wrapper.
28-37: Consider nil logger handling.
LogWithContextwill panic ifloggeris nil. While callers should ensure a valid logger, a defensive nil check could prevent runtime errors:Defensive nil check
func LogWithContext(ctx context.Context, logger *slog.Logger) *slog.Logger { + if logger == nil { + logger = slog.Default() + } spanCtx := trace.SpanContextFromContext(ctx) if !spanCtx.IsValid() { return logger }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
charts/lfx-v2-auth-service/Chart.yamlcharts/lfx-v2-auth-service/templates/deployment.yamlcharts/lfx-v2-auth-service/values.yamlcmd/server/http.gocmd/server/main.gogo.modpkg/log/log.gopkg/logging/context.gopkg/utils/otel.gopkg/utils/otel_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
cmd/server/main.go (1)
pkg/utils/otel.go (2)
OTelConfigFromEnv(84-161)SetupOTelSDKWithConfig(171-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: CodeQL analysis (go)
- GitHub Check: Agent
- GitHub Check: MegaLinter
- GitHub Check: Build and Test
- GitHub Check: Publish Container
🔇 Additional comments (14)
charts/lfx-v2-auth-service/Chart.yaml (1)
8-8: LGTM!Version bump to 0.3.5 appropriately reflects the OpenTelemetry configuration additions to the chart.
pkg/utils/otel.go (5)
32-80: LGTM!Well-documented constants and configuration struct with clear environment variable mappings for each field.
82-161: LGTM!The environment variable parsing is well-implemented with sensible defaults. The sample ratio validation (lines 116-129) properly handles invalid values with warnings.
171-239: LGTM!Solid implementation of the SDK setup with proper:
- Error aggregation via
errors.Joinfor shutdown- Cleanup on initialization failure via
handleErr- Conditional provider initialization based on exporter configuration
- Global provider registration
253-280: LGTM!Good defensive implementation with fallback to default propagators if none are valid, and appropriate warning logs for unknown propagator values.
321-393: LGTM!Metrics and logger provider implementations follow consistent patterns. The 30-second periodic reader interval for metrics is a reasonable default for production use.
cmd/server/http.go (1)
64-65: LGTM!The
otelhttp.NewHandlerwrapper is correctly placed after the debug middleware, ensuring all requests are traced. The operation name "auth-service" serves as a reasonable top-level span name, with individual route details captured via standard HTTP semantic conventions (method, route, status code).cmd/server/main.go (1)
51-64: LGTM!OpenTelemetry initialization is correctly positioned early in the startup sequence, ensuring traces are available for subsequent operations. Key observations:
- Using
context.Background()for shutdown (line 61) is correct since the original context may be cancelled at that point.- The deferred shutdown will execute after the graceful shutdown sequence completes, ensuring all spans are flushed.
pkg/utils/otel_test.go (2)
353-372: Good documentation of zero-value behavior.This test effectively documents an important API contract: zero-value
OTelConfigdoes not disable exporters. Users must explicitly set exporters toOTelExporterNone. This is valuable for preventing silent misconfiguration.
1-399: Comprehensive test coverage for OTel configuration.The test suite provides thorough coverage of:
- Default configuration values
- Environment variable parsing (including edge cases)
- SDK initialization and shutdown (including idempotency)
- Resource and propagator creation
- API constant stability
charts/lfx-v2-auth-service/templates/deployment.yaml (1)
38-80: OTEL environment variable injection looks well-structured.The conditional pattern ensures env vars are only set when values are provided, keeping the container environment clean when OTEL is disabled. The
extraEnvblock (lines 38-40) allows additional flexibility for custom configurations.charts/lfx-v2-auth-service/values.yaml (2)
144-178: Well-documented OTEL configuration with secure defaults.The configuration block provides:
- Clear inline documentation for each field
- Sensible defaults (exporters disabled, gRPC protocol, full sampling)
- Proper typing with quoted strings for Helm templating
The default
propagators: "tracecontext,baggage,jaeger"includes Jaeger for compatibility with legacy tracing systems, which is a good choice for gradual migration.
141-143: FlexibleextraEnvinjection point.The
extraEnvarray allows operators to inject additional environment variables without modifying the chart, which is useful for environment-specific configurations or secrets from external sources.pkg/log/log.go (1)
106-112: Handler wrapping order is correct.The logging pipeline correctly chains:
JSONHandler→OtelHandler→contextHandler. This ensures:
- Base JSON formatting
- OTel trace_id/span_id injection from span context
- Application-specific context attributes via
AppendCtx
slogotel.OtelHandleris correctly used as a value type; all required methods (Handle,WithAttrs,WithGroup) use value receivers in the slog-otel library.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive OpenTelemetry (OTel) instrumentation to the LFX V2 Auth Service, enabling distributed tracing, metrics collection, and context-rich logging. The implementation includes flexible configuration through environment variables and Helm chart values, with all telemetry exporters disabled by default for safe deployment.
Changes:
- Added OpenTelemetry SDK initialization with support for OTLP exporters (traces, metrics, logs) via both gRPC and HTTP protocols
- Instrumented HTTP server with automatic trace generation and added trace context (trace_id, span_id) to structured logs
- Introduced Helm chart configuration for OTel settings with environment variable injection in the deployment template
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/otel.go | Core OTel SDK setup with configuration parsing, exporter initialization, and resource/propagator management |
| pkg/utils/otel_test.go | Comprehensive test coverage for OTel configuration and SDK initialization |
| pkg/logging/context.go | Utility functions for extracting and adding trace context to logs |
| pkg/log/log.go | Wrapped logger with slog-otel handler to automatically include trace context |
| cmd/server/main.go | Added OTel SDK initialization and graceful shutdown in service startup |
| cmd/server/http.go | Instrumented HTTP handler with otelhttp middleware for automatic tracing |
| charts/lfx-v2-auth-service/values.yaml | Added OTel configuration section with defaults for service name, exporters, and sampling |
| charts/lfx-v2-auth-service/templates/deployment.yaml | Template updates to inject OTel environment variables from Helm values |
| charts/lfx-v2-auth-service/Chart.yaml | Bumped chart version to 0.3.5 |
| go.mod | Added OpenTelemetry dependencies and slog-otel package |
| go.sum | Updated checksums for new dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # serviceVersion is the service version for OpenTelemetry resource | ||
| # identification | ||
| # (default: "1.0.0") | ||
| serviceVersion: "" |
There was a problem hiding this comment.
The comment for serviceVersion says "(default: "1.0.0")" but the actual default in the Go code (pkg/utils/otel.go line 90) is an empty string, not "1.0.0". The comment is misleading and should either be removed or corrected to match the actual implementation.
| shutdown = func(ctx context.Context) error { | ||
| var err error | ||
| for _, fn := range shutdownFuncs { | ||
| err = errors.Join(err, fn(ctx)) | ||
| } | ||
| shutdownFuncs = nil | ||
| return err | ||
| } |
There was a problem hiding this comment.
The shutdown function captures the shutdownFuncs slice and sets it to nil after calling all shutdown functions (line 182). However, if shutdown is called concurrently from multiple goroutines, this could lead to a race condition where one goroutine is iterating over shutdownFuncs while another sets it to nil. While the defer in main.go (line 60-64) suggests single-threaded shutdown, the public API doesn't document or enforce this constraint. Consider adding synchronization (e.g., sync.Once) to ensure thread-safe shutdown behavior.
Include jaeger propagator in default propagators alongside tracecontext and baggage for better trace context propagation compatibility. Update helm deployment template to only set OTEL env vars when endpoint is configured. Also updates OpenTelemetry and related dependencies to latest versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
- Add Version, BuildTime, GitCommit build-time variables - Update OTEL shutdown to use context with timeout - Add ldflags to ko-build workflows for version injection - ServiceVersion falls back to build-time Version if env var empty 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
6eed861 to
de5d5b0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/ko-build-tag.yaml:
- Around line 56-63: The GOFLAGS definition repeats -ldflags which causes only
the last -ldflags to take effect; change the GOFLAGS export so all -X
assignments (main.Version, main.BuildTime, main.GitCommit) are combined into a
single -ldflags string (i.e. one -ldflags with all three -X=... entries) before
running ko build for github.com/linuxfoundation/lfx-v2-auth-service/cmd/server
so Version, BuildTime and GitCommit are all embedded.
GOFLAGS parses values as space-separated flags. Use separate -ldflags entries with = syntax for each -X flag. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
Test coverage for AppendCtx, Priority, PriorityCritical, contextHandler.Handle, and InitStructureLogConfig functions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
de5d5b0 to
b4ae39d
Compare
Declare and assign separately to avoid masking return values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
- Delete pkg/logging/context.go (slog-otel handles this) - Use WarnContext in authelia/user.go for trace correlation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
Wrap HTTP client transport with otelhttp.NewTransport to enable distributed tracing for outgoing API calls. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
Run go mod tidy to move otel/trace to indirect. Replace empty slice literals with var declarations and trim propagator string before switch statement. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/log/log_test.go`:
- Around line 287-342: TestInitStructureLogConfig mutates global slog state via
InitStructureLogConfig and leaks into other tests; fix by capturing the current
default logger at the start of each subtest (using slog.Default()) and restoring
it after the subtest (using slog.SetDefault(saved))—i.e., inside each t.Run
before calling InitStructureLogConfig save prev := slog.Default() and defer
slog.SetDefault(prev) so the global logger state is restored after the subtest.
The otel SDK v1.40.0 uses schema v1.39.0 internally. Using semconv v1.37.0 caused a conflicting Schema URL error in tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
The version is set at build-time via ldflags, not hardcoded to "1.0.0". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
Save and restore slog.Default() in each subtest of TestInitStructureLogConfig to prevent InitStructureLogConfig from leaking global logger state into other tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-608 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/utils/otel.go`:
- Around line 82-161: OTelConfigFromEnv currently accepts any
OTEL_EXPORTER_OTLP_PROTOCOL value and silently defaults to OTelProtocolGRPC;
update the handling of the local variable protocol to normalize (strings.ToLower
and trim spaces) and validate against supported values ("grpc", "http/protobuf",
"http/json") (also treat "http" as alias for "http/protobuf"), set protocol to
the matching canonical constant (e.g. OTelProtocolGRPC or OTelProtocolHTTP with
subtype) and emit a slog.Warn when an unsupported value is provided before
falling back to OTelProtocolGRPC; make the changes where protocol is read and
defaulted in OTelConfigFromEnv so the returned OTelConfig.Protocol is always one
of the supported canonical values.
| // OTelConfigFromEnv creates an OTelConfig from environment variables. | ||
| // See OTelConfig struct fields for supported environment variables. | ||
| func OTelConfigFromEnv() OTelConfig { | ||
| serviceName := os.Getenv("OTEL_SERVICE_NAME") | ||
| if serviceName == "" { | ||
| serviceName = "lfx-v2-auth-service" | ||
| } | ||
|
|
||
| serviceVersion := os.Getenv("OTEL_SERVICE_VERSION") | ||
|
|
||
| protocol := os.Getenv("OTEL_EXPORTER_OTLP_PROTOCOL") | ||
| if protocol == "" { | ||
| protocol = OTelProtocolGRPC | ||
| } | ||
|
|
||
| endpoint := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") | ||
|
|
||
| insecure := os.Getenv("OTEL_EXPORTER_OTLP_INSECURE") == "true" | ||
|
|
||
| tracesExporter := os.Getenv("OTEL_TRACES_EXPORTER") | ||
| if tracesExporter == "" { | ||
| tracesExporter = OTelExporterNone | ||
| } | ||
|
|
||
| metricsExporter := os.Getenv("OTEL_METRICS_EXPORTER") | ||
| if metricsExporter == "" { | ||
| metricsExporter = OTelExporterNone | ||
| } | ||
|
|
||
| logsExporter := os.Getenv("OTEL_LOGS_EXPORTER") | ||
| if logsExporter == "" { | ||
| logsExporter = OTelExporterNone | ||
| } | ||
|
|
||
| tracesSampleRatio := 1.0 | ||
| if ratio := os.Getenv("OTEL_TRACES_SAMPLE_RATIO"); ratio != "" { | ||
| if parsed, err := strconv.ParseFloat(ratio, 64); err == nil { | ||
| if parsed >= 0.0 && parsed <= 1.0 { | ||
| tracesSampleRatio = parsed | ||
| } else { | ||
| slog.Warn("OTEL_TRACES_SAMPLE_RATIO must be between 0.0 and 1.0, using default 1.0", | ||
| "provided-value", ratio) | ||
| } | ||
| } else { | ||
| slog.Warn("invalid OTEL_TRACES_SAMPLE_RATIO value, using default 1.0", | ||
| "provided-value", ratio, "error", err) | ||
| } | ||
| } | ||
|
|
||
| propagators := os.Getenv("OTEL_PROPAGATORS") | ||
| if propagators == "" { | ||
| propagators = "tracecontext,baggage,jaeger" | ||
| } | ||
|
|
||
| slog.With( | ||
| "service-name", serviceName, | ||
| "version", serviceVersion, | ||
| "protocol", protocol, | ||
| "endpoint", endpoint, | ||
| "insecure", insecure, | ||
| "traces-exporter", tracesExporter, | ||
| "traces-sample-ratio", tracesSampleRatio, | ||
| "metrics-exporter", metricsExporter, | ||
| "logs-exporter", logsExporter, | ||
| "propagators", propagators, | ||
| ).Debug("OTelConfig") | ||
|
|
||
| return OTelConfig{ | ||
| ServiceName: serviceName, | ||
| ServiceVersion: serviceVersion, | ||
| Protocol: protocol, | ||
| Endpoint: endpoint, | ||
| Insecure: insecure, | ||
| TracesExporter: tracesExporter, | ||
| TracesSampleRatio: tracesSampleRatio, | ||
| MetricsExporter: metricsExporter, | ||
| LogsExporter: logsExporter, | ||
| Propagators: propagators, | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the full otel.go file to understand the constants and usage
fd -t f "otel.go" --exec wc -l {} \;Repository: linuxfoundation/lfx-v2-auth-service
Length of output: 102
🏁 Script executed:
# Read the otel.go file
cat -n pkg/utils/otel.go | head -200Repository: linuxfoundation/lfx-v2-auth-service
Length of output: 8092
🌐 Web query:
OpenTelemetry OTEL_EXPORTER_OTLP_PROTOCOL specification standard values
💡 Result:
OTEL_EXPORTER_OTLP_PROTOCOL (and the signal-specific variants OTEL_EXPORTER_OTLP_{TRACES|METRICS|LOGS}_PROTOCOL, plus ..._PROFILES_PROTOCOL where supported) is an enum with these standard values:
grpc— OTLP over gRPC (protobuf over HTTP/2)http/protobuf— OTLP over HTTP with protobuf payloadshttp/json— OTLP over HTTP with JSON payloads (optional for SDKs) [1][2]
Default: the specification says the default transport SHOULD be http/protobuf unless an SDK has strong reasons to default to grpc (e.g., backward compatibility), so the effective default can be SDK-dependent. [1][2]
🏁 Script executed:
# Search for how the protocol is used in the setup function
cat -n pkg/utils/otel.go | sed -n '171,250p'Repository: linuxfoundation/lfx-v2-auth-service
Length of output: 2908
🏁 Script executed:
# Continue reading to see full protocol usage
cat -n pkg/utils/otel.go | sed -n '200,300p'Repository: linuxfoundation/lfx-v2-auth-service
Length of output: 3834
🏁 Script executed:
# Check the rest of protocol usage for metrics and logs providers
cat -n pkg/utils/otel.go | sed -n '300,395p'Repository: linuxfoundation/lfx-v2-auth-service
Length of output: 3367
Normalize and validate OTEL_EXPORTER_OTLP_PROTOCOL to handle standard values like http/protobuf.
The current code accepts any protocol value without validation. When set to standard values like http/protobuf or http/json, the equality checks against OTelProtocolHTTP ("http") fail, silently falling back to gRPC. This causes telemetry to be sent via the wrong protocol, leading to connection failures or dropped data. Normalize protocol values, validate against supported types (grpc, http/protobuf, http/json), and warn on unsupported values.
🔧 Proposed fix (normalize and validate protocol)
- protocol := os.Getenv("OTEL_EXPORTER_OTLP_PROTOCOL")
- if protocol == "" {
- protocol = OTelProtocolGRPC
- }
+ protocol := strings.ToLower(strings.TrimSpace(os.Getenv("OTEL_EXPORTER_OTLP_PROTOCOL")))
+ switch protocol {
+ case "", "grpc":
+ protocol = OTelProtocolGRPC
+ case "http", "http/protobuf", "http/json":
+ protocol = OTelProtocolHTTP
+ default:
+ slog.Warn("unsupported OTEL_EXPORTER_OTLP_PROTOCOL, defaulting to grpc",
+ "protocol", protocol)
+ protocol = OTelProtocolGRPC
+ }🤖 Prompt for AI Agents
In `@pkg/utils/otel.go` around lines 82 - 161, OTelConfigFromEnv currently accepts
any OTEL_EXPORTER_OTLP_PROTOCOL value and silently defaults to OTelProtocolGRPC;
update the handling of the local variable protocol to normalize (strings.ToLower
and trim spaces) and validate against supported values ("grpc", "http/protobuf",
"http/json") (also treat "http" as alias for "http/protobuf"), set protocol to
the matching canonical constant (e.g. OTelProtocolGRPC or OTelProtocolHTTP with
subtype) and emit a slog.Warn when an unsupported value is provided before
falling back to OTelProtocolGRPC; make the changes where protocol is read and
defaulted in OTelConfigFromEnv so the returned OTelConfig.Protocol is always one
of the supported canonical values.
This pull request adds comprehensive OpenTelemetry (OTel) tracing, metrics, and logging support to the LFX V2 Auth Service. It introduces new configuration options for OTel in the Helm chart, updates the deployment to inject OTel-related environment variables, instruments the HTTP server for tracing, and enhances logging to include trace context. Additionally, it adds and updates dependencies required for OTel support.
OpenTelemetry Integration:
main.go, reading configuration from environment variables and ensuring proper shutdown on exit. (cmd/server/main.gocmd/server/main.goR51-R65)cmd/server/http.go[1] [2]Helm Chart and Configuration Enhancements:
.Values.app.otelconfiguration section invalues.yaml, allowing users to customize OTel service name, version, protocol, endpoint, exporters, sampling ratio, and propagators. (charts/lfx-v2-auth-service/values.yamlcharts/lfx-v2-auth-service/values.yamlR141-R178).Values.app.extraEnv. (charts/lfx-v2-auth-service/templates/deployment.yamlcharts/lfx-v2-auth-service/templates/deployment.yamlR38-R80)0.3.5. (charts/lfx-v2-auth-service/Chart.yamlcharts/lfx-v2-auth-service/Chart.yamlL8-R8)Logging Improvements:
slog-otelto automatically includetrace_idandspan_idin logs, and added a utility for extracting trace context fromcontext.Context. (pkg/log/log.go[1] [2];pkg/logging/context.go[3]Dependency Updates:
slog-otelpackage togo.mod. (go.mod[1] [2] [3] [4]These changes lay the foundation for distributed tracing, metrics, and context-rich logging, making it easier to observe, debug, and monitor the authentication service in production.