Skip to content

Conversation

bernielomax
Copy link
Contributor

@bernielomax bernielomax commented Jun 10, 2025

Airbyte Server - Automatic HTTP Request Tracing Implementation

Overview

This implementation adds automatic distributed tracing to all HTTP requests in the Airbyte server without requiring any changes to existing business logic. Every HTTP request now automatically gets:

  • OpenTelemetry distributed tracing spans
  • Trace and span IDs in ALL log messages
  • JSON structured logging with trace correlation
  • Zero-footprint instrumentation (no code changes needed in controllers)

Architecture

Components Added

  1. MicronautHttpTracingFilter.kt - Core HTTP filter that automatically instruments all requests
  2. ObservationRegistryBeanFactory.kt - Creates and configures the OpenTelemetry SDK
  3. Enhanced application.yml - Comprehensive OpenTelemetry and Micronaut tracing configuration

How It Works

HTTP Request → MicronautHttpTracingFilter → Creates OpenTelemetry Span → Populates MDC → Business Logic → Response
                      ↓
               Automatic trace/span IDs in ALL log messages via existing LogEvent.kt

Key Features

1. Automatic Span Creation

Every HTTP request automatically gets an OpenTelemetry span with:

  • HTTP method, URL, status code, host, scheme
  • Request/response timing and duration
  • Error handling and exception recording
  • Parent-child span relationships for nested operations

2. Structured Logging Integration

The existing LogEvent.kt automatically includes trace correlation:

{
  "timestamp": 1749588203250,
  "message": "Processing request",
  "level": "INFO", 
  "traceId": "79a889b557cf0f57f4c823f774f6b390",
  "spanId": "95e1fde57a13a036"
}

3. Zero Code Changes Required

Existing controllers work unchanged:

@Get("/api/v1/health")
fun getHealthCheck(): HealthCheckRead? = healthCheckHandler.health()

This automatically gets traced with no modifications needed.

4. Dual Tracing Integration

The implementation uses both:

  • Micronaut's built-in OpenTelemetry integration for service-level configuration
  • Custom HTTP filter for guaranteed MDC population across thread boundaries

Configuration

Environment Variables

All tracing can be controlled via environment variables:

# Core OpenTelemetry tracing
OTEL_TRACING_ENABLED=true                    # Enable OpenTelemetry SDK (default: true)
OTEL_SAMPLER_RATIO=1.0                       # Sample 100% of traces (default: 1.0)

# Micronaut tracing integration
MICRONAUT_TRACING_ENABLED=true               # Enable Micronaut tracing (default: true)
HTTP_TRACING_ENABLED=true                    # Enable HTTP request tracing (default: true)
HTTP_SERVER_TRACING_ENABLED=true             # Enable HTTP server tracing (default: true)
HTTP_CLIENT_TRACING_ENABLED=true             # Enable HTTP client tracing (default: true)

Local-Only Tracing

The configuration is set up for local tracing only (no external exporters):

otel:
  exporter:
    otlp:
      enabled: false  # No OTLP export
    jaeger:
      enabled: false  # No Jaeger export
    zipkin:
      enabled: false  # No Zipkin export

This means traces are generated for logging correlation but not exported to external systems by default.

Micronaut Tracing Configuration

micronaut:
  tracing:
    opentelemetry:
      enabled: ${MICRONAUT_TRACING_ENABLED:true}
      http:
        server:
          enabled: ${HTTP_SERVER_TRACING_ENABLED:true}
        client:
          enabled: ${HTTP_CLIENT_TRACING_ENABLED:true}

Benefits

1. Request Correlation

Every log message from a request contains the same traceId, making it trivial to:

  • Follow a request end-to-end across services
  • Debug issues by filtering logs by trace ID
  • Understand request flow and timing

2. Observability Ready

The implementation is production-ready and can easily be extended to:

  • Export traces to Jaeger, Zipkin, or OTLP endpoints
  • Add custom spans for business operations
  • Integrate with APM tools

3. Performance Monitoring

Automatic timing and error tracking for:

  • HTTP request duration
  • Error rates and status codes
  • Exception capture and context

4. Thread-Safe MDC Management

Handles Micronaut's async request processing correctly:

  • MDC is populated when spans start
  • MDC is cleaned up in reactive stream doFinally to prevent leaks
  • Works across thread pool changes during request processing

Usage Examples

View Traces in Logs

# Make a request
curl http://localhost:8080/api/v1/health

# All logs from that request will have the same traceId
# Filter logs by trace ID:
grep "79a889b557cf0f57f4c823f774f6b390" application.log

Sample Log Output

{"timestamp":1749588203250,"message":"HTTP span started for GET /api/v1/health","level":"DEBUG","traceId":"79a889b557cf0f57f4c823f774f6b390","spanId":"95e1fde57a13a036"}
{"timestamp":1749588203251,"message":"Processing health check","level":"INFO","traceId":"79a889b557cf0f57f4c823f774f6b390","spanId":"95e1fde57a13a036"}
{"timestamp":1749588203252,"message":"HTTP span completed for GET /api/v1/health with status 200","level":"DEBUG","traceId":"79a889b557cf0f57f4c823f774f6b390","spanId":"95e1fde57a13a036"}

Add Custom Spans (Optional)

For business logic that needs custom spans:

@Autowired
private lateinit var tracer: Tracer

fun someBusinessMethod() {
    val span = tracer.spanBuilder("business.operation").startSpan()
    span.makeCurrent().use {
        // Business logic here - all logs automatically get trace/span IDs
        logger.info("Processing business operation")  // Automatically includes traceId
        // ... business logic
    }
    span.end()
}

Implementation Details

Why This Approach?

  1. Manual MDC Population: We manually populate SLF4J MDC in the HTTP filter because Micronaut's automatic MDC integration wasn't working reliably across thread boundaries in reactive contexts.

  2. HTTP Filter + Micronaut Integration: We use both:

    • Micronaut's OpenTelemetry integration for service-level span creation and configuration
    • Custom HTTP filter for guaranteed MDC population and lifecycle management
  3. Reactive-Safe Design: The filter uses Reactor's doFinally for cleanup, ensuring MDC is cleared even if requests are cancelled or exceptions occur.

  4. Local-Only Default: External trace export is disabled by default to avoid operational overhead while still providing local correlation benefits.

File Structure

airbyte-server/src/main/kotlin/io/airbyte/server/config/
├── MicronautHttpTracingFilter.kt          # HTTP filter for automatic span creation + MDC
├── ObservationRegistryBeanFactory.kt      # OpenTelemetry SDK configuration
└── HealthApiController.kt                 # Reverted to original (gets automatic tracing)

airbyte-server/src/main/resources/
└── application.yml                        # OpenTelemetry + Micronaut tracing config

Span Attributes

Each HTTP request span includes:

span.setAttribute("http.method", method)           // GET, POST, etc.
span.setAttribute("http.url", request.uri.toString())
span.setAttribute("http.scheme", request.uri.scheme ?: "http")
span.setAttribute("http.host", request.uri.host ?: "unknown")
span.setAttribute("http.target", path)             // /api/v1/health
span.setAttribute("http.status_code", statusCode)  // 200, 404, 500, etc.

Error Handling

  • Exceptions are recorded on spans with span.recordException(e)
  • Error status codes (≥400) mark spans as errors
  • Span status is set appropriately (ERROR for failures, OK for success)

Monitoring and Debugging

Verify Tracing is Working

# 1. Check application startup logs for tracing initialization
grep -i "tracing\|opentelemetry" application.log

# 2. Make a request and verify trace IDs appear
curl http://localhost:8080/api/v1/health
grep "traceId" application.log | tail -10

# 3. Verify span lifecycle (start/end)
grep "HTTP span" application.log

Troubleshooting

No trace IDs in logs:

  • Check OTEL_TRACING_ENABLED=true
  • Check MICRONAUT_TRACING_ENABLED=true
  • Verify MicronautHttpTracingFilter is loaded

Inconsistent trace IDs:

  • Check that MDC cleanup is working in doFinally
  • Verify no other code is modifying MDC

Performance impact:

  • Adjust sampling ratio: OTEL_SAMPLER_RATIO=0.1 (10% sampling)
  • Disable specific tracing: HTTP_SERVER_TRACING_ENABLED=false

Future Extensions

This foundation enables easy addition of:

External Trace Export

# Enable Jaeger export
otel:
  exporter:
    jaeger:
      enabled: true
      endpoint: http://jaeger:14250

Custom Business Spans

  • Database query tracing
  • External API call tracing
  • Business operation timing

Distributed Tracing

  • Cross-service trace propagation
  • Temporal workflow tracing
  • Connector operation tracing

APM Integration

  • Datadog APM
  • New Relic
  • Elastic APM

Summary

Result: Every HTTP request to Airbyte server now automatically includes distributed tracing with complete log correlation, requiring zero changes to existing business logic!

Key Benefits:

  • 🔍 Instant request correlation across all log messages
  • 📊 Automatic performance monitoring for all HTTP endpoints
  • 🚀 Production-ready observability foundation
  • 🔧 Zero maintenance overhead - works automatically
  • 📈 Easily extensible for advanced monitoring needs

The implementation provides a solid foundation for observability while maintaining simplicity and requiring no changes to existing code.

Requires

        - name: PLATFORM_LOG_FORMAT
          value: "json"

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

MDC.put("span_id", spanId) // OpenTelemetry standard format

// Also set camelCase for Airbyte compatibility
MDC.put("traceId", traceId) // Airbyte format
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no Airbyte format!? Keys should follow MDC standard which AFAIK is snake case.

@bernielomax bernielomax force-pushed the bernielomax/feat/otel branch from 7df922d to 21c26d2 Compare June 13, 2025 22:05
@bernielomax bernielomax deleted the bernielomax/feat/otel branch June 18, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants