Skip to content

Conversation

@harshrai654
Copy link
Contributor

PR Description

This PR adds a forward_to argument to the stage.metrics block in loki.process, enabling metrics generated from log processing to be forwarded directly to remote storage components (like prometheus.remote_write) instead of being exposed at Alloy's local /metrics endpoint.
This feature is useful when you want to derive metrics from logs and send them directly to a Prometheus backend without cluttering the local metrics endpoint.

Key Changes

New Arguments:

  • forward_to (list(MetricsReceiver)) - List of receivers to forward generated metrics to
  • metrics_flush_interval (duration) - Configurable interval for flushing metrics to the receivers (default: 60s)

Behaviour:

  • When forward_to is provided, metrics are not registered with Alloy's Prometheus registry
  • Metrics are periodically flushed to the specified receivers based on metrics_flush_interval
  • When forward_to is not set, the existing behaviour is preserved (metrics exposed at /metrics endpoint)

Which issue(s) this PR fixes

Fixes #4779

Notes to the Reviewer

Implementation Details

  1. Prometheus Fanout Integration:
    The storage.Appendable type forward_to argument uses prometheus.Fanout to send generated metrics to multiple receivers. The fanout requires a LabelStore which caches the storage reference returned by append calls to prevent rehashing of labels every time a metric with the same label set is appended. The LabelStore is passed down from the loki.process component to the pipeline and then to the metrics stage.

  2. Pipeline Signature Updates:
    Any stage that creates a new pipeline internally (e.g., stage.match) also needed the LabelStore instance. These stages have been modified to accept the LabelStore argument, which required updating various tests to follow the new NewPipeline method signature.

  3. Inspiration from spanmetrics:
    Took inspiration from otelcol.connector.spanmetrics for the metrics_flush_interval pattern to configure the interval for flushing metric data.

  4. Conditional Registry Registration:
    Uses the existing collector structure and now conditionally registers metrics to the global registry based on whether forward_to is set.

  5. Metric Flushing (flushMetrics):

    • Writes metric state from collectors to DTO objects
    • Converts *dto.LabelPair to labels.Label slices
    • Manually adds the __name__ label with the metric name
    • For histograms, appends _bucket, _sum, and _count metric suffixes appropriately
  6. Metadata Limitation:
    The flushMetrics function calls UpdateMetadata on the appender, but this currently does not work as expected. The remote write protocol does not support updating metadata for appended metrics. This is a known limitation tracked in #547. Please let me know if there is a way around this so that egenrated metric's metadata can also be sent to the remote endpoint.

  7. Data Race Prevention:
    Modified the metric collectors (Counters, Gauges, Histograms) to use atomic.Int64 for the lastModSec field. This prevents data races between updating metrics in the Process method and the Collect call in the separate flush metrics goroutine.

Testing

Apart from the unit test also manually tested with the following configuration against local Loki and Prometheus instances:

logging {
  level = "debug"
  format = "logfmt"
}

loki.source.file "tmpfiles" {
  targets    = [
    {__path__ = "/path/to/test-service.log"},
  ]
  forward_to = [loki.process.test_service.receiver]
  tail_from_end = true
}

loki.process "test_service" {
  forward_to = [loki.write.test_service_logs.receiver]

  stage.regex {
    expression = "level=(?P<level>\\S+).*status=(?P<status>\\S+).*latency_ms=(?P<latency>\\S+)"
  }

  stage.metrics {
    metric.counter {
      name = "test_service_requests_total"
      match_all = true
      description = "Total number of requests"
      action = "inc"
    }

    metric.histogram {
      name = "test_service_request_latency_ms"
      description = "Request latency in milliseconds"
      source = "latency"
      buckets = [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
    }

    metric.gauge {
      name = "test_service_request_status_gauge"
      description = "Request status gauge"
      source = "status"
      action = "set"
    }

    forward_to = [prometheus.remote_write.test_service_metrics.receiver]
    metrics_flush_interval = "5s"
  }
}

loki.write "test_service_logs" {
    endpoint {
        url = "http://localhost:3100/loki/api/v1/push"
    }
}

prometheus.remote_write "test_service_metrics" {
  endpoint {
    url = "http://localhost:9090/api/v1/write"
  }
}
image image image image

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated

@harshrai654 harshrai654 requested review from a team and clayton-cornell as code owners December 7, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add forward_to support for stage.metrics to enable direct push to prometheus.remote_write

1 participant