Skip to content
This repository was archived by the owner on Jul 7, 2025. It is now read-only.
This repository was archived by the owner on Jul 7, 2025. It is now read-only.

Trace-Log-Metric关联方案 #320

@zzhutianyu

Description

@zzhutianyu

指标关联Trace

exemplar机制

prometheus

prometheus主要是采用 exemplars 的机制在 metrics 中带上额外的信息。通过metrics的接口可以同事暴露exemplar
https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exemplars-1

# 后面的内容就是exemplar
# lable 采样值 采样时间
foo_bucket{le="0.1"} 8 # {} 0.054
foo_bucket{le="1"} 11 # {trace_id="KOO5S4vxi0o"} 0.67
foo_bucket{le="10"} 17 # {trace_id="oHg5SJYRHA0"} 9.8 1520879607.789

注入方式

c := GetPlayURLTotal.WithLabelValues(
            strconv.FormatInt(int64(callerType), 10),
            strconv.FormatInt(int64(device.GetOs()), 10),
            strconv.FormatInt(int64(device.GetNetwork()), 10),
            videoFormat,
)
sp := trace.SpanFromContext(ctx).SpanContext()
if sp.IsSampled() { // 可以继续增加其他条件使得exemplar样本更加典型
    c.(prometheus.ExemplarAdder).AddWithExemplar(1, prometheus.Labels{
          "traceID": sp.TraceID().String(),
    }) // 如果是histogram类型的则类型断言为prometheus.ExemplarObserver
} else {
    c.Inc()
}

otlp

otlp在协议中有Exemplar字段 可以在指标上报时将被采样的span跟指标关联.otlp-SDK是自动进行注入的,因为trace-log-metric 三者共享同样的otlp-context,所以可以不必要进行手工关联

// A representation of an exemplar, which is a sample input measurement.

// Exemplars also hold information about the environment when the measurement

// was recorded, for example the span and trace ID of the active span when the

// exemplar was recorded.

message Exemplar {

// The set of key/value pairs that were filtered out by the aggregator, but

// recorded alongside the original measurement. Only key/value pairs that were

// filtered out by the aggregator should be included

repeated opentelemetry.proto.common.v1.KeyValue filtered_attributes = 7;

// Labels is deprecated and will be removed soon.

// 1. Old senders and receivers that are not aware of this change will

// continue using the `filtered_labels` field.

// 2. New senders, which are aware of this change MUST send only

// `filtered_attributes`.

// 3. New receivers, which are aware of this change MUST convert this into

// `filtered_labels` by simply converting all int64 values into float.

//

// This field will be removed in ~3 months, on July 1, 2021.

repeated opentelemetry.proto.common.v1.StringKeyValue filtered_labels = 1 [deprecated = true];

// time_unix_nano is the exact time when this exemplar was recorded

//

// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January

// 1970.

fixed64 time_unix_nano = 2;

// The value of the measurement that was recorded. An exemplar is

// considered invalid when one of the recognized value fields is not present

// inside this oneof.

oneof value {

double as_double = 3;

sfixed64 as_int = 6;

}

// (Optional) Span ID of the exemplar trace.

// span_id may be missing if the measurement is not recorded inside a trace

// or if the trace is not sampled.

bytes span_id = 4;

// (Optional) Trace ID of the exemplar trace.

// trace_id may be missing if the measurement is not recorded inside a trace

// or if the trace is not sampled.

bytes trace_id = 5;

}

prometheus存储方式(tjg使用该方式)

https://github.com/prometheus/prometheus/pull/6635/files
prometheus 实现了一种环形连续内存的结构来存储 exemplar,并实现了对应的查询接口

$ curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=020-09-14T15:23:25.479Z'
{
    "status": "success",
    "data": [
        {
            "seriesLabels": {
                "__name__": "test_exemplar_metric_total",
                "instance": "localhost:8090",
                "job": "prometheus",
                "service": "bar"
            },
            "exemplars": [
                {
                    "labels": {
                        "traceID": "EpTxMJ40fUus7aGY"
                    },
                    "value": "6",
                    "timestamp": 1600096945.479,
                }
            ]
        },
        {
            "seriesLabels": {
                "__name__": "test_exemplar_metric_total",
                "instance": "localhost:8090",
                "job": "prometheus",
                "service": "foo"
            },
            "exemplars": [
                {
                    "labels": {
                        "traceID": "Olp9XHlq763ccsfa"
                    },
                    "value": "19",
                    "timestamp": 1600096955.479,
                },
                {
                    "labels": {
                        "traceID": "hCtjygkIHwAN9vs4"
                    },
                    "value": "20",
                    "timestamp": 1600096965.489,
                },
            ]
        }
    ]
}

image

日志关联Trace

日志关联Trace 比较简单 只要在打印日志的时候获取到链路的TraceId和spanId 就可以关联Trace和单条日志了

Log
timestamp= TraceId=xxxx SpanId=xxxxx
Json
{"trace_id": "xxx", "span_id": "xxx", "log": "xxxx"}

最终清洗入库并标记trace_id和span_id即可实现联动
image

otlp-SDK 最终可以实现默认关联因为共享Context

监控存储exemplar

由于influxdb目前不支持exemplar入库,所以基于现有存储结构监控可以使用ES进行exemplar存储,避免高基线问题
修改如下

  • 相关prometheus的数据解析需要支持exemplar类型的解析并上报
  • transfer需要支持exemplar数据入库到ES
  • saas支持exemplar数据的查询

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions