This example demonstrates Tail Sampling with the OpenTelemetry collector by using the Tail Sampling Processor. Traces are sampled based on the defined sampling policy and sent to New Relic via OTLP.
block-beta
columns 1
Upstream("telemetrygen")
space
block:LB
Loadbalancer_Collector
end
space
block:ID
Sampling_Collector_A
Sampling_Collector_B
Sampling_Collector_C
end
space
NR("New Relic")
Upstream --"Spans"--> LB
LB --> Sampling_Collector_A
LB --> Sampling_Collector_B
LB --> Sampling_Collector_C
Sampling_Collector_A --"Sampling"--> NR
Sampling_Collector_B --"Sampling"--> NR
Sampling_Collector_C --"Sampling"--> NR
style Upstream fill:#000,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
style NR fill:#000,stroke:#03AC13,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
- You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. Docker desktop includes a standalone Kubernetes server and client which is useful for local testing.
- A New Relic account
- A New Relic license key
- Telemetrygen
As a prerequisite, you'll need to install both Cert Manager and the OpenTelemetry Operator for K8s in your cluster.
Install Cert Manager with the following command:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.yaml
Install the OpenTelemetry Operator for K8s with the following command:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
If successful, you should see all Pods in a Running
state.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-5798486f6b-6fm2b 1/1 Running 0 17m
cert-manager cert-manager-cainjector-7666685ff5-qbt8v 1/1 Running 0 17m
cert-manager cert-manager-webhook-5f594df789-w5dkl 1/1 Running 0 17m
kube-system coredns-5dd5756b68-dhm5f 1/1 Running 0 142m
kube-system etcd-otel-minikube 1/1 Running 0 143m
kube-system kube-apiserver-otel-minikube 1/1 Running 0 143m
kube-system kube-controller-manager-otel-minikube 1/1 Running 0 143m
kube-system kube-proxy-w4sfq 1/1 Running 0 142m
kube-system kube-scheduler-otel-minikube 1/1 Running 0 143m
kube-system storage-provisioner 1/1 Running 1 (142m ago) 143m
opentelemetry-operator-system opentelemetry-operator-controller-manager-54c987dd5b-5cqvh 2/2 Running 0 15m
In order for the OTel Collectors to authenticate successfully with New Relic's OTLP endpoint, you'll need to store your New Relic ingest license key in a Kubernetes Secret within the demo
namespace. Use the following command to create both the namespace and the secret:
kubectl create ns demo && kubectl create secret generic newrelic-license-key --from-literal=licensekey=<YOUR NR LICENSE KEY> -n demo
You'll be setting up a total of 4 OTel Collectors in this example. The loadbalancer
collector instance will use the loadbalancing exporter to ship spans to one of three sampling
collector instances based on trace.id
(the default routing_key
).
Run the following commands to install all collector instances:
kubectl apply -f lb.yaml -n demo
kubectl apply -f sampling.yaml -n demo
You can validate that all are up and running by running kubectl get pods -n demo
:
$ kubectl get pods -n demo
NAME READY STATUS RESTARTS AGE
demo-app-57cf4bdc99-57x2n 2/2 Running 0 46h
otelcol-loadbalancer-collector-7fdc8fd969-vsfq6 1/1 Running 0 32h
otelcol-sampling-collector-6db8f8c5c7-9hz2d 1/1 Running 0 34m
otelcol-sampling-collector-6db8f8c5c7-pr7b2 1/1 Running 0 34m
otelcol-sampling-collector-6db8f8c5c7-shv69 1/1 Running 0 34m
You'll need to utilize port forwarding to open a port on your local machine to the OTel collector running in your cluster. The following command will open up port 4317 on your localhost, which is what the telemetrygen
tool uses by default.
kubectl port-forward svc/otelcol-loadbalancer-collector -n demo 4317 2>&1 >/dev/null &
To cancel out of the port forward session, enter the fg
command to foreground it and then ctrl + c
to cancel.
The loagen.sh
shell script uses the telemetrygen
tool to generate traces. Ensure that you have telemetrygen
installed on your machine and then run the loadgen script.
./loadgen.sh
If working successfully, you should see some output in your terminal that resembles the following:
...
2024-07-10T21:56:49.469-0500 INFO [email protected]/balancer_wrapper.go:103 [core][Channel #1]Channel switches to new LB policy "pick_first" {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.469-0500 INFO [email protected]/balancer_wrapper.go:170 [core][Channel #1 SubChannel #2]Subchannel created {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.469-0500 INFO [email protected]/clientconn.go:535 [core][Channel #1]Channel Connectivity change to CONNECTING {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.469-0500 INFO [email protected]/clientconn.go:1210 [core][Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.469-0500 INFO [email protected]/clientconn.go:1326 [core][Channel #1 SubChannel #2]Subchannel picks a new address "[::1]:4317" to connect {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.481-0500 INFO [email protected]/clientconn.go:1210 [core][Channel #1 SubChannel #2]Subchannel Connectivity change to READY {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.481-0500 INFO [email protected]/clientconn.go:535 [core][Channel #1]Channel Connectivity change to READY {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO [email protected]/clientconn.go:535 [core][Channel #1]Channel Connectivity change to SHUTDOWN {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO [email protected]/resolver_wrapper.go:100 [core][Channel #1]Closing the name resolver {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO [email protected]/balancer_wrapper.go:135 [core][Channel #1]ccBalancerWrapper: closing {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO [email protected]/clientconn.go:1210 [core][Channel #1 SubChannel #2]Subchannel Connectivity change to SHUTDOWN {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO [email protected]/clientconn.go:1154 [core][Channel #1 SubChannel #2]Subchannel deleted {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO [email protected]/clientconn.go:1156 [core][Channel #1]Channel deleted {"system": "grpc", "grpc_log": true}
2024-07-10T21:56:49.525-0500 INFO traces/traces.go:64 stopping the exporter
After a minute or so, you'll start to see Spans populate in your New Relic account. The following NRQL query can be used to validate that your setup is working and that a smaller percentage of traces are showing up for the non-critical-service
compared to the critical-service
.
from Span select uniqueCount(trace.id) where recipe = 'newrelic-tail-sampling' where service.name in ('critical-service', 'non-critical-service') facet service.name TIMESERIES 1 minute since 15 minutes ago
Notice that the number of traces for the critical-service
is much higher than non-critical-service
. This is tail sampling via the OpenTelemetry Collector in action!
See get started with querying for additional details on querying data in New Relic.
You can also validate that load balancing is working in your cluster by looking at the otel-sampling-collector-*
pod logs. Take note that multiple pod names are all exporting traces. A tool like k9s can make this a breeze if you're not a kubectl
jedi.
Congratulations! You've successfully completed the OpenTelemetry tail sampling example. Hopefully this helped to illustrate how valuable tail sampling can be and be sure to check out the documentation for more advanced tail sampling policy examples that you can enable in your own environment.