Skip to content

Commit 44bd082

Browse files
rubenvp8510max-cx
authored andcommitted
OBSDOCS-1142: Add Tempo troubleshooting page
Signed-off-by: Ruben Vargas <[email protected]>
1 parent bef5ba4 commit 44bd082

5 files changed

+102
-0
lines changed

Diff for: _topic_maps/_topic_map.yml

+2
Original file line numberDiff line numberDiff line change
@@ -2929,6 +2929,8 @@ Topics:
29292929
File: distr-tracing-tempo-installing
29302930
- Name: Configuring
29312931
File: distr-tracing-tempo-configuring
2932+
- Name: Troubleshooting
2933+
File: distr-tracing-tempo-troubleshooting
29322934
- Name: Upgrading
29332935
File: distr-tracing-tempo-updating
29342936
- Name: Removing
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="problems-ingesting-traces_{context}"]
7+
= Troubleshooting issues ingesting traces
8+
9+
When the TempoStack instance is failing to ingest traces, you can troubleshoot it as follows.
10+
11+
.Procedure
12+
13+
. Review the components in the ingestion path to diagnose the issue:
14+
+
15+
* The spans are not sent correctly.
16+
* The spans are not sampled.
17+
18+
. Check for errors in the logs that identify unhealthy ingesters, which might be failing due to out-of-memory (OOM) issues or scale-down events.
19+
+
20+
In high-volume tracing environments, the default trace limits might not be adequate.
21+
If spans are being refused due to these limits, you can discover this in the distributor logs.
22+
+
23+
You can also see that the `tempo_discarded_spans_total` metric is incremented.
24+
25+
. Check the following two metrics to verify that the TempoStack instance is successfully receiving traces:
26+
+
27+
[source,yaml]
28+
----
29+
tempo_distributor_spans_received_total
30+
tempo_ingester_traces_created_total
31+
----
32+
+
33+
** If both of these metrics are greather than zero, the TempoStack instance is successfully ingesting data.
34+
** If `tempo_distributor_spans_received_total` is `0`, the distributor is not receiving spans.
35+
Troubleshoot by checking the network configuration, protocols, and ports of the applications that are attempting to send traces to the distributor.
36+
** If `tempo_ingester_traces_created_total is `0`, there is a communication issue between the ingester and the distributor. Troubleshoot by inspecting the logs for both to find errors.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="problems-quering-traces_{context}"]
7+
= Troubleshooting issues with quering traces
8+
9+
Wheb the issues remain even though you have verified that the TempoStack instance is successfully ingesting data, troubleshoot for issues with querying the data.
10+
11+
.Procedure
12+
13+
. Check the logs of the `query-frontend` pod, which runs with two containers: `query-frontend` and `querier`. Logs containing errors similar to the following indicate an error in the query path:
14+
+
15+
[source,terminal]
16+
----
17+
level=info ts=<...> caller=frontend.go:63 method=GET traceID=<...> url=/api/traces/<...> duration=<...> status=500
18+
could not dial 10.X.X.X:3200 connection refused
19+
----
20+
21+
. Review the logs of the `tempo-querier` and `tempo-queryfrontend` services to determine if either has crashed and as a result unable to communicate with each other.
22+
23+
. Check your network settings and policies to ensure proper communication between them.
24+
25+
. Review the `cortex_query_frontend_connected_clients` metric, which is exposed by the `query-frontend`. If `cortex_query_frontend_connected_clients` is greater than `0`, it is an indication than the `queriers` are connected to the `query-frontend`.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="getting-tempo-stack-logs_{context}"]
7+
= Getting the TempoStack instance logs
8+
9+
You can get the logs from a TempoStack component as follows.
10+
11+
.Procedure
12+
13+
. Locate the component you want to get the logs for. You can do this by listing all the deployments on an specific namespace.
14+
15+
. Identify the pods that belong to this component.
16+
17+
. Watch the logs by using `oc logs` command or retrieve the logs by using the web console.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="dist-tracing-tempo-troubleshooting"]
3+
= Troubleshooting
4+
include::_attributes/common-attributes.adoc[]
5+
:context: dist-tracing-tempo-troubleshooting
6+
7+
toc::[]
8+
9+
You can diagnose and fix TempoStack instance issues by using various troubleshooting methods.
10+
11+
You can inspect the logs of the different components along with some useful metrics.
12+
13+
If the the TempoStack instance has no traces, there are two possible causes:
14+
15+
* TempoStack instance is not ingesting traces.
16+
* There is an issue with the query path.
17+
18+
include::modules/distr-tracing-tempo-getting-the-tempostack-instance-logs.adoc[leveloffset=+1]
19+
20+
include::modules/distr-tracing-tempo-troubleshooting-issues-ingesting-traces.adoc[leveloffset=+1]
21+
22+
include::modules/distr-tracing-tempo-troubleshooting-issues-with-quering-traces.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)