Skip to content

OBSDOCS-1142: Add Tempo troubleshooting page #81524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

max-cx
Copy link
Contributor

@max-cx max-cx commented Sep 10, 2024

Version(s):

4.12, 4.13, 4.14, 4.15, 4.16, 4.17

Issue:

https://issues.redhat.com/browse/OBSDOCS-1142

Link to docs preview:

https://81524--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.html

QE review:

  • QE has approved this change.

Additional information:

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2024

@max-cx: This pull request references OBSDOCS-1142 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Signed-off-by: Ruben Vargas [email protected]

Version(s):

4.12, 4.13, 4.14, 4.15, 4.16, 4.17

Issue:

https://issues.redhat.com/browse/OBSDOCS-1142

Link to docs preview:

https://79058--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.html

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2024

@max-cx: This pull request references OBSDOCS-1142 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Signed-off-by: Ruben Vargas [email protected]

Version(s):

4.12, 4.13, 4.14, 4.15, 4.16, 4.17

Issue:

https://issues.redhat.com/browse/OBSDOCS-1142

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 10, 2024
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Sep 10, 2024

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2024

@max-cx: This pull request references OBSDOCS-1142 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Signed-off-by: Ruben Vargas [email protected]

Version(s):

4.12, 4.13, 4.14, 4.15, 4.16, 4.17

Issue:

https://issues.redhat.com/browse/OBSDOCS-1142

Link to docs preview:

https://81524--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.html

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@max-cx max-cx force-pushed the OBSDOCS-1142 branch 4 times, most recently from c6a29e3 to 44bd082 Compare September 13, 2024 16:40
@max-cx
Copy link
Contributor Author

max-cx commented Sep 13, 2024

/test deploy-preview

Copy link

openshift-ci bot commented Sep 13, 2024

@max-cx: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test validate-asciidoc
  • /test validate-portal

Use /test all to run all jobs.

In response to this:

/test deploy-preview

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@max-cx
Copy link
Contributor Author

max-cx commented Sep 13, 2024

/test all

@max-cx max-cx force-pushed the OBSDOCS-1142 branch 4 times, most recently from 6bbd736 to 22f2d44 Compare September 13, 2024 19:18
@max-cx
Copy link
Contributor Author

max-cx commented Sep 16, 2024

/label peer-review-needed

@openshift-ci openshift-ci bot added the peer-review-needed Signifies that the peer review team needs to review this PR label Sep 16, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 16, 2024

@max-cx: This pull request references OBSDOCS-1142 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Version(s):

4.12, 4.13, 4.14, 4.15, 4.16, 4.17

Issue:

https://issues.redhat.com/browse/OBSDOCS-1142

Link to docs preview:

https://81524--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-troubleshooting.html

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@abrennan89 abrennan89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments


.Procedure

. Locate the component you want to get the logs for. You can do this by listing all the deployments on an specific namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear if a user would know how to do this - maybe if there is an oc CLI command they can use to do this we should include it here, and an example output?

Copy link
Contributor Author

@max-cx max-cx Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubenvp8510, WDYT?

  1. Any CLI command?

  2. What's the navigation path in the web console?

Copy link
Contributor

@rubenvp8510 rubenvp8510 Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
I think the standard way of doing this is using:
oc get pods
to list all pods

Then select the ones that belongs to the component you want to inspects logs for

oc logs <pod name> , usually the pod name follows this pattern : tempo-<tempo-stack-name>-<component>


. Locate the component you want to get the logs for. You can do this by listing all the deployments on an specific namespace.

. Identify the pods that belong to this component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, how would a user do this? Are there specific commands involved or something specific they're looking for to identify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubenvp8510, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
I think the standard way of doing this is using:
oc get pods
to list all pods, the ones that belongs to the tempostack will be in the format

tempo-<tempo-stack-name>-<component>


. Identify the pods that belong to this component.

. Watch the logs by using `oc logs` command or retrieve the logs by using the web console.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include an example command here instead I think, something like:

. Watch the logs for the component by running the following command:
+
[source,terminal]
----
$ oc logs <component>
----
+
You can also view logs by navigating to X in the {product-title} web console.

(I'm not 100% what the command would be, this is just an example)
I think maybe there should be separate procedures also for using the oc CLI versus using the web console - neither one is listed here as .Prerequisites even though a user would need either access to the web console or to have the oc CLI installed here to follow this procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubenvp8510, two requests to you here:

  1. Could you recommend a specific oc logs <options> command?

  2. What's the navigation path in the web console?

[id="problems-ingesting-traces_{context}"]
= Troubleshooting issues with ingesting traces

When the TempoStack instance is failing to ingest traces, you can troubleshoot it as follows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: how would a user know if traces aren't being ingested? Would they see a particular error or run into particular issues that signal this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubenvp8510, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different indicators, but the most important one is that the customer won't be able to see traces on the UI.

When that happens, is when should need to investigate if the traces are being droped, using the metrics mentioned in this doc.

@abrennan89
Copy link
Contributor

It seems like the procedures that are included don't really include information about how a user would do certain tasks, for example #81524 (comment)

Is there a reason why these modules couldn't be refactored maybe to instead provide step by step guidance that has specific oc commands that users can follow, or directs them to a specific place in the web console where they can look for certain info for each step?

@max-cx
Copy link
Contributor Author

max-cx commented Sep 18, 2024

@abrennan89, thank you for your insightful review. Looks like this PR needs a bit of further development. The hardest part of it is about making the procedure module work for the purpose of troubleshooting, which really should be a different module type like https://www.oxygenxml.com/dita/1.3/specs/archSpec/technicalContent/dita-troubleshooting-topic.html.

@bergerhoffer
Copy link
Contributor

The branch/enterprise-4.18 label has been added to this PR.

This is because your PR targets the main branch and is labeled for enterprise-4.17. And any PR going into main must also target the latest version branch (enterprise-4.18).

If the update in your PR does NOT apply to version 4.18 onward, please re-target this PR to go directly into the appropriate version branch or branches (enterprise-4.x) instead of main.

Copy link

openshift-ci bot commented Nov 15, 2024

@max-cx: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2025
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bergerhoffer
Copy link
Contributor

The branch/enterprise-4.19 label has been added to this PR.

This is because your PR targets the main branch and is labeled for enterprise-4.18. And any PR going into main must also target the latest version branch (enterprise-4.19).

If the update in your PR does NOT apply to version 4.19 onward, please re-target this PR to go directly into the appropriate version branch or branches (enterprise-4.x) instead of main.

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.12 branch/enterprise-4.13 branch/enterprise-4.14 branch/enterprise-4.15 branch/enterprise-4.16 branch/enterprise-4.17 branch/enterprise-4.18 branch/enterprise-4.19 jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. peer-review-in-progress Signifies that the peer review team is reviewing this PR peer-review-needed Signifies that the peer review team needs to review this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants