Skip to content

feat: integrate OTEL/Jaeger #3815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

hthieu1110
Copy link
Contributor

@hthieu1110 hthieu1110 commented Feb 24, 2025

Implementation for #2434

We can filter by BlockHeight
Screenshot 2025-03-05 at 11 19 19

We can trace the calls
Screenshot 2025-03-05 at 11 20 03

@github-actions github-actions bot added 📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related labels Feb 24, 2025
@hthieu1110 hthieu1110 changed the title wip infra: integrate OTEL/Jaeger Feb 24, 2025
@Gno2D2 Gno2D2 requested a review from a team February 24, 2025 14:26
@Gno2D2
Copy link
Collaborator

Gno2D2 commented Feb 24, 2025

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)
🟢 Pending initial approval by a review team member, or review from tech-staff

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 The pull request was created from a fork (head branch repo: hthieu1110/gno)

Then

🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request

Pending initial approval by a review team member, or review from tech-staff

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 Not (🔴 Pull request author is a member of the team: tech-staff)

Then

🟢 Requirement satisfied
└── 🟢 If
    ├── 🟢 Condition
    │   └── 🟢 Or
    │       ├── 🔴 At least 1 user(s) of the organization reviewed the pull request (with state "APPROVED")
    │       ├── 🟢 At least 1 user(s) of the team tech-staff reviewed pull request
    │       └── 🔴 This pull request is a draft
    └── 🟢 Then
        └── 🟢 And
            ├── 🟢 Not (🔴 This label is applied to pull request: review/triage-pending)
            └── 🟢 At least 1 user(s) of the team tech-staff reviewed pull request

Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

Copy link

codecov bot commented Feb 25, 2025

@hthieu1110 hthieu1110 marked this pull request as ready for review March 5, 2025 04:22
@hthieu1110 hthieu1110 force-pushed the feat/integrate-otel-jaeger branch from c6d3691 to f8437e6 Compare April 17, 2025 10:20
@hthieu1110
Copy link
Contributor Author

hi @gfanton, I've updated the code with your suggested changes, could you take a look again pls :)
I've checked, normally all the code related to addSpan are already after the lock acquisition.

@thehowl
Copy link
Member

thehowl commented May 15, 2025

Can you address the failing CI?

@github-actions github-actions bot added the 🧾 package/realm Tag used for new Realms or Packages. label May 26, 2025
@hthieu1110
Copy link
Contributor Author

hiii @gfanton , sorry for my late reply. Seems that some tests are broken on master, I'm waiting a little bit for that to be fixed first.

Copy link
Member

@zivkovicmilos zivkovicmilos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the consensus module is not the best place, for now, to showcase tracing.

What do you think about hooking up the keeper query handlers, or RPC handlers?

MeterName string `json:"meter_name" toml:"meter_name"`
ServiceName string `json:"service_name" toml:"service_name" comment:"in Prometheus this is transformed into the label 'exported_job'"`
ServiceInstanceID string `json:"service_instance_id" toml:"service_instance_id" comment:"the ID helps to distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled service), in Prometheus this is transformed into the label 'exported_instance"`
ExporterEndpoint string `json:"exporter_endpoint" toml:"exporter_endpoint" comment:"the endpoint to export metrics to, like a local OpenTelemetry collector"`

TracesEnabled bool `json:"traces_enabled" toml:"traces_enabled"`
GracefulShutdownTelemetry bool `json:"graceful_shutdown_telemetry" toml:"graceful_shutdown_telemetry"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever not want to gracefully shut down telemetry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally there was no Graceful shutdown, so I added this for case where we need that. I think that we will need GracefulShutdown when we run it in production and does not need it when running locally or test. Wdyt ?

"go.opentelemetry.io/otel/trace/noop"
)

type TracerFactory func() trace.Tracer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop this indirection

// Check if the metrics are enabled at all
if !c.MetricsEnabled {
return nil
func Init(c config.Config, logger *slog.Logger) (*sdkTrace.TracerProvider, *sdkMetric.MeterProvider, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a logger just to log that something happened in the func is overkill

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should also be split up into 2 different ones, one for the tracing, and one for the metrics -- because in the current state you still require the caller to check whether something is nil

return nil, fmt.Errorf("unable to create http traces exporter, %w", err)
}
default:
exp, err = otlptracegrpc.New(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a convention after talking to Jae, and that's that grpc fallbacks like this should be dropped entirely -- only support http / https

return nil, fmt.Errorf("error parsing tracer exporter endpoint: %s, %w", cfg.ExporterEndpoint, err)
}

// Use oltp metric exporter with http/https or grpc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here?

@@ -1445,6 +1513,12 @@ func (cs *ConsensusState) defaultSetProposal(proposal *types.Proposal) error {
// NOTE: block is not necessarily valid.
// Asynchronously triggers either enterPrevote (before we timeout of propose) or tryFinalizeCommit, once we have the full block.
func (cs *ConsensusState) addProposalBlockPart(msg *BlockPartMessage, peerID p2pTypes.ID) (added bool, err error) {
span := cs.addSpan("addProposalBlockPart")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this

@@ -1515,6 +1589,13 @@ func (cs *ConsensusState) addProposalBlockPart(msg *BlockPartMessage, peerID p2p

// Attempt to add the vote. if its a duplicate signature, dupeout the validator
func (cs *ConsensusState) tryAddVote(vote *types.Vote, peerID p2pTypes.ID) (bool, error) {
span := cs.addSpan("tryAddVote")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this

@@ -1741,6 +1832,9 @@ func (cs *ConsensusState) voteTime() time.Time {

// sign the vote and publish on internalMsgQueue
func (cs *ConsensusState) signAddVote(type_ types.SignedMsgType, hash []byte, header types.PartSetHeader) *types.Vote {
span := cs.addSpan("signAddVote")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this

func (cs *ConsensusState) addTrace(spanName string, opts ...trace.SpanStartOption) trace.Span {
var span trace.Span

if telemetry.TracesEnabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hthieu1110

Any updates on this?

attribute.Int("csStep", int(cs.Step)),
)
}
cs.traceCtx = ctx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfanton is right, no need to introduce traceCtx. The fact that context is not propagated to most of the consensus state operations is a bug, not a feature

@hthieu1110 hthieu1110 requested a review from zivkovicmilos May 29, 2025 08:57
@hthieu1110
Copy link
Contributor Author

@zivkovicmilos I addressed your feedbacks, please re-check this PR :) thanks

@github-actions github-actions bot added the 📦 🤖 gnovm Issues or PRs gnovm related label Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related 📦 🤖 gnovm Issues or PRs gnovm related 🧾 package/realm Tag used for new Realms or Packages.
Projects
Status: In Progress
Status: In Review
Development

Successfully merging this pull request may close these issues.

[chain] Add OTEL tracing functionality + Jaeger
7 participants