Skip to content

Introduce load-balanced channel for OpenTelemetry exporters#2175

Open
dkostyrev wants to merge 1 commit into
TraceMachina:mainfrom
joomcode:feature/balanced-channel
Open

Introduce load-balanced channel for OpenTelemetry exporters#2175
dkostyrev wants to merge 1 commit into
TraceMachina:mainfrom
joomcode:feature/balanced-channel

Conversation

@dkostyrev
Copy link
Copy Markdown
Contributor

@dkostyrev dkostyrev commented Feb 23, 2026

Summary

This PR introduces client-side load balancing for OpenTelemetry (OTLP) gRPC connections using the ginepro library. When the NL_OTEL_ENDPOINT (name to be discussed, maybe boolean flag?) environment variable is set, NativeLink will create a load-balanced channel for exporting logs, traces, and metrics, distributing requests across multiple backend endpoints resolved via DNS. This change allows to distribute OTLP traffic across multiple OTLP collector instances.

Changes

Load-balanced OTLP exports:

  • Added ginepro dependency to provide client-side load balancing for gRPC channels used by OpenTelemetry exporters
  • Introduced NL_OTEL_ENDPOINT environment variable to configure the OTLP endpoint for load-balanced connections
  • Changed init_tracing() from synchronous to async to support balanced channel initialization
  • Updated OpenTelemetry dependencies from v0.29 to v0.30 to support the new channel configuration
  • All three exporters (logs, traces, metrics) now share the same load-balanced channel when configured

This change is Reviewable

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator

@dkostyrev Than you for this awesome PR. I have finally had a bit of time to explore it and now I understand quite a bit.

This is fantastic.

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator

@amankrx when you get a chance, please help with the merge conflict here.

.block_on(async {
// The OTLP exporters need to run in a Tokio context.
spawn!("init tracing", async { init_tracing() })
spawn!("init tracing", async { init_tracing().await })
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof! Good catch.

@palfrey
Copy link
Copy Markdown
Member

palfrey commented Mar 6, 2026

That coverage issue is interesting. mold: error: undefined symbol: __memcpy_chk is the sort of problem we used to be having before the fixes in #2192 that removes the hardeningDisable bit that was a workaround to this before. Not quite sure why this is hitting here now, but might need to re-add the workaround.

@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from bb5068e to d6a38f4 Compare May 14, 2026 14:27
@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from d6a38f4 to 9e6ddec Compare May 14, 2026 14:34
@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from 7a99137 to 300e6c5 Compare May 18, 2026 15:57
@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from 5927abc to 7005eab Compare May 19, 2026 14:53
@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from 8b3d6c2 to 6f77c5a Compare May 19, 2026 15:31
@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from 6f77c5a to 35001bc Compare May 30, 2026 13:37
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

@dkostyrev is attempting to deploy a commit to the native-link-web-assets Team on Vercel.

A member of the Team first needs to authorize it.

@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch 2 times, most recently from f4fc7d3 to 89f99fa Compare May 30, 2026 16:37
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nativelink Ready Ready Preview, Comment May 30, 2026 4:40pm
nativelink-aidm Ready Ready Preview, Comment May 30, 2026 4:40pm

Request Review

Add client-side load balancing to OTLP gRPC connections using ginepro.
When NL_OTEL_ENDPOINT is set, the telemetry system creates a load-balanced
channel shared across log, trace, and metric exporters. This enables better
distribution of telemetry traffic across multiple OTLP collector instances
and improves overall system resilience.

- Add ginepro dependency for gRPC load balancing
- Upgrade OpenTelemetry dependencies from 0.29 to 0.30
- Change init_tracing() to async to support channel initialization
- Add NL_OTEL_ENDPOINT environment variable for configuration
- Update all OTLP exporters to use shared load-balanced channel

# Conflicts:
#	Cargo.lock
@dkostyrev dkostyrev force-pushed the feature/balanced-channel branch from c09418b to 0bd5066 Compare May 30, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants