-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add connection pooling support to OTLP exporter for high-throughput scenarios #14364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add connection pooling support to OTLP exporter for high-throughput scenarios #14364
Conversation
When service.telemetry.metrics.level is set to 'none', the collector should skip registering process metrics to avoid errors on platforms where gopsutil is not supported (such as AIX). This change conditionally registers process metrics only when the metrics level is not LevelNone, preventing the 'failed to register process metrics: not implemented yet' error on unsupported platforms. Fixes regression introduced in v0.136.0 where the check for metrics level was removed.
Similar to the resolution for pcommon.Value in previous changes, this update ensures consistent documentation across all pdata types by clarifying that calling functions on zero-initialized instances is invalid usage. Changes: - Updated template files (one_of_field.go, one_of_message_value.go) to generate improved comment wording - Updated pcommon/value.go comments manually - Updated all generated pdata files to use consistent wording: 'is invalid and will cause a panic' instead of 'will cause a panic' This makes it clearer that using zero-initialized instances is not just dangerous but explicitly invalid usage, improving API documentation clarity.
…onfig file endpoints Fixes open-telemetry#14286 When both OTEL_EXPORTER_OTLP_TRACES_ENDPOINT environment variable and a configured endpoint in the config file are present, the URL scheme from the environment variable was incorrectly overriding the scheme from the config file, resulting in mixed endpoints (e.g., http scheme from env var + path from config file). This fix ensures that environment variables do not override explicitly configured endpoints by temporarily unsetting the OTEL_EXPORTER_OTLP_*_ENDPOINT environment variables before creating the SDK, then restoring them afterward. According to the OpenTelemetry specification, explicit configuration should take precedence over environment variables. Changes: - Modified sdk.go to temporarily unset OTEL_EXPORTER_OTLP_*_ENDPOINT environment variables before calling config.NewSDK() - Added helper functions unsetOTLPEndpointEnvVars() and restoreEnvVars() - Added comprehensive tests to verify env vars don't override config
…cenarios This enhancement adds a connection_pool_size configuration option to the OTLP exporter, enabling multiple gRPC connections with round-robin load balancing. Key changes: - Add connection_pool_size config parameter (default: 0, uses 1 connection) - Implement round-robin load balancing across multiple connections - Support for 1-256 concurrent gRPC connections - Backward compatible: default behavior unchanged This resolves performance issues in high-throughput environments (10K+ spans/sec) and high-latency network scenarios where a single gRPC connection becomes a bottleneck. Also fixes unrelated service.go issue per contributor feedback on PR open-telemetry#14342.
|
I hope this works well, as was adressed through the issue .if any unrelated changes occur or anything which is wrong , please do tell after the review. |
bogdandrutu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show me some data the demonstrate this is needed? gRPC says that you don't need to do this and it will automatically use multiple sockets, etc.
|
It doesn’t seem general enough to justify inclusion in the core component. |
Description
This PR adds connection pooling support to the OTLP exporter to resolve performance issues in high-throughput and high-latency environments.
Motivation
As reported in the issue, users experience unreliability in the OTLP exporter with:
The single gRPC connection becomes a bottleneck, causing queue overflow and dropped spans.
Changes:
##Core Implementation:
*Added connection_pool_size
configuration parameter** toConfig` structConfig.Validate()Implemented connection pool in
baseExporterThread-safe round-robin distribution*
method usesatomic.Uint32`Documentation
Bug Fix
Testing
Usage Example
``yaml
exporters:
otlp/high-throughput:
endpoint: otel-gateway:443
connection_pool_size: 5 # Creates 5 gRPC connections
compression: snappy
timeout: 20s
sending_queue:
num_consumers: 100
queue_size: 2000