feat(sdk)!: provider builder build() returns Result instead of panicking#3462
feat(sdk)!: provider builder build() returns Result instead of panicking#3462lazureykis wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
657f1a4 to
5832e63
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3462 +/- ##
======================================
Coverage 83.2% 83.3%
======================================
Files 128 128
Lines 25086 25199 +113
======================================
+ Hits 20896 21013 +117
+ Misses 4190 4186 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4f8a2fa to
abbee3d
Compare
TracerProviderBuilder::build() and LoggerProviderBuilder::build() now return Result<_, ProviderBuildError> instead of panicking on invalid configuration. BatchSpanProcessor::builder().build() and BatchLogProcessor::builder().build() now return Result<_, ProviderBuildError> instead of panicking when no async runtime is available. Removes with_batch_exporter() convenience methods from provider builders. Users now construct batch processors explicitly, which makes the fallible step visible in their code and avoids hidden panics. BREAKING CHANGES: - TracerProviderBuilder::build() returns Result<SdkTracerProvider, ProviderBuildError> - LoggerProviderBuilder::build() returns Result<SdkLoggerProvider, ProviderBuildError> - BatchSpanProcessor::builder().build() returns Result<BatchSpanProcessor, ProviderBuildError> - BatchLogProcessor::builder().build() returns Result<BatchLogProcessor, ProviderBuildError> - Removed TracerProviderBuilder::with_batch_exporter() - Removed LoggerProviderBuilder::with_batch_exporter() Assisted-by: Claude Opus 4.6
abbee3d to
6020391
Compare
|
Hey @lazureykis thanks for opening this PR! Anyway, to that end, it would be great if you could add some text to that tracking issue about how you came across this and the misery it caused to help make it clear this needs addressing. |
Thanks Scott! I played with the API to eliminate the panic using 3 different approaches and this PR is what I ended up with. Curious what @cijothomas thinks. |
|
https://github.com/open-telemetry/opentelemetry-rust#project-status |
|
Adding helpers to the OTLP crate would only cover OTLP users and the panic in BatchLogProcessor remains reachable through the SDK. |
You are fully correct. However, my position still remains the same - we cannot take breaking change to stable components, so OTLP Exporter builder is the only place we can address now. I think that should address majority of user feedback. Fixing BatchProcessor has to wait for v2.0 where we can make breaking changes. |
Fixes #2690
Design discussion issue (if applicable) #2673
Changes
Makes
build()returnResultfor all SDK provider and batch processor builders, eliminatingtwo categories of panics at initialization time:
Thread spawn failures (
ProviderBuildError::ThreadSpawnFailed) —BatchSpanProcessorandBatchLogProcessorboth spawn a background OS thread. On resource-exhausted systems thisthread::Builder::spawn()call can fail; previously it wouldexpect()and crash the process.Async-runtime/processor mismatch (
ProviderBuildError::AsyncRuntimeRequired) — The batchprocessor runs on a dedicated OS thread using blocking calls (since 0.28). Async HTTP clients
like
reqwest::ClientandHyperClientrequire a Tokio reactor, which isn't available on thatthread — causing a panic at export time with "there is no reactor running". This is a common
pitfall because
reqwest::Client(async) is the default in many Rust applications.This PR surfaces that misconfiguration at build time.
SpanExporterandLogExportergain arequires_async_runtime() -> boolmethod (default:false); OTLP exporters propagate thisflag from their underlying
HttpClient. The check is best-effort: third-party exporters thatdo not override the method continue to return
falseand may still panic at export time.Build-time detection is preferred over simply documenting "don't use async clients" because:
reqwest::Clientis what most Rust developers reach for firstdiagnose
HttpClientimplementations can opt into the same safety net by overridingrequires_async_runtime()ProviderBuildError::AsyncRuntimeRequiredis better DX than a tokio reactor panicRemoved
with_batch_exporter()convenience method — from bothTracerProviderBuilderandLoggerProviderBuilder. Users now build the batch processor explicitly and pass it viawith_span_processor()/with_log_processor(). This eliminates the design tension of havinga fallible operation hidden inside an infallible builder method (see design rationale below).
API changes (breaking)
TracerProviderBuilder::build() -> SdkTracerProvider-> Result<SdkTracerProvider, ProviderBuildError>LoggerProviderBuilder::build() -> SdkLoggerProvider-> Result<SdkLoggerProvider, ProviderBuildError>BatchSpanProcessorBuilder::build() -> BatchSpanProcessor-> Result<BatchSpanProcessor, ProviderBuildError>BatchLogProcessorBuilder::build() -> BatchLogProcessor-> Result<BatchLogProcessor, ProviderBuildError>TracerProviderBuilder::with_batch_exporter()with_span_processor()LoggerProviderBuilder::with_batch_exporter()with_log_processor()ProviderBuildErroris#[non_exhaustive]to allow future expansion (e.g. detecting additionalinvalid client combinations as noted in #2690).
Design rationale: removing
with_batch_exporter()with_batch_exporter()was a convenience wrapper that internally constructed aBatchSpanProcessor— a fallible operation (thread spawn, async-runtime validation). Thiscreated a design problem: how should a builder method that returns
Selfreport errors?Three approaches were considered:
Deferred error (
pending_errorfield) — Store the error in the builder and surface itat
build(). This is whatreqwest::ClientBuilderdoes. The problem: hidden state, and ifwith_batch_exporter()is called twice with two invalid exporters, only the last error isreported (the first is silently dropped).
Factory closure (
PendingProcessorenum) — Store a closure that captures the exporterand defer construction to
build(). Correct and order-preserving, but novel — no major Rustcrate uses this pattern — and adds complexity (closure boxing, manual
Debugimpls).Remove the convenience method entirely — Require users to build the batch processor
explicitly via
BatchSpanProcessor::builder(exporter).build()?and pass it towith_span_processor(). This follows thetracing-subscriberpattern (.with(layer)takespre-built layers) and the broader Rust ecosystem consensus: builder setters are infallible,
build()is the single fallible call site.Option 3 was chosen because it matches the dominant pattern across the Rust ecosystem
(
tracing-subscriber,axum,tower,tokio,rdkafka— all take pre-built componentsrather than wrapping fallible construction in convenience methods). The two-line migration is
straightforward and the explicit processor construction gives users access to
BatchConfigBuilderfor customization — something
with_batch_exporter()did not expose.Provider
build()infallibility noteBoth
TracerProviderBuilder::build()andLoggerProviderBuilder::build()are currentlyinfallible at the provider level — the
Resultreturn type is prophylactic for futurevalidation. The only errors today come from batch processor construction
(
BatchSpanProcessor::builder().build()/BatchLogProcessor::builder().build()), whichhappens before the provider
build()call.SdkTracerProvider::Defaultuses.expect()which is safe because the default builderhas no batch processors.
Scope
This PR covers both
TracerProviderandLoggerProviderin a single change so the two providersstay consistent and the breaking change is delivered atomically.
Migration
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial, user-facing changes