Skip to content

feat(sdk)!: provider builder build() returns Result instead of panicking#3462

Open
lazureykis wants to merge 1 commit intoopen-telemetry:mainfrom
lazureykis:feat/provider-build-result
Open

feat(sdk)!: provider builder build() returns Result instead of panicking#3462
lazureykis wants to merge 1 commit intoopen-telemetry:mainfrom
lazureykis:feat/provider-build-result

Conversation

@lazureykis
Copy link
Copy Markdown

@lazureykis lazureykis commented Apr 16, 2026

Fixes #2690
Design discussion issue (if applicable) #2673

Changes

Makes build() return Result for all SDK provider and batch processor builders, eliminating
two categories of panics at initialization time:

Thread spawn failures (ProviderBuildError::ThreadSpawnFailed) — BatchSpanProcessor and
BatchLogProcessor both spawn a background OS thread. On resource-exhausted systems this
thread::Builder::spawn() call can fail; previously it would expect() and crash the process.

Async-runtime/processor mismatch (ProviderBuildError::AsyncRuntimeRequired) — The batch
processor runs on a dedicated OS thread using blocking calls (since 0.28). Async HTTP clients
like reqwest::Client and HyperClient require a Tokio reactor, which isn't available on that
thread — causing a panic at export time with "there is no reactor running". This is a common
pitfall because reqwest::Client (async) is the default in many Rust applications.

This PR surfaces that misconfiguration at build time. SpanExporter and LogExporter gain a
requires_async_runtime() -> bool method (default: false); OTLP exporters propagate this
flag from their underlying HttpClient. The check is best-effort: third-party exporters that
do not override the method continue to return false and may still panic at export time.

Build-time detection is preferred over simply documenting "don't use async clients" because:

  • The async reqwest::Client is what most Rust developers reach for first
  • The panic surfaces deep in the export path, often minutes after startup, making it hard to
    diagnose
  • Third-party HttpClient implementations can opt into the same safety net by overriding
    requires_async_runtime()
  • A clear ProviderBuildError::AsyncRuntimeRequired is better DX than a tokio reactor panic

Removed with_batch_exporter() convenience method — from both TracerProviderBuilder and
LoggerProviderBuilder. Users now build the batch processor explicitly and pass it via
with_span_processor() / with_log_processor(). This eliminates the design tension of having
a fallible operation hidden inside an infallible builder method (see design rationale below).

API changes (breaking)

Before After
TracerProviderBuilder::build() -> SdkTracerProvider -> Result<SdkTracerProvider, ProviderBuildError>
LoggerProviderBuilder::build() -> SdkLoggerProvider -> Result<SdkLoggerProvider, ProviderBuildError>
BatchSpanProcessorBuilder::build() -> BatchSpanProcessor -> Result<BatchSpanProcessor, ProviderBuildError>
BatchLogProcessorBuilder::build() -> BatchLogProcessor -> Result<BatchLogProcessor, ProviderBuildError>
TracerProviderBuilder::with_batch_exporter() Removed — use with_span_processor()
LoggerProviderBuilder::with_batch_exporter() Removed — use with_log_processor()

ProviderBuildError is #[non_exhaustive] to allow future expansion (e.g. detecting additional
invalid client combinations as noted in #2690).

Design rationale: removing with_batch_exporter()

with_batch_exporter() was a convenience wrapper that internally constructed a
BatchSpanProcessor — a fallible operation (thread spawn, async-runtime validation). This
created a design problem: how should a builder method that returns Self report errors?

Three approaches were considered:

  1. Deferred error (pending_error field) — Store the error in the builder and surface it
    at build(). This is what reqwest::ClientBuilder does. The problem: hidden state, and if
    with_batch_exporter() is called twice with two invalid exporters, only the last error is
    reported (the first is silently dropped).

  2. Factory closure (PendingProcessor enum) — Store a closure that captures the exporter
    and defer construction to build(). Correct and order-preserving, but novel — no major Rust
    crate uses this pattern — and adds complexity (closure boxing, manual Debug impls).

  3. Remove the convenience method entirely — Require users to build the batch processor
    explicitly via BatchSpanProcessor::builder(exporter).build()? and pass it to
    with_span_processor(). This follows the tracing-subscriber pattern (.with(layer) takes
    pre-built layers) and the broader Rust ecosystem consensus: builder setters are infallible,
    build() is the single fallible call site.

Option 3 was chosen because it matches the dominant pattern across the Rust ecosystem
(tracing-subscriber, axum, tower, tokio, rdkafka — all take pre-built components
rather than wrapping fallible construction in convenience methods). The two-line migration is
straightforward and the explicit processor construction gives users access to BatchConfigBuilder
for customization — something with_batch_exporter() did not expose.

Provider build() infallibility note

Both TracerProviderBuilder::build() and LoggerProviderBuilder::build() are currently
infallible at the provider level — the Result return type is prophylactic for future
validation. The only errors today come from batch processor construction
(BatchSpanProcessor::builder().build() / BatchLogProcessor::builder().build()), which
happens before the provider build() call.

SdkTracerProvider::Default uses .expect() which is safe because the default builder
has no batch processors.

Scope

This PR covers both TracerProvider and LoggerProvider in a single change so the two providers
stay consistent and the breaking change is delivered atomically.

Migration

// Before
let provider = SdkTracerProvider::builder()
    .with_batch_exporter(exporter)
    .build();

// After
let processor = BatchSpanProcessor::builder(exporter).build()?;
let provider = SdkTracerProvider::builder()
    .with_span_processor(processor)
    .build()?;
// Before (logs)
let provider = SdkLoggerProvider::builder()
    .with_batch_exporter(exporter)
    .build();

// After (logs)
let processor = BatchLogProcessor::builder(exporter).build()?;
let provider = SdkLoggerProvider::builder()
    .with_log_processor(processor)
    .build()?;

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lazureykis lazureykis force-pushed the feat/provider-build-result branch from 657f1a4 to 5832e63 Compare April 16, 2026 20:02
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 90.86957% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.3%. Comparing base (8e95e16) to head (6020391).

Files with missing lines Patch % Lines
opentelemetry-appender-tracing/src/layer.rs 80.0% 6 Missing ⚠️
opentelemetry-http/src/lib.rs 66.6% 3 Missing ⚠️
opentelemetry-otlp/src/exporter/http/logs.rs 0.0% 3 Missing ⚠️
opentelemetry-otlp/src/exporter/http/trace.rs 0.0% 3 Missing ⚠️
opentelemetry-sdk/src/logs/batch_log_processor.rs 89.6% 3 Missing ⚠️
opentelemetry-sdk/src/trace/span_processor.rs 89.6% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##            main   #3462    +/-   ##
======================================
  Coverage   83.2%   83.3%            
======================================
  Files        128     128            
  Lines      25086   25199   +113     
======================================
+ Hits       20896   21013   +117     
+ Misses      4190    4186     -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lazureykis lazureykis force-pushed the feat/provider-build-result branch 3 times, most recently from 4f8a2fa to abbee3d Compare April 19, 2026 20:07
@lazureykis lazureykis marked this pull request as ready for review April 19, 2026 20:48
@lazureykis lazureykis requested a review from a team as a code owner April 19, 2026 20:48
TracerProviderBuilder::build() and LoggerProviderBuilder::build() now
return Result<_, ProviderBuildError> instead of panicking on invalid
configuration.

BatchSpanProcessor::builder().build() and BatchLogProcessor::builder().build()
now return Result<_, ProviderBuildError> instead of panicking when no
async runtime is available.

Removes with_batch_exporter() convenience methods from provider builders.
Users now construct batch processors explicitly, which makes the fallible
step visible in their code and avoids hidden panics.

BREAKING CHANGES:
- TracerProviderBuilder::build() returns Result<SdkTracerProvider, ProviderBuildError>
- LoggerProviderBuilder::build() returns Result<SdkLoggerProvider, ProviderBuildError>
- BatchSpanProcessor::builder().build() returns Result<BatchSpanProcessor, ProviderBuildError>
- BatchLogProcessor::builder().build() returns Result<BatchLogProcessor, ProviderBuildError>
- Removed TracerProviderBuilder::with_batch_exporter()
- Removed LoggerProviderBuilder::with_batch_exporter()

Assisted-by: Claude Opus 4.6
@lazureykis lazureykis force-pushed the feat/provider-build-result branch from abbee3d to 6020391 Compare April 19, 2026 20:51
@scottgerring
Copy link
Copy Markdown
Member

Hey @lazureykis thanks for opening this PR!
#3375 talks about doing this; as its a breaking change and reasonably sprawling we need to make sure the main maintainer @cijothomas is happy with the approach before moving forward. I think structured errors on this API make sense and the only controversy is likely to be the breaking-ness, but we're still pre-release so ...

Anyway, to that end, it would be great if you could add some text to that tracking issue about how you came across this and the misery it caused to help make it clear this needs addressing.

@lazureykis
Copy link
Copy Markdown
Author

Hey @lazureykis thanks for opening this PR! #3375 talks about doing this; as its a breaking change and reasonably sprawling we need to make sure the main maintainer @cijothomas is happy with the approach before moving forward. I think structured errors on this API make sense and the only controversy is likely to be the breaking-ness, but we're still pre-release so ...

Anyway, to that end, it would be great if you could add some text to that tracking issue about how you came across this and the misery it caused to help make it clear this needs addressing.

Thanks Scott! I played with the API to eliminate the panic using 3 different approaches and this PR is what I ended up with. Curious what @cijothomas thinks.

@cijothomas
Copy link
Copy Markdown
Member

https://github.com/open-telemetry/opentelemetry-rust#project-status
API and SDK are marked stable for Logs and Metrics, so no breaking changes allowed there. OTLP Exporter is the perfect candidate to apply this change to, and that is also where we see most of the issues this is trying to address.

@lazureykis
Copy link
Copy Markdown
Author

Adding helpers to the OTLP crate would only cover OTLP users and the panic in BatchLogProcessor remains reachable through the SDK.

@cijothomas
Copy link
Copy Markdown
Member

Adding helpers to the OTLP crate would only cover OTLP users and the panic in BatchLogProcessor remains reachable through the SDK.

You are fully correct. However, my position still remains the same - we cannot take breaking change to stable components, so OTLP Exporter builder is the only place we can address now. I think that should address majority of user feedback. Fixing BatchProcessor has to wait for v2.0 where we can make breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants