feat(coprocessor): standardize OTLP init + shutdown guard by Eikix · Pull Request #1942 · zama-ai/fhevm

Eikix · 2026-02-09T08:54:17Z

Summary

unify coprocessor OTEL init and shutdown lifecycle in fhevm-engine-common::telemetry
centralize sns-worker subscriber + optional OTEL layer wiring in common telemetry
keep behavior minimal: global tracer provider only, no extra propagator config

Validation

pre-commit passed (cargo fmt, cargo check, clippy)
targeted check passed:
SQLX_OFFLINE=true cargo check -p fhevm-engine-common -p sns-worker -p gw-listener -p host-listener -p tfhe-worker -p transaction-sender -p zkproof-worker
telemetry unit tests pass in fhevm-engine-common

Scope

no gRPC migration
no operation-level span migration

Closes https://github.com/zama-ai/fhevm-internal/issues/1006

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs

Eikix · 2026-02-09T10:08:23Z

@claude /pr-review
is a drop guard good practice or outdated?

claude · 2026-02-09T10:09:19Z

Claude finished @Eikix's task in 1m 17s —— View job

PR Review: feat(coprocessor): standardize OTLP init + shutdown guard

Gather context and read changed files
Analyze drop guard pattern usage
Answer question about drop guard best practices
Provide comprehensive PR review feedback

Question: Is a Drop Guard Good Practice or Outdated?

Short answer: Drop guards are absolutely still good practice in Rust, especially for your use case.

Why Drop Guards are the Right Choice Here

RAII Principle: Drop guards are the idiomatic Rust way to implement Resource Acquisition Is Initialization (RAII). This pattern ensures cleanup happens even when:
- The program panics
- Early returns occur
- Errors cause stack unwinding
OpenTelemetry's Flush Requirement: The SdkTracerProvider::shutdown() method flushes pending spans before shutting down. Without proper shutdown, you risk losing traces that haven't been exported yet. A drop guard guarantees this happens.
Modern Rust Standard Library Uses This Pattern: Examples include:
- MutexGuard<T> - releases the lock on drop
- JoinHandle<T> - can detach or join on drop
- File - flushes and closes on drop
Your Implementation is Clean: The Option::take() pattern in shutdown_once() is the correct way to ensure idempotent shutdown - this prevents double-free issues if shutdown is called manually before drop.

When Drop Guards Might Be "Outdated"

The only scenario where drop guards are considered problematic is in async contexts where the destructor needs to perform async work. In your case, SdkTracerProvider::shutdown() is synchronous, so this isn't an issue.

PR Review Summary

✅ What's Good

Unified OTLP Initialization API: All binaries now use the same init_otlp() or init_otlp_tracing() path, eliminating drift.
Idempotent Shutdown: The Option::take() pattern ensures shutdown_once() is safe to call multiple times.
Good Test Coverage: The unit tests verify the guard semantics (shutdown_once() disarms, tracer presence/absence).
Graceful Degradation: Empty service_name returns Ok(None) instead of failing, allowing opt-out of OTLP.

Consistent Error Handling Across Binaries: All binaries follow the same pattern:

let _otlp_runtime = match telemetry::init_otlp(&config.service_name) {
    Ok(runtime) => runtime,
    Err(err) => {
        error!(error = %err, "Failed to setup OTLP");
        None
    }
};

💡 Suggestions (Non-Blocking)

1. Consider Adding Timeout to Shutdown

telemetry.rs:35-40

The shutdown could potentially hang if the OTLP collector is unresponsive. Consider wrapping with a timeout:

fn shutdown_once(&mut self) {
    if let Some(provider) = self.provider.take() {
        // Consider: timeout around shutdown to prevent hanging on exit
        if let Err(err) = provider.shutdown() {
            warn!(error = %err, "Failed to shutdown OTLP tracer provider");
        }
    }
}

Add timeout →

2. Document Why `tracer()` Returns a Clone

telemetry.rs:67-69

pub fn tracer(&self) -> Option<opentelemetry_sdk::trace::Tracer> {
    self.tracer.clone()  // Why clone? Is Tracer expensive to clone?
}

This is probably fine since Tracer is likely cheap to clone, but a brief comment would help future readers understand the design choice.

3. Consider `#[must_use]` for `OtlpRuntime`

telemetry.rs:51-70

Adding #[must_use] would catch cases where callers accidentally discard the runtime:

#[must_use = "OtlpRuntime must be kept alive for the duration of the process to ensure proper shutdown"]
pub struct OtlpRuntime {
    // ...
}

Add must_use →

✅ Overall Assessment

This PR is well-designed and ready to merge. The drop guard pattern is appropriate, the implementation is clean, and it successfully standardizes OTLP initialization across all coprocessor binaries. The review comments from the self-review have been addressed properly (renaming shutdown_if_needed → shutdown_once, removing redundant public shutdown(), etc.).

|

mergify · 2026-02-09T11:00:16Z

🧪 CI Insights

Here's what we observed from your CI run for 60f4a1f.

🟢 All jobs passed!

But CI Insights is watching 👀

Eikix · 2026-02-09T15:55:36Z

@claude /pr-review

claude · 2026-02-09T15:56:04Z

Claude finished @Eikix's task in 5m 47s —— View job

PR Review: feat(coprocessor): standardize OTLP init + shutdown guard

Pre-review validation
Gather context (PR summary + guidelines)
Run parallel review agents (bug hunter, guideline compliance, error handling)
Validate high-confidence issues
Post final review

Summary

This PR successfully standardizes OTLP initialization across the coprocessor binaries with a clean TracerProviderGuard pattern that ensures proper shutdown. The implementation is sound and follows OpenTelemetry best practices.

Verdict: ✅ Ready to merge with one minor consistency fix suggested below.

What's Good

Clean Drop Guard Pattern: The TracerProviderGuard using Option::take() ensures idempotent shutdown - safe to call multiple times, guaranteed cleanup on drop.
Graceful Degradation: init_json_subscriber correctly initializes JSON logs even when OTLP export fails, ensuring logs are never lost.
Unified API: All binaries now follow the same initialization pattern, eliminating drift.
Good Test Coverage: Unit tests verify guard semantics (shutdown_once disarms, tracer presence/absence).
Correct Clone Semantics: SdkTracerProvider uses Arc internally, so cloning shares state - this is intentional and ensures shutdown affects all references.

Issues Found

🟡 Minor: Inconsistent Log Level for OTLP Failure

Confidence: 95%

File: coprocessor/fhevm-engine/host-listener/src/poller/mod.rs:96

The poller uses warn! while all other services use error! for OTLP initialization failures:

// poller/mod.rs:96 - uses warn!
warn!(error = %err, "Failed to setup OTLP");

// All other services use error!
error!(error = %err, "Failed to setup OTLP");

Impact: Operators monitoring for ERROR-level logs may miss OTLP failures in the poller service. Since OTLP failure means no distributed tracing, this should be treated consistently across all services.

Fix this →

Notes

Drop Guard Pattern - Answering the Original Question

Q: Is a drop guard good practice or outdated?

A: Drop guards are absolutely still good practice, especially for this use case:

RAII Principle: This is idiomatic Rust for resource cleanup that must happen even during panics or early returns.
Standard Library Uses This: MutexGuard<T>, JoinHandle<T>, File all use drop guards.
OpenTelemetry Requirement: SdkTracerProvider::shutdown() flushes pending spans - without proper shutdown, traces can be lost.
Your Implementation is Correct: The Option::take() pattern prevents double-shutdown issues.

The only scenario where drop guards are problematic is for async cleanup, but SdkTracerProvider::shutdown() is synchronous, so this is not an issue here.

View job

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs

Eikix · 2026-02-10T12:10:06Z

@mergify queue

mergify · 2026-02-10T12:10:27Z

feat(coprocessor): standardize otlp init with shutdown guard

eed3f9b

Eikix requested a review from a team as a code owner February 9, 2026 08:54

cla-bot bot added the cla-signed label Feb 9, 2026

Eikix commented Feb 9, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

Eikix commented Feb 9, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

Eikix commented Feb 9, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

Eikix commented Feb 9, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

Eikix commented Feb 9, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

Eikix added 5 commits February 9, 2026 10:03

refactor(coprocessor): clarify otlp helper naming

44d9824

refactor(coprocessor): remove non-shutdown otlp setup path

28a941a

refactor(coprocessor): simplify OTLP init runtime API

29c65ea

refactor(coprocessor): make OTLP runtime init explicit and tested

8649de2

refactor(coprocessor): drop unused OTEL propagator setup

b140ece

Eikix added 3 commits February 9, 2026 11:38

refactor(coprocessor): align OTEL runtime naming

aabb2e9

refactor(coprocessor): collapse OTEL runtime to single guard

c624d37

refactor(coprocessor): decouple tracer from OTEL guard

4ec1f47

Eikix marked this pull request as draft February 9, 2026 11:01

Eikix added 9 commits February 9, 2026 12:02

docs(coprocessor): note transitional tracer return

0e3cb83

refactor(coprocessor): centralize sns-worker subscriber OTEL wiring

b97b1e8

refactor(coprocessor): drop migration TODO note

cb7e9a3

refactor(coprocessor): simplify otel setup and guard naming

4b5fb67

refactor(coprocessor): unify json subscriber init path

52f4378

refactor(coprocessor): keep one json subscriber init API

b01b2df

refactor(coprocessor): remove expect from otel subscriber init

0a0a9cd

refactor(coprocessor): split otel setup paths by use case

f0f77c2

refactor(coprocessor): inline otel tracer-provider setup

eb3c8bf

Eikix marked this pull request as ready for review February 9, 2026 15:04

Eikix added 2 commits February 9, 2026 16:13

test(coprocessor): stabilize host-listener restart reorg test

4c1bf91

fix(coprocessor): keep logs when OTLP init fails

900b416

fix(coprocessor): use error level for poller OTLP init failure

6919a91

obatirou reviewed Feb 10, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

obatirou reviewed Feb 10, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/telemetry.rs Outdated Show resolved Hide resolved

refactor(coprocessor): inline trivial OTEL helpers

60f4a1f

obatirou approved these changes Feb 10, 2026

View reviewed changes

mergify bot added the queued label Feb 10, 2026

mergify bot added a commit that referenced this pull request Feb 10, 2026

Merge of #1942

bb8427d

mergify bot mentioned this pull request Feb 10, 2026

merge queue: embarking main (22f7d0f) and #1942 together #1955

Closed

59 tasks

mergify bot merged commit 4dafed3 into main Feb 10, 2026
72 of 76 checks passed

mergify bot deleted the codex/coprocessor-otel-phase-a-main branch February 10, 2026 14:47

mergify bot removed the queued label Feb 10, 2026

mergify bot added a commit that referenced this pull request Feb 10, 2026

Merge of #1942

7169c3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coprocessor): standardize OTLP init + shutdown guard#1942

feat(coprocessor): standardize OTLP init + shutdown guard#1942
mergify[bot] merged 22 commits intomainfrom
codex/coprocessor-otel-phase-a-main

Eikix commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eikix commented Feb 9, 2026

Uh oh!

claude bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

mergify bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Eikix commented Feb 9, 2026

Uh oh!

claude bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Eikix commented Feb 10, 2026

Uh oh!

mergify bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eikix commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Scope

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eikix commented Feb 9, 2026

Uh oh!

claude bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(coprocessor): standardize OTLP init + shutdown guard

Question: Is a Drop Guard Good Practice or Outdated?

Why Drop Guards are the Right Choice Here

When Drop Guards Might Be "Outdated"

PR Review Summary

✅ What's Good

💡 Suggestions (Non-Blocking)

1. Consider Adding Timeout to Shutdown

2. Document Why tracer() Returns a Clone

3. Consider #[must_use] for OtlpRuntime

✅ Overall Assessment

Uh oh!

mergify bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

Eikix commented Feb 9, 2026

Uh oh!

claude bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(coprocessor): standardize OTLP init + shutdown guard

Summary

What's Good

Issues Found

🟡 Minor: Inconsistent Log Level for OTLP Failure

Notes

Drop Guard Pattern - Answering the Original Question

Uh oh!

Uh oh!

Uh oh!

Eikix commented Feb 10, 2026

Uh oh!

mergify bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Eikix commented Feb 9, 2026 •

edited

Loading

claude bot commented Feb 9, 2026 •

edited

Loading

2. Document Why `tracer()` Returns a Clone

3. Consider `#[must_use]` for `OtlpRuntime`

mergify bot commented Feb 9, 2026 •

edited

Loading

claude bot commented Feb 9, 2026 •

edited

Loading

mergify bot commented Feb 10, 2026 •

edited

Loading