feat(coprocessor): otel cardinality cleanup, dfg pattern attribution, spanmetrics by Eikix · Pull Request #2019 · zama-ai/fhevm

Eikix · 2026-02-23T10:45:05Z

Summary

This PR finishes the coprocessor OTEL cleanup in four parts:

Low-cardinality tracing cleanup across host-listener, tfhe-worker, sns-worker, zkproof-worker, and transaction-sender
Scheduler pattern attribution via bounded operation_pattern_id and transaction_pattern_id span attributes
Spanmetrics plumbing via an OTEL Collector + Prometheus pipeline for the local stack
Pattern integration test hardening so span assertions remain reliable under the shared global subscriber used by the test suite

Closes zama-ai/fhevm-internal#1044, zama-ai/fhevm-internal#1013, zama-ai/fhevm-internal#1012.

What changed

1. Coprocessor span cardinality cleanup

Removes high-cardinality child-span identifiers such as handle, txn_id, request_id, and similar per-item fields from runtime spans
Keeps transaction-scoped identifiers on root/parent spans where they remain useful for debugging
Replaces a few parent: &span call sites with span.in_scope() / equivalent scoped flow so parent-child relationships stay correct
Propagates span context correctly across spawned transaction tasks
Updates the affected SQLx metadata under coprocessor/fhevm-engine/.sqlx

Main files:

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
coprocessor/fhevm-engine/tfhe-worker/src/tfhe_worker.rs
coprocessor/fhevm-engine/sns-worker/src/aws_upload.rs
coprocessor/fhevm-engine/sns-worker/src/executor.rs
coprocessor/fhevm-engine/sns-worker/src/squash_noise.rs
coprocessor/fhevm-engine/transaction-sender/src/ops/*.rs
coprocessor/fhevm-engine/zkproof-worker/src/verifier.rs

2. Scheduler pattern attribution

Adds deterministic operation_pattern_id and transaction_pattern_id attribution in the scheduler
Computes pattern IDs from logical-operation grouping on the pre-partition transaction graph
Uses a bounded encoding strategy so span attributes stay low-cardinality and size-safe
Splits the pattern implementation into focused modules:
- scheduler/src/dfg/pattern/encoding.rs
- scheduler/src/dfg/pattern/grouping.rs
- scheduler/src/dfg/pattern/types.rs
- scheduler/src/dfg/pattern/tests.rs
Updates scheduler execution spans in scheduler/src/dfg/scheduler.rs to emit the new pattern attributes while keeping transaction_id on the transaction root span

Main files:

coprocessor/fhevm-engine/scheduler/src/dfg.rs
coprocessor/fhevm-engine/scheduler/src/dfg/scheduler.rs
coprocessor/fhevm-engine/scheduler/src/dfg/pattern/*

3. Local spanmetrics pipeline

Adds OTEL Collector config with a spanmetrics connector
Wires the collector into the local tracing docker-compose stack
Updates Prometheus scrape config for the collector/spanmetrics path

Main files:

test-suite/fhevm/config/otel-collector/otel-collector-config.yaml
test-suite/fhevm/docker-compose/tracing-docker-compose.yml
test-suite/fhevm/config/prometheus/prometheus.yml
.gitignore

4. tfhe-worker pattern integration test hardening

Adds the end-to-end pattern_integration test module for realistic encrypted-transfer shapes
Installs a shared in-memory exporter in tfhe-worker test setup so tracing setup is no longer order-dependent
Filters captured spans by the real transaction root transaction_id emitted on execute_transaction, so the assertions ignore cross-test span pollution without serializing the suite

Main files:

coprocessor/fhevm-engine/tfhe-worker/src/tests/pattern_integration.rs
coprocessor/fhevm-engine/tfhe-worker/src/tests/utils.rs
coprocessor/fhevm-engine/tfhe-worker/src/tests/mod.rs

Follow-up

Log/trace correlation was split out to #2055 to keep this PR focused on span attribution and spanmetrics.

Commit map

feat(telemetry): add JSON log-trace correlation
refactor(coprocessor): standardize low-cardinality span attribution
feat(infra): add OTEL collector spanmetrics pipeline
feat(scheduler): add DFG pattern attribution with scalable encoding
fix(deps): pin tfhe-cuda-backend to 0.13.0 in lockfile
test(tfhe-worker): harden pattern integration tracing assertions
fix(tfhe-worker): align compressed ciphertext plumbing with scheduler
refactor(coprocessor): keep tx ids on root spans only

Validation

Pre-commit cargo check and clippy passed while rewriting the branch history
SQLX_OFFLINE=true cargo check -p fhevm-engine-common -p scheduler -p tfhe-worker -p transaction-sender -p sns-worker -p zkproof-worker -p host-listener --tests --quiet
cargo test -p scheduler pattern::tests --quiet
SQLX_OFFLINE=true cargo test -p tfhe-worker test_erc20_transaction_pattern_ids --no-run

Review notes

The branch was history-rewritten into a smaller themed stack; the final tree matches the pre-rewrite PR head plus the follow-up scope reduction that moved log/trace correlation into a separate issue.

mergify · 2026-02-23T12:55:29Z

🧪 CI Insights

Here's what we observed from your CI run for e9579f1.

🟢 All jobs passed!

But CI Insights is watching 👀

coprocessor/fhevm-engine/sns-worker/src/squash_noise.rs

claude · 2026-03-17T15:37:00Z

Claude finished @PanGan21's request. — View run

claude

[AUTOMATED]

Code Review Summary

This is a well-structured PR that accomplishes four distinct goals: OTEL cardinality cleanup, DFG pattern attribution, spanmetrics plumbing, and test hardening. The code quality is high — the pattern encoding module is well-documented, the union-find implementation is correct, the binary layout is sound, and the tracing refactors (replacing parent: &span with span.in_scope() and .instrument()) are semantically correct.

What was reviewed

Agent	Focus	Findings
Bug Hunter ×2 (opus)	Runtime bugs, security	No high-confidence bugs found
Error Handling Auditor	Silent failures	1 minor observability nit (see inline)
Guideline Compliance	Project conventions	No violations — new code follows established patterns

Highlights

Pattern encoding (encoding.rs): Clean binary layout with proper bounds checking. Encode/decode round-trips are verified by extensive tests. The two-tier strategy (v1 compact encoding → v2 Keccak hash for large groups) is well-designed.
Union-find grouping (grouping.rs): Correct path-splitting implementation. The cone-tracing and merge algorithm is well-documented with worked examples.
Tracing refactors: The parent: &span → span.in_scope() / .instrument() changes are semantically correct and properly propagate span context across async boundaries.
Test coverage: Comprehensive unit tests for pattern encoding/decoding, plus an end-to-end integration test (pattern_integration.rs) that validates real encrypted-transfer shapes.

One minor observability nit is noted inline. Overall this is a solid, well-tested change.

coprocessor/fhevm-engine/scheduler/src/dfg/pattern/grouping.rs

claude · 2026-03-17T15:46:52Z

The review has been posted successfully. Here's a summary of what was done:

Review Complete ✅

PR #2019: feat(coprocessor): otel cardinality cleanup, dfg pattern attribution, spanmetrics

Review Process

4 specialized agents ran in parallel (2× Bug Hunter opus, Error Handling Auditor sonnet, Guideline Compliance sonnet)
3 validation agents independently verified flagged issues
Strict confidence threshold (≥80) applied, with validator cross-checks

Results

Check	Result
Bugs & Security (2× opus)	✅ No high-confidence bugs found
Error Handling	✅ 1 minor nit (inline comment posted)
Guideline Compliance	✅ No violations — code follows established patterns

Filtered Issues

~~Encode/decode opcode enum asymmetry~~ → False positive (enums have identical discriminant values)
~~Mutex poisoning risk~~ → False positive (lock scope contains only non-panicking operations)
Silent toposort failure → Posted as non-blocking nit (missing warn! log for consistency with scheduler.rs)

Posted

1 summary review comment with findings table
1 inline nit on grouping.rs:118 suggesting a warn! log for the silent toposort fallback

coprocessor/fhevm-engine/scheduler/src/dfg/pattern/encoding.rs

antoniupop

Looks good - and very well tested. Pattern encoding looks nice!

Eikix · 2026-03-23T18:29:51Z

@Mergifyio queue

mergify · 2026-03-23T18:30:07Z

Eikix · 2026-03-25T10:06:21Z

CI seems to pass now.

…vents and fix span correlation gap

cla-bot bot added the cla-signed label Feb 23, 2026

Eikix force-pushed the otel-metrics-improvement branch from ceb3dff to a4f023f Compare February 23, 2026 10:49

Eikix changed the title ~~feat(coprocessor): OTEL cardinality cleanup, DFG pattern attribution, spanmetrics~~ feat(coprocessor): otel cardinality cleanup, dfg pattern attribution, spanmetrics Feb 23, 2026

Eikix force-pushed the otel-metrics-improvement branch 11 times, most recently from 762d58e to 4d407b1 Compare February 25, 2026 13:02

Eikix marked this pull request as ready for review February 25, 2026 13:42

Eikix requested review from a team as code owners February 25, 2026 13:42

Eikix force-pushed the otel-metrics-improvement branch from c750e92 to 5f4e105 Compare February 25, 2026 15:01

Eikix commented Feb 25, 2026

View reviewed changes

coprocessor/fhevm-engine/sns-worker/src/squash_noise.rs Show resolved Hide resolved

Eikix force-pushed the otel-metrics-improvement branch 2 times, most recently from 0e498f2 to 256b446 Compare February 26, 2026 15:57

Eikix marked this pull request as draft February 27, 2026 07:51

Eikix marked this pull request as ready for review February 27, 2026 08:40

Eikix force-pushed the otel-metrics-improvement branch from 011fb1d to 81ed9e8 Compare March 4, 2026 13:08

This comment was marked as resolved.

Sign in to view

zama-ai deleted a comment from claude bot Mar 5, 2026

Eikix force-pushed the otel-metrics-improvement branch 2 times, most recently from fe28ad9 to efbcbd1 Compare March 6, 2026 10:04

PanGan21 force-pushed the otel-metrics-improvement branch from 82f366a to d0b9ad0 Compare March 17, 2026 15:41

claude bot reviewed Mar 17, 2026

View reviewed changes

coprocessor/fhevm-engine/scheduler/src/dfg/pattern/grouping.rs Show resolved Hide resolved

PanGan21 force-pushed the otel-metrics-improvement branch 2 times, most recently from 68a953c to e9579f1 Compare March 18, 2026 10:20

Eikix commented Mar 23, 2026

View reviewed changes

coprocessor/fhevm-engine/scheduler/src/dfg/pattern/encoding.rs Outdated Show resolved Hide resolved

antoniupop previously approved these changes Mar 23, 2026

View reviewed changes

Eikix dismissed antoniupop’s stale review via b369da0 March 24, 2026 09:49

Eikix force-pushed the otel-metrics-improvement branch from ed1ef45 to b369da0 Compare March 24, 2026 09:49

Eikix and others added 16 commits March 27, 2026 09:25

feat(telemetry): add JSON log-trace correlation

bd7933f

refactor(coprocessor): standardize low-cardinality span attribution

a4f7f4e

feat(infra): add OTEL collector spanmetrics pipeline

0d33868

feat(scheduler): add DFG pattern attribution with scalable encoding

e4e2c2e

fix(deps): pin tfhe-cuda-backend to 0.13.0 in lockfile

1d55133

test(tfhe-worker): harden pattern integration tracing assertions

b979dfc

fix(tfhe-worker): align compressed ciphertext plumbing with scheduler

330e7b0

refactor(coprocessor): keep tx ids on root spans only

2e21758

fix(fhevm-cli): route local OTLP through collector

045ba5c

fix(coprocessor): move high-cardinality ids from child spans to log e…

3611f29

…vents and fix span correlation gap

refactor(scheduler): clarify pattern tag naming

2a71654

fix(deps): resolve coprocessor audit advisories

39b9aa0

fix(zkproof-worker): restore low-cardinality verify spans

6e1fb25

fix(deps): sync tfhe-cuda-backend lockfile

3bf8036

fix(deps): drop stale OpenSSL license allowlist

0fe521e

fix(deps): minimize coprocessor lockfile churn

4519d5f

Eikix force-pushed the otel-metrics-improvement branch from f8bf5f8 to 4519d5f Compare March 27, 2026 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coprocessor): otel cardinality cleanup, dfg pattern attribution, spanmetrics#2019

feat(coprocessor): otel cardinality cleanup, dfg pattern attribution, spanmetrics#2019
Eikix wants to merge 16 commits intomainfrom
otel-metrics-improvement

Eikix commented Feb 23, 2026 •

edited

Loading

Uh oh!

mergify bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

claude bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

claude bot commented Mar 17, 2026

Uh oh!

Uh oh!

antoniupop left a comment

Uh oh!

Eikix commented Mar 23, 2026

Uh oh!

mergify bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Eikix commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Eikix commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

1. Coprocessor span cardinality cleanup

2. Scheduler pattern attribution

3. Local spanmetrics pipeline

4. tfhe-worker pattern integration test hardening

Follow-up

Commit map

Validation

Review notes

Uh oh!

mergify bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

claude bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review Summary

What was reviewed

Highlights

Uh oh!

Uh oh!

claude bot commented Mar 17, 2026

Review Complete ✅

Review Process

Results

Filtered Issues

Posted

Uh oh!

Uh oh!

antoniupop left a comment

Choose a reason for hiding this comment

Uh oh!

Eikix commented Mar 23, 2026

Uh oh!

mergify bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

Eikix commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Eikix commented Feb 23, 2026 •

edited

Loading

mergify bot commented Feb 23, 2026 •

edited

Loading

claude bot commented Mar 17, 2026 •

edited

Loading

mergify bot commented Mar 23, 2026 •

edited

Loading