feat(coprocessor): slow lane for dependent ops by Eikix · Pull Request #1907 · zama-ai/fhevm

Eikix · 2026-02-04T21:24:23Z

What

Add binary scheduling priority on dependence_chain: 0 (fast) / 1 (slow).
Classify slow chains in host-listener ingest using unweighted dependent-op count (+1 per newly inserted, allowed TFHE op with dependencies) per chain per ingest pass.
Add --dependent-ops-max-per-chain (0 disables slow-lane classification).
Persist monotonic priority with GREATEST(existing, incoming) so concurrent HL types cannot downgrade.
Keep worker fast-first ordering in normal path (schedule_priority, then last_updated_at).
Keep oldest-first fallback (acquire_early_lock) as liveness escape hatch when no progress is possible.

Inheritance model

Add ingest-only parent metadata (inheritance_parents) to improve slow-lane inheritance without changing scheduling behavior.
Scheduling still uses dependencies (no-fork parallelism behavior unchanged).
Slow-lane inheritance now uses inheritance_parents so parallel splits are less likely to drop lineage for throttling decisions.

Off mode (`--dependent-ops-max-per-chain=0`)

No new slow-lane decisions are made.
Startup promotes chains back to fast (schedule_priority=0) using advisory-lock serialized, batched updates.

Why

Isolate heavy dependent chains from normal traffic without dropping data.
Avoid backoff-style priority churn/inversions.
Improve protection against parent-slow/child-fast splits when chain lineage is partially split for parallelism.

How

DB migration adds schedule_priority and aligns pending-chain index with worker acquisition order.
Ingest computes per-chain dependent-op totals and marks over-cap chains as slow.
Ingest also:
- inherits slow from known slow parents in DB,
- propagates transitively to dependents in the current batch graph.

Impact

Hot path remains in-memory for classification/propagation.
Priority persistence is monotonic and idempotent.
Off-mode promotion is bounded per batch to reduce DB pressure spikes.

Tracking

Closes https://github.com/zama-ai/fhevm-internal/issues/1003
Related PR: feat(coprocessor): throttle dependent ops #1905

Validation

SQLX_OFFLINE=true cargo +1.91.1 clippy -p host-listener --all-targets -- -D warnings
SQLX_OFFLINE=true cargo +1.91.1 clippy -p tfhe-worker --all-targets -- -D warnings
SQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_threshold_matrix_locally -- --nocapture
SQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_cross_block_sustained_below_cap_stays_fast_locally -- --nocapture
SQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_cross_block_parent_lookup_finds_known_slow_parent_locally -- --nocapture
SQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_priority_is_monotonic_across_blocks_locally -- --nocapture
SQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_off_mode_promotes_all_chains_on_startup_locally -- --nocapture
SQLX_OFFLINE=true cargo +1.91.1 test -p tfhe-worker test_acquire_early_lock_ignores_priority -- --nocapture

Eikix · 2026-02-04T21:38:30Z

@claude /pr-review

claude · 2026-02-04T21:38:55Z

Claude finished @Eikix's task in 5m 28s —— View job

PR Review Complete

Summary

Approved - This PR adds a well-designed slow-lane scheduling mechanism for rate-limited dependent operations.

Verified as working correctly:

Token bucket implementation handles edge cases properly
SQL parameter binding order is correct after switching to sqlx::query()
Greylisted chains are properly cleaned up within block processing
Worker ordering correctly prioritizes fast lane (lane 0)
Migration uses IF NOT EXISTS for safety

Non-blocking design notes posted in review:

Rate limiting is per-operation not per-chain (may be more aggressive than expected)
release_all_owned_locks doesn't reset schedule_lane (likely intentional)

See the full review for details.

claude

[AUTOMATED]

Code Review Summary

This PR adds a well-designed slow-lane scheduling mechanism for rate-limited dependent operations. The implementation is sound with proper use of GREATEST for sticky lane escalation and correct lane reset on successful processing.

Verified as Working Correctly

Token bucket implementation handles edge cases properly (no division by zero, correct refill logic)
SQL parameter binding order is correct after switching to sqlx::query()
Greylisted chains are properly cleaned up within block processing
tokio::sync::Mutex doesn't poison on panic (by design)
Boolean to i16 conversion for schedule_lane is safe
Worker ordering ORDER BY schedule_lane ASC, last_updated_at ASC correctly prioritizes fast lane
The GREATEST SQL function ensures lanes only escalate, never de-escalate during updates
Migration uses IF NOT EXISTS for safety

Design Considerations (Non-blocking)

Rate limiting granularity: The limiter consumes tokens per-operation rather than per-chain. If a single chain has 100 dependent operations, it will consume ~100 tokens. This is more aggressive than limiting by chain count. This may be intentional (mentioned as "per-op" in PR description) but could be documented.
release_all_owned_locks and schedule_lane: The release_all_owned_locks function (called on worker startup) doesn't reset schedule_lane to 0, unlike release_current_lock. If a worker crashes while processing a greylisted chain, that chain will remain in lane 1 after recovery until it's successfully processed. This is likely intentional (keeping previously-hot chains throttled), but worth confirming.

Tests

Good test coverage added:

test_acquire_next_lock_prefers_fast_lane - verifies lane ordering
dependent_ops_limiter_defers_after_burst - verifies token bucket behavior
dependent_ops_limiter_refills_over_time - verifies refill logic
dependent_ops_limiter_disabled_when_rate_zero - verifies rate=0 disables limiter

LGTM! Approving.

mergify · 2026-02-04T23:02:24Z

🧪 CI Insights

Here's what we observed from your CI run for 8b8927d.

🟢 All jobs passed!

But CI Insights is watching 👀

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs

rudy-6-4

If easy could be done in ingest function instead of database struct. (I understand the existing state database is handy for that)

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs

Eikix · 2026-02-05T10:08:50Z

@claude /pr-review

claude · 2026-02-05T10:09:22Z

Claude finished @Eikix's task in 6m 48s —— View job

PR Review Complete

Summary

Approved - This PR implements a well-designed slow-lane scheduling mechanism for rate-limited dependent operations.

Verified as working correctly:

Token bucket implementation handles edge cases properly (debt accumulation, time-based refill)
SQL GREATEST function correctly ensures lanes only escalate (0→1), never de-escalate
Worker ordering schedule_lane ASC, last_updated_at ASC correctly prioritizes fast lane
Lane reset to 0 on successful processing is intentional design
Migration uses IF NOT EXISTS for safety
Limiter is properly disabled when rate_per_min == 0

Reviewer comments addressed:

Code uses sqlx::query! macro for compile-time verification ✓
Throttling is done per-chain (not per-op) as intended ✓

Non-blocking: Minor defensive dead code in consume() rate check (constructor already validates).

See the full review for details.

claude

[AUTOMATED]

Code Review Summary

This PR implements a well-designed slow-lane scheduling mechanism for rate-limited dependent operations. The implementation is sound and I found no critical bugs.

✅ Verified as Working Correctly

Token bucket implementation - Correctly implements a leaky bucket with debt accumulation and time-based refill
SQL GREATEST usage - Properly ensures lanes only escalate (0→1), never de-escalate during concurrent updates
Lane reset on processing - Intentionally resets to 0 when chain processing completes (correct design)
Worker ordering - ORDER BY schedule_lane ASC, last_updated_at ASC correctly prioritizes fast lane
Migration safety - Uses ADD COLUMN IF NOT EXISTS for idempotency
Limiter disabled correctly - Returns None from constructor when rate_per_min == 0
Duplicate event handling - Correctly skips update_dependence_chain when no new insertions (chains were already persisted)

Reviewer Comments Addressed

Regarding @rudy-6-4's comments:

The code does use sqlx::query! macro for compile-time verified queries in update_dependence_chain
The throttling logic location in ingest.rs and persistence in update_dependence_chain appears appropriate for the design goals
Throttling is correctly done per-chain (not per-op) - the limiter consumes tokens based on chain's total dependent op count

Non-Blocking Observations

Nit: The rate_per_sec variable calculation in consume() (line 167-169) is defensive dead code since the constructor already validates rate_per_min > 0. Could be simplified but doesn't affect correctness.

Test Coverage

Good test coverage added:

dependent_ops_limiter_defers_after_burst - verifies token bucket behavior
dependent_ops_limiter_refills_over_time - verifies refill logic
dependent_ops_limiter_disabled_when_rate_zero - verifies rate=0 disables limiter
test_acquire_next_lock_prefers_fast_lane - verifies lane ordering in worker

LGTM! Approving.

Eikix · 2026-02-05T11:10:52Z

I am still thinking about the mental model of the slow lane, will take some time to think of something fair and good.

Eikix · 2026-02-05T15:25:18Z

@claude /pr-review

claude · 2026-02-05T15:25:48Z

Claude finished @Eikix's task in 4m 6s —— View job

PR Review Complete

Summary

Approved - This PR implements a well-designed slow-lane scheduling mechanism for dependent operations.

Verified as working correctly:

SchedulePriority type correctly implements Fast=0, Slow=1
Weighted ops calculation matches HCU limit ratios (mul=7, div=5, rem=8)
SQL GREATEST ensures priority monotonically escalates (0→1 only)
Worker ordering correctly prioritizes fast lane with FIFO within same priority
Per-chain cap logic correctly disables throttling when set to 0
Migrations use safe IF NOT EXISTS / RENAME COLUMN
All SQL queries use sqlx::query! macro for compile-time verification

Test coverage:

test_acquire_next_lock_prefers_fast_lane validates core priority ordering

Non-blocking notes:

Theoretical integer overflow in weighted ops would require ~536M ops per chain per block (unrealistic)
Test coverage could be expanded for boundary conditions and weighted ops calculation

See the full review for details.

claude

[AUTOMATED]

Code Review Summary

This PR implements a well-designed slow-lane scheduling mechanism for dependent operations. The implementation correctly prioritizes chains based on their weighted dependent operation count.

✅ Verified as Working Correctly

SchedulePriority type - Correctly implements Fast=0, Slow=1 with proper From traits for conversion
Weighted ops calculation - Operations are weighted appropriately (mul=7, div=5, rem=8, others=1) matching HCU limit ratios
SQL GREATEST usage - The GREATEST(dependence_chain.schedule_priority, EXCLUDED.schedule_priority) correctly ensures priority monotonically escalates (0→1 only, never de-escalates)
Worker ordering - ORDER BY schedule_priority ASC, last_updated_at ASC correctly prioritizes fast lane, with FIFO within same priority
Per-chain cap - Comparison stats.total > options.dependent_ops_max_per_chain with > 0 check correctly disables throttling when cap is 0
Migration safety - Both migrations use IF NOT EXISTS / RENAME COLUMN for idempotency
Parameterized SQL - All queries use sqlx::query! macro for compile-time verification and SQL injection protection

Test Coverage

✅ test_acquire_next_lock_prefers_fast_lane validates the core priority ordering behavior

Coverage gaps (non-blocking):

Weighted ops calculation not directly unit tested
Per-chain cap threshold boundary conditions not tested
Monotonic priority behavior (GREATEST) not tested

Non-Blocking Design Notes

Integer overflow in weighted ops - Theoretically possible but requires ~536M weighted ops per chain per block, which is unrealistic given Ethereum gas limits
Slow lane starvation - By design, slow lane chains only process when fast lane is empty. This is documented behavior and acceptable for the use case.

LGTM! Approving.

coprocessor/fhevm-engine/tfhe-worker/src/dependence_chain.rs

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs

coprocessor/fhevm-engine/host-listener/src/cmd/mod.rs

...r/fhevm-engine/db-migration/migrations/20260205130000_dependence_chain_schedule_priority.sql

coprocessor/fhevm-engine/tfhe-worker/src/dependence_chain.rs

Eikix · 2026-02-06T07:03:25Z

@claude /pr-review

claude · 2026-02-06T07:03:50Z

Claude finished @Eikix's task in 5m 46s —— View job

PR Review Complete

Summary

Approved - This PR implements a well-designed slow-lane scheduling mechanism for dependent operations.

Verified as working correctly:

SchedulePriority type correctly implements Fast=0, Slow=1 with proper conversions
Weighted ops calculation uses saturating_add() for safe arithmetic (mul=7, div=5, rem=8)
SQL GREATEST ensures priority monotonically escalates (0→1 only)
Worker ordering correctly prioritizes fast lane with FIFO within same priority
Per-chain cap disable (--dependent-ops-max-per-chain=0) resets all priorities and skips throttling
Migration safely adds column with IF NOT EXISTS and recreates index aligned with ORDER BY
All SQL queries use sqlx::query! macro for compile-time verification

Previous reviewer comments addressed:

@rudy-6-4: compile-time verified queries ✓
@antoniupop: index alignment, HCU weight provenance, disable mode, consolidated migrations ✓

Test coverage: test_acquire_next_lock_prefers_fast_lane validates core priority ordering

See the full review for details.

claude

[AUTOMATED]

Code Review Summary

This PR implements a well-designed slow-lane scheduling mechanism for dependent operations. After comprehensive review by multiple specialized agents, no bugs with confidence >= 80% were found.

✅ Verified as Working Correctly

SchedulePriority type - Correctly implements Fast=0, Slow=1 with proper From traits for bidirectional conversion
Weighted ops calculation - Operations weighted appropriately (mul=7, div=5, rem=8, baseline=1) matching HCU limit ratios. Uses saturating_add() to prevent integer overflow.
SQL GREATEST usage - GREATEST(dependence_chain.schedule_priority, EXCLUDED.schedule_priority) correctly ensures priority monotonically escalates (0→1 only, never de-escalates)
Worker ordering - ORDER BY schedule_priority ASC, last_updated_at ASC correctly prioritizes fast lane, with FIFO within same priority
Per-chain cap disable - When --dependent-ops-max-per-chain=0:
- slow_lane_enabled is false, skipping all throttling logic
- reset_schedule_priorities() resets all priorities to 0 at startup
- Correctly reverts to FIFO behavior
Migration safety - Uses ADD COLUMN IF NOT EXISTS and recreates index with proper column ordering matching the ORDER BY clause
Parameterized SQL - All queries use sqlx::query! macro for compile-time verification and SQL injection protection

✅ Previous Reviewer Comments Addressed

@rudy-6-4's comments: Code uses sqlx::query! macro for compile-time verification ✓
@antoniupop's comments:
- Index now aligned with ORDER BY clause ✓
- HCU weight provenance documented inline ✓
- Feature can be fully disabled with --dependent-ops-max-per-chain=0 ✓
- Migrations consolidated to single file ✓

Test Coverage

✅ test_acquire_next_lock_prefers_fast_lane validates the core priority ordering behavior - verifies that a chain with schedule_priority=0 (fast) is acquired before one with schedule_priority=1 (slow), even when the slow chain has an older timestamp.

Design Notes (Non-blocking)

Monotonic demotion - By design, slow chains stay slow. This is documented and mitigated by processing slow lanes when fast lane is empty.
Metrics pattern - Uses .unwrap() for Prometheus metrics registration, which follows the established pattern throughout the codebase.

LGTM! Approving.

Drop is_allowed filter to avoid undercounting required producer ops.

Use inserted-only per-chain counting to avoid underestimating required producer pressure.

…nly counting

Keep no_fork parallelism, but aggregate inserted-op pressure over split dependency closures for slow-lane classification.

coprocessor/fhevm-engine/fhevm-engine-common/src/chain_id.rs

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs

Eikix · 2026-02-16T16:55:58Z

@mergify queue

mergify · 2026-02-16T16:56:15Z

Cover undocumented user-facing changes since v0.11.3: - coprocessor: DB state revert runbook (operator workflow, #2122) - coprocessor: slow lane for dependent ops — config + fundamentals (#1907) - gateway-contracts: KMS context ID — new env var, event rename, errors (#IGatewayConfig) - gateway-contracts: IDecryption breaking changes — isUserDecryptionReady overload, isDelegatedUserDecryptionReady param removal, new errors (#2137) - library-solidity: isPublicDecryptionResultValid view function (#1987) - library-solidity: FHE.fromExternal uninitialized handle support (#1969) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cla-bot bot added the cla-signed label Feb 4, 2026

Eikix mentioned this pull request Feb 4, 2026

feat(coprocessor): throttle dependent ops #1905

Closed

claude bot previously approved these changes Feb 4, 2026

View reviewed changes

rudy-6-4 reviewed Feb 5, 2026

View reviewed changes

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs Outdated Show resolved Hide resolved

rudy-6-4 reviewed Feb 5, 2026

View reviewed changes

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs Outdated Show resolved Hide resolved

rudy-6-4 reviewed Feb 5, 2026

View reviewed changes

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs Outdated Show resolved Hide resolved

Eikix marked this pull request as ready for review February 5, 2026 09:21

Eikix requested a review from a team as a code owner February 5, 2026 09:21

Eikix dismissed claude[bot]’s stale review via 965fdfc February 5, 2026 09:37

claude bot previously approved these changes Feb 5, 2026

View reviewed changes

Eikix dismissed claude[bot]’s stale review via eda3f6f February 5, 2026 10:27

Eikix marked this pull request as draft February 5, 2026 11:10

claude bot previously approved these changes Feb 5, 2026

View reviewed changes

Eikix dismissed claude[bot]’s stale review via a154757 February 5, 2026 15:30

Eikix marked this pull request as ready for review February 5, 2026 15:58

antoniupop requested changes Feb 5, 2026

View reviewed changes

claude bot previously approved these changes Feb 6, 2026

View reviewed changes

Eikix dismissed claude[bot]’s stale review via 2181c22 February 6, 2026 07:12

Eikix requested a review from antoniupop February 6, 2026 10:00

Eikix added 5 commits February 13, 2026 15:12

docs(host-listener): clarify dependent-ops cap semantics and tuning

ceb74c5

fix(host-listener): count all dependent ops for slow-lane cap

e534d52

Drop is_allowed filter to avoid undercounting required producer ops.

docs(host-listener): clarify non-allowed ops can still be required

0526cb4

docs(host-listener): make slow-lane flag section concise

45adc63

docs(host-listener): trim slow-lane runbook and local notes

4f05b54

Eikix force-pushed the codex/slow-lane-throttle branch from 7c8e71e to 4f05b54 Compare February 13, 2026 14:12

Eikix added 4 commits February 15, 2026 11:16

fix(host-listener): count all inserted ops for slow-lane cap

88f2ac1

Use inserted-only per-chain counting to avoid underestimating required producer pressure.

test(host-listener): align slow-lane threshold matrix with inserted-o…

4e27262

…nly counting

feat(host-listener): account slow-lane over split dependency closures

3c912c2

Keep no_fork parallelism, but aggregate inserted-op pressure over split dependency closures for slow-lane classification.

docs(host-listener): clarify split-closure slow-lane classifier

18e435a

Eikix commented Feb 16, 2026

View reviewed changes

coprocessor/fhevm-engine/fhevm-engine-common/src/chain_id.rs Show resolved Hide resolved

docs(host-listener): simplify split-dependency classifier comment

3292b6f

Eikix commented Feb 16, 2026

View reviewed changes

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs Outdated Show resolved Hide resolved

Eikix added 3 commits February 16, 2026 10:42

refactor(host-listener): namespace slow-lane advisory lock by chain

54e6b3e

refactor(host-listener): remove cached chain_id label from Database

bad6e86

refactor(host-listener): simplify inserted-op counter update

8b8927d

antoniupop reviewed Feb 16, 2026

View reviewed changes

coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs Show resolved Hide resolved

antoniupop approved these changes Feb 16, 2026

View reviewed changes

mergify bot added merge-queued queued and removed merge-queued labels Feb 16, 2026

mergify bot added a commit that referenced this pull request Feb 16, 2026

Merge of #1907

d122d4a

mergify bot mentioned this pull request Feb 16, 2026

merge queue: embarking main (5b381fb) and #1907 together #1998

Closed

59 tasks

mergify bot merged commit e1734b9 into main Feb 16, 2026
64 checks passed

mergify bot deleted the codex/slow-lane-throttle branch February 16, 2026 19:49

mergify bot removed the queued label Feb 16, 2026

immortal-tofu mentioned this pull request Apr 8, 2026

docs(common): update documentation for v0.12.0 #2265

Open

3 tasks

Conversation

Eikix commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Inheritance model

Off mode (--dependent-ops-max-per-chain=0)

Why

How

Impact

Tracking

Validation

Uh oh!

Eikix commented Feb 4, 2026

Uh oh!

claude bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review Summary

Verified as Working Correctly

Design Considerations (Non-blocking)

Tests

Uh oh!

mergify bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

Uh oh!

Uh oh!

rudy-6-4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Eikix commented Feb 5, 2026

Uh oh!

claude bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review Summary

✅ Verified as Working Correctly

Reviewer Comments Addressed

Non-Blocking Observations

Test Coverage

Uh oh!

Eikix commented Feb 5, 2026

Uh oh!

Eikix commented Feb 5, 2026

Uh oh!

claude bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review Summary

✅ Verified as Working Correctly

Test Coverage

Non-Blocking Design Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eikix commented Feb 6, 2026

Uh oh!

claude bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

Uh oh!

Eikix commented Feb 4, 2026 •

edited

Loading

Off mode (`--dependent-ops-max-per-chain=0`)

claude bot commented Feb 4, 2026 •

edited

Loading

mergify bot commented Feb 4, 2026 •

edited

Loading

claude bot commented Feb 5, 2026 •

edited

Loading

claude bot commented Feb 5, 2026 •

edited

Loading

claude bot commented Feb 6, 2026 •

edited

Loading

mergify bot commented Feb 16, 2026 •

edited

Loading