feat(coprocessor): slow lane for dependent ops#1907
Conversation
|
@claude /pr-review |
|
Claude finished @Eikix's task in 5m 28s —— View job PR Review Complete
SummaryApproved - This PR adds a well-designed slow-lane scheduling mechanism for rate-limited dependent operations. Verified as working correctly:
Non-blocking design notes posted in review:
See the full review for details. |
There was a problem hiding this comment.
[AUTOMATED]
Code Review Summary
This PR adds a well-designed slow-lane scheduling mechanism for rate-limited dependent operations. The implementation is sound with proper use of GREATEST for sticky lane escalation and correct lane reset on successful processing.
Verified as Working Correctly
- Token bucket implementation handles edge cases properly (no division by zero, correct refill logic)
- SQL parameter binding order is correct after switching to
sqlx::query() - Greylisted chains are properly cleaned up within block processing
tokio::sync::Mutexdoesn't poison on panic (by design)- Boolean to
i16conversion forschedule_laneis safe - Worker ordering
ORDER BY schedule_lane ASC, last_updated_at ASCcorrectly prioritizes fast lane - The
GREATESTSQL function ensures lanes only escalate, never de-escalate during updates - Migration uses
IF NOT EXISTSfor safety
Design Considerations (Non-blocking)
-
Rate limiting granularity: The limiter consumes tokens per-operation rather than per-chain. If a single chain has 100 dependent operations, it will consume ~100 tokens. This is more aggressive than limiting by chain count. This may be intentional (mentioned as "per-op" in PR description) but could be documented.
-
release_all_owned_locksandschedule_lane: Therelease_all_owned_locksfunction (called on worker startup) doesn't resetschedule_laneto 0, unlikerelease_current_lock. If a worker crashes while processing a greylisted chain, that chain will remain in lane 1 after recovery until it's successfully processed. This is likely intentional (keeping previously-hot chains throttled), but worth confirming.
Tests
Good test coverage added:
test_acquire_next_lock_prefers_fast_lane- verifies lane orderingdependent_ops_limiter_defers_after_burst- verifies token bucket behaviordependent_ops_limiter_refills_over_time- verifies refill logicdependent_ops_limiter_disabled_when_rate_zero- verifies rate=0 disables limiter
LGTM! Approving.
🧪 CI InsightsHere's what we observed from your CI run for 8b8927d. 🟢 All jobs passed!But CI Insights is watching 👀 |
coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
Outdated
Show resolved
Hide resolved
coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
Outdated
Show resolved
Hide resolved
rudy-6-4
left a comment
There was a problem hiding this comment.
If easy could be done in ingest function instead of database struct. (I understand the existing state database is handy for that)
coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
Outdated
Show resolved
Hide resolved
|
@claude /pr-review |
|
Claude finished @Eikix's task in 6m 48s —— View job PR Review Complete
SummaryApproved - This PR implements a well-designed slow-lane scheduling mechanism for rate-limited dependent operations. Verified as working correctly:
Reviewer comments addressed:
Non-blocking: Minor defensive dead code in See the full review for details. |
There was a problem hiding this comment.
[AUTOMATED]
Code Review Summary
This PR implements a well-designed slow-lane scheduling mechanism for rate-limited dependent operations. The implementation is sound and I found no critical bugs.
✅ Verified as Working Correctly
- Token bucket implementation - Correctly implements a leaky bucket with debt accumulation and time-based refill
- SQL GREATEST usage - Properly ensures lanes only escalate (0→1), never de-escalate during concurrent updates
- Lane reset on processing - Intentionally resets to 0 when chain processing completes (correct design)
- Worker ordering -
ORDER BY schedule_lane ASC, last_updated_at ASCcorrectly prioritizes fast lane - Migration safety - Uses
ADD COLUMN IF NOT EXISTSfor idempotency - Limiter disabled correctly - Returns
Nonefrom constructor whenrate_per_min == 0 - Duplicate event handling - Correctly skips
update_dependence_chainwhen no new insertions (chains were already persisted)
Reviewer Comments Addressed
Regarding @rudy-6-4's comments:
- The code does use
sqlx::query!macro for compile-time verified queries inupdate_dependence_chain - The throttling logic location in
ingest.rsand persistence inupdate_dependence_chainappears appropriate for the design goals - Throttling is correctly done per-chain (not per-op) - the limiter consumes tokens based on chain's total dependent op count
Non-Blocking Observations
Nit: The rate_per_sec variable calculation in consume() (line 167-169) is defensive dead code since the constructor already validates rate_per_min > 0. Could be simplified but doesn't affect correctness.
Test Coverage
Good test coverage added:
dependent_ops_limiter_defers_after_burst- verifies token bucket behaviordependent_ops_limiter_refills_over_time- verifies refill logicdependent_ops_limiter_disabled_when_rate_zero- verifies rate=0 disables limitertest_acquire_next_lock_prefers_fast_lane- verifies lane ordering in worker
LGTM! Approving.
|
I am still thinking about the mental model of the slow lane, will take some time to think of something fair and good. |
|
@claude /pr-review |
|
Claude finished @Eikix's task in 4m 6s —— View job PR Review Complete
SummaryApproved - This PR implements a well-designed slow-lane scheduling mechanism for dependent operations. Verified as working correctly:
Test coverage:
Non-blocking notes:
See the full review for details. |
There was a problem hiding this comment.
[AUTOMATED]
Code Review Summary
This PR implements a well-designed slow-lane scheduling mechanism for dependent operations. The implementation correctly prioritizes chains based on their weighted dependent operation count.
✅ Verified as Working Correctly
-
SchedulePriority type - Correctly implements Fast=0, Slow=1 with proper
Fromtraits for conversion -
Weighted ops calculation - Operations are weighted appropriately (mul=7, div=5, rem=8, others=1) matching HCU limit ratios
-
SQL GREATEST usage - The
GREATEST(dependence_chain.schedule_priority, EXCLUDED.schedule_priority)correctly ensures priority monotonically escalates (0→1 only, never de-escalates) -
Worker ordering -
ORDER BY schedule_priority ASC, last_updated_at ASCcorrectly prioritizes fast lane, with FIFO within same priority -
Per-chain cap - Comparison
stats.total > options.dependent_ops_max_per_chainwith> 0check correctly disables throttling when cap is 0 -
Migration safety - Both migrations use
IF NOT EXISTS/RENAME COLUMNfor idempotency -
Parameterized SQL - All queries use
sqlx::query!macro for compile-time verification and SQL injection protection
Test Coverage
✅ test_acquire_next_lock_prefers_fast_lane validates the core priority ordering behavior
Coverage gaps (non-blocking):
- Weighted ops calculation not directly unit tested
- Per-chain cap threshold boundary conditions not tested
- Monotonic priority behavior (GREATEST) not tested
Non-Blocking Design Notes
-
Integer overflow in weighted ops - Theoretically possible but requires ~536M weighted ops per chain per block, which is unrealistic given Ethereum gas limits
-
Slow lane starvation - By design, slow lane chains only process when fast lane is empty. This is documented behavior and acceptable for the use case.
LGTM! Approving.
coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
Outdated
Show resolved
Hide resolved
...r/fhevm-engine/db-migration/migrations/20260205130000_dependence_chain_schedule_priority.sql
Outdated
Show resolved
Hide resolved
|
@claude /pr-review |
|
Claude finished @Eikix's task in 5m 46s —— View job PR Review Complete
SummaryApproved - This PR implements a well-designed slow-lane scheduling mechanism for dependent operations. Verified as working correctly:
Previous reviewer comments addressed:
Test coverage: See the full review for details. |
There was a problem hiding this comment.
[AUTOMATED]
Code Review Summary
This PR implements a well-designed slow-lane scheduling mechanism for dependent operations. After comprehensive review by multiple specialized agents, no bugs with confidence >= 80% were found.
✅ Verified as Working Correctly
-
SchedulePriority type - Correctly implements Fast=0, Slow=1 with proper
Fromtraits for bidirectional conversion -
Weighted ops calculation - Operations weighted appropriately (mul=7, div=5, rem=8, baseline=1) matching HCU limit ratios. Uses
saturating_add()to prevent integer overflow. -
SQL GREATEST usage -
GREATEST(dependence_chain.schedule_priority, EXCLUDED.schedule_priority)correctly ensures priority monotonically escalates (0→1 only, never de-escalates) -
Worker ordering -
ORDER BY schedule_priority ASC, last_updated_at ASCcorrectly prioritizes fast lane, with FIFO within same priority -
Per-chain cap disable - When
--dependent-ops-max-per-chain=0:slow_lane_enabledis false, skipping all throttling logicreset_schedule_priorities()resets all priorities to 0 at startup- Correctly reverts to FIFO behavior
-
Migration safety - Uses
ADD COLUMN IF NOT EXISTSand recreates index with proper column ordering matching the ORDER BY clause -
Parameterized SQL - All queries use
sqlx::query!macro for compile-time verification and SQL injection protection
✅ Previous Reviewer Comments Addressed
- @rudy-6-4's comments: Code uses
sqlx::query!macro for compile-time verification ✓ - @antoniupop's comments:
- Index now aligned with ORDER BY clause ✓
- HCU weight provenance documented inline ✓
- Feature can be fully disabled with
--dependent-ops-max-per-chain=0✓ - Migrations consolidated to single file ✓
Test Coverage
✅ test_acquire_next_lock_prefers_fast_lane validates the core priority ordering behavior - verifies that a chain with schedule_priority=0 (fast) is acquired before one with schedule_priority=1 (slow), even when the slow chain has an older timestamp.
Design Notes (Non-blocking)
-
Monotonic demotion - By design, slow chains stay slow. This is documented and mitigated by processing slow lanes when fast lane is empty.
-
Metrics pattern - Uses
.unwrap()for Prometheus metrics registration, which follows the established pattern throughout the codebase.
LGTM! Approving.
Drop is_allowed filter to avoid undercounting required producer ops.
7c8e71e to
4f05b54
Compare
Use inserted-only per-chain counting to avoid underestimating required producer pressure.
Keep no_fork parallelism, but aggregate inserted-op pressure over split dependency closures for slow-lane classification.
coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
Outdated
Show resolved
Hide resolved
coprocessor/fhevm-engine/host-listener/src/database/tfhe_event_propagate.rs
Show resolved
Hide resolved
|
@mergify queue |
Merge Queue StatusRule:
This pull request spent 2 hours 52 minutes 34 seconds in the queue, including 1 hour 51 minutes 6 seconds running CI. Required conditions to merge
|
Cover undocumented user-facing changes since v0.11.3: - coprocessor: DB state revert runbook (operator workflow, #2122) - coprocessor: slow lane for dependent ops — config + fundamentals (#1907) - gateway-contracts: KMS context ID — new env var, event rename, errors (#IGatewayConfig) - gateway-contracts: IDecryption breaking changes — isUserDecryptionReady overload, isDelegatedUserDecryptionReady param removal, new errors (#2137) - library-solidity: isPublicDecryptionResultValid view function (#1987) - library-solidity: FHE.fromExternal uninitialized handle support (#1969) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What
dependence_chain:0(fast) /1(slow).+1per newly inserted, allowed TFHE op with dependencies) per chain per ingest pass.--dependent-ops-max-per-chain(0disables slow-lane classification).GREATEST(existing, incoming)so concurrent HL types cannot downgrade.schedule_priority, thenlast_updated_at).acquire_early_lock) as liveness escape hatch when no progress is possible.Inheritance model
inheritance_parents) to improve slow-lane inheritance without changing scheduling behavior.dependencies(no-fork parallelism behavior unchanged).inheritance_parentsso parallel splits are less likely to drop lineage for throttling decisions.Off mode (
--dependent-ops-max-per-chain=0)schedule_priority=0) using advisory-lock serialized, batched updates.Why
How
schedule_priorityand aligns pending-chain index with worker acquisition order.Impact
Tracking
Validation
SQLX_OFFLINE=true cargo +1.91.1 clippy -p host-listener --all-targets -- -D warningsSQLX_OFFLINE=true cargo +1.91.1 clippy -p tfhe-worker --all-targets -- -D warningsSQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_threshold_matrix_locally -- --nocaptureSQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_cross_block_sustained_below_cap_stays_fast_locally -- --nocaptureSQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_cross_block_parent_lookup_finds_known_slow_parent_locally -- --nocaptureSQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_priority_is_monotonic_across_blocks_locally -- --nocaptureSQLX_OFFLINE=true cargo +1.91.1 test -p host-listener --test host_listener_integration_tests test_slow_lane_off_mode_promotes_all_chains_on_startup_locally -- --nocaptureSQLX_OFFLINE=true cargo +1.91.1 test -p tfhe-worker test_acquire_early_lock_ignores_priority -- --nocapture