Skip to content

Commit 92f9434

Browse files
antoniupopgoshawk-3rudy-6-4
authored
feat(coprocessor): process work along dependence chains (#1718)
* feat(coprocessor): schedule computations along dependence chains in coprocessor (#1550) * feat(coprocessor): create dependence_chain table * feat(coprocessor): coordinate dependence-chain processing across multiple workers It provides a non-blocking, distributed locking mechanism that coordinates dependence-chain processing across multiple tfhe-workers. A worker can acquire ownership of the next available dependence-chain entry for processing ordered by last_updated_at (FIFO queue-like approach). Ownership expires after a timeout, enabling work-stealing by other workers. New CLI param --worker_id * test(coprocessor): ensure both acquire_next_lock and work-stealing features * fix(coprocessor): fix work-stealing when a lock has expired - Added LockingReason for logging - Make expiry configurable * fix(coprocessor): update the flow of acquire/extend/release chain_id lock * chore(coprocessor): update sqlx cache * chore(coprocessor): improve logging for dcid locking * chore(coprocessor): disable fallback for dependence_chain_id locking * fix(coprocessor): update in-memory lock info on extend_current_lock * fix(coprocessor): lock another dcid and continue processing * chore(coprocessor): update sqlx cache * chore(coprocessor): add idx_dependence_chain_processing_by_worker * chore(coprocessor): observe query timings in tfhe-worker - add --dcid_ttl_sec config - add otel traces for dcid * chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options * chore(coprocessor): update sqlx cache * chore(coprocessor): update last_updated_at when releasing a lock * chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker * chore(coprocessor): solve dcid unit-tests issue * chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded * chore(coprocessor): enable default timeslice * chore(coprocessor): run cleanup procedure to delete old processed dcids * chore(coprocessor): acquire locks only on DCIDs that are ready for computation * chore(coprocessor): update charts with new tfhe-worker args * chore(coprocessor): handle case no-dcid-available * chore(coprocessor): bump chart version * chore(coprocessor): notify work available if dependency count reaches zero * fix(coprocessor): add dependence chain index on last_updated when processed * fix(coprocessor): update is_completed only where a CT is inserted in DB * fix(coprocessor): restrict update of computation completion to uncompleted computations * fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles * fix(coprocessor): update test for completion of processing of dcid * fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables * feat(coprocessor): add transaction dependence chains in HL (#1651) * fix(coprocessor): host-listener, dependency chain * fix(coprocessor): fix to squash, duplicated trivial encrypt * fix(coprocessor): fix to squash, duplicated trivial encrypt, test * fix(coprocessor): fix to squash, scalars are not handles * fix(coprocessor): fix to squash, cargo fmt * feat(coprocessor): topologic timestamp * fix(coprocessor): host-listener, reject cycle and describe out of order dependencies * fix(coprocessor): do not update dependence chain timestamp on row update * fix(coprocessor): host-listener, bad condition for need to sort tx * feat(coprocessor): host-listener, dependency_count for dependency_chain * feat(coprocessor): host-listener, dependents for dependency_chain * fix(coprocessor): restrict dependence counter to block scope * fix(coprocessor): do not update dependence chain last_updated_at on release * fix(coprocessor): emit warning only when dependence chain is missing dependences * feat(coprocessor): host-listener, dependency_chain as connected component * fix(coprocessor): host-listener, update last_updated_at de chain when already processed * fix(coprocessor): deprecate schedule order in TFHE worker * chore(coprocessor): fix CI * fix(coprocessor): host-listener, params for dependency chain policy * fix(coprocessor): hist-listener, dependency_chain, cycle detection * chore(coprocessor): update charts for new params * chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2 --------- Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai> Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com> Co-authored-by: rudy <rudy.sicard@zama.ai> * fix(coprocessor): db migration, improve indexing for sns worker fetching work (#1692) * fix(coprocessor): db migration, improve indexing for sns worker fetching work * fix(coprocessor): add missing indexes for selecting allowed handles when tx unsent --------- Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai> * feat(coprocessor): add mechanism to release dependence chains when no progress (#1696) * fix(coprocessor): do not update is_completed on unallowed handles * feat(coprocessor): add mechanism to release dependence chains when no progress * fix(coprocessor): remove obsolete row lock on computations * feat(coprocessor): set created_at as topological order within block * fix(coprocessor): chain release and update * chore(coprocessor): update charts * fix(coprocessor): fix top timestamp for tx * fix(coprocessor): update earliest schedule order * fix(coprocessor): remove adding epsilon to timestamp when releasing chain * fix(coprocessor): split dependence chains after forks instead of before --------- Co-authored-by: rudy <rudy.sicard@zama.ai> * fix(coprocessor): add missing indexes on verify_proofs and dependence_chain tables (#1715) * fix(coprocessor): db-migration, first clean on more obvious unused index (#1722) * fix(coprocessor): align host listener and poller dependence params (#1728) --------- Co-authored-by: goshawk-3 <76947196+goshawk-3@users.noreply.github.com> Co-authored-by: rudy <rudy.sicard@zama.ai>
1 parent 662df96 commit 92f9434

53 files changed

Lines changed: 3274 additions & 298 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test-suite-e2e-operators-tests.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,13 @@ jobs:
101101
username: ${{ github.actor }}
102102
password: ${{ secrets.GHCR_READ_TOKEN }}
103103

104+
- name: Login to Chainguard Registry
105+
uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567 # v3.3.0
106+
with:
107+
registry: cgr.dev
108+
username: ${{ secrets.CGR_USERNAME }}
109+
password: ${{ secrets.CGR_PASSWORD }}
110+
104111
- name: Deploy fhevm Stack
105112
working-directory: test-suite/fhevm
106113
env:

charts/coprocessor/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name: coprocessor
22
description: A helm chart to distribute and deploy Zama fhevm Co-Processor services
3-
version: 0.7.8
3+
version: 0.7.10
44
apiVersion: v2
55
keywords:
66
- fhevm

charts/coprocessor/values.yaml

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,13 @@ hostListener:
153153
- --initial-block-time=12 # it can switch to real blockTime when data is available
154154
- --log-level=INFO
155155
- --health-port=8080
156-
- --dependence-cache-size=128
157156
- --reorg-maximum-duration-in-blocks=50
158157

158+
### Dependence chains parameters
159+
# - --dependence-cache-size=10000
160+
# - --dependence-by-connexity # Whether to build connected components or linear chains (default no)
161+
# - --dependence-cross-block # Do chains cross L1 block boundaries (default yes)
162+
159163
### Catchup parameters (optional)
160164
# - --catchup-margin
161165
# - --catchup-paging
@@ -266,6 +270,11 @@ hostListenerPoller:
266270
### Prometheus metrics
267271
- --metrics-addr=0.0.0.0:9100 # Address for Prometheus metrics HTTP server
268272

273+
### Dependence chains parameters
274+
# - --dependence-cache-size=10000
275+
# - --dependence-by-connexity # Whether to build connected components or linear chains (default no)
276+
# - --dependence-cross-block # Do chains cross L1 block boundaries (default yes)
277+
269278
# Service ports configuration
270279
ports:
271280
metrics: 9100
@@ -545,7 +554,7 @@ tfheWorker:
545554
- --run-bg-worker=true
546555
- --worker-polling-interval-ms=10000
547556
- --work-items-batch-size=100 # scheduling changed
548-
- --dependence-chains-per-batch=100
557+
- --dependence-chains-per-batch=100 # Deprecated. To be removed in a future release.
549558
- --tenant-key-cache-size=32
550559
- --coprocessor-fhe-threads=64 # scheduling changed
551560
- --tokio-threads=16 # scheduling changed
@@ -562,6 +571,23 @@ tfheWorker:
562571
- --maximum-handles-per-input=0
563572
- --server-addr=""
564573
- --coprocessor-private-key=""
574+
# Unique worker identifier (valid UUID v4 format)
575+
# If not set, defaults to a random UUID generated at startup
576+
- --worker-id=$(WORKER_ID)
577+
- --dcid-ttl-sec=30 # Time-to-live (in seconds) for dependence chain locks
578+
# Disable dependence chain ID locking
579+
# WARNING: May cause multiple workers to process the same DCID concurrently
580+
# Defaults to false
581+
- --disable-dcid-locking=false
582+
# Time slice (in seconds) for processing a single dependence chain
583+
# Locks are released if processing exceeds this duration
584+
- --dcid-timeslice-sec=90
585+
# Processed DCIDs older than this value are cleaned up
586+
# Defaults to 48 hours (172800 seconds)
587+
# Time-to-live (in seconds) for processed dependence chains
588+
- --processed-dcid-ttl-sec=172800
589+
- --dcid-cleanup-interval-sec=3600 # Interval (in seconds) for cleaning up expired DCID locks
590+
- --dcid-max-no-progress-cycles=2 # Worker cycles without progress before releasing
565591

566592
# Service ports configuration
567593
ports:

coprocessor/fhevm-engine/.sqlx/query-0be7f94ac1356de126688b56b95593e80509b7834f14f39e8aed9a4f15fad410.json

Lines changed: 28 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coprocessor/fhevm-engine/.sqlx/query-156dcfa2ae70e64be2eb8014928745a9c95e29d18a435f4d2e2fda2afd7952bf.json

Lines changed: 17 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coprocessor/fhevm-engine/.sqlx/query-1dde98bc8c1076c5708f985512b88082da81a35fd1411d6f9871a5414075a666.json

Lines changed: 0 additions & 16 deletions
This file was deleted.

coprocessor/fhevm-engine/.sqlx/query-2e431116e7d3116265c42dda4fbee1b9954906485e02665c59431e4c6394d239.json

Lines changed: 14 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coprocessor/fhevm-engine/.sqlx/query-356ad05cf8677b0e561e56e0b7d5298b39471d8431093f3297da926b3f97273e.json

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coprocessor/fhevm-engine/.sqlx/query-49417a40d2aa74a4a9d7486417acf5c791519c9b1de680de3516e18d24b4f48e.json

Lines changed: 15 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coprocessor/fhevm-engine/.sqlx/query-eda7c952325475562fbe1d1a5793ac82366742c1618f83dfd6b4da5db9492544.json renamed to coprocessor/fhevm-engine/.sqlx/query-5e57f8be36c4ccd8fb28fb36a71d81a4cb7e13700a89267b53a3e7edfeb8b4cc.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)