Skip to content

feat(coprocessor): add a non-blocking, distributed locking mechanism in tfhe-worker#1550

Merged
antoniupop merged 33 commits intorelease/0.10.xfrom
feature/tfhe-worker/scalability
Dec 29, 2025
Merged

feat(coprocessor): add a non-blocking, distributed locking mechanism in tfhe-worker#1550
antoniupop merged 33 commits intorelease/0.10.xfrom
feature/tfhe-worker/scalability

Conversation

@goshawk-3
Copy link
Copy Markdown
Contributor

@goshawk-3 goshawk-3 commented Dec 11, 2025

This adds a non-blocking, distributed locking mechanism that coordinates dependence-chain processing across multiple tfhe-workers replicas.

A worker can acquire a lock of the next available dependence-chain entry for processing ordered by last_updated_at (FIFO queue-like approach).

A permission to acquire a DCID depends on either

  • dependency_count is 0 and DCID is not locked
    or

  • dependency_count is 0 and DCID is locked but the lock has expired

  • Ownership expires after a timeout, enabling work-stealing by other workers for resilience.

  • GC procedure is regularly executed to clean up processed DCIDs

@cla-bot cla-bot Bot added the cla-signed label Dec 11, 2025
@mergify
Copy link
Copy Markdown

mergify Bot commented Dec 11, 2025

🧪 CI Insights

Here's what we observed from your CI run for afdfe6b.

🟢 All jobs passed!

But CI Insights is watching 👀

@rudy-6-4
Copy link
Copy Markdown
Contributor

values.yaml new parameters are missing

@goshawk-3
Copy link
Copy Markdown
Contributor Author

See also: #1506 (comment)

@goshawk-3 goshawk-3 changed the title Feature/tfhe worker/scalability feat(coprocessor): add a non-blocking, distributed locking mechanism across multiple tfhe-workers Dec 16, 2025
@goshawk-3 goshawk-3 changed the title feat(coprocessor): add a non-blocking, distributed locking mechanism across multiple tfhe-workers feat(coprocessor): add a non-blocking, distributed locking mechanism in tfhe-worker Dec 16, 2025
@goshawk-3 goshawk-3 force-pushed the feature/tfhe-worker/scalability branch from 687e931 to e63b299 Compare December 16, 2025 10:36
@antoniupop
Copy link
Copy Markdown
Collaborator

New CLI params

  • --worker-id
  • --dcid-ttl-sec
  • --dcid-timeslice-sec
  • --disable-dcid-locking

Please could you update the charts with these (or any new params added) - I think we've mostly converged on the arch, so would be good to start planning for deployment.

@goshawk-3 goshawk-3 force-pushed the feature/tfhe-worker/scalability branch 4 times, most recently from b642f1b to c35b36c Compare December 19, 2025 16:06
@antoniupop antoniupop force-pushed the feature/tfhe-worker/scalability branch 4 times, most recently from 46dcec4 to 4402c9e Compare December 23, 2025 08:08
rudy-6-4
rudy-6-4 previously approved these changes Dec 24, 2025
@antoniupop antoniupop marked this pull request as ready for review December 28, 2025 22:10
@antoniupop antoniupop requested review from a team as code owners December 28, 2025 22:10
antoniupop
antoniupop previously approved these changes Dec 28, 2025
@antoniupop antoniupop force-pushed the feature/tfhe-worker/scalability branch from 458b71d to de42ee5 Compare December 29, 2025 08:01
…iple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id
@antoniupop antoniupop force-pushed the feature/tfhe-worker/scalability branch 2 times, most recently from 9d4b718 to afbf0a1 Compare December 29, 2025 08:12
* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: rudy <rudy.sicard@zama.ai>
@antoniupop antoniupop force-pushed the feature/tfhe-worker/scalability branch from afbf0a1 to afdfe6b Compare December 29, 2025 08:20
@antoniupop
Copy link
Copy Markdown
Collaborator

@antoniupop antoniupop merged commit ad0d93b into release/0.10.x Dec 29, 2025
136 of 137 checks passed
@antoniupop antoniupop deleted the feature/tfhe-worker/scalability branch December 29, 2025 10:05
antoniupop added a commit that referenced this pull request Jan 6, 2026
…oprocessor (#1550)

* feat(coprocessor): create dependence_chain table

* feat(coprocessor): coordinate dependence-chain processing across multiple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id

* test(coprocessor): ensure both acquire_next_lock and work-stealing features

* fix(coprocessor): fix work-stealing when a lock has expired

- Added LockingReason for logging
- Make expiry configurable

* fix(coprocessor): update the flow of acquire/extend/release chain_id lock

* chore(coprocessor): update sqlx cache

* chore(coprocessor): improve logging for dcid locking

* chore(coprocessor): disable fallback for dependence_chain_id locking

* fix(coprocessor): update in-memory lock info on extend_current_lock

* fix(coprocessor): lock another dcid and continue processing

* chore(coprocessor): update sqlx cache

* chore(coprocessor): add idx_dependence_chain_processing_by_worker

* chore(coprocessor): observe query timings in tfhe-worker

- add --dcid_ttl_sec config
- add otel traces for dcid

* chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options

* chore(coprocessor): update sqlx cache

* chore(coprocessor): update last_updated_at when releasing a lock

* chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker

* chore(coprocessor): solve dcid unit-tests issue

* chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded

* chore(coprocessor): enable default timeslice

* chore(coprocessor): run cleanup procedure to delete old processed dcids

* chore(coprocessor): acquire locks only on DCIDs that are ready for computation

* chore(coprocessor): update charts with new tfhe-worker args

* chore(coprocessor): handle case no-dcid-available

* chore(coprocessor): bump chart version

* chore(coprocessor): notify work available if dependency count reaches zero

* fix(coprocessor): add dependence chain index on last_updated when processed

* fix(coprocessor): update is_completed only where a CT is inserted in DB

* fix(coprocessor): restrict update of computation completion to uncompleted computations

* fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles

* fix(coprocessor): update test for completion of processing of dcid

* fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables

* feat(coprocessor): add transaction dependence chains in HL (#1651)

* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>
Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>
antoniupop added a commit that referenced this pull request Jan 6, 2026
…oprocessor (#1550)

* feat(coprocessor): create dependence_chain table

* feat(coprocessor): coordinate dependence-chain processing across multiple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id

* test(coprocessor): ensure both acquire_next_lock and work-stealing features

* fix(coprocessor): fix work-stealing when a lock has expired

- Added LockingReason for logging
- Make expiry configurable

* fix(coprocessor): update the flow of acquire/extend/release chain_id lock

* chore(coprocessor): update sqlx cache

* chore(coprocessor): improve logging for dcid locking

* chore(coprocessor): disable fallback for dependence_chain_id locking

* fix(coprocessor): update in-memory lock info on extend_current_lock

* fix(coprocessor): lock another dcid and continue processing

* chore(coprocessor): update sqlx cache

* chore(coprocessor): add idx_dependence_chain_processing_by_worker

* chore(coprocessor): observe query timings in tfhe-worker

- add --dcid_ttl_sec config
- add otel traces for dcid

* chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options

* chore(coprocessor): update sqlx cache

* chore(coprocessor): update last_updated_at when releasing a lock

* chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker

* chore(coprocessor): solve dcid unit-tests issue

* chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded

* chore(coprocessor): enable default timeslice

* chore(coprocessor): run cleanup procedure to delete old processed dcids

* chore(coprocessor): acquire locks only on DCIDs that are ready for computation

* chore(coprocessor): update charts with new tfhe-worker args

* chore(coprocessor): handle case no-dcid-available

* chore(coprocessor): bump chart version

* chore(coprocessor): notify work available if dependency count reaches zero

* fix(coprocessor): add dependence chain index on last_updated when processed

* fix(coprocessor): update is_completed only where a CT is inserted in DB

* fix(coprocessor): restrict update of computation completion to uncompleted computations

* fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles

* fix(coprocessor): update test for completion of processing of dcid

* fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables

* feat(coprocessor): add transaction dependence chains in HL (#1651)

* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>
Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>
antoniupop added a commit that referenced this pull request Jan 6, 2026
…oprocessor (#1550)

* feat(coprocessor): create dependence_chain table

* feat(coprocessor): coordinate dependence-chain processing across multiple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id

* test(coprocessor): ensure both acquire_next_lock and work-stealing features

* fix(coprocessor): fix work-stealing when a lock has expired

- Added LockingReason for logging
- Make expiry configurable

* fix(coprocessor): update the flow of acquire/extend/release chain_id lock

* chore(coprocessor): update sqlx cache

* chore(coprocessor): improve logging for dcid locking

* chore(coprocessor): disable fallback for dependence_chain_id locking

* fix(coprocessor): update in-memory lock info on extend_current_lock

* fix(coprocessor): lock another dcid and continue processing

* chore(coprocessor): update sqlx cache

* chore(coprocessor): add idx_dependence_chain_processing_by_worker

* chore(coprocessor): observe query timings in tfhe-worker

- add --dcid_ttl_sec config
- add otel traces for dcid

* chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options

* chore(coprocessor): update sqlx cache

* chore(coprocessor): update last_updated_at when releasing a lock

* chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker

* chore(coprocessor): solve dcid unit-tests issue

* chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded

* chore(coprocessor): enable default timeslice

* chore(coprocessor): run cleanup procedure to delete old processed dcids

* chore(coprocessor): acquire locks only on DCIDs that are ready for computation

* chore(coprocessor): update charts with new tfhe-worker args

* chore(coprocessor): handle case no-dcid-available

* chore(coprocessor): bump chart version

* chore(coprocessor): notify work available if dependency count reaches zero

* fix(coprocessor): add dependence chain index on last_updated when processed

* fix(coprocessor): update is_completed only where a CT is inserted in DB

* fix(coprocessor): restrict update of computation completion to uncompleted computations

* fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles

* fix(coprocessor): update test for completion of processing of dcid

* fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables

* feat(coprocessor): add transaction dependence chains in HL (#1651)

* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>
Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>
antoniupop added a commit that referenced this pull request Jan 7, 2026
…oprocessor (#1550)

* feat(coprocessor): create dependence_chain table

* feat(coprocessor): coordinate dependence-chain processing across multiple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id

* test(coprocessor): ensure both acquire_next_lock and work-stealing features

* fix(coprocessor): fix work-stealing when a lock has expired

- Added LockingReason for logging
- Make expiry configurable

* fix(coprocessor): update the flow of acquire/extend/release chain_id lock

* chore(coprocessor): update sqlx cache

* chore(coprocessor): improve logging for dcid locking

* chore(coprocessor): disable fallback for dependence_chain_id locking

* fix(coprocessor): update in-memory lock info on extend_current_lock

* fix(coprocessor): lock another dcid and continue processing

* chore(coprocessor): update sqlx cache

* chore(coprocessor): add idx_dependence_chain_processing_by_worker

* chore(coprocessor): observe query timings in tfhe-worker

- add --dcid_ttl_sec config
- add otel traces for dcid

* chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options

* chore(coprocessor): update sqlx cache

* chore(coprocessor): update last_updated_at when releasing a lock

* chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker

* chore(coprocessor): solve dcid unit-tests issue

* chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded

* chore(coprocessor): enable default timeslice

* chore(coprocessor): run cleanup procedure to delete old processed dcids

* chore(coprocessor): acquire locks only on DCIDs that are ready for computation

* chore(coprocessor): update charts with new tfhe-worker args

* chore(coprocessor): handle case no-dcid-available

* chore(coprocessor): bump chart version

* chore(coprocessor): notify work available if dependency count reaches zero

* fix(coprocessor): add dependence chain index on last_updated when processed

* fix(coprocessor): update is_completed only where a CT is inserted in DB

* fix(coprocessor): restrict update of computation completion to uncompleted computations

* fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles

* fix(coprocessor): update test for completion of processing of dcid

* fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables

* feat(coprocessor): add transaction dependence chains in HL (#1651)

* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>
Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>
antoniupop added a commit that referenced this pull request Jan 8, 2026
…oprocessor (#1550)

* feat(coprocessor): create dependence_chain table

* feat(coprocessor): coordinate dependence-chain processing across multiple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id

* test(coprocessor): ensure both acquire_next_lock and work-stealing features

* fix(coprocessor): fix work-stealing when a lock has expired

- Added LockingReason for logging
- Make expiry configurable

* fix(coprocessor): update the flow of acquire/extend/release chain_id lock

* chore(coprocessor): update sqlx cache

* chore(coprocessor): improve logging for dcid locking

* chore(coprocessor): disable fallback for dependence_chain_id locking

* fix(coprocessor): update in-memory lock info on extend_current_lock

* fix(coprocessor): lock another dcid and continue processing

* chore(coprocessor): update sqlx cache

* chore(coprocessor): add idx_dependence_chain_processing_by_worker

* chore(coprocessor): observe query timings in tfhe-worker

- add --dcid_ttl_sec config
- add otel traces for dcid

* chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options

* chore(coprocessor): update sqlx cache

* chore(coprocessor): update last_updated_at when releasing a lock

* chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker

* chore(coprocessor): solve dcid unit-tests issue

* chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded

* chore(coprocessor): enable default timeslice

* chore(coprocessor): run cleanup procedure to delete old processed dcids

* chore(coprocessor): acquire locks only on DCIDs that are ready for computation

* chore(coprocessor): update charts with new tfhe-worker args

* chore(coprocessor): handle case no-dcid-available

* chore(coprocessor): bump chart version

* chore(coprocessor): notify work available if dependency count reaches zero

* fix(coprocessor): add dependence chain index on last_updated when processed

* fix(coprocessor): update is_completed only where a CT is inserted in DB

* fix(coprocessor): restrict update of computation completion to uncompleted computations

* fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles

* fix(coprocessor): update test for completion of processing of dcid

* fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables

* feat(coprocessor): add transaction dependence chains in HL (#1651)

* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>
Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>
mergify Bot pushed a commit that referenced this pull request Jan 9, 2026
* feat(coprocessor): schedule computations along dependence chains in coprocessor (#1550)

* feat(coprocessor): create dependence_chain table

* feat(coprocessor): coordinate dependence-chain processing across multiple workers

It provides a non-blocking, distributed locking mechanism that
coordinates dependence-chain processing across multiple tfhe-workers.

A worker can acquire ownership of the next available dependence-chain entry for processing
ordered by last_updated_at (FIFO queue-like approach).

Ownership expires after a timeout, enabling work-stealing by other workers.

New CLI param --worker_id

* test(coprocessor): ensure both acquire_next_lock and work-stealing features

* fix(coprocessor): fix work-stealing when a lock has expired

- Added LockingReason for logging
- Make expiry configurable

* fix(coprocessor): update the flow of acquire/extend/release chain_id lock

* chore(coprocessor): update sqlx cache

* chore(coprocessor): improve logging for dcid locking

* chore(coprocessor): disable fallback for dependence_chain_id locking

* fix(coprocessor): update in-memory lock info on extend_current_lock

* fix(coprocessor): lock another dcid and continue processing

* chore(coprocessor): update sqlx cache

* chore(coprocessor): add idx_dependence_chain_processing_by_worker

* chore(coprocessor): observe query timings in tfhe-worker

- add --dcid_ttl_sec config
- add otel traces for dcid

* chore(coprocessor): implement both max_lock_ttl_sec and disable_dcid_locking options

* chore(coprocessor): update sqlx cache

* chore(coprocessor): update last_updated_at when releasing a lock

* chore(coprocessor): support --dcid-timeslice-sec CLI param, tfhe-worker

* chore(coprocessor): solve dcid unit-tests issue

* chore(coprocessor): enable lock re-acquisition once the timeslice has been exceeded

* chore(coprocessor): enable default timeslice

* chore(coprocessor): run cleanup procedure to delete old processed dcids

* chore(coprocessor): acquire locks only on DCIDs that are ready for computation

* chore(coprocessor): update charts with new tfhe-worker args

* chore(coprocessor): handle case no-dcid-available

* chore(coprocessor): bump chart version

* chore(coprocessor): notify work available if dependency count reaches zero

* fix(coprocessor): add dependence chain index on last_updated when processed

* fix(coprocessor): update is_completed only where a CT is inserted in DB

* fix(coprocessor): restrict update of computation completion to uncompleted computations

* fix(coprocessor): prevent dependence cycle overestimation on trivial encrypt handles

* fix(coprocessor): update test for completion of processing of dcid

* fix(coprocessor): add missing partial indexes on ciphertexts and ciphertext_digest tables

* feat(coprocessor): add transaction dependence chains in HL (#1651)

* fix(coprocessor): host-listener, dependency chain

* fix(coprocessor): fix to squash, duplicated trivial encrypt

* fix(coprocessor): fix to squash, duplicated trivial encrypt, test

* fix(coprocessor): fix to squash, scalars are not handles

* fix(coprocessor): fix to squash, cargo fmt

* feat(coprocessor): topologic timestamp

* fix(coprocessor): host-listener, reject cycle and describe out of order dependencies

* fix(coprocessor): do not update dependence chain timestamp on row update

* fix(coprocessor): host-listener, bad condition for need to sort tx

* feat(coprocessor): host-listener, dependency_count for dependency_chain

* feat(coprocessor): host-listener, dependents for dependency_chain

* fix(coprocessor): restrict dependence counter to block scope

* fix(coprocessor): do not update dependence chain last_updated_at on release

* fix(coprocessor): emit warning only when dependence chain is missing dependences

* feat(coprocessor): host-listener, dependency_chain as connected component

* fix(coprocessor): host-listener, update last_updated_at de chain when already processed

* fix(coprocessor): deprecate schedule order in TFHE worker

* chore(coprocessor): fix CI

* fix(coprocessor): host-listener, params for dependency chain policy

* fix(coprocessor): hist-listener, dependency_chain, cycle detection

* chore(coprocessor): update charts for new params

* chore(coprocessor): fix TFHE worker CI test test_extend_or_release_lock_2

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>
Co-authored-by: Antoniu Pop <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>

* fix(coprocessor): db migration, improve indexing for sns worker fetching work (#1692)

* fix(coprocessor): db migration, improve indexing for sns worker fetching work

* fix(coprocessor): add missing indexes for selecting allowed handles when tx unsent

---------

Co-authored-by: Antoniu Pop <antoniu.pop@zama.ai>

* feat(coprocessor): add mechanism to release dependence chains when no progress (#1696)

* fix(coprocessor): do not update is_completed on unallowed handles

* feat(coprocessor): add mechanism to release dependence chains when no progress

* fix(coprocessor): remove obsolete row lock on computations

* feat(coprocessor): set created_at as topological order within block

* fix(coprocessor): chain release and update

* chore(coprocessor): update charts

* fix(coprocessor): fix top timestamp for tx

* fix(coprocessor): update earliest schedule order

* fix(coprocessor): remove adding epsilon to timestamp when releasing chain

* fix(coprocessor): split dependence chains after forks instead of before

---------

Co-authored-by: rudy <rudy.sicard@zama.ai>

* fix(coprocessor): add missing indexes on verify_proofs and dependence_chain tables (#1715)

* fix(coprocessor): db-migration, first clean on more obvious unused index (#1722)

* fix(coprocessor): align host listener and poller dependence params (#1728)

---------

Co-authored-by: goshawk-3 <76947196+goshawk-3@users.noreply.github.com>
Co-authored-by: rudy <rudy.sicard@zama.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants