Skip to content

feat(kms-connector): garbage collection#1561

Merged
mergify[bot] merged 8 commits intomainfrom
eudelins/feat/garbage-collect-events
Dec 15, 2025
Merged

feat(kms-connector): garbage collection#1561
mergify[bot] merged 8 commits intomainfrom
eudelins/feat/garbage-collect-events

Conversation

@eudelins-zama
Copy link
Copy Markdown
Contributor

@eudelins-zama eudelins-zama commented Dec 12, 2025

Closes https://github.com/zama-ai/fhevm-internal/issues/749

DB changes

This PR adds the status (custom enum) and updated_at (timestamp) fields while removing the under_process field of all the requests/responses tables.
The status field can be:

  • pending: waiting to be processed
  • under_process: currently processed, so the request/response is "locked"
  • completed (will be deleted later by the garbage collector)
  • failed (will be deleted later by the garbage collector)

Behavior changes

Request/Response deletion

Before this PR, the kms-connector tried to immediatly delete request/response once it has been processed.
Now it will instead mark it as completed (or failed), and a garbage collector will handle them later on.
The garbage collector will:

  • delete decryption requests/responses
  • keep other kind of requests/responses
    • Why? Because they are related to keygen or initialization of the KMS, so they are pretty rare, and it could be useful to keep a trace of them

As we want to run a single instance of this garbage collector, it is running alongside the TransactionSender, as it is the only micro-service of the kms-connector that should also be a single instance.

Decryption Request/Response unlocking

Before this PR, if a KmsWorker picks and locks a decryption request from the DB to process it, if it crashes or is restarted for some reason, then this request will be locked forever and never processed.
Now the garbage collector will unlock the request if it is in a under_process status for too long.

Same behavior for the TransactionSender and decryption responses.

Note that we only unlock decryption requests/responses and not the other, because as explained above, they are related to keygen/init of the KMS, so are pretty rare, and under the protocol's control. Thus it is easier to manually investigate what is happening in such case, and manually unlock or re-trigger requests if needed. That way, it avoid running multiple SQL queries that will be useless 99.99% of the time.

Code changes

I removed the KmsResponseRemover abstraction. All of these abstractions were mostly created in the 1st place because a migration from Postgres DB to Kafka was something to consider.
But IMO, it seems less probable these days, and it was easier/cleaner to achieve the update of the KmsResponse status using method than a dedicated struct to pass everywhere.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a garbage collection feature for the KMS connector to automatically clean up old database entries. The key changes include:

  • Replacing the under_process boolean field with an operation_status enum that supports four states: pending, under_process, completed, and failed
  • Adding updated_at timestamps to track when operations are last modified
  • Implementing garbage collection routines that delete old completed/failed operations and unlock stuck operations
  • Adding status update methods to KmsResponse and GatewayEvent types
  • Removing the KmsResponseRemover abstraction in favor of direct database updates via status methods

Reviewed changes

Copilot reviewed 34 out of 91 changed files in this pull request and generated no comments.

Show a summary per file
File Description
connector-db/migrations/20251208161022_implem_garbage_collection.sql Adds the database schema changes including the operation_status enum, status columns, updated_at columns, indexes, and triggers
crates/utils/src/types/db.rs Defines the OperationStatus enum type with sqlx annotations
crates/utils/src/types/kms_response.rs Adds status update methods (mark_as_pending, mark_as_completed, mark_as_failed) to KmsResponse
crates/utils/src/types/gw_event.rs Adds status update methods to GatewayEvent and refactors update logic
crates/tx-sender/src/monitoring/gc.rs Implements garbage collection routines for deleting old operations and unlocking stuck operations
crates/tx-sender/src/core/tx_sender.rs Updates to use the new status methods instead of the removed KmsResponseRemover
crates/tx-sender/src/core/kms_response_remover.rs File removed - functionality moved to status methods
crates/tx-sender/src/core/config/*.rs Adds configuration parameters for garbage collection intervals and expiry times
Test files Updates test helpers to accept status parameter and verify completed status instead of deletion
.sqlx/*.json Updated query metadata files reflecting the new schema
Files not reviewed (57)
  • kms-connector/.sqlx/query-05206597d00f58583f577f3d54ddd767c5696857085178cca25df687aa0cd39c.json: Language not supported
  • kms-connector/.sqlx/query-083e8a8de715c2a8dc806edb5aad8c9f92d1087059e7531e6232fde40e633f4e.json: Language not supported
  • kms-connector/.sqlx/query-0c7e1f45677401d5bf8d0dce8849637a4b533c4371b1620f9259529a1bfebc27.json: Language not supported
  • kms-connector/.sqlx/query-120c7bc6586f02642e5eaaa38471e8f20326fa5200cda61851355ea0f84b699e.json: Language not supported
  • kms-connector/.sqlx/query-14c8041a04281d0731559221e38686feb8a4acea36e1540bda75267c6412f4b0.json: Language not supported
  • kms-connector/.sqlx/query-179550430d054e86be5aa2295d4eb2aa7db2ed9a940a8ac1047c29a17ef661b5.json: Language not supported
  • kms-connector/.sqlx/query-1e4e2b8d09524458504e3969d8704fd31be45dcece212c7c7b8c5d9023a1be11.json: Language not supported
  • kms-connector/.sqlx/query-206a41e2c3ef34b4820cb4e7d271be51316580b3b2f2de67e15d2e6bcab12946.json: Language not supported
  • kms-connector/.sqlx/query-206d6c0bc40462ad05e063114c98d9081f918b663269baa8b9ed233c918b6d31.json: Language not supported
  • kms-connector/.sqlx/query-2350ac60a071822d470ac847cd54924059c23e74005f04d249cbc4cef0b74ac6.json: Language not supported
  • kms-connector/.sqlx/query-2c1c41025c09656ec4aa9a6e8ee58b52cadbc7039cf07d158f70dfc46b5e06eb.json: Language not supported
  • kms-connector/.sqlx/query-313949db8a6fe92e8b835fd03c177bd7da78bc55ac924f330207ba6d303be033.json: Language not supported
  • kms-connector/.sqlx/query-3d3c5a0bf214ee773ecde5d94480575d3d46afe6549da42deee242388984dfb9.json: Language not supported
  • kms-connector/.sqlx/query-3daeba4fbf77d703a538ae81bc4801f30455419dda08b31885f8d2d1ff924038.json: Language not supported
  • kms-connector/.sqlx/query-44bb0537af9736b1121d933649a456ef6f0921b6ea86d05d76fd48d4bffe7db6.json: Language not supported
  • kms-connector/.sqlx/query-4d063e25b766d3c96b516388f0682b12fd16ab2b6f0601d702c099c9725087b2.json: Language not supported
  • kms-connector/.sqlx/query-4d8fef33610f8baf96a2da80dec76e23423aa66331b1315c5c74a06d8a4af120.json: Language not supported
  • kms-connector/.sqlx/query-536fab7b8d99df255619355b084cdb4de9279b5ec84ad4d79772d326b1099fc6.json: Language not supported
  • kms-connector/.sqlx/query-55e39fce20ced8f64418404aedd2d8c2d764df3c5fe9680415e9c8ced391c87d.json: Language not supported
  • kms-connector/.sqlx/query-59428c72c6afbb794abc7a14db053e575e50199aadb1fc8bfec5b68a5bcb4394.json: Language not supported
  • kms-connector/.sqlx/query-5d0b62324d38394bc77e5a12024b0875102ee676ca73bfcacaf0945ed3d10d51.json: Language not supported
  • kms-connector/.sqlx/query-60f58397e8588260d3c79b89bfdef430f7bcdfba000a3166c7d68fd54908f6c9.json: Language not supported
  • kms-connector/.sqlx/query-69196b39e59fe6f9e449f7cd1f30fe6117c9670d56a08f64e49e3d9a7b109fb9.json: Language not supported
  • kms-connector/.sqlx/query-70c9f021267ee3989cfc313cd95f1aa1af7163fda39518bf6c9f9dc4a10911f1.json: Language not supported
  • kms-connector/.sqlx/query-72b5875fec8f017644c045dc7897fdc6798aabf9da3016fad71d0f3227d26e8e.json: Language not supported
  • kms-connector/.sqlx/query-775f56607468e59d6f88ed64b2bd789c18b3d7ba68140897d3a6f7729d56e61c.json: Language not supported
  • kms-connector/.sqlx/query-7bc37c994ea75c732017ff6c4eb36a56d23191c2a428af3e8847fde85a701b36.json: Language not supported
  • kms-connector/.sqlx/query-7f74ef6dec3a6111b0f0e31c74caf0d93877520081dd428c8bebc167a4e87104.json: Language not supported
  • kms-connector/.sqlx/query-7f96aa6c8b04c991cbffa6548229e57039a9137d1a6b96a7b677b41382b6f1a7.json: Language not supported
  • kms-connector/.sqlx/query-820e14ed1c8a9ec5a5462025a871aedc0b319f207cb55b80886f29a3f4e2e63b.json: Language not supported
  • kms-connector/.sqlx/query-828ed7aa3de4737fcc54dbb2ffdf07098ab6ef7810a1f27ac7d7e0b5a083db69.json: Language not supported
  • kms-connector/.sqlx/query-856389ba8fa80561c2a90c6fb02d77615ee97e14beea56eb9827e24a105cd304.json: Language not supported
  • kms-connector/.sqlx/query-88d0f1470e61797d49940ea36c9155e37cc3f0c622de57701c237ccdf359e7d0.json: Language not supported
  • kms-connector/.sqlx/query-93db5ac72e3acc12af313048a9d900371e903a45c1c172af2633129033aae5e1.json: Language not supported
  • kms-connector/.sqlx/query-9424450a5fe75af91766f2dfe3463e81e8872ed262b34a6d6865af21f24ba3da.json: Language not supported
  • kms-connector/.sqlx/query-9b355f759f69f02cca8a740911cd9a1b55d495c5c7644364bfaca734cb835fc9.json: Language not supported
  • kms-connector/.sqlx/query-9b9d3bbd21139dbb78601c4374e14bc118c3a39a06d682f82db7310e8d8c8cf7.json: Language not supported
  • kms-connector/.sqlx/query-9e6abfc0df77b3587972fc590ee888338d1e9a3d4f2bdb506d8588319adad879.json: Language not supported
  • kms-connector/.sqlx/query-a44b1d4e88190c3319559b50ae6cd8ade6b26b04b71e9a90b4a64bbf48a9bff8.json: Language not supported
  • kms-connector/.sqlx/query-ac8229a3ceb462177827da9963f51aaced2c230e0ae6603866711ca949d4a99f.json: Language not supported
  • kms-connector/.sqlx/query-ac82565c4c15f51c6df30bd691519619bb6ff80098fe4e077ed2b43cf6ac525a.json: Language not supported
  • kms-connector/.sqlx/query-af73e253d2f8f12077e59607c44ff182930372d1452c32330464c911c970754d.json: Language not supported
  • kms-connector/.sqlx/query-b3ee6c99edad54e3ef0f8a19bdc036142463f678111f32cf1ae8607e4a281634.json: Language not supported
  • kms-connector/.sqlx/query-bac064549047cb319d62b7d3a639cfc3933fc455e8e97407aacba220f251a61f.json: Language not supported
  • kms-connector/.sqlx/query-be7952ac7f042f7b4756783cb2c40cdc4b92eb360bc5ae95bb3ce3805a3c0a00.json: Language not supported
  • kms-connector/.sqlx/query-c4846436287bb74b8286442658da646740df133377bd8ffdb4e3451283ced0b7.json: Language not supported
  • kms-connector/.sqlx/query-cf58df3893aa66257170af9712e22096ef64d0525678bc97a883209d6ec39445.json: Language not supported
  • kms-connector/.sqlx/query-cf60be82db94427c2ec3236b82ad353f0b8b222e7a3453b43b2f7542f9c8ea62.json: Language not supported
  • kms-connector/.sqlx/query-d0b5c0db41828ffa93492074e25d8c3401f1858435db7f7dc2bc59a257616c1c.json: Language not supported
  • kms-connector/.sqlx/query-d79b1f1cd29fc807bf29d3c8ac39983aa705d35ce944449e8708aa28446625d7.json: Language not supported
  • kms-connector/.sqlx/query-dba00e47fff69c431958cbce600923bd055efcc1d89134e1dbcb5699c770028e.json: Language not supported
  • kms-connector/.sqlx/query-e0d8f33e5502609d10d0a141bb2bd20421f84523b5a88dcd3f44305effaa5c6e.json: Language not supported
  • kms-connector/.sqlx/query-e1dcc890ca2f261e88d804e3ba9982a45915fd578fda8e74986aaf7cbd560310.json: Language not supported
  • kms-connector/.sqlx/query-e77e47909be8011605f261071241f6d480bc45712ba34889f50a8c271e67035b.json: Language not supported
  • kms-connector/.sqlx/query-eb4c0459deb269abd66cd57bba8174a4548b7cba727730db8938be880f9bb748.json: Language not supported
  • kms-connector/.sqlx/query-f12b35632ebb9921e6817a061661f854aec53dd785f257d2b016d3b2d2a1707e.json: Language not supported
  • kms-connector/.sqlx/query-fb7fe0e88036441a7ccb5ae083545ef7b2178e39c6e32ead5d9a3848ef72a616.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eudelins-zama eudelins-zama marked this pull request as ready for review December 12, 2025 10:53
@eudelins-zama eudelins-zama requested review from a team and dartdart26 as code owners December 12, 2025 10:53
@mergify
Copy link
Copy Markdown

mergify Bot commented Dec 12, 2025

🧪 CI Insights

Here's what we observed from your CI run for 4c76f87.

🟢 All jobs passed!

But CI Insights is watching 👀

@dartdart26
Copy link
Copy Markdown
Collaborator

I think what could be nice is to have a high-level description of the change as part of the PR (and commit message) as that would make it easier for the reviewer and also to track what changes have been made if going through commits later. Copilot has taken a stab at it, but I am sure you as an author would have much better context :)

Copy link
Copy Markdown
Collaborator

@dartdart26 dartdart26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I am almost ready to approve, just wanted to discuss some of the comments I put first.

Comment thread kms-connector/config/tx-sender.toml
Comment thread kms-connector/config/tx-sender.toml Outdated
Comment thread kms-connector/crates/kms-worker/src/core/event_picker/picker.rs Outdated
Comment thread kms-connector/crates/tx-sender/src/monitoring/garbage_collection.rs
@eudelins-zama
Copy link
Copy Markdown
Contributor Author

eudelins-zama commented Dec 15, 2025

I think what could be nice is to have a high-level description of the change as part of the PR (and commit message) as that would make it easier for the reviewer and also to track what changes have been made if going through commits later. Copilot has taken a stab at it, but I am sure you as an author would have much better context :)

You're right @dartdart26, done! 🙂

@eudelins-zama eudelins-zama force-pushed the eudelins/feat/garbage-collect-events branch from c974901 to 553dbfc Compare December 15, 2025 13:13
Copy link
Copy Markdown
Collaborator

@dartdart26 dartdart26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

@eudelins-zama
Copy link
Copy Markdown
Contributor Author

@Mergifyio queue

@mergify
Copy link
Copy Markdown

mergify Bot commented Dec 15, 2025

queue

✅ The pull request has been merged automatically

Details

The pull request has been merged automatically at fa8b337

@mergify
Copy link
Copy Markdown

mergify Bot commented Dec 15, 2025

Merge Queue Status

✅ The pull request has been merged at 4c76f87

This pull request spent 25 minutes 49 seconds in the queue, including 23 minutes 37 seconds running CI.
The checks were run on draft #1575.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • check-success = run-e2e-tests / fhevm-e2e-test
  • any of [🛡 GitHub branch protection]:
    • check-success = common-pull-request/lint (bpr)
    • check-neutral = common-pull-request/lint (bpr)
    • check-skipped = common-pull-request/lint (bpr)
  • any of [🛡 GitHub branch protection]:
    • check-skipped = coprocessor-cargo-listener-tests/cargo-tests (bpr)
    • check-neutral = coprocessor-cargo-listener-tests/cargo-tests (bpr)
    • check-success = coprocessor-cargo-listener-tests/cargo-tests (bpr)
  • any of [🛡 GitHub branch protection]:
    • check-skipped = coprocessor-cargo-test/cargo-tests (bpr)
    • check-neutral = coprocessor-cargo-test/cargo-tests (bpr)
    • check-success = coprocessor-cargo-test/cargo-tests (bpr)
  • any of [🛡 GitHub branch protection]:
    • check-skipped = coprocessor-dependency-analysis/dependencies-check (bpr)
    • check-neutral = coprocessor-dependency-analysis/dependencies-check (bpr)
    • check-success = coprocessor-dependency-analysis/dependencies-check (bpr)
  • any of [🛡 GitHub branch protection]:
    • check-skipped = gateway-contracts-deployment-tests/sc-deploy (bpr)
    • check-neutral = gateway-contracts-deployment-tests/sc-deploy (bpr)
    • check-success = gateway-contracts-deployment-tests/sc-deploy (bpr)
  • any of [🛡 GitHub branch protection]:
    • check-success = kms-connector-tests/test-connector (bpr)
    • check-neutral = kms-connector-tests/test-connector (bpr)
    • check-skipped = kms-connector-tests/test-connector (bpr)

@mergify mergify Bot added the queued label Dec 15, 2025
mergify Bot added a commit that referenced this pull request Dec 15, 2025
@mergify mergify Bot merged commit fa8b337 into main Dec 15, 2025
65 checks passed
@mergify mergify Bot deleted the eudelins/feat/garbage-collect-events branch December 15, 2025 16:41
@mergify mergify Bot removed the queued label Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants