feat(kms-connector): garbage collection#1561
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a garbage collection feature for the KMS connector to automatically clean up old database entries. The key changes include:
- Replacing the
under_processboolean field with anoperation_statusenum that supports four states:pending,under_process,completed, andfailed - Adding
updated_attimestamps to track when operations are last modified - Implementing garbage collection routines that delete old completed/failed operations and unlock stuck operations
- Adding status update methods to
KmsResponseandGatewayEventtypes - Removing the
KmsResponseRemoverabstraction in favor of direct database updates via status methods
Reviewed changes
Copilot reviewed 34 out of 91 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| connector-db/migrations/20251208161022_implem_garbage_collection.sql | Adds the database schema changes including the operation_status enum, status columns, updated_at columns, indexes, and triggers |
| crates/utils/src/types/db.rs | Defines the OperationStatus enum type with sqlx annotations |
| crates/utils/src/types/kms_response.rs | Adds status update methods (mark_as_pending, mark_as_completed, mark_as_failed) to KmsResponse |
| crates/utils/src/types/gw_event.rs | Adds status update methods to GatewayEvent and refactors update logic |
| crates/tx-sender/src/monitoring/gc.rs | Implements garbage collection routines for deleting old operations and unlocking stuck operations |
| crates/tx-sender/src/core/tx_sender.rs | Updates to use the new status methods instead of the removed KmsResponseRemover |
| crates/tx-sender/src/core/kms_response_remover.rs | File removed - functionality moved to status methods |
| crates/tx-sender/src/core/config/*.rs | Adds configuration parameters for garbage collection intervals and expiry times |
| Test files | Updates test helpers to accept status parameter and verify completed status instead of deletion |
| .sqlx/*.json | Updated query metadata files reflecting the new schema |
Files not reviewed (57)
- kms-connector/.sqlx/query-05206597d00f58583f577f3d54ddd767c5696857085178cca25df687aa0cd39c.json: Language not supported
- kms-connector/.sqlx/query-083e8a8de715c2a8dc806edb5aad8c9f92d1087059e7531e6232fde40e633f4e.json: Language not supported
- kms-connector/.sqlx/query-0c7e1f45677401d5bf8d0dce8849637a4b533c4371b1620f9259529a1bfebc27.json: Language not supported
- kms-connector/.sqlx/query-120c7bc6586f02642e5eaaa38471e8f20326fa5200cda61851355ea0f84b699e.json: Language not supported
- kms-connector/.sqlx/query-14c8041a04281d0731559221e38686feb8a4acea36e1540bda75267c6412f4b0.json: Language not supported
- kms-connector/.sqlx/query-179550430d054e86be5aa2295d4eb2aa7db2ed9a940a8ac1047c29a17ef661b5.json: Language not supported
- kms-connector/.sqlx/query-1e4e2b8d09524458504e3969d8704fd31be45dcece212c7c7b8c5d9023a1be11.json: Language not supported
- kms-connector/.sqlx/query-206a41e2c3ef34b4820cb4e7d271be51316580b3b2f2de67e15d2e6bcab12946.json: Language not supported
- kms-connector/.sqlx/query-206d6c0bc40462ad05e063114c98d9081f918b663269baa8b9ed233c918b6d31.json: Language not supported
- kms-connector/.sqlx/query-2350ac60a071822d470ac847cd54924059c23e74005f04d249cbc4cef0b74ac6.json: Language not supported
- kms-connector/.sqlx/query-2c1c41025c09656ec4aa9a6e8ee58b52cadbc7039cf07d158f70dfc46b5e06eb.json: Language not supported
- kms-connector/.sqlx/query-313949db8a6fe92e8b835fd03c177bd7da78bc55ac924f330207ba6d303be033.json: Language not supported
- kms-connector/.sqlx/query-3d3c5a0bf214ee773ecde5d94480575d3d46afe6549da42deee242388984dfb9.json: Language not supported
- kms-connector/.sqlx/query-3daeba4fbf77d703a538ae81bc4801f30455419dda08b31885f8d2d1ff924038.json: Language not supported
- kms-connector/.sqlx/query-44bb0537af9736b1121d933649a456ef6f0921b6ea86d05d76fd48d4bffe7db6.json: Language not supported
- kms-connector/.sqlx/query-4d063e25b766d3c96b516388f0682b12fd16ab2b6f0601d702c099c9725087b2.json: Language not supported
- kms-connector/.sqlx/query-4d8fef33610f8baf96a2da80dec76e23423aa66331b1315c5c74a06d8a4af120.json: Language not supported
- kms-connector/.sqlx/query-536fab7b8d99df255619355b084cdb4de9279b5ec84ad4d79772d326b1099fc6.json: Language not supported
- kms-connector/.sqlx/query-55e39fce20ced8f64418404aedd2d8c2d764df3c5fe9680415e9c8ced391c87d.json: Language not supported
- kms-connector/.sqlx/query-59428c72c6afbb794abc7a14db053e575e50199aadb1fc8bfec5b68a5bcb4394.json: Language not supported
- kms-connector/.sqlx/query-5d0b62324d38394bc77e5a12024b0875102ee676ca73bfcacaf0945ed3d10d51.json: Language not supported
- kms-connector/.sqlx/query-60f58397e8588260d3c79b89bfdef430f7bcdfba000a3166c7d68fd54908f6c9.json: Language not supported
- kms-connector/.sqlx/query-69196b39e59fe6f9e449f7cd1f30fe6117c9670d56a08f64e49e3d9a7b109fb9.json: Language not supported
- kms-connector/.sqlx/query-70c9f021267ee3989cfc313cd95f1aa1af7163fda39518bf6c9f9dc4a10911f1.json: Language not supported
- kms-connector/.sqlx/query-72b5875fec8f017644c045dc7897fdc6798aabf9da3016fad71d0f3227d26e8e.json: Language not supported
- kms-connector/.sqlx/query-775f56607468e59d6f88ed64b2bd789c18b3d7ba68140897d3a6f7729d56e61c.json: Language not supported
- kms-connector/.sqlx/query-7bc37c994ea75c732017ff6c4eb36a56d23191c2a428af3e8847fde85a701b36.json: Language not supported
- kms-connector/.sqlx/query-7f74ef6dec3a6111b0f0e31c74caf0d93877520081dd428c8bebc167a4e87104.json: Language not supported
- kms-connector/.sqlx/query-7f96aa6c8b04c991cbffa6548229e57039a9137d1a6b96a7b677b41382b6f1a7.json: Language not supported
- kms-connector/.sqlx/query-820e14ed1c8a9ec5a5462025a871aedc0b319f207cb55b80886f29a3f4e2e63b.json: Language not supported
- kms-connector/.sqlx/query-828ed7aa3de4737fcc54dbb2ffdf07098ab6ef7810a1f27ac7d7e0b5a083db69.json: Language not supported
- kms-connector/.sqlx/query-856389ba8fa80561c2a90c6fb02d77615ee97e14beea56eb9827e24a105cd304.json: Language not supported
- kms-connector/.sqlx/query-88d0f1470e61797d49940ea36c9155e37cc3f0c622de57701c237ccdf359e7d0.json: Language not supported
- kms-connector/.sqlx/query-93db5ac72e3acc12af313048a9d900371e903a45c1c172af2633129033aae5e1.json: Language not supported
- kms-connector/.sqlx/query-9424450a5fe75af91766f2dfe3463e81e8872ed262b34a6d6865af21f24ba3da.json: Language not supported
- kms-connector/.sqlx/query-9b355f759f69f02cca8a740911cd9a1b55d495c5c7644364bfaca734cb835fc9.json: Language not supported
- kms-connector/.sqlx/query-9b9d3bbd21139dbb78601c4374e14bc118c3a39a06d682f82db7310e8d8c8cf7.json: Language not supported
- kms-connector/.sqlx/query-9e6abfc0df77b3587972fc590ee888338d1e9a3d4f2bdb506d8588319adad879.json: Language not supported
- kms-connector/.sqlx/query-a44b1d4e88190c3319559b50ae6cd8ade6b26b04b71e9a90b4a64bbf48a9bff8.json: Language not supported
- kms-connector/.sqlx/query-ac8229a3ceb462177827da9963f51aaced2c230e0ae6603866711ca949d4a99f.json: Language not supported
- kms-connector/.sqlx/query-ac82565c4c15f51c6df30bd691519619bb6ff80098fe4e077ed2b43cf6ac525a.json: Language not supported
- kms-connector/.sqlx/query-af73e253d2f8f12077e59607c44ff182930372d1452c32330464c911c970754d.json: Language not supported
- kms-connector/.sqlx/query-b3ee6c99edad54e3ef0f8a19bdc036142463f678111f32cf1ae8607e4a281634.json: Language not supported
- kms-connector/.sqlx/query-bac064549047cb319d62b7d3a639cfc3933fc455e8e97407aacba220f251a61f.json: Language not supported
- kms-connector/.sqlx/query-be7952ac7f042f7b4756783cb2c40cdc4b92eb360bc5ae95bb3ce3805a3c0a00.json: Language not supported
- kms-connector/.sqlx/query-c4846436287bb74b8286442658da646740df133377bd8ffdb4e3451283ced0b7.json: Language not supported
- kms-connector/.sqlx/query-cf58df3893aa66257170af9712e22096ef64d0525678bc97a883209d6ec39445.json: Language not supported
- kms-connector/.sqlx/query-cf60be82db94427c2ec3236b82ad353f0b8b222e7a3453b43b2f7542f9c8ea62.json: Language not supported
- kms-connector/.sqlx/query-d0b5c0db41828ffa93492074e25d8c3401f1858435db7f7dc2bc59a257616c1c.json: Language not supported
- kms-connector/.sqlx/query-d79b1f1cd29fc807bf29d3c8ac39983aa705d35ce944449e8708aa28446625d7.json: Language not supported
- kms-connector/.sqlx/query-dba00e47fff69c431958cbce600923bd055efcc1d89134e1dbcb5699c770028e.json: Language not supported
- kms-connector/.sqlx/query-e0d8f33e5502609d10d0a141bb2bd20421f84523b5a88dcd3f44305effaa5c6e.json: Language not supported
- kms-connector/.sqlx/query-e1dcc890ca2f261e88d804e3ba9982a45915fd578fda8e74986aaf7cbd560310.json: Language not supported
- kms-connector/.sqlx/query-e77e47909be8011605f261071241f6d480bc45712ba34889f50a8c271e67035b.json: Language not supported
- kms-connector/.sqlx/query-eb4c0459deb269abd66cd57bba8174a4548b7cba727730db8938be880f9bb748.json: Language not supported
- kms-connector/.sqlx/query-f12b35632ebb9921e6817a061661f854aec53dd785f257d2b016d3b2d2a1707e.json: Language not supported
- kms-connector/.sqlx/query-fb7fe0e88036441a7ccb5ae083545ef7b2178e39c6e32ead5d9a3848ef72a616.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🧪 CI InsightsHere's what we observed from your CI run for 4c76f87. 🟢 All jobs passed!But CI Insights is watching 👀 |
|
I think what could be nice is to have a high-level description of the change as part of the PR (and commit message) as that would make it easier for the reviewer and also to track what changes have been made if going through commits later. Copilot has taken a stab at it, but I am sure you as an author would have much better context :) |
dartdart26
left a comment
There was a problem hiding this comment.
Nice work! I am almost ready to approve, just wanted to discuss some of the comments I put first.
You're right @dartdart26, done! 🙂 |
c974901 to
553dbfc
Compare
|
@Mergifyio queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at fa8b337 |
Merge Queue Status✅ The pull request has been merged at 4c76f87 This pull request spent 25 minutes 49 seconds in the queue, including 23 minutes 37 seconds running CI. Required conditions to merge
|
Closes https://github.com/zama-ai/fhevm-internal/issues/749
DB changes
This PR adds the
status(custom enum) andupdated_at(timestamp) fields while removing theunder_processfield of all the requests/responses tables.The
statusfield can be:pending: waiting to be processedunder_process: currently processed, so the request/response is "locked"completed(will be deleted later by the garbage collector)failed(will be deleted later by the garbage collector)Behavior changes
Request/Response deletion
Before this PR, the kms-connector tried to immediatly delete request/response once it has been processed.
Now it will instead mark it as
completed(orfailed), and a garbage collector will handle them later on.The garbage collector will:
As we want to run a single instance of this garbage collector, it is running alongside the
TransactionSender, as it is the only micro-service of the kms-connector that should also be a single instance.Decryption Request/Response unlocking
Before this PR, if a
KmsWorkerpicks and locks a decryption request from the DB to process it, if it crashes or is restarted for some reason, then this request will be locked forever and never processed.Now the garbage collector will unlock the request if it is in a
under_processstatus for too long.Same behavior for the
TransactionSenderand decryption responses.Note that we only unlock decryption requests/responses and not the other, because as explained above, they are related to keygen/init of the KMS, so are pretty rare, and under the protocol's control. Thus it is easier to manually investigate what is happening in such case, and manually unlock or re-trigger requests if needed. That way, it avoid running multiple SQL queries that will be useless 99.99% of the time.
Code changes
I removed the
KmsResponseRemoverabstraction. All of these abstractions were mostly created in the 1st place because a migration from Postgres DB to Kafka was something to consider.But IMO, it seems less probable these days, and it was easier/cleaner to achieve the update of the
KmsResponsestatususing method than a dedicated struct to pass everywhere.