Skip to content

Conversation

@rautenrieth-da
Copy link
Contributor

[ci]

Signed-off-by: Robert Autenrieth <robert.autenrieth@digitalasset.com>
@rautenrieth-da rautenrieth-da force-pushed the rautenrieth-da/lock-table-for-acs-snapshots branch from 7717d5b to 1c60bd0 Compare September 26, 2025 15:37
@rautenrieth-da rautenrieth-da marked this pull request as ready for review September 26, 2025 16:30
Copy link
Contributor

@meiersi-da meiersi-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me, but I'm not deep enough in the code to judge whether it is really safe.

[ci]

Signed-off-by: Robert Autenrieth <robert.autenrieth@digitalasset.com>
Copy link
Contributor

@ray-roestenburg-da ray-roestenburg-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like locks so you need to convince me there is no other option :-) (looks like I misread 'lock' vs 'advisory lock')

* In case the application crashes while holding the lock, the server should close the connection
* and abort the transaction as soon as it detects a disconnect.
* See [[com.digitalasset.canton.platform.store.backend.postgresql.PostgresDataSourceConfig]] for our connection keepalive settings.
* With default settings, the server should detect a dead connection within ~15sec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it though? We've seen queries stuck for longer than 15s after a pod kill/restart

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmarton-da: I remember that there was some trickery around keep alives that you figured out as part of the HA config for the IndexDB. Something around clients and servers having to set the right params? Do you remember?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this ChatGPT convo has quite a bit of interesting info: https://chatgpt.com/share/68dbb7bd-48c8-8007-85e9-ae43fa3a5b54

Seems like it's worth checking our keep-alive configs for server and client -- independently of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #2488 for double checking disconnect behavior, and leaving the advisory lock. We'd rather have stray locks than a corrupted database.

[ci]

Signed-off-by: Robert Autenrieth <robert.autenrieth@digitalasset.com>
Copy link
Contributor

@ray-roestenburg-da ray-roestenburg-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rautenrieth-da rautenrieth-da merged commit a3bd719 into main Oct 1, 2025
57 checks passed
@rautenrieth-da rautenrieth-da deleted the rautenrieth-da/lock-table-for-acs-snapshots branch October 1, 2025 21:11
@rautenrieth-da
Copy link
Contributor Author

Actual behavior on CILR:

2025-10-02 07:11:52.069Z: trigger runs (never finishing)
2025-10-02T07:13:18.317Z: scan shuts down
2025-10-02 07:13:51.000Z: scan restarts
2025-10-02 07:15:14.951Z: trigger runs again on the same snapshot record time
2025-10-02 07:15:14.973Z: trigger fails with java.lang.Exception: Failed to acquire exclusive lock (not retrying the error)
2025-10-02 07:15:46.134Z: trigger runs again on the same snapshot record time
2025-10-02 07:19:12.841Z: trigger succeeds

I think that's enough to conclude that it takes a few minutes to release the lock, potentially leading to log warnings, but the trigger does not deadlock itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants