Skip to content

Conversation

@moritzkiefer-da
Copy link
Contributor

@moritzkiefer-da moritzkiefer-da commented Oct 13, 2025

fixes https://github.com/DACH-NY/canton-network-internal/issues/1490

This essentially reverts
https://github.com/DACH-NY/canton-network-node/pull/11483 which we can
do because we switched to pausing through mediator timeout = 0.

Pull Request Checklist

Cluster Testing

  • If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

  • Include any change that might be observable by our partners or affect their deployment in the release notes.
  • Specify fixed issues with Fixes #n, and mention issues worked on using #n
  • Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

  • Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

@moritzkiefer-da moritzkiefer-da force-pushed the cocreature/no-rollback branch 2 times, most recently from ee99cce to 93d50d9 Compare October 13, 2025 15:55
@moritzkiefer-da moritzkiefer-da changed the title Cocreature/no rollback Don't rollback update history after migration Oct 13, 2025
@moritzkiefer-da moritzkiefer-da force-pushed the cocreature/no-rollback branch 2 times, most recently from e182f40 to c9859f8 Compare October 20, 2025 15:50
.sequence(
Seq(
sql"""
select count(*) from update_history_creates
Copy link
Contributor Author

@moritzkiefer-da moritzkiefer-da Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the queries are identical to the delete variant so essentially the same query plan as well

 Finalize Aggregate  (cost=959598.74..959598.75 rows=1 width=8)
   ->  Gather  (cost=959598.52..959598.73 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=958598.52..958598.53 rows=1 width=8)
               ->  Parallel Index Only Scan using updt_hist_crea_hi_mi_rt on update_history_creates  (cost=0.57..953136.30 rows=2184889 width=0)
                     Index Cond: ((history_id = 2) AND (migration_id = 6) AND (record_time > '1760054400000000'::bigint))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT EXISTS(SELECT 1 FROM ... WHERE ...)

is probably even faster, but I don't think it's worth optimizing for maximum speed here since AFAICT we're only running these queries on actual migration/DR procedures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah also the counts are useful in logs imho. So I'd lean towards keeping it as it is.

@moritzkiefer-da moritzkiefer-da force-pushed the cocreature/no-rollback branch 2 times, most recently from 35638c4 to ee20c80 Compare October 20, 2025 16:08
[ci]

fixes https://github.com/DACH-NY/canton-network-internal/issues/1490

This essentially reverts
DACH-NY/canton-network-node#11483 which we can
do because we switched to pausing through mediator timeout = 0.

Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
_ <- create(domain1, cid2, offset2, party1, store1, t1)
updates1 <- updates(store1)
ex <- recoverToExceptionIf[IllegalStateException](initStore(store2TimeTooEarly))
_ = ex.getMessage should include("Found List(1, 0, 1, 0, 0) rows")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log is a bit confusing but it's the same log format we have for the delete and I can't be bothered to fix that righ tnow.

@moritzkiefer-da moritzkiefer-da marked this pull request as ready for review October 20, 2025 17:28
Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

.sequence(
Seq(
sql"""
select count(*) from update_history_creates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT EXISTS(SELECT 1 FROM ... WHERE ...)

is probably even faster, but I don't think it's worth optimizing for maximum speed here since AFAICT we're only running these queries on actual migration/DR procedures.

migrationId: Long,
recordTime: CantonTimestamp,
)(implicit tc: TraceContext): Future[Unit] = {
val action = sql"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm somehow I hit this in tests

2025-10-20T19:38:41.004+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): received request. 
2025-10-20T19:38:41.004+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): omitting logging of request entity data. 
2025-10-20T19:38:41.023+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): Responding with status code: 200 OK 
2025-10-20T19:38:41.023+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): Responding with entity data: {"record_time":"2025-10-20T17:38:33.082427Z","migration_id":0} 
2025-10-20T19:40:39.182+0200 [⋮] INFO - o.l.s.s.a.AcsSnapshotTrigger:DecentralizedSynchronizerMigrationIntegrationTest/config=1e0ce823/scan=sv2Scan (2d7c6cb4b923a9a8d12966e2663ce745-AcsSnapshotTrigger--1a3ab5fb07777d77) - Still not time to take a snapshot. Now: 2025-10-20T17:40:39.177592Z. Next snapshot time: 2025-10-20T18:00:00Z. 
2025-10-20T19:46:21.396+0200 [⋮] INFO - c.d.c.r.DbStorageSingle:DecentralizedSynchronizerMigrationIntegrationTest/config=1e0ce823/scan=sv2ScanLocal (a642760f98f4c3fa3c393238ead07771-ledger ingestion loop--70a3dfb0faf77da1) - Detected an error. java.lang.IllegalStateException: Found 1 acs snapshots for DSO-1e0ce823-1e0ce823::1220b88707c5... where migration_id = 0 and record_time > 2025-10-20T17:42:56.116892Z, but the configuration says the domain was paused during the migration. Check the domain migration configuration and the content of the update history database

But I don't understand why, I don't see a snapshot with a higher time 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-10-21T08:14:42.806+0200 [⋮] INFO - o.l.s.s.a.AcsSnapshotTrigger:DecentralizedSynchronizerMigrationIntegrationTest/config=6c5a71a3/scan=sv2ScanLocal (7b749ed6846de9cc6cebb9d0b4dc4423-AcsSnapshotTrigger--234cf2d4cae90320) - Still not time to take a snapshot. Now: 2025-10-21T06:14:42.805240Z. Next snapshot time: 2025-10-21T09:00:00Z. 
2025-10-21T08:14:43.449+0200 [⋮] INFO - c.d.c.r.DbStorageSingle:DecentralizedSynchronizerMigrationIntegrationTest/config=6c5a71a3/scan=sv2ScanLocal (d23d0a2bdd9e7135da49e3b4a602eb8e-ledger ingestion loop--d2bdb56a49aefe7c) - Detected an error. java.lang.IllegalStateException: Found acs snapshots at Vector(2025-10-21T09:00:00Z) for DSO-6c5a71a3-6c5a71a3::1220264dccd0... where migration_id = 0 and record_time > 2025-10-21T06:11:11.069075Z, but the configuration says the domain was paused during the migration. Check the domain migration configuration and the content of the update history database

This makes no sense to me.

@OriolMunoz-da I think I need a 🦆, am I missing something obvious here? I can't find any log that inserts this snapshot. We also are nowhere close to 09:00:00 (and this is a wallclock test not simtime). There also are no weird force calls (and even if they were they would not be at 09:00:00). Is there some codepath I'm missing that creates this snapshot here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @OriolMunoz-da: this is a bug in the acs snapshot logic. He's looking into fixing it

[ci]

Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Copy link
Contributor

@nicu-da nicu-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks

[ci]

---------

Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
@moritzkiefer-da moritzkiefer-da enabled auto-merge (squash) October 21, 2025 11:59
@moritzkiefer-da moritzkiefer-da merged commit ee48e61 into main Oct 21, 2025
59 checks passed
@moritzkiefer-da moritzkiefer-da deleted the cocreature/no-rollback branch October 21, 2025 12:19
moritzkiefer-da added a commit that referenced this pull request Oct 21, 2025
backport from #2674

We technically only need the change to the export format on 0.4 but
seems safer to backport the whole PR.

[ci]

---------

Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>

---------

Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Co-authored-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
moritzkiefer-da added a commit that referenced this pull request Oct 21, 2025
* Don't rollback update history after migration

backport from #2674

We technically only need the change to the export format on 0.4 but
seems safer to backport the whole PR.

[ci]

---------

Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>

---------

Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Co-authored-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>

* Sneak in release notes

[ci]

Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>

---------

Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
Co-authored-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants