-
Notifications
You must be signed in to change notification settings - Fork 59
Don't rollback update history after migration #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ee99cce to
93d50d9
Compare
e182f40 to
c9859f8
Compare
| .sequence( | ||
| Seq( | ||
| sql""" | ||
| select count(*) from update_history_creates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the queries are identical to the delete variant so essentially the same query plan as well
Finalize Aggregate (cost=959598.74..959598.75 rows=1 width=8)
-> Gather (cost=959598.52..959598.73 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=958598.52..958598.53 rows=1 width=8)
-> Parallel Index Only Scan using updt_hist_crea_hi_mi_rt on update_history_creates (cost=0.57..953136.30 rows=2184889 width=0)
Index Cond: ((history_id = 2) AND (migration_id = 6) AND (record_time > '1760054400000000'::bigint))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT EXISTS(SELECT 1 FROM ... WHERE ...)is probably even faster, but I don't think it's worth optimizing for maximum speed here since AFAICT we're only running these queries on actual migration/DR procedures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah also the counts are useful in logs imho. So I'd lean towards keeping it as it is.
35638c4 to
ee20c80
Compare
[ci] fixes https://github.com/DACH-NY/canton-network-internal/issues/1490 This essentially reverts DACH-NY/canton-network-node#11483 which we can do because we switched to pausing through mediator timeout = 0. Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
ee20c80 to
2ea4976
Compare
| _ <- create(domain1, cid2, offset2, party1, store1, t1) | ||
| updates1 <- updates(store1) | ||
| ex <- recoverToExceptionIf[IllegalStateException](initStore(store2TimeTooEarly)) | ||
| _ = ex.getMessage should include("Found List(1, 0, 1, 0, 0) rows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log is a bit confusing but it's the same log format we have for the delete and I can't be bothered to fix that righ tnow.
rautenrieth-da
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| .sequence( | ||
| Seq( | ||
| sql""" | ||
| select count(*) from update_history_creates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT EXISTS(SELECT 1 FROM ... WHERE ...)is probably even faster, but I don't think it's worth optimizing for maximum speed here since AFAICT we're only running these queries on actual migration/DR procedures.
| migrationId: Long, | ||
| recordTime: CantonTimestamp, | ||
| )(implicit tc: TraceContext): Future[Unit] = { | ||
| val action = sql""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm somehow I hit this in tests
2025-10-20T19:38:41.004+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): received request.
2025-10-20T19:38:41.004+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): omitting logging of request entity data.
2025-10-20T19:38:41.023+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): Responding with status code: 200 OK
2025-10-20T19:38:41.023+0200 [⋮] DEBUG - o.l.s.a.a.HttpRequestLogger:SvOffboardingIntegrationTest/config=426b87db/scan=sv2Scan (48a11ed8da4f8f67570214ff1f7918c4---39233d6980013913) - HTTP POST /api/scan/v0/state/acs/force from (127.0.0.1:41912): Responding with entity data: {"record_time":"2025-10-20T17:38:33.082427Z","migration_id":0}
2025-10-20T19:40:39.182+0200 [⋮] INFO - o.l.s.s.a.AcsSnapshotTrigger:DecentralizedSynchronizerMigrationIntegrationTest/config=1e0ce823/scan=sv2Scan (2d7c6cb4b923a9a8d12966e2663ce745-AcsSnapshotTrigger--1a3ab5fb07777d77) - Still not time to take a snapshot. Now: 2025-10-20T17:40:39.177592Z. Next snapshot time: 2025-10-20T18:00:00Z.
2025-10-20T19:46:21.396+0200 [⋮] INFO - c.d.c.r.DbStorageSingle:DecentralizedSynchronizerMigrationIntegrationTest/config=1e0ce823/scan=sv2ScanLocal (a642760f98f4c3fa3c393238ead07771-ledger ingestion loop--70a3dfb0faf77da1) - Detected an error. java.lang.IllegalStateException: Found 1 acs snapshots for DSO-1e0ce823-1e0ce823::1220b88707c5... where migration_id = 0 and record_time > 2025-10-20T17:42:56.116892Z, but the configuration says the domain was paused during the migration. Check the domain migration configuration and the content of the update history database
But I don't understand why, I don't see a snapshot with a higher time 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2025-10-21T08:14:42.806+0200 [⋮] INFO - o.l.s.s.a.AcsSnapshotTrigger:DecentralizedSynchronizerMigrationIntegrationTest/config=6c5a71a3/scan=sv2ScanLocal (7b749ed6846de9cc6cebb9d0b4dc4423-AcsSnapshotTrigger--234cf2d4cae90320) - Still not time to take a snapshot. Now: 2025-10-21T06:14:42.805240Z. Next snapshot time: 2025-10-21T09:00:00Z.
2025-10-21T08:14:43.449+0200 [⋮] INFO - c.d.c.r.DbStorageSingle:DecentralizedSynchronizerMigrationIntegrationTest/config=6c5a71a3/scan=sv2ScanLocal (d23d0a2bdd9e7135da49e3b4a602eb8e-ledger ingestion loop--d2bdb56a49aefe7c) - Detected an error. java.lang.IllegalStateException: Found acs snapshots at Vector(2025-10-21T09:00:00Z) for DSO-6c5a71a3-6c5a71a3::1220264dccd0... where migration_id = 0 and record_time > 2025-10-21T06:11:11.069075Z, but the configuration says the domain was paused during the migration. Check the domain migration configuration and the content of the update history database
This makes no sense to me.
@OriolMunoz-da I think I need a 🦆, am I missing something obvious here? I can't find any log that inserts this snapshot. We also are nowhere close to 09:00:00 (and this is a wallclock test not simtime). There also are no weird force calls (and even if they were they would not be at 09:00:00). Is there some codepath I'm missing that creates this snapshot here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed with @OriolMunoz-da: this is a bug in the acs snapshot logic. He's looking into fixing it
[ci] Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
nicu-da
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks
[ci] --------- Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
328fdd5 to
2557922
Compare
backport from #2674 We technically only need the change to the export format on 0.4 but seems safer to backport the whole PR. [ci] --------- Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> --------- Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> Co-authored-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
* Don't rollback update history after migration backport from #2674 We technically only need the change to the export format on 0.4 but seems safer to backport the whole PR. [ci] --------- Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> --------- Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> Co-authored-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> * Sneak in release notes [ci] Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> --------- Signed-off-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com> Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org> Co-authored-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
fixes https://github.com/DACH-NY/canton-network-internal/issues/1490
This essentially reverts
https://github.com/DACH-NY/canton-network-node/pull/11483 which we can
do because we switched to pausing through mediator timeout = 0.
Pull Request Checklist
Cluster Testing
/cluster_teston this PR to request it, and ping someone with access to the DA-internal system to approve it./hdm_teston this PR to request it, and ping someone with access to the DA-internal system to approve it.PR Guidelines
Fixes #n, and mention issues worked on using#nMerge Guidelines