Skip to content

Commit 193a1fb

Browse files
authored
AckSn loop after de/commit is finalized on leader node (#2510)
## Fix: Prevent Infinite AckSn Loop After De/commit ### Problem After an incremental decommit, snapshot confirmation could get stuck in an infinite loop where `AckSn` messages were requeued repeatedly, preventing the snapshot from ever reaching `SnapshotConfirmed` state and blocking all subsequent snapshots. **Root Cause:** Race condition on the leader node when `DecommitFinalized` arrives before `AckSn` messages: 1. Leader requests snapshot N with decommit → state becomes `RequestedSnapshot{lastSeen=N-1, requested=N}` 2. `DecrementTx` is observed on-chain before `AckSn` messages arrive 3. `DecommitFinalized` aggregate incorrectly used `seenSnapshotNumber()` which returned `lastSeen` (N-1) instead of `requested` (N) 4. When `AckSn(N)` messages arrived, they failed the guard check `N <= N-1` and were requeued infinitely Same scenarion goes also for incremental commits. ### Solution Modified the `De/commitFinalized` aggregate in `HeadLogic.hs` to use explicit pattern matching instead of `seenSnapshotNumber()`: - `NoSeenSnapshot` → `LastSeenSnapshot{lastSeen = 0}` - `LastSeenSnapshot{lastSeen}` → `LastSeenSnapshot{lastSeen}` (unchanged) - `RequestedSnapshot{requested}` → `LastSeenSnapshot{lastSeen = requested}` ✅ **The fix** - `SeenSnapshot{snapshot}` → `LastSeenSnapshot{lastSeen = number}` The key insight: when the leader is in `RequestedSnapshot` state and `DecommitFinalized` arrives, the decommit belongs to the **requested** snapshot (N), not the previously confirmed snapshot (N-1). --- <!-- Consider each and tick it off one way or the other --> * [x] CHANGELOG updated or not needed * [x] Documentation updated or not needed * [x] Haddocks updated or not needed * [x] No new TODOs introduced or explained herafter
2 parents cc5e1c0 + 8f97f57 commit 193a1fb

File tree

3 files changed

+154
-9
lines changed

3 files changed

+154
-9
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ changes.
4545
- `POST /snapshot` now returns the specific side-load validation failure instead of timing out [#2462](https://github.com/cardano-scaling/hydra/issues/2462).
4646
- Fixed the internal wallet fee estimation, which was more often than not using maximum plutus execution units. This reduces costs for initializing, open, etc. of a head by a factor of ~4x [#2473](https://github.com/cardano-scaling/hydra/pull/2473).
4747
- Fixed another race-condition around incremental commits/decommits [#2500](https://github.com/cardano-scaling/hydra/issues/2500)
48+
- Fixed infinite AckSn requeue loop after decommit when DecommitFinalized arrives before snapshot confirmation on the leader node [#2510](https://github.com/cardano-scaling/hydra/pull/2510)
4849
- **BREAKING** Improved reporting of chain synchronization status by exposing the node's chain time and drift.
4950
- `NodeSynced` and `NodeUnsynced` state-changed events, and their corresponding server outputs, now include the observed chain time and drift.
5051
- `NodeState` now tracks the latest observed chan slot in addition to the chain time (`UTCTime`) and its drift measured in seconds.

hydra-node/src/Hydra/HeadLogic.hs

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1787,10 +1787,7 @@ aggregateNodeState nodeState sc =
17871787
-- depositTxId, but we should not verify this here.
17881788
currentDepositTxId = Nothing
17891789
, localUTxO = localUTxO <> newUTxO
1790-
, seenSnapshot =
1791-
LastSeenSnapshot
1792-
{ lastSeen = seenSnapshotNumber (seenSnapshot coordinatedHeadState)
1793-
}
1790+
, seenSnapshot = toLastSeenSnapshot (seenSnapshot coordinatedHeadState)
17941791
}
17951792
}
17961793
, pendingDeposits = Map.delete depositTxId currentPendingDeposits
@@ -1815,6 +1812,25 @@ aggregateNodeState nodeState sc =
18151812

18161813
-- * HeadState aggregate
18171814

1815+
-- | Convert any SeenSnapshot to LastSeenSnapshot, preserving the correct snapshot number.
1816+
-- Used by CommitFinalized and DecommitFinalized to handle race conditions where
1817+
-- on-chain transaction confirmation happens before AckSn messages arrive.
1818+
toLastSeenSnapshot :: SeenSnapshot tx -> SeenSnapshot tx
1819+
toLastSeenSnapshot = \case
1820+
NoSeenSnapshot ->
1821+
LastSeenSnapshot{lastSeen = 0}
1822+
LastSeenSnapshot{lastSeen} ->
1823+
LastSeenSnapshot{lastSeen}
1824+
-- NB: Use 'requested' not 'lastSeen' to prevent infinite AckSn loop.
1825+
-- When leader requests snapshot N with commit/decommit and the on-chain transaction
1826+
-- is observed before AckSn messages arrive, the transaction is part of snapshot N
1827+
-- (requested), not snapshot N-1 (lastSeen). Using lastSeen would cause AckSn(N)
1828+
-- messages to fail the guard check and be requeued infinitely.
1829+
RequestedSnapshot{requested} ->
1830+
LastSeenSnapshot{lastSeen = requested}
1831+
SeenSnapshot{snapshot = Snapshot{number}} ->
1832+
LastSeenSnapshot{lastSeen = number}
1833+
18181834
-- | Reflect 'StateChanged' events onto the 'HeadState' aggregate.
18191835
aggregate :: IsChainState tx => HeadState tx -> StateChanged tx -> HeadState tx
18201836
aggregate st = \case
@@ -2034,10 +2050,7 @@ aggregate st = \case
20342050
coordinatedHeadState
20352051
{ decommitTx = Nothing
20362052
, version = newVersion
2037-
, seenSnapshot =
2038-
LastSeenSnapshot
2039-
{ lastSeen = seenSnapshotNumber (seenSnapshot coordinatedHeadState)
2040-
}
2053+
, seenSnapshot = toLastSeenSnapshot (seenSnapshot coordinatedHeadState)
20412054
}
20422055
}
20432056
_otherState -> st

hydra-node/test/Hydra/HeadLogicSpec.hs

Lines changed: 132 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,10 +379,13 @@ spec =
379379
let s1 = aggregateState s0 decommitFinalizedOutcome
380380

381381
-- Verify seenSnapshot was reset (not stuck as RequestedSnapshot)
382+
-- After DecommitFinalized, lastSeen should be the snapshot number that
383+
-- included the decommit (snapshot 1), not the previously confirmed snapshot (0).
384+
-- This prevents incoming AckSn messages for snapshot 1 from being requeued infinitely.
382385
case s1 of
383386
NodeInSync{headState = Open OpenState{coordinatedHeadState = chs}} -> do
384387
chs.version `shouldBe` 4
385-
chs.seenSnapshot `shouldBe` LastSeenSnapshot{lastSeen = 0}
388+
chs.seenSnapshot `shouldBe` LastSeenSnapshot{lastSeen = 1}
386389
chs.decommitTx `shouldBe` Nothing
387390
_ -> fail "expected Open state"
388391

@@ -403,13 +406,141 @@ spec =
403406
getState
404407

405408
-- Verify the head is not stuck: seenSnapshot should have advanced
409+
-- (snapshot number will be based on confirmedSnapshot.number, not lastSeen)
406410
case s2 of
407411
NodeInSync{headState = Open OpenState{coordinatedHeadState = chs}} ->
408412
chs.seenSnapshot `shouldSatisfy` \case
409413
RequestedSnapshot{} -> True
410414
_ -> False
411415
_ -> fail "expected Open state"
412416

417+
it "DecommitFinalized with RequestedSnapshot uses requested number not lastSeen" $ do
418+
let localUTxO = utxoRefs [1]
419+
confirmedSn =
420+
ConfirmedSnapshot
421+
{ snapshot = testSnapshot 0 3 [] localUTxO
422+
, signatures = Crypto.aggregate []
423+
}
424+
-- State: leader sent ReqSn(v=3, sn=1) with lastSeen=0
425+
s0 =
426+
inOpenState' threeParties $
427+
coordinatedHeadState
428+
{ localUTxO
429+
, version = 3
430+
, confirmedSnapshot = confirmedSn
431+
, seenSnapshot = RequestedSnapshot{lastSeen = 0, requested = 1}
432+
, decommitTx = Just (SimpleTx 10 mempty (utxoRef 99))
433+
}
434+
435+
-- DecommitFinalized arrives before AckSn messages
436+
let decrementObservation = observeTx $ OnDecrementTx{headId = testHeadId, newVersion = 4, distributedUTxO = mempty}
437+
now <- nowFromSlot s0.chainPointTime.currentSlot
438+
let decommitFinalizedOutcome = update aliceEnv ledger now s0 decrementObservation
439+
let s1 = aggregateState s0 decommitFinalizedOutcome
440+
441+
-- Verify seenSnapshot uses requested (1), not lastSeen (0)
442+
case s1 of
443+
NodeInSync{headState = Open OpenState{coordinatedHeadState = chs}} -> do
444+
chs.version `shouldBe` 4
445+
chs.seenSnapshot `shouldBe` LastSeenSnapshot{lastSeen = 1}
446+
chs.decommitTx `shouldBe` Nothing
447+
_ -> fail "expected Open state"
448+
449+
it "AckSn for decommit snapshot is noop after DecommitFinalized" $ do
450+
let localUTxO = utxoRefs [1]
451+
snapshot1 = testSnapshot 0 4 [] localUTxO & \s -> s{number = 1}
452+
confirmedSn =
453+
ConfirmedSnapshot
454+
{ snapshot = testSnapshot 0 3 [] localUTxO
455+
, signatures = Crypto.aggregate []
456+
}
457+
-- State after DecommitFinalized: lastSeen=1
458+
s0 =
459+
inOpenState' threeParties $
460+
coordinatedHeadState
461+
{ localUTxO
462+
, version = 4
463+
, confirmedSnapshot = confirmedSn
464+
, seenSnapshot = LastSeenSnapshot{lastSeen = 1}
465+
, decommitTx = Nothing
466+
}
467+
468+
-- AckSn for snapshot 1 arrives late
469+
let ackSn = AckSn (sign aliceSk snapshot1) 1
470+
input = receiveMessageFrom alice ackSn
471+
now <- nowFromSlot s0.chainPointTime.currentSlot
472+
473+
-- Should be noop (or error), not Wait
474+
update aliceEnv ledger now s0 input `shouldSatisfy` \case
475+
Continue{} -> True
476+
Error{} -> True
477+
Wait{} -> False -- Must NOT Wait (infinite AckSn requeue)
478+
it "DecommitFinalized with SeenSnapshot state extracts correct snapshot number" $ do
479+
let localUTxO = utxoRefs [1]
480+
decommitTx = SimpleTx 10 mempty (utxoRef 99)
481+
snapshot1 = testSnapshot 0 3 [] localUTxO & \s -> s{number = 1}
482+
confirmedSn =
483+
ConfirmedSnapshot
484+
{ snapshot = testSnapshot 0 3 [] localUTxO
485+
, signatures = Crypto.aggregate []
486+
}
487+
-- State: follower has seen ReqSn and is collecting signatures
488+
s0 =
489+
inOpenState' threeParties $
490+
coordinatedHeadState
491+
{ localUTxO
492+
, version = 3
493+
, confirmedSnapshot = confirmedSn
494+
, seenSnapshot = SeenSnapshot{snapshot = snapshot1, signatories = mempty}
495+
, decommitTx = Just decommitTx
496+
}
497+
498+
-- DecommitFinalized arrives while collecting signatures
499+
let decrementObservation = observeTx $ OnDecrementTx{headId = testHeadId, newVersion = 4, distributedUTxO = mempty}
500+
now <- nowFromSlot s0.chainPointTime.currentSlot
501+
let decommitFinalizedOutcome = update bobEnv ledger now s0 decrementObservation
502+
let s1 = aggregateState s0 decommitFinalizedOutcome
503+
504+
-- Verify DecommitFinalized correctly extracted snapshot number from SeenSnapshot
505+
case s1 of
506+
NodeInSync{headState = Open OpenState{coordinatedHeadState = chs}} -> do
507+
chs.version `shouldBe` 4
508+
chs.decommitTx `shouldBe` Nothing
509+
chs.seenSnapshot `shouldBe` LastSeenSnapshot{lastSeen = 1}
510+
_ -> fail "expected Open state"
511+
512+
it "CommitFinalized with RequestedSnapshot should use requested number not lastSeen" $ do
513+
let localUTxO = utxoRefs [1]
514+
confirmedSn =
515+
ConfirmedSnapshot
516+
{ snapshot = testSnapshot 0 3 [] localUTxO
517+
, signatures = Crypto.aggregate []
518+
}
519+
depositTxId = 42
520+
-- State after leader sent ReqSn(sn=1) with pending deposit: snapshot in flight
521+
s0 =
522+
inOpenState' threeParties $
523+
coordinatedHeadState
524+
{ localUTxO
525+
, version = 3
526+
, confirmedSnapshot = confirmedSn
527+
, seenSnapshot = RequestedSnapshot{lastSeen = 0, requested = 1}
528+
, currentDepositTxId = Just depositTxId
529+
}
530+
531+
-- CommitFinalized arrives before AckSn messages (race condition)
532+
let incrementObservation = observeTx $ OnIncrementTx{headId = testHeadId, newVersion = 4, depositTxId}
533+
now <- nowFromSlot s0.chainPointTime.currentSlot
534+
let commitFinalizedOutcome = update aliceEnv ledger now s0 incrementObservation
535+
let s1 = aggregateState s0 commitFinalizedOutcome
536+
537+
case s1 of
538+
NodeInSync{headState = Open OpenState{coordinatedHeadState = chs}} -> do
539+
chs.version `shouldBe` 4
540+
chs.currentDepositTxId `shouldBe` Nothing
541+
chs.seenSnapshot `shouldBe` LastSeenSnapshot{lastSeen = 1}
542+
_ -> fail "expected Open state"
543+
413544
describe "Tracks Transaction Ids" $ do
414545
it "keeps transactions in allTxs given it receives a ReqTx" $ do
415546
let s0 = inOpenState threeParties

0 commit comments

Comments
 (0)