Skip to content

Comments

handle deadlocks/blocked tasks in parallel reconstruction#7800

Open
agnxsh wants to merge 15 commits intounstablefrom
rp2
Open

handle deadlocks/blocked tasks in parallel reconstruction#7800
agnxsh wants to merge 15 commits intounstablefrom
rp2

Conversation

@agnxsh
Copy link
Contributor

@agnxsh agnxsh commented Dec 15, 2025

No description provided.

@github-actions
Copy link

github-actions bot commented Dec 15, 2025

Unit Test Results

       12 files  ±0    2 444 suites  ±0   47m 36s ⏱️ + 2m 28s
12 893 tests ±0  12 346 ✔️ ±0  547 💤 ±0  0 ±0 
65 264 runs  ±0  64 554 ✔️ ±0  710 💤 ±0  0 ±0 

Results for commit a878974. ± Comparison against base commit 42eeb42.

♻️ This comment has been updated with latest results.

if true:
return
# # Currently, this logic is broken
# if true:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just remove this if entirely, rather than commenting it out.

localIndices[j] = idxArr[j]
localCells[j] = cellsArr[j]
# use the task wrapper which maps string errors to void
recoverCellsAndKzgProofsTask(localIndices, localCells)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for this seq as far as I can tell - just change the argument type of ...Task to openArray, then use idxArr.toOpenArray(0, columnCount - 1)

@agnxsh agnxsh marked this pull request as ready for review December 26, 2025 08:48
break # Stop spawning new tasks
trace "PeerDAS reconstruction timed out while preparing columns",
spawned = spawned, total = blobCount
return err("Data column reconstruction timed out")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the tasks that are still pending here? presumably, they still reference the memory allocated by pendingIndices and pendingCells?

Copy link
Contributor Author

@agnxsh agnxsh Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, we need to drain all the pending tasks, one more thing we can probably do instead of an early return is to use a timeout flag, and check if we have timed out before spawning newer tasks, that way the number of pending tasks may get lower, however, timeout won't be strict in that sense, as we would wait to drain the pending tasks, but should be memory safe, most likely.

(i have made some more changes)

@tersec
Copy link
Contributor

tersec commented Jan 1, 2026

Run excluded_files="config.yaml|config.nims|beacon_chain.nimble"
The following files do not have an up-to-date copyright year:
- beacon_chain/nimbus_beacon_node.nim
- beacon_chain/spec/peerdas_helpers.nim

desc: "Subscribe to the first half of column subnets"
name: "light-supernode" .}: bool

debugDisableReconstruction* {.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid the double negative here .. ie --debug-enable-reconstruction instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants