Skip to content

Conversation

@rach-id
Copy link
Member

@rach-id rach-id commented Oct 9, 2025

Closes #2222

@rach-id rach-id requested a review from cmwaters October 9, 2025 11:57
@rach-id rach-id self-assigned this Oct 9, 2025
@rach-id rach-id requested a review from evan-forbes as a code owner October 9, 2025 11:57
@rach-id rach-id added the C:consensus related to the core consensus label Oct 9, 2025
@rach-id rach-id marked this pull request as draft October 9, 2025 12:06
@rach-id rach-id changed the title chore: exit on consensus failures chore: panic on consensus failures Oct 9, 2025
})
}

// TestSwitchToConsensusVoteExtensions tests that the SwitchToConsensus correctly
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this test because it's only for vote extensions and we're not using them and they provoke panics and expect them to be recovered, but now we re-panic to stop the node.

@rach-id rach-id marked this pull request as ready for review October 9, 2025 14:19
// some console or secure RPC system, but for now, halting the chain upon
// unexpected consensus bugs sounds like the better option.
onExit(cs)
panic(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol panicking in a recover function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to re-propagate after we execute onExit() 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are just going to panic, then does it even make sense to recover? Will this cleanly shut down everything. Isn't it possible with the multiplexer that there would be some dangling threads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll check it more closely

@evan-forbes
Copy link
Member

I'm fine with merging this, but I also don't see this as a huge priority. I know the consensus reactor can't recover here, but do we know node operators actually want this?

@evan-forbes
Copy link
Member

also, is this a break in behavior?

@rach-id
Copy link
Member Author

rach-id commented Oct 13, 2025

It's changing the behavior, so it can be considered breaking. However, given @cmwaters has the most context, I am waiting for his review before merging

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about simply panicking. If we are to use a panic, I just want to make sure it cleanly shuts down the node completely and not just the process that called the panic.

Yes this is a breaking change in behaviour but I think it's an important improvement in detecting problems in critical components. As it is breaking, we should probably target v7 with this (which is what I believe main targets)

// some console or secure RPC system, but for now, halting the chain upon
// unexpected consensus bugs sounds like the better option.
onExit(cs)
panic(r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are just going to panic, then does it even make sense to recover? Will this cleanly shut down everything. Isn't it possible with the multiplexer that there would be some dangling threads

@rach-id
Copy link
Member Author

rach-id commented Oct 13, 2025

which is what I believe main targets

current main targets v6. v0.39.x-celestia can be deleted. As long as we don't have any v7 changes

@rach-id rach-id marked this pull request as draft October 13, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C:consensus related to the core consensus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shut down the entire node when their is a consensus panic

5 participants