-
Notifications
You must be signed in to change notification settings - Fork 1k
Skip building on blocks on relay parents in old session #9990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Skip building on blocks on relay parents in old session #9990
Conversation
be812cb
to
340fdfb
Compare
cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do better than this and have some separation of concerns and more robust check. We should actually query the collator protocol and ask it if we can use a specific relay parent.
Please look at distribute_collation
as there are multiple checks there on the relay parent which we can do before we decide to create a collation on it.
Also, we should keep in mind that advertisement can actually happen later, and by that time the relay parent might not be valid anymore, if a new block was created in a new session. This means that what we do here will not always prevent the situation from happening.
where | ||
Client: RelayChainInterface, | ||
{ | ||
let Ok(relay_best_hash) = relay_client.best_block_hash().await else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already relay_best_hash
this in the caller scope, you can just pass it to the fn
@sandreim can you expand a bit on your last message / explain more what you had in mind. I'll call checking if relay parent is in old session for
From where should we query the collator protocol? Later you say:
Thus you make it sound like performing
Hmm did you mean I should perform By "query the collator protocol", do you mean adding a method to the trait collator.collator_service().some_function() I feel properly confused now 😅 |
I thought we wanted to fix this on the relay chain side? #9766 I have a check here which already checks whether there will be an epoch change in the relay parent ancestry. If yes, I am including the next epochs authorities for verification. Currently these blocks are getting dropped anyway, so the implementation is already forward looking, because I assumed at some point we will not drop anymore 😬. |
We want to avoid even building the parablock in the first place, to not waste resources (degrading block confidence), so it must happen on parachain side then, right? |
Confidence is dropping because we drop candidates at session boundaries. If we wouldn't do that, confidence would not reduce and parachains could keep producing blocks as they do now right? #9766 But yeah I assume it takes too long or is not scheduled? |
#9766 is a different issue. If you read the ticket it is about candidates that have already been backed on chain and are pending availability. To solve that one we need to fix availability. The issue in #9977 is that these candidates are not even advertised by collator protocol because the relay parent is out of scope already. To properly fix it we need to allow candidates with relay parents from the previous session. The fix should require changes in collator protocol, backing and prospective-parachains. We will need to do this for supporting low latency parachains. IIRC we discussed with @eskimor about decoupling the relay parent we use for execution context from the one we use for scheduling information. The fix in this PR should be very easy but will not be perfect. The candidate could be fetched because a new session was not observed yet, but then dropped from prospective parachains as soon as the RC advances in new session. What we can do for now is not build a collation on an older RP if we've already seen the RC best block in new session.
I am not familiar with this code, does it solve what I said above ? Also I don't think this should be solved in the cumulus code, because of separation of concerns. That's why I am proposing to query the collator protocol subsystem to do a sanity check on the relay parent before proceeding with block production. When we will allow RPs from prev session, you won't need to change anything in cumulus. |
Yes, before, I propose you send a message to |
The check I mentioned above checks if any of the RP descendants we use to enforce the offset contain a session change digest. If they do, we include additional relay chain authorities in the inherent storage proof. I did this because I was assuming that we will allow relay parents from old sessions at some point. So it does currently not fix the issue you want to fix. But the check can be used for this, because the condition is the same. The concerns of finding the correct relay parent are currently not separated anyway. We have the parent_search which tries to find a suitable parent block which lives in the same session as the tip of the chain. While thinking about this issue here I realized that we pass the So I think what we should do is:
|
Fixes: #9977
On our Kusama Canary chain YAP-3392 has the log entry:
show up 400+ times (2025-10-03 -- 2025-10-10).
Luckily we can detect this - that the session of a relay parent is old session - can easily be detected. And thus we can avoid building the block in the first place.
This will (slightly) increase block confidence (more so on our Kusama Canary where sessions last 1h instead of Polkadots 4h).
N.B. We have similar logic like this in
fn build_relay_parent_ancestry
incumulus/client/consensus/common/src/parent_search.rs
: