Skip building on blocks on relay parents in old session #9990

Sajjon · 2025-10-10T08:46:57Z

On our Kusama Canary chain YAP-3392 has the log entry:

Collation wasn't advertised because it was built on a relay chain block that is now part of an old session

show up 400+ times (2025-10-03 -- 2025-10-10).

Luckily we can detect this - that the session of a relay parent is old session - can easily be detected. And thus we can avoid building the block in the first place.

This will (slightly) increase block confidence (more so on our Kusama Canary where sessions last 1h instead of Polkadots 4h).

N.B. We have similar logic like this in fn build_relay_parent_ancestry in cumulus/client/consensus/common/src/parent_search.rs:

let session = relay_client.session_index_for_child(current_rp).await?;
if required_session.get_or_insert(session) != &session {
    // Respect the relay-chain rule not to cross session boundaries.
    break;
}

cumulus/client/consensus/aura/src/collators/mod.rs

…ld session

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

sandreim

I think we can do better than this and have some separation of concerns and more robust check. We should actually query the collator protocol and ask it if we can use a specific relay parent.

Please look at distribute_collation as there are multiple checks there on the relay parent which we can do before we decide to create a collation on it.

Also, we should keep in mind that advertisement can actually happen later, and by that time the relay parent might not be valid anymore, if a new block was created in a new session. This means that what we do here will not always prevent the situation from happening.

sandreim · 2025-10-10T10:18:24Z

cumulus/client/consensus/aura/src/collators/mod.rs

+where
+	Client: RelayChainInterface,
+{
+	let Ok(relay_best_hash) = relay_client.best_block_hash().await else {


you already relay_best_hash this in the caller scope, you can just pass it to the fn

Sajjon · 2025-10-10T12:19:28Z

@sandreim can you expand a bit on your last message / explain more what you had in mind.

I'll call checking if relay parent is in old session for SessionBoundryCheck

We should actually query the collator protocol and ask it if we can use a specific relay parent.

From where should we query the collator protocol? Later you say:

This means that what we do here will not always prevent the situation from happening.

Thus you make it sound like performing SessionBoundryCheck inside run_block_builder is wrong? Or do you mean it is wrong to only performing SessionBoundryCheck there? So perhaps do it in multiple places?

before building (my attempt of doing so is my current impl, inside: run_block_builder)
after building but before advertisment
some more place/time?

Please look at distribute_collation as there are multiple checks there on the relay parent which we can do before we decide to create a collation on it.

Hmm did you mean I should perform SessionBoundryCheck inside distribute_collation? Because that feels much too late? That function is called with a candidate, thus we have already built, but this issue is about not even building the block.

By "query the collator protocol", do you mean adding a method to the trait ServiceInterface? And call it through like we do here in basic aura, like so:

collator.collator_service().some_function()

I feel properly confused now 😅

skunert · 2025-10-10T12:31:46Z

I thought we wanted to fix this on the relay chain side? #9766

I have a check here which already checks whether there will be an epoch change in the relay parent ancestry. If yes, I am including the next epochs authorities for verification. Currently these blocks are getting dropped anyway, so the implementation is already forward looking, because I assumed at some point we will not drop anymore 😬.

Sajjon · 2025-10-10T12:52:52Z

@skunert

I thought we wanted to fix this on the relay chain side?

We want to avoid even building the parablock in the first place, to not waste resources (degrading block confidence), so it must happen on parachain side then, right?

skunert · 2025-10-10T13:03:34Z

@skunert

I thought we wanted to fix this on the relay chain side?

We want to avoid even building the parablock in the first place, to not waste resources (degrading block confidence), so it must happen on parachain side then, right?

Confidence is dropping because we drop candidates at session boundaries. If we wouldn't do that, confidence would not reduce and parachains could keep producing blocks as they do now right? #9766

But yeah I assume it takes too long or is not scheduled?

sandreim · 2025-10-10T14:27:15Z

@skunert

I thought we wanted to fix this on the relay chain side?

We want to avoid even building the parablock in the first place, to not waste resources (degrading block confidence), so it must happen on parachain side then, right?

Confidence is dropping because we drop candidates at session boundaries. If we wouldn't do that, confidence would not reduce and parachains could keep producing blocks as they do now right? #9766

But yeah I assume it takes too long or is not scheduled?

#9766 is a different issue. If you read the ticket it is about candidates that have already been backed on chain and are pending availability. To solve that one we need to fix availability.

The issue in #9977 is that these candidates are not even advertised by collator protocol because the relay parent is out of scope already. To properly fix it we need to allow candidates with relay parents from the previous session. The fix should require changes in collator protocol, backing and prospective-parachains.

We will need to do this for supporting low latency parachains. IIRC we discussed with @eskimor about decoupling the relay parent we use for execution context from the one we use for scheduling information.

The fix in this PR should be very easy but will not be perfect. The candidate could be fetched because a new session was not observed yet, but then dropped from prospective parachains as soon as the RC advances in new session. What we can do for now is not build a collation on an older RP if we've already seen the RC best block in new session.

I have a check here which already checks whether there will be an epoch change in the relay parent ancestry. If yes, I am including the next epochs authorities for verification. Currently these blocks are getting dropped anyway, so the implementation is already forward looking, because I assumed at some point we will not drop anymore 😬.

I am not familiar with this code, does it solve what I said above ?

Also I don't think this should be solved in the cumulus code, because of separation of concerns. That's why I am proposing to query the collator protocol subsystem to do a sanity check on the relay parent before proceeding with block production. When we will allow RPs from prev session, you won't need to change anything in cumulus.

sandreim · 2025-10-10T14:31:40Z

Thus you make it sound like performing SessionBoundryCheck inside run_block_builder is wrong? Or do you mean it is wrong to only performing SessionBoundryCheck there? So perhaps do it in multiple places?

before building (my attempt of doing so is my current impl, inside: run_block_builder)

Yes, before, I propose you send a message to collator-protocol subsystem and ask it to tell you if the relay_parent is good to be built on. Collator protocol already tracks sessions and contains more checks for RP.

skunert · 2025-10-10T15:20:12Z

The check I mentioned above checks if any of the RP descendants we use to enforce the offset contain a session change digest. If they do, we include additional relay chain authorities in the inherent storage proof. I did this because I was assuming that we will allow relay parents from old sessions at some point. So it does currently not fix the issue you want to fix. But the check can be used for this, because the condition is the same.

The concerns of finding the correct relay parent are currently not separated anyway. We have the parent_search which tries to find a suitable parent block which lives in the same session as the tip of the chain. While thinking about this issue here I realized that we pass the relay_parent into this instead of the tip of the chain.

So I think what we should do is:

Pass the relay_best_hash here instead of relay_parent. This should enforce that the relay parent ancestry contains only blocks which we have no parachain blocks for, essentially leading to skipping the block.
However, we need to ensure that the included_header_hash here corresponds to the included hash from the relay_parent, same as it is now. Otherwise the runtime will later perform its checks against a different included block than what we check during authoring.

Skip building on blocks on relay parents in old session

340fdfb

Sajjon force-pushed the cyon/skip_building_blocks_on_relay_parents_in_old_session_issue_9977 branch from be812cb to 340fdfb Compare October 10, 2025 08:51

Sajjon added the T9-cumulus This PR/Issue is related to cumulus. label Oct 10, 2025

Add PRDoc

0e77f6b

Sajjon requested a review from sandreim October 10, 2025 08:59

Sajjon commented Oct 10, 2025

View reviewed changes

cumulus/client/consensus/aura/src/collators/mod.rs Outdated Show resolved Hide resolved

Add trace logging about skipping building due to relay parent is in o…

7dc02c9

…ld session

sandreim reviewed Oct 10, 2025

View reviewed changes

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs Outdated Show resolved Hide resolved

Compare session index not with grand_parent but with best_block

a8eaafc

Sajjon requested a review from sandreim October 10, 2025 10:07

sandreim reviewed Oct 10, 2025

View reviewed changes

avoid fetching best block twice

bcba623

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip building on blocks on relay parents in old session #9990

Skip building on blocks on relay parents in old session #9990

Sajjon commented Oct 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

sandreim left a comment •

edited

Loading

Uh oh!

sandreim Oct 10, 2025

Uh oh!

Sajjon commented Oct 10, 2025 •

edited

Loading

Uh oh!

skunert commented Oct 10, 2025

Uh oh!

Sajjon commented Oct 10, 2025 •

edited

Loading

Uh oh!

skunert commented Oct 10, 2025

Uh oh!

sandreim commented Oct 10, 2025 •

edited

Loading

Uh oh!

sandreim commented Oct 10, 2025

Uh oh!

skunert commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Skip building on blocks on relay parents in old session #9990

Are you sure you want to change the base?

Skip building on blocks on relay parents in old session #9990

Conversation

Sajjon commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sandreim left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sandreim Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Sajjon commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skunert commented Oct 10, 2025

Uh oh!

Sajjon commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skunert commented Oct 10, 2025

Uh oh!

sandreim commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sandreim commented Oct 10, 2025

Uh oh!

skunert commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sajjon commented Oct 10, 2025 •

edited

Loading

sandreim left a comment •

edited

Loading

Sajjon commented Oct 10, 2025 •

edited

Loading

Sajjon commented Oct 10, 2025 •

edited

Loading

sandreim commented Oct 10, 2025 •

edited

Loading