Skip to content

Conversation

@eskimor
Copy link
Member

@eskimor eskimor commented Nov 28, 2025

Overview

This PR introduces V3 candidate descriptors with an explicit scheduling_parent field, separating the scheduling context (which determines validator group assignment) from the execution context (which determines parachain state). This is a critical foundation for enabling lookahead scheduling and improving parachain block production flexibility in async backing.

Key Innovation: V3 candidates can be scheduled based on a different relay chain block than the one they execute against, enabling validators to assign backing responsibilities ahead of time while maintaining correct execution semantics.

Problem

In V1 and V2 candidate descriptors, the relay_parent field serves a dual purpose:

  1. Execution context: Determines which relay chain state the parachain block executes against
  2. Scheduling context: Determines which validator group is assigned to back the candidate

This tight coupling limits flexibility:

  • Candidates must be scheduled on the exact block they execute against
  • No lookahead scheduling possible
  • Reduces predictability for parachains about when they can produce blocks

Solution

V3 candidates introduce an explicit scheduling_parent field that decouples these concerns:

  • relay_parent: Determines execution context (parachain state root, claim queue state for core assignment)
  • scheduling_parent: Determines validator group assignment (which validators back this candidate)

For backward compatibility:

  • V1/V2 candidates: scheduling_parent == relay_parent (implicit, behavior unchanged)
  • V3 candidates: scheduling_parent can differ from relay_parent (explicit field in descriptor)

This separation enables lookahead scheduling where parachains can be assigned to validator groups on future relay chain blocks while still executing against older state.

Key Changes

Primitives (polkadot/primitives/src/v9/mod.rs)

  • CandidateDescriptorVersion::V3: New enum variant for version detection
  • CandidateDescriptorV2::new_v3(): Constructor accepting explicit scheduling_parent parameter
  • scheduling_parent(v3_enabled: bool) -> Hash: Accessor returning scheduling_parent for V3, relay_parent for V1/V2
  • scheduling_session(v3_enabled: bool) -> Option<SessionIndex>: Accessor for scheduling session (offset-based for V3)
  • Version detection: Feature-gated logic (CandidateReceiptV3 node feature) to distinguish V1 from V3 using reserved fields

PVF Extension (polkadot/parachain/src/primitives.rs)

  • ValidationParamsExtension::V3: New extension type containing both relay_parent and scheduling_parent hashes
  • TrailingOption<T>: Backward-compatible wrapper that decodes T from trailing bytes if present, or None if at EOF
    • V3 candidates: PVF receives extension with both hashes
    • V1/V2 candidates: PVF receives no extension bytes (TrailingOption decodes as None)
    • Old PVFs: Gracefully ignore trailing extension bytes (don't fail to decode)

Subsystem Messages (polkadot/node/subsystem-types/src/messages.rs)

  • BackableCandidateRef: New struct containing candidate_hash and scheduling_parent (replaces bare CandidateHash)
  • CandidateBackingMessage::Second: Now includes explicit scheduling_parent: Hash parameter
  • CandidateBackingMessage::Statement: Now includes explicit scheduling_parent: Hash parameter
  • CandidateBackingMessage::GetBackableCandidates: Uses Vec<BackableCandidateRef> instead of Vec<CandidateHash>
  • CanSecondRequest: Includes candidate_scheduling_parent field for validator group lookup

Core Subsystems

Candidate Backing (polkadot/node/core/backing/)

  • Use scheduling_parent (not relay_parent) for validator group lookups
  • Track per-scheduling-parent state instead of per-relay-parent
  • Validate candidates against scheduling_parent context (session, group assignment)

Candidate Validation (polkadot/node/core/candidate-validation/)

  • For V3: Append ValidationParamsExtension::V3 bytes to PVF validation input
  • For V1/V2: No extension bytes appended (backward compatible)
  • PVF workers decode extension using TrailingOption pattern

Prospective Parachains (polkadot/node/core/prospective-parachains/)

  • Track candidates with their scheduling_parent hash
  • Validate scheduling_parent is in active leaves before accepting candidates
  • Support fragment chains with mixed V1/V2/V3 candidates

Network Protocol

Collator Protocol (polkadot/node/network/collator-protocol/)

  • Track collations per-scheduling-parent instead of per-relay-parent
  • PendingCollation and FetchedCollation now include scheduling_parent field
  • Validate advertised scheduling_parent matches fetched descriptor's actual scheduling_parent(v3_enabled)
  • Ensure scheduling_parent is an active leaf before accepting collations

Statement Distribution (polkadot/node/network/statement-distribution/src/v2/)

  • Rename PerRelayParentStatePerSchedulingParentState
  • Rename per_relay_parent map → per_scheduling_parent
  • Use scheduling_parent as key for state lookups (validator groups, candidate tracking)
  • Add clarifying comments in tests explaining relay_parent serves dual role for V1/V2

Test Infrastructure

  • Remove 114 lines of obsolete CandidateReceiptV2 node feature checks (V2 now enabled everywhere)
  • Add V3-specific tests:
    • v3_descriptors_are_accepted_when_enabled: V3 with UMP signals accepted
    • v3_descriptors_without_ump_signals_are_rejected: V3 without UMP signals rejected
    • v3_descriptors_rejected_as_v1_when_disabled: V3 rejected as V1 when feature disabled
  • Update all test call sites to use new descriptor version enum

Backward Compatibility

Multiple layers of protection ensure safe gradual rollout:

  1. Node Feature Gating: V3 only recognized when CandidateReceiptV3 node feature enabled (requires 2/3+ validator upgrade)

  2. Mandatory UMP Signals: V3 candidates MUST include UMP signals (SelectCore at minimum)

    • Prevents old nodes from mistakenly backing V3 candidates (they'd see them as invalid V1)
    • No slashing risk for validators during transition period
  3. TrailingOption Pattern: PVF extension gracefully handled

    • Old PVFs: Don't decode extension, behave as before
    • New PVFs: Decode extension if present, use both hashes
  4. Version Detection: Backwards compatible logic distinguishes V1 from V3

    • V3 uses version == 1 (vs V2's version == 0)
    • Reserved fields checked to prevent misidentification
    • Old nodes see V3 as invalid V1 (missing UMP signals)
  5. Runtime Protection: Runtime drops candidates violating version-specific rules

    • V3 without UMP signals: Rejected
    • V3 with invalid scheduling_parent: Rejected

Review Focus

High Priority - Correctness

  1. Version detection logic (polkadot/primitives/src/v9/mod.rs:CandidateDescriptorV2::version())

    • Ensures V1 and V3 are correctly distinguished
    • Verify old nodes cannot misinterpret V3 as V1
  2. TrailingOption safety (polkadot/parachain/src/primitives.rs, polkadot/node/core/pvf/execute-worker/)

    • Confirm it only works as final field (documented with safety warnings)
    • Verify old PVFs don't fail when extension bytes absent
  3. Scheduling_parent validation (polkadot/node/network/collator-protocol/, polkadot/node/core/backing/)

    • Must be active leaf before accepting candidates
    • Used correctly for validator group lookups
  4. UMP signal enforcement (polkadot/runtime/parachains/src/paras_inherent/mod.rs)

    • Runtime rejects V3 without mandatory UMP signals
    • Prevents security issues with old nodes

Medium Priority - Architecture

  1. Message flow (polkadot/node/subsystem-types/src/messages.rs)

    • scheduling_parent correctly threaded through subsystem messages
    • BackableCandidateRef used consistently
  2. State tracking (polkadot/node/core/backing/, polkadot/node/network/statement-distribution/)

    • Per-scheduling-parent state management (not per-relay-parent)
    • Correct hashmap key usage
  3. PVF extension encoding/decoding (polkadot/node/core/candidate-validation/)

    • Extension appended correctly for V3
    • No extension for V1/V2 (backward compatible)

Lower Priority - Cleanup

  1. Obsolete V2 feature checks removed (114 lines in paras_inherent/tests.rs)
  2. Naming consistency (per_relay_parentper_scheduling_parent)
  3. Test infrastructure refactoring (descriptor version enum in builder.rs)

CI Coverage

CI verifies:

  • All affected packages compile successfully
  • All existing tests pass (runtime, subsystems, network protocols)
  • New V3-specific tests validate:
    • V3 descriptors accepted when feature enabled
    • V3 descriptors rejected without mandatory UMP signals
    • V3 descriptors rejected as V1 when feature disabled
    • Scheduling_parent validation in collator-protocol and backing

Critical Invariants

  1. No slashing risk: Old validators cannot mistakenly back V3 candidates (UMP signal requirement prevents this)
  2. Correct validator grouping: Always determined by scheduling_parent, never relay_parent (for V3)
  3. Active leaf requirement: scheduling_parent must be in validator's active leaves
  4. Core assignment correctness: UMP SelectCore signal matches claim queue assignment
  5. PVF safety: V1/V2 PVFs don't fail when TrailingOption decodes as None

Related


Scope: ~4,500 lines added, ~2,350 lines removed across 58 files

@eskimor eskimor requested review from alindima and sandreim November 28, 2025 22:19
@eskimor eskimor force-pushed the rk-prep-new-candidate-version branch from 8c529fe to 1588d71 Compare November 28, 2025 22:21
@eskimor eskimor force-pushed the rk-prep-new-candidate-version branch 2 times, most recently from 21db3df to d66e38a Compare December 2, 2025 15:34
This change works towards supporting for V3 candidate descriptors which
allow the scheduling parent (the relay block used for core assignment)
to differ from the relay parent (the block the parachain builds on).
This is a prerequisite for low-latency collation.

Key changes:

collation-generation:
- Add comprehensive module documentation explaining the two modes of
  operation (CollatorFn callback vs SubmitCollation message) and V2/V3
  descriptor differences
- Pass scheduling_parent through to construct_and_distribute_receipt()
- Create V3 descriptors when scheduling_parent is Some, V2 otherwise

candidate-backing:
- Rename PerRelayParentState to PerSchedulingParentState to reflect that
  state is now keyed by scheduling parent, not relay parent
- Store session_index in PerSchedulingParentState for V1 fallback (where
  session is not in the descriptor)
- Fetch executor_params on-demand using session from descriptor (V2/V3)
  or from scheduling parent state (V1 fallback), rather than storing it
  per scheduling parent
- Simplify core_index_from_statement() to take PerSchedulingParentState

prospective-parachains:
- Add tests for V3 candidate descriptor handling

primitives:
- Add new_v3() constructor for CandidateDescriptorV2 with explicit
  scheduling_parent parameter
@eskimor eskimor force-pushed the rk-prep-new-candidate-version branch from 4f2c139 to f288d35 Compare January 9, 2026 17:01
@eskimor eskimor changed the base branch from master to rk-prospective-parachains-cleanup January 9, 2026 17:02
This commit introduces several related improvements to the backing and
validation subsystems:

1. Add BackableCandidateRef struct
   - Replaces bare (CandidateHash, Hash) tuples with type-safe struct
   - Explicitly names scheduling_parent field for clarity
   - Prevents accidental field swapping or wrong hash usage

2. Convert subsystem messages to named enum fields
   - CandidateBackingMessage::GetBackableCandidates
   - CandidateBackingMessage::Second
   - CandidateBackingMessage::Statement
   - ProspectiveParachainsMessage::GetBackableCandidates
   - Improves code readability and IDE support

3. Fix scheduling parent terminology
   - Rename candidate_relay_parent → candidate_scheduling_parent in
     CanSecondRequest
   - Fix variable naming: relay_parent → scheduling_parent where
     semantically correct
   - Update comments and logs to use accurate terminology
   - Distinguish between execution context (relay_parent) and scheduling
     context (scheduling_parent)

4. Add ValidationContext struct to PVF subsystem
   - Encapsulates candidate receipt, PVD, PoV, and execution params
   - Provides helper methods for accessing relay_parent and
     scheduling_parent
   - Reduces parameter explosion in validation code paths
   - ExecuteRequest now includes scheduling_parent and
     descriptor_version

5. Update fragment chain to track V3 scheduling_parent
   - CandidateEntry now stores both relay_parent and scheduling_parent
   - Validates relay_parent ancestry while using scheduling_parent for
     group assignment
   - Adds v3_enabled parameter to candidate entry creation

All changes are internal to the node - no network protocol changes. This
prepares the codebase for proper V3 candidate handling where
relay_parent (execution) and scheduling_parent (scheduling) can differ.
scheduling_parent

- Add v3_collation protocol imports for V3 AdvertiseCollation messages
- Add version field to PeerData for protocol version negotiation
- Rename PerRelayParent -> PerSchedulingParent throughout
- Add v3_enabled flag to PerSchedulingParent from node_features
- Update PendingCollation to track advertised_descriptor_version for V3
- Unified PendingCollation::new and new_v3 into single constructor
- Fix borrow checker issues by passing CollationVersion directly
- Update all tests to use V3 protocol where appropriate
ValidationParamsExtension

This commit introduces the concept of scheduling_parent as distinct from
relay_parent (execution parent) across node subsystems and extends the
PVF interface to pass both hashes for V3+ candidates.

For V1/V2 candidates: scheduling_parent == relay_parent (implicitly) For
V3 candidates: scheduling_parent may differ from relay_parent

The scheduling_parent determines:
- Which validator group is assigned to back the candidate
- Which per-parent state to use for candidate tracking
- The context for claim queue lookups and validator assignments

The relay_parent determines:
- The execution context (relay chain block state)
- Parent head data and storage root

Add ValidationParamsExtension for V3+ candidates:
- New versioned enum appended to ValidationParams encoding
- V3 variant contains both relay_parent and scheduling_parent hashes
- TrailingOption wrapper enables backward compatibility with V1/V2
- PVFs decode extension only if bytes remain (V3), otherwise None
  (V1/V2)
- Add comprehensive safety warnings to TrailingOption about its
  constraints

This allows PVFs to distinguish between scheduling and execution
contexts starting with V3 candidates.

CandidateBackingMessage:
- GetBackableCandidates: Introduce BackableCandidateRef type with
  candidate_hash + scheduling_parent, convert to struct variant
- Second: Convert to struct variant with explicit scheduling_parent
  field
- Statement: Add scheduling_parent field to track backing context
- CanSecondRequest: Rename candidate_relay_parent →
  candidate_scheduling_parent

CollationSecondedSignal:
- Rename relay_parent → scheduling_parent with clarifying documentation

SubmitCollationParams:
- Add optional scheduling_parent field for V3 descriptor creation

statement-distribution/v2:
- Rename PerRelayParentState → PerSchedulingParentState
- Rename per_relay_parent map → per_scheduling_parent
- Update all lookups to use scheduling_parent as the key
- Update comments distinguishing scheduling vs execution context

- Refactor descriptor version from two booleans to
  CandidateDescriptorVersionConfig enum (V1/V2/V3 variants) eliminating
  invalid state combinations
- Remove obsolete CandidateReceiptV2 feature flag checks (19 instances,
  114 lines) V2 is now always accepted regardless of feature flag
  (graduated in commit 4cdf77e)
- Update paras_inherent filtering documentation
- Add comments in grid tests clarifying relay_parent serves dual role
  for V1/V2
- Fix indentation in CandidateBackingMessage::Statement pattern matches

This is a preparatory refactoring. V1/V2 behavior is unchanged:
- ValidationParams encoding unchanged (extension appended only for V3)
- V
@eskimor eskimor mentioned this pull request Jan 16, 2026
@eskimor eskimor changed the title Prepare new candidate version: Add node feature V3 Candidate Descriptor Support with Explicit Scheduling Parent + node feature Jan 16, 2026
@eskimor eskimor marked this pull request as ready for review January 22, 2026 17:43
@eskimor eskimor added T8-polkadot This PR/Issue is related to/affects the Polkadot network. T0-node This PR/Issue is related to the topic “node”. T18-zombienet_tests Trigger zombienet CI tests. labels Jan 22, 2026
@eskimor eskimor mentioned this pull request Jan 22, 2026
9 tasks
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/21264051381
Failed job name: test-linux-stable

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/21264051381
Failed job name: test-linux-stable-runtime-benchmarks

Comment on lines +1956 to +1983
// must all be 0 by accident to cause any issues. Bitcoin hardest
// difficulty so far has been 24 digits/12 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with Bitcoin ? :)

/// return V3 descriptors. When `false`, the function preserves pre-V3
/// behavior for backwards compatibility - see explanation above.
pub fn version(&self, v3_enabled: bool) -> CandidateDescriptorVersion {
let old_v1_detected = self.reserved2 != [0u8; 32] ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be v1, not really old v1.

/// via node features. When `true`, the function will properly detect and
/// return V3 descriptors. When `false`, the function preserves pre-V3
/// behavior for backwards compatibility - see explanation above.
pub fn version(&self, v3_enabled: bool) -> CandidateDescriptorVersion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this flag ? The primitive is not aware of feature flags, and the versioning information is contained fully in it.

pub(super) core_index: u16,
/// The session index of the candidate relay parent.
session_index: SessionIndex,
/// Session index for determining secondary checkers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before fixing session boundaries session_index should be equal scheduling_session_offset

/// The root of a block's erasure encoding Merkle tree.
erasure_root: Hash,
/// The relay chain block determining scheduling.
scheduling_parent: H, // Introduced in v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we running out of space, one more hash left to add 😢

candidate: &BackedCandidate<T::Hash>,
allowed_relay_parents: &AllowedRelayParentsTracker<T::Hash, BlockNumberFor<T>>,
allow_v2_receipts: bool,
v3_enabled: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also update fn name as it now also checks v3.

// Check if session index is equal to current session index.
if session_index != shared::CurrentSessionIndex::<T>::get() {
// Check if scheduling session is equal to current session index.
if scheduling_session != shared::CurrentSessionIndex::<T>::get() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check candidate.descriptor().session_index(). For now they should still be the same.

Copy link
Contributor

@iulianbarbu iulianbarbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a check in review - local zn-sdk testing is underway.

Comment on lines +293 to +294
v3_enabled,
scheduling_parent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels redundant to have both. Can't we assume v3 enabled if scheduling_parent.is_some()?

self.candidate_hash()
}

// Uses default implementation: returns relay_parent()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not that useful, can be removed

Suggested change
// Uses default implementation: returns relay_parent()

vec![notification.into()],
metrics,
)
} else {
Copy link
Contributor

@iulianbarbu iulianbarbu Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle a message for V3 here too. Logs during zn-sdk tests are showing the Major logic bug. Peer somehow has unsupported collation protocol version., which is not shown anymore after this fix.

diff --git a/polkadot/node/network/bridge/src/rx/mod.rs b/polkadot/node/network/bridge/src/rx/mod.rs
index 7a8f2e3133..b00956dfd9 100644
--- a/polkadot/node/network/bridge/src/rx/mod.rs
+++ b/polkadot/node/network/bridge/src/rx/mod.rs
@@ -587,6 +587,15 @@ async fn handle_collation_message<AD>(
                                                metrics,
                                        )
                                } else if expected_versions[PeerSet::Collation] == Some(CollationVersion::V2.into())
+                               {
+                                       handle_peer_messages::<protocol_v2::CollationProtocol, _>(
+                                               peer,
+                                               PeerSet::Collation,
+                                               &mut shared.0.lock().collation_peers,
+                                               vec![notification.into()],
+                                               metrics,
+                                       )
+                               } else if expected_versions[PeerSet::Collation] == Some(CollationVersion::V3.into())
                                {
                                        handle_peer_messages::<protocol_v2::CollationProtocol, _>(
                                                peer,		

}
}
},
NetworkBridgeTxMessage::SendRequests(reqs, if_disconnected) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably something must be handled here accordingly for V3 CollationVersion. I am seeing such logs in the local testing, while parachain finalization is not happening and collations expire. Will try tomorrow to address it.

2026-01-28 17:44:48.032 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Collation was advertised but not requested by any validator. candidate_hash=0x94b8d1bf7709355052dca13d8ca33018d46956ba2133b185e20a72d02de13f06 pov_hash=0x2766fcabaee5da227b317ff0f41222e586bd4191d7e93a3c2d658507ee258870 traceID=197685380191164009165270678908973559832

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network. T18-zombienet_tests Trigger zombienet CI tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants