Skip to content

Conversation

@AlexandruCihodaru
Copy link
Contributor

Implements persistent storage for the experimental collator protocol's reputation database.

Changes:

  • Adds PersistentDb wrapper that persists the in-memory reputation DB to disk
  • Periodic persistence every 10 minutes (30s in test mode)
  • Immediate persistence on slashes and parachain deregistration
  • Loads existing state on startup with lookback for missed blocks

Implementation:

PersistentDb wraps the existing Db and adds persistence on top:

- All reputation logic (scoring, decay, LRU) stays in `Db`
- Persistence layer handles disk I/O and serialization
- Per-para data stored in parachains_db

Tests:

  • basic_persistence.rs: Validates persistence across restarts and startup lookback
  • pruning.rs: Validates automatic cleanup on parachain deregistration

Implements disk persistence for the experimental collator protocol's
reputation system to preserve peer scores across validator restarts.
                                                                                                                                                                                                                                                                                                   Key changes:
  - Add PersistentDb wrapper that persists in-memory Db to disk
  - Implement periodic persistence (10 min prod, 30 sec test mode)
  - Add immediate persistence for slashes and para pruning
  - Create serialization layer for ScoreEntry and reputation data
  - Add ReputationConfig for database column configuration

Signed-off-by: Alexandru Cihodaru <[email protected]>
Tests added:
  - basic_persistence.rs: Validates that reputation data persists across
    validator restarts and that the startup lookback mechanism correctly
    processes missed blocks during downtime. Tests both large gaps (20+
    blocks) and small gaps (<20 blocks).

  - pruning.rs: Verifies that reputation data is correctly pruned when
    parachains are deregistered. Tests para cleanup via sudo, session
    boundary detection, immediate persistence of pruned state, and
    correct loading of only remaining paras after restart.

Signed-off-by: Alexandru Cihodaru <[email protected]>
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/21413586873
Failed job name: fmt

Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job!

// along with Polkadot. If not, see <http://www.gnu.org/licenses/>.

//! Serialization types for disk persistence of collator reputation data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the db-related file would be better off in a separate module within the peer manager (including the in-memory db). And the in-memory db should be renamed from just plain db.
But let's do the renames in a follow-up PR, to not make review more difficult right now


let last_finalized = instance.inner.processed_finalized_block_number().await;

// Use info level for clear observability in tests and production
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated comment

// Load metadata
if let Some(meta) = self.load_metadata()? {
// We need to directly access the inner fields to restore state
// This requires making Db fields pub(super) or adding setter methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment


/// Persist all in-memory data to disk.
///
/// This should be called periodically by the main loop (currently, every 10 minutes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimisation: it should only be done for the paras that have been "dirtied" (had any change).

}

/// Persist a single para's data to disk (called immediately after slash).
fn persist_para(&self, para_id: &ParaId) -> Result<(), PersistenceError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for slash we only need to persist one peer's reputation. The way it was previously stored (in the old PR), this was possible

);

if block_data.experimental_send_approved_peer {
// Create a valid PeerId in multihash format: [hash_code, digest_size, ...digest_bytes]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double checking: this was added for test purposes, but the code can handle these invalid peerids gracefully right?

// Copyright (C) Parity Technologies (UK) Ltd.
// SPDX-License-Identifier: Apache-2.0

//! Test: Basic Persistence on Graceful Shutdown with Startup Lookback Verification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the test is not expressive enough. What reputation is this about?

Also, this tests persistence on restart more than anything

@@ -0,0 +1,266 @@
// Copyright (C) Parity Technologies (UK) Ltd.
// SPDX-License-Identifier: Apache-2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to have the tests running in CI you need to add them to the github actions config

//! 5. **Deregister parachain 2001 using sudo**
//! 6. Wait for session boundary (triggers pruning check)
//! 7. Verify pruning logs show para 2001 was pruned
//! 8. Wait for periodic persistence (pruned state written to disk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pruned state is written immediately, is it not?

);

log::info!("Verifying para 2000 continues normal operation (para 2001 is deregistered)");
assert_para_throughput(&validator0_client_after, 5, [(ParaId::from(PARA_ID_1), 4..6)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also assert that para 2001 has throughput 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants