Skip to content

feat: change Participants from a Vec-based structure to a BTreeMap for improved gas usage#1861

Draft
pbeza wants to merge 87 commits intomainfrom
1572-replace-vec-with-setmap-based-data-structure-in-participants-for-improved-performance
Draft

feat: change Participants from a Vec-based structure to a BTreeMap for improved gas usage#1861
pbeza wants to merge 87 commits intomainfrom
1572-replace-vec-with-setmap-based-data-structure-in-participants-for-improved-performance

Conversation

@pbeza
Copy link
Contributor

@pbeza pbeza commented Jan 26, 2026

Fixes #1572

@pbeza pbeza force-pushed the 1572-replace-vec-with-setmap-based-data-structure-in-participants-for-improved-performance branch 10 times, most recently from ebd8911 to f2624ed Compare January 26, 2026 21:15
@pbeza pbeza force-pushed the 1572-replace-vec-with-setmap-based-data-structure-in-participants-for-improved-performance branch from f2624ed to 7288d28 Compare January 27, 2026 10:39
pbeza added 15 commits January 27, 2026 11:52
…-setmap-based-data-structure-in-participants-for-improved-performance
…-setmap-based-data-structure-in-participants-for-improved-performance
…-setmap-based-data-structure-in-participants-for-improved-performance
…-setmap-based-data-structure-in-participants-for-improved-performance
…ec-with-setmap-based-data-structure-in-participants-for-improved-performance
…place-vec-with-setmap-based-data-structure-in-participants-for-improved-performance
@pbeza
Copy link
Contributor Author

pbeza commented Feb 14, 2026

@claude review

@claude
Copy link

claude bot commented Feb 14, 2026

Code Review

I've analyzed this PR which migrates the internal Participants storage from Vec to BTreeMap while maintaining backward compatibility.

Critical Issues

1. Performance Regression: O(n) ID Uniqueness Check

Location: crates/contract/src/primitives/participants.rs:215-216

The PR removed the used_ids: BTreeSet<ParticipantId> secondary index, replacing an O(log n) lookup with O(n) iteration via .any(). While insertions are rare, the architecture supports hundreds of participants (tests use 800). An O(n) check during large resharing operations could cause gas issues.

Recommendation: Restore the used_ids BTreeSet or document why O(n) is acceptable with gas analysis.

2. Missing Validation in Participants::init

Location: crates/contract/src/primitives/participants.rs:283-292

The code builds used_ids locally but never validates uniqueness or stores it (field was removed). This differs from the old version which called validate(). Invalid state could be created through init.

Recommendation: Make init return Result<Self, Error> and call validate() before returning.

Strengths

  • Excellent backward compatibility via custom Borsh/JSON serialization
  • Proper migration strategy using v3_4_1_state.rs
  • Comprehensive test coverage
  • Gas optimization for lookups

⚠️ Issues found - please address before merging

@pbeza
Copy link
Contributor Author

pbeza commented Feb 15, 2026

@claude review

@claude
Copy link

claude bot commented Feb 15, 2026

Code Review

Based on my analysis of the PR, I've identified the following:

Critical Issues Found: ⚠️

1. Potential Data Loss Risk in Deserialization (contract-interface)

In crates/contract-interface/src/types/participants.rs, the custom deserialize_participants function silently drops duplicate AccountIds when converting from Vec format:

ParticipantsField::Vec(vec) => Ok(vec
    .into_iter()
    .map(|(account, id, info)| (account, ParticipantData { id, info }))
    .collect()),

If the legacy Vec format contains duplicate AccountIds (which shouldn't happen but could exist due to bugs), the BTreeMap::collect() will silently keep only the last entry. This could lead to participant data being lost during migration without any error or warning.

Recommendation: Add validation after collection:

ParticipantsField::Vec(vec) => {
    let expected_len = vec.len();
    let map: BTreeMap<_, _> = vec
        .into_iter()
        .map(|(account, id, info)| (account, ParticipantData { id, info }))
        .collect();
    if map.len() != expected_len {
        return Err(serde::de::Error::custom(
            "Duplicate AccountIds detected during deserialization"
        ));
    }
    Ok(map)
}

2. Incomplete Validation in Participants::validate()

The validate() method in crates/contract/src/primitives/participants.rs:110-121 checks for duplicate ParticipantIds and the next_id invariant, but there's an edge case:

if ids.last().is_some_and(|max| self.next_id.get() <= max.get()) {
    return Err(InvalidCandidateSet::IncoherentParticipantIds.into());
}

This only checks if next_id is greater than the maximum ID, but doesn't ensure that next_id is strictly greater than all IDs. While this is mathematically equivalent, the semantic intent is unclear. More importantly, it doesn't catch the case where next_id equals the max ID + 1 but some IDs in between are missing.

Actually, this is fine on second thought - the check is correct because next_id only needs to be greater than the max ID. The missing IDs in between are intentional (participants can be removed).


3. Missing .expect() Safety Comment

In crates/contract/src/v3_4_1_state.rs:51:

crate::primitives::participants::Participants::init(old.next_id, old.participants)
    .expect("Migrated participants should be valid")

This .expect() could panic during contract upgrade if the old state contained invalid participant data. While the comment claims "should be valid", there's no guarantee. If old contract versions had bugs that allowed invalid states, this migration will fail catastrophically.

Recommendation: Add detailed error context explaining what validation failed:

.unwrap_or_else(|e| {
    env::panic_str(&format!(
        "Migration failed: invalid participants in v3.4.1 state: {:?}",
        e
    ))
})

Non-Critical Observations:

  1. Good: The Borsh serialization maintains backward compatibility by serializing as Vec format
  2. Good: Comprehensive test coverage for duplicate detection via JSON tampering
  3. Good: The init() method properly validates on construction
  4. Good: Migration code is well-separated in v3_4_1_state.rs

Verdict: ⚠️ Issues Found

Please address the critical deserialization data loss risk (#1) before merging. Issue #3 is lower priority but should be addressed for better operational safety during upgrades.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gilcu3
Copy link
Contributor

gilcu3 commented Feb 18, 2026

@claude review

Now that you are Opus 4.6, you might have better insights.

@pbeza there is a conflict that needs to be resolved either way, but maybe better wait to get two reviews first

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review

I've reviewed the full diff for this Vec-to-BTreeMap migration. The PR is well-structured overall — migration logic is comprehensive, gas improvements are meaningful, and test coverage has been updated. Previous review rounds have addressed the major issues (ParticipantId uniqueness validation in validate(), init() returning Result, duplicate-ID detection tests). Here are remaining concerns:

Issues

1. Silent data loss in contract-interface deserialization (previously flagged, appears unresolved)

crates/contract-interface/src/types/participants.rs — the deserialize_participants function silently drops duplicate AccountIds when converting from legacy Vec format:

ParticipantsField::Vec(vec) => Ok(vec
    .into_iter()
    .map(|(account, id, info)| (account, ParticipantData { id, info }))
    .collect()),

BTreeMap::collect() keeps only the last entry for duplicate keys. While duplicates "shouldn't" exist in production data, this is a DTO consumed by external callers (backup-cli, node code) where the input isn't under contract control. A length check after .collect() would catch corruption without adding meaningful cost:

ParticipantsField::Vec(vec) => {
    let expected_len = vec.len();
    let map: BTreeMap<_, _> = vec
        .into_iter()
        .map(|(account, id, info)| (account, ParticipantData { id, info }))
        .collect();
    if map.len() \!= expected_len {
        return Err(serde::de::Error::custom("duplicate AccountIds in legacy Vec format"));
    }
    Ok(map)
}

2. Migration uses new_unvalidated — skips threshold validation

crates/contract/src/v3_4_1_state.rs:47OldThresholdParameters::from uses ThresholdParameters::new_unvalidated():

impl From<OldThresholdParameters> for crate::primitives::thresholds::ThresholdParameters {
    fn from(old: OldThresholdParameters) -> Self {
        crate::primitives::thresholds::ThresholdParameters::new_unvalidated(
            old.participants.into(),
            old.threshold,
        )
    }
}

While the data was previously valid on-chain, note that Participants::init() (called via .into()) already validates participant invariants. Using new_unvalidated means the threshold/participant-count relationship is not re-checked after migration. This is likely fine since the old contract enforced these invariants, but it's worth a brief comment explaining why validation is intentionally skipped (e.g., // Old state was already validated; re-validation would reject valid states during resharing transitions).

3. ABI schema/wire-format change is a breaking change for external consumers

The Participants struct's JSON format changed from Vec-of-tuples to BTreeMap. The ABI snapshot confirms this. Any existing external consumers parsing the old format from the state() view method will break after upgrade. This is deliberate — just confirming it's expected and that downstream consumers have been notified.

4. remove() moved from test-only to production code

The remove method was previously #[cfg(any(test, feature = "test-utils"))] and is now public production code (used in tee_state.rs). The implementation is correct (self.participants.remove(account)), but note this is a behavioral change — remove is now available in non-test builds. This seems intentional since it's needed by the TEE validation cleanup path.

Minor

  • crates/node/src/p2p.rs:160 — Changed from ParticipantId::from_raw(rand::random()) to ParticipantId::from_raw(i as u32). Good change — random IDs could cause duplicate-ID validation failures with BTreeSet-based validation.

Verdict

The core logic is sound. Issue #1 (silent data loss in DTO deserialization) is the most actionable item — a two-line length check would add defense-in-depth. Issue #2 is informational. The breaking ABI change (#3) should be confirmed as intentional.

⚠️ Issues found — please address #1 before merging.

Copy link
Contributor

@gilcu3 gilcu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a big PR, I had several comments, but none of them blocking IMO. There are also the comments above by claude. I think many of them, both mine and claude's are good to have.

@@ -47,65 +48,128 @@ pub struct ParticipantInfo {
pub sign_pk: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in the PR the other day (sorry), but this should be a proper type that we already have defined in this crate Ed25519PublicKey

Comment on lines 214 to 215
/// Returns a subset of the participants according to the given range of indices.
pub fn subset(&self, range: std::ops::Range<usize>) -> Participants {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know we had this function!

Comment on lines +9 to +13
//!
//! ## Changes since 3.4.1
//! - `Participants` struct changed from serializing participants as
//! `Vec<(AccountId, ParticipantId, ParticipantInfo)>` to
//! `BTreeMap<AccountId, ParticipantData>` where `ParticipantData { id, info }`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a few more changes to add, for example all the foreign tx stuff

Comment on lines +208 to +210
/// StaleData in v3.4.1 is empty (participant_attestations was cleaned up in 3.4.0 → 3.4.1).
#[derive(Debug, Default, BorshSerialize, BorshDeserialize)]
struct StaleData {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this struct if that's the case?

Comment on lines -50 to +48
pub const CURRENT_CONTRACT_DEPLOY_DEPOSIT: NearToken = NearToken::from_millinear(14000);
pub const CURRENT_CONTRACT_DEPLOY_DEPOSIT: NearToken = NearToken::from_millinear(14201);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I general I would rather leave some room here, putting an exact values calls for future flakiness

Comment on lines -391 to -402
/// Submit mock attestations for all participants in parallel.
/// Submit mock attestations for the given accounts in parallel.
pub async fn submit_attestations(
contract: &Contract,
accounts: &[Account],
participants: &Participants,
) {
let futures: Vec<_> = participants
.participants()
let futures: Vec<_> = accounts
.iter()
.zip(accounts)
.enumerate()
.map(|(i, ((_, _, participant), account))| async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worth fixing in a future PR, we could use the much simpler async transactions instead. See for example the function execute_async_transactions

Comment on lines +73 to +83
// Build new participants: all except mpc_signer_accounts[0]
let subset_dto = dtos::Participants {
next_id: initial_participants.next_id,
participants: initial_participants
.participants
.iter()
.filter(|(account_id, _)| account_id.0 != excluded_account)
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
};
let new_participants: Participants = (&subset_dto).into_contract_type();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing this double conversion here? Shouldn't we just construct the contract type?

/// Wrapper around [`execute_key_generation_and_add_random_state_with_proposal`]
/// that serialises the threshold proposal in the legacy Vec-of-tuples format
/// expected by old contract binaries.
async fn add_random_state_to_old_contract(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more accurate, and seems to align better with existing code

Suggested change
async fn add_random_state_to_old_contract(
async fn execute_key_generation_and_add_random_state_to_old_contract(

Comment on lines -262 to +264
if participants_with_valid_attestation.len() != participants.len() {
let participants_with_valid_attestation =
Participants::init(participants.next_id(), participants_with_valid_attestation);
if !invalid_accounts.is_empty() {
let mut participants_with_valid_attestation = participants.clone();
for account in &invalid_accounts {
participants_with_valid_attestation.remove(account);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change added the requirement of the "remove" function to the prod contruct that was previously test only. I think the previous code was slightly more efficient, wdyt?

Copy link
Contributor

@kevindeforth kevindeforth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR currently does two things:

  • it changes the contract-internal Participants representation
  • it propagates those changes to the contract-interface crate.

It is unclear to me, why we need to add the new internal representation to the contract-interface crate. It makes reviewing this PR much harder.
To me, it looks like there is a breaking change in the abi.

I would prefer we split this up into two separate PR's, with the first one only doing the contract-internal changes. If we stay backwards-compatible, then this shouldn't require changes to any of our other crates.

Requesting changes for now - but we can discuss this on slack or over a call.

Comment on lines +78 to +104
#[serde(deserialize_with = "deserialize_participants")]
pub participants: BTreeMap<AccountId, ParticipantData>,
}

/// Custom deserializer for the `participants` field that accepts both the
/// current map format (`{account: {id, info}}`) and the legacy vec-of-tuples
/// format (`[[account, id, info], ...]`).
fn deserialize_participants<'de, D>(
deserializer: D,
) -> Result<BTreeMap<AccountId, ParticipantData>, D::Error>
where
D: serde::Deserializer<'de>,
{
#[derive(Deserialize)]
#[serde(untagged)]
enum ParticipantsField {
Map(BTreeMap<AccountId, ParticipantData>),
Vec(Vec<(AccountId, ParticipantId, ParticipantInfo)>),
}

match ParticipantsField::deserialize(deserializer)? {
ParticipantsField::Map(map) => Ok(map),
ParticipantsField::Vec(vec) => Ok(vec
.into_iter()
.map(|(account, id, info)| (account, ParticipantData { id, info }))
.collect()),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so two things:

  • I don't think this PR requires any changes to the contract-interface. The only method that needs to be modified is the into_dto_type(), but that lives in the contract crate.
  • If we realy want to add the new Participants type to the contract-interface crate, then I would opt to do that in a follow-up PR, not this one. It's just a lot of changes otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this file require changes if we stay backwards compatible?
IIRC, then the backup-service only depends on the contract-interface type and thus, would expect the same format as before?

],
"maxItems": 3,
"minItems": 3
"type": "object",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't his a breaking change?

@gilcu3 gilcu3 self-requested a review February 19, 2026 08:52
Copy link
Contributor

@gilcu3 gilcu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we figured that the current PR introduces breaking changes wrt the 3.4.1 contract, checked with localnet automation:

mpc on  1572-replace-vec-with-setmap-based-data-structure-in-participants-for-improved-performance via 🦀 v1.86.0 took 59s
❯ ./scripts/launch-localnet.sh
Using mpc-contract binary from ./target/near/mpc_contract/mpc_contract.wasm
Creating network with 2 mpc nodes and threshold 2
Logs will be stored in /tmp/mpc-localnet.9QpDvx
Cleaning ~/.near folder
Started: neard PID: 261214
Waiting 60 seconds for neard to start properly
Creating mpc-contract account
Deploying mpc-contract
Creating mpc-node accounts
Creating mpc nodes configuration
Starting mpc nodes
Started: mpc-node-1.test.near PID: 266942
Started: mpc-node-2.test.near PID: 266944
Waiting 20 seconds for mpc nodes to start properly
Adding account keys for the nodes
Initializing contract
Adding domains to contract
Waiting 20 seconds for key generation to happen
Command near contract call-function as-read-only mpc-contract.test.near state json-args {} network-config mpc-localnet now 2>&1 | grep Running failed. Retrying in 2s...
Command near contract call-function as-read-only mpc-contract.test.near state json-args {} network-config mpc-localnet now 2>&1 | grep Running failed. Retrying in 2s...
Command near contract call-function as-read-only mpc-contract.test.near state json-args {} network-config mpc-localnet now 2>&1 | grep Running failed. Retrying in 2s...
Command near contract call-function as-read-only mpc-contract.test.near state json-args {} network-config mpc-localnet now 2>&1 | grep Running failed. Retrying in 2s...

For some reason the old contract does not transition to Running state

@pbeza
Copy link
Contributor Author

pbeza commented Feb 19, 2026

Thanks for all the reviews!

I’m putting this on hold for now, since addressing all the concerns would take too much time and we need to prioritize the HOT Wallet migration work.

@gilcu3
Copy link
Contributor

gilcu3 commented Feb 19, 2026

Thanks for all the reviews!

I’m putting this on hold for now, since addressing all the concerns would take too much time and we need to prioritize the HOT Wallet migration work.

Makes sense, marking it as draft so that the intention of work in progress is explicit

@gilcu3 gilcu3 marked this pull request as draft February 19, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace Vec with Set/Map-based data structure in Participants for improved performance

4 participants