Skip to content

feat: Da sequencer, add state propagation#1226

Merged
0xmovses merged 66 commits into
feature/da-sequencer-nodefrom
musitdev/propagate_state
May 13, 2025
Merged

feat: Da sequencer, add state propagation#1226
0xmovses merged 66 commits into
feature/da-sequencer-nodefrom
musitdev/propagate_state

Conversation

@musitdev
Copy link
Copy Markdown
Contributor

@musitdev musitdev commented May 9, 2025

Summary

  • RFCs: Link to RFC, Link to RFC, or $\emptyset$.
  • Categories: any of protocol-units, networks, scripts, util, cicd, or misc.

Implementation of the issue #1222

Some commits are already in feature branch because I've used squash and merge in the other PR instead of simple merge, so as this branch start from the other PR branch, git can't re-associate them.
The setup and test part is already merged in feature branch.

Add state propagation from a main node to all other full node.
Full node validate the received state with the one they have at the same height.
If it diverges the node is stopped.
The propagation use the da-sequencer grpc service.

Start updating the follower node setup / migration doc update.

Changelog

  • Add a new entry point to da sequencer grpc: send_state
  • Add a property in node config to define the main node.
  • Main node send state after execution.
  • All full node store the last 100 state they receive.
  • All full node validate the state after execution with the one they have received.
  • Full node installation doc creation.

Testing

Unit test of the state validation:

 cargo test -p movement-full-node test_validate_state

e2e test:

CELESTIA_LOG_LEVEL=FATAL nix develop --extra-experimental-features nix-command --extra-experimental-features flakes --command bash  -c "CARGO_PROFILE=release just movement-full-node native build.setup.eth-local.celestia-local.load --keep-tui"

Outstanding issues

0xmovses and others added 30 commits April 7, 2025 18:32
…1087)

Co-authored-by: Andreas Penzkofer <36140574+apenzk@users.noreply.github.com>
Co-authored-by: Mikhail Zabaluev <mikhail.zabaluev@movementlabs.xyz>
Co-authored-by: Liam Monninger <79056955+l-monninger@users.noreply.github.com>
Co-authored-by: Andy Golay <andygolay@gmail.com>
Co-authored-by: musitdev <philippe.delrieu@free.fr>
Co-authored-by: Richard Melkonian <35300528+0xmovses@users.noreply.github.com>
Co-authored-by: primata <primata@movementlabs.xyz>
Co-authored-by: Richard Melkonian <r.v.melkonian@gmail.com>
Co-authored-by: Radu Popa <radupopa21be@gmail.com>
Co-authored-by: Liam Monninger <l.mak.monninger@gmail.com>
"Scenarios per client", removed the "number_" prefix
which was a handful to type and didn't add clarity.
@musitdev musitdev requested review from l-monninger and sebtomba May 9, 2025 14:39
@ganymedio
Copy link
Copy Markdown
Contributor

@musitdev the unit test passes for me, but for the e2e test, I'm wondering, is expected, for setup?:

image

I was able to run the test, and it looks like some scenarios failed. I'll DM you the run logs.

Comment thread docs/movement-node/run-fullnode/manual/README.md Outdated
)
.await
tokio::spawn({
let da_signer = self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@musitdev I think it might be better to move these into a config struct for the exec_settle_task at this point. Would you like for me to drop that in a PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you want. As I say later, it's still a test to understand better why state of node changes. So it can be removed depending on the cause of the state divergence.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will drop something tomorrow.

@@ -0,0 +1,117 @@
use maptos_opt_executor::executor::ExecutionState;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add module comments for this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I do that. I'll put the explanation that I'm going to write to your next question.

Comment thread proto/movementlabs/protocol_units/da/sequencer/v1.proto
Comment thread protocol-units/da-sequencer/node/src/server.rs Outdated
bytes signature = 2;
}

message MainNodeState {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense as LeaderNodeState. Why the change to Main*?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leader* was never meant to refer to nodes that gossip state to to Follower*. That was a misconception that you guys decided to advance, additionally shutting down the transaction posting from followers.

The Leader* produces blocks as well as executing. The Follower* receives blocks and executes them.

Copy link
Copy Markdown
Contributor Author

@musitdev musitdev May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes—propagating a state isn’t our goal. I added this feature because we’ve seen edge cases that cause nodes to diverge. Without on‐chain consensus, we need a “source of truth” node to decide which state is correct. I’ve called it the main node (to avoid confusing it with the leader node, which has a different responsibility). The main node’s only job is to broadcast its state; it has no other special functions.

Originally, this was just a test harness to uncover divergence scenarios. I could have used ad-hoc debug RPCs, but I decided to build a permanent, robust solution. Even after we find and fix specific bugs, we can’t guarantee they’ll never reappear. As partners rely more on their follower nodes—and as network load increases—early detection of divergence becomes critical. We’ll use this implementation first to detect divergence, then decide whether to keep it long-term.

We disabled transaction submission on follower nodes because our current DA light-node can’t handle Celestia back-pressure when multiple batches arrive simultaneously. On mainnet we discovered partners were unintentionally using their follower nodes: Celestia blobs backed up in the client, and transactions took minutes to include in a block.

The new DA-Sequencer architecture changes this:

  • The DA-Sequencer produces and publishes blocks.
  • Full nodes handle everything else (RPC, execution, indexing, gRPC, etc.).

Comment thread protocol-units/execution/maptos/dof/src/v1.rs
use tracing::info;

#[derive(Debug, Clone)]
pub struct ExecutionState {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[derive(Serialize, Deserialize, Clone, Default, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]

BlockCommitment already has a height and an id. Commitments would match on timestamp and version too. What's the need for adding this separate struct versus including it in BlockCommitment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't share the same use case. Change on one struct shouldn't affect the other. They must be separated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain briefly why they must be separated as a doc comment /// above the struct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a comment.

Comment thread docs/movement-node/run-fullnode/README.md Outdated
@0xmovses 0xmovses requested a review from Copilot May 12, 2025 13:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements state propagation from the main node to full nodes, adding a new gRPC endpoint (send_state), configuration for the main node, and verification of state consistency between nodes.

  • Introduces a state verifier module to store and validate node states.
  • Enhances task execution in the DA sequencer client to propagate execution state and verify it.
  • Updates configuration and deployment scripts to support the new state propagation mechanism.

Reviewed Changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
networks/movement/movement-full-node/src/node/tasks/state_verifier.rs Adds a lightweight state verifier used to track and validate a limited history of node states.
networks/movement/movement-full-node/src/node/tasks/execute_settle.rs Updates the block processing logic to include validation of state propagation, as well as spawning task to send new state to the DA sequencer.
networks/movement/movement-full-node/src/node/tasks/mod.rs Registers the new state verifier module.
networks/movement/movement-full-node/src/node/partial.rs Updates task spawning to supply new configuration options related to state propagation.
Cargo.toml, shell scripts, docs Reflect configuration and dependency updates supporting the new state propagation functionality.

Comment thread networks/movement/movement-full-node/src/node/tasks/execute_settle.rs Outdated
Copy link
Copy Markdown
Contributor

@0xmovses 0xmovses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests and e2e test ran successfully. Output for e2e`movement-tests-load-tests was :

 Finished `dev` profile [unoptimized + debuginfo] target(s) in 2m 54s
     Running `target/debug/movement-tests-e2e-load-alice-bob`
  2025-05-12T13:53:16.421964Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.423368Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.425542Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.425668Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.426223Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.427455Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.427598Z  INFO movement_tests_e2e_load_alice_bob: final balances, Alice: 99999998400, Bob: 200
    at networks/movement/movement-client/src/bin/e2e/load_alice_bob.rs:162

  2025-05-12T13:53:16.428009Z  INFO movement_tests_e2e_load_alice_bob: final

I've left comments around style, implementation questions and doc improvements

Comment thread docs/movement-node/run-fullnode/README.md Outdated
Comment thread docs/movement-node/run-fullnode/README.md Outdated
Comment thread docs/movement-node/run-fullnode/README.md Outdated
Comment thread docs/movement-node/run-fullnode/README.md Outdated
Comment thread docs/movement-node/run-fullnode/README.md Outdated
Comment thread networks/movement/movement-full-node/src/node/tasks/state_verifier.rs Outdated
.map_err(|_| anyhow::anyhow!("Invalid main_node_verifying_key key"))
})
})
.transpose()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite unwieldily just to get the verify_key. If we expect this verifying_key to be accessed elsewhere, abstract the map() call into some method on main_node_verifying_key.

Comment thread protocol-units/da-sequencer/node/src/server.rs Outdated
tracing::warn!("Bad node state data, no state in it.");
return Ok(tonic::Response::new(BatchWriteResponse { answer: false }));
}
let state = state_data.state.unwrap(); //unwrap tested before.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//unwrap tested before. where, how? Maybe a more detailed comment about why its ok to unwrap() here, as I don't know what it is and would rather it be out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the statement before:

if state_data.state.is_none() {

If you prefer I can to a match; perhaps it's more clear. I change it to a match.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I liked anyhow::Context best here.

let state = state_data.context("state data should not be None")?;

use tracing::info;

#[derive(Debug, Clone)]
pub struct ExecutionState {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain briefly why they must be separated as a doc comment /// above the struct.

@ganymedio
Copy link
Copy Markdown
Contributor

The updated E2E test

CELESTIA_LOG_LEVEL=FATAL nix develop --extra-experimental-features nix-command --extra-experimental-features flakes --command bash  -c "CARGO_PROFILE=release just movement-full-node native build.setup.eth-local.celestia-local.load --keep-tui"

has passed for me:

image

Pending resolution of @l-monninger and @0xmovses change requests, I think this is good to merge.

musitdev and others added 3 commits May 13, 2025 10:23
Co-authored-by: Richard Melkonian <35300528+0xmovses@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@musitdev musitdev requested review from 0xmovses and l-monninger May 13, 2025 10:01
Comment thread networks/movement/setup/src/migrate.rs Outdated
Some(url) => url.to_string_lossy().into_owned(),
None => "http://movement-da-sequencer:30730/".to_string(),
};
tracing::info!("updating da-sequecner connection url with:{new_url}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo.

Copy link
Copy Markdown
Contributor

@0xmovses 0xmovses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@0xmovses 0xmovses merged commit 56b16b3 into feature/da-sequencer-node May 13, 2025
37 of 38 checks passed
@musitdev musitdev deleted the musitdev/propagate_state branch June 13, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants