Persist node logs #36

0xOmarA · 2025-07-12T18:06:00Z

Summary

This PR persists the logs that are produced by the various nodes that we start which aids in the debugging process of certain errors that we're seeing.

Description

This change was inspired by the following error:

  2025-07-12T16:13:38.838203Z ERROR revive_dt_core::driver: Failed to execute transaction when deploying the contract on node : "revive_dt_core::Kitchensink", "Callable", server returned an error response: error code 1010: Invalid Transaction
    at crates/core/src/driver/mod.rs:281 on main ThreadId(1)
    in revive_dt_core::driver::Executing case with case: "first", case_idx: 0
    in retester::Running driver with metadata_file_path: "era-compiler-tests/solidity/complex/array_one_element/test.json"

Where I'm seeing that some transaction is being reported as invalid. However, I have no way of knowing what exactly about the transaction is invalid.

Is it the structure of the transaction?
Is it its values?
Are the signatures perhaps invalid?
Was it rejected by the node or by some logic in the RPC?
Is the gas provided invalid?
Is it something about the bytecode of the contract that we're deploying?

I think that many of these questions can be answered by us keeping the logs from the nodes that we run in hopes of the RPC or node logs shining more light on what exactly is the issue with the transaction so we can fix it.

In the initial version of this PR all I did was that I piped stdout and stderr of the std::process::Command that we're spawning to a file in hopes of getting those logs. From that point, any kind of "readiness" check that we would perform would be done on the file and its content rather than on stdout or stderr. But, I discovered that the logs are not getting flushed to the file immediately. Additionally, std::process::Command::kill will not send the appropriate signals to the program that it should flush its stdout and stderr to the file and therefore these logs would be lost.

Thus, I switched to using the subprocess crate which allows for the processes to be gracefully terminated through the subprocess::Popen::terminate function and therefore allows for the logs to be flushed to the file.

So, the changes made in this PR are:

Added the tracing::instrument macro to all of the functions that the nodes have. This is important as we want to be able to look at the logs and say "This TX failure happened on node 5, let's look at the logs for node 5". So, it allows us to link together the logs and errors with the nodes on which they happened to allow for better debugging.
All of the processes that we start for the node have their stdout and stderr piped to a file.
We've changed to using subprocess for the processes that we spawn in order to be able to gracefully terminate the processes and collect their logs and avoid any issues with flushing.

With this change, I've already been able to get more useful logs to help me in debugging the above (which I plan on fixing in a future PR), which is the following log:

2025-07-12 19:13:38.836 DEBUG tokio-runtime-worker runtime::revive: try_into_checked_extrinsic: gas_price: 5000000, encoded_len: 1944 actual_fee: 19441617794180 eth_fee: 25000000    
2025-07-12 19:13:38.836 DEBUG tokio-runtime-worker runtime::revive: eth fees 25000000 too low, actual fees: 19441617794180    
2025-07-12 19:13:38.840 ERROR tokio-runtime-worker sc_service::task_manager: Essential task `txpool-background` failed. Shutting down service.

Which make it clear that the above transaction that was failing was actually failing due to the gas not being enough for the execution.

xermicus

Good idea, this helps debugging

xermicus · 2025-07-14T08:45:45Z

Cargo.toml

    "json",
    "env-filter",
 ] }
+subprocess = { version = "0.2.9" }


This was last released 3 years ago. I'd prefer if we could just rawdog the std library for process management, I find it sufficient.

I was primarily using subprocess as I found that it allows for a way to perform graceful termination of the processes which allows them to flush their stdout and stderr buffers to the files (std library doesn't allow for graceful termination at the moment and I encountered cases where the flushing never happens if we just kill the process).

I think another alternative would be for us to use the standard library for process management and then have some OS specific logic for terminating the processes (which is slightly more overhead). It could be something like:

#[cfg(target_os = "windows")] fn shutdown() { // Code that uses the winapis to perform the graceful termination } #[cfg(target_family = "unix")] fn shutdown() { // Code that uses the unix APIs to perform the graceful termination }

So it seemed better to use subprocess since we need the graceful termination for the logs.

It should be just a file handle which can be flushed.

On that note, should probably be wrapped in a bufwriter too.

crates/node/src/geth.rs

…logs

xermicus

Thanks. I like it. Modulo the added subprocess dependency which is a bit of a thorn in my side. I'd appreciate if we could just use the stdlib.

0xOmarA · 2025-07-14T11:46:41Z

@xermicus Thanks for approving! Yeah I will try to keep using std::process::Command before merging and will report back with the results.

0xOmarA · 2025-07-14T14:07:47Z

Hey @xermicus, I have reverted the subprocess change and we now use std::process::Command for the processes that we spawn.

Sadly, we can't use std::io::BufWriter<File> for stdout or stderr as they do not implement the Into<Stdio> trait. Additionally, to my understanding, we do not need to buffer the output in the parent process as I assume that the child process is handling all of the buffering on its side where it either decides to do line or block buffering based on where the output is being piped, so I believe that the child processes are already handling the buffering on their end. So, say that we do something like child.stdout.take().unwrap().flush() (which is not possible since stdout is of the type ChildStdout which has no flush method), flushing won't work since everything is buffered in the child process.

So, from my perspective, for us to correctly handle the buffering and flushing we have a few options:

Using subprocess for graceful termination which should force the child process to flush its buffers to stdout and stderr files since they will be block buffered if we're pointing them to anything that's not a terminal. Graceful shut down would handle that for us.
Continue using std::process::Command but then use OS specific API libraries to perform the graceful shutdown of the processes and sending them the appropriate signals to perform any cleanup they need to do.
Wrap our commands in stdbuf (e.g., stdbuf -oL substrate-node) which would allow us to control how the program does its buffering and whether it should use no buffering, line buffering, or block buffering (This is not an ideal solution as it looks like it only works for programs using glibc for their IO, and go for example, doesn't do that)

What are your thoughts around these?

crates/node/src/kitchensink.rs

xermicus · 2025-07-14T14:21:57Z

What are your thoughts around these?

I am not sure if we complicate things. The file handle can be cloned. It should suffice to keep the cloned file handle and flush / sync it whenever the node should have stopped (gracefully or not)?

xermicus · 2025-07-14T14:25:38Z

What are your thoughts around these?

I am not sure if we complicate things. The file handle can be cloned. It should suffice to keep the cloned file handle and flush / sync it whenever the node should have stopped (gracefully or not)?

Just checked the subprocess crate. I think it's exactly what they do

0xOmarA · 2025-07-14T14:28:37Z

Just checked the subprocess crate. I think it's exactly what they do

Gotcha, let me take a closer look at that.

0xOmarA · 2025-07-14T14:53:52Z

@xermicus thanks for the recommendation, I think that this worked! I have made the changes you recommended in my latest commit. If all looks good to you I will go ahead and merge :)

xermicus · 2025-07-14T15:17:09Z

@xermicus thanks for the recommendation, I think that this worked! I have made the changes you recommended in my latest commit. If all looks good to you I will go ahead and merge :)

Nice thanks a lot

crates/node/src/geth.rs

0xOmarA added 4 commits July 12, 2025 21:04

Persist node logs

50e1f0c

Fix clippy lints

7c4dc35

Delete the node's db on shutdown but persist logs

d71b39d

Fix tests

a1ad9f2

xermicus reviewed Jul 14, 2025

View reviewed changes

0xOmarA added 3 commits July 14, 2025 13:57

Merge remote-tracking branch 'origin/main' into feature/persist-node-…

1081c56

…logs

Separate stdout and stderr and use more consts.

7c97858

More consistent handling of open options

4d2759d

xermicus approved these changes Jul 14, 2025

View reviewed changes

Revert the use of subprocess

e2795cd

xermicus reviewed Jul 14, 2025

View reviewed changes

crates/node/src/kitchensink.rs Outdated Show resolved Hide resolved

0xOmarA added 2 commits July 14, 2025 17:29

Remove outdated comment

4ff81e5

Flush the log files on drop

c30c227

xermicus approved these changes Jul 14, 2025

View reviewed changes

xermicus reviewed Jul 14, 2025

View reviewed changes

crates/node/src/geth.rs Outdated Show resolved Hide resolved

Rename log_files -> logs_file_to_flush

3138ff6

0xOmarA added this pull request to the merge queue Jul 14, 2025

Merged via the queue into main with commit b204de5 Jul 14, 2025
5 checks passed

Persist node logs #36

Persist node logs #36

Uh oh!

Conversation

0xOmarA commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

Uh oh!

xermicus left a comment

Choose a reason for hiding this comment

Uh oh!

xermicus Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

0xOmarA Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

xermicus Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xermicus left a comment

Choose a reason for hiding this comment

Uh oh!

0xOmarA commented Jul 14, 2025

Uh oh!

0xOmarA commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xermicus commented Jul 14, 2025

Uh oh!

xermicus commented Jul 14, 2025

Uh oh!

0xOmarA commented Jul 14, 2025

Uh oh!

0xOmarA commented Jul 14, 2025

Uh oh!

xermicus commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0xOmarA commented Jul 12, 2025 •

edited

Loading

0xOmarA commented Jul 14, 2025 •

edited

Loading