Skip to content

Conversation

@0xOmarA
Copy link
Contributor

@0xOmarA 0xOmarA commented Jul 12, 2025

Summary

This PR persists the logs that are produced by the various nodes that we start which aids in the debugging process of certain errors that we're seeing.

Description

This change was inspired by the following error:

  2025-07-12T16:13:38.838203Z ERROR revive_dt_core::driver: Failed to execute transaction when deploying the contract on node : "revive_dt_core::Kitchensink", "Callable", server returned an error response: error code 1010: Invalid Transaction
    at crates/core/src/driver/mod.rs:281 on main ThreadId(1)
    in revive_dt_core::driver::Executing case with case: "first", case_idx: 0
    in retester::Running driver with metadata_file_path: "era-compiler-tests/solidity/complex/array_one_element/test.json"

Where I'm seeing that some transaction is being reported as invalid. However, I have no way of knowing what exactly about the transaction is invalid.

  • Is it the structure of the transaction?
  • Is it its values?
  • Are the signatures perhaps invalid?
  • Was it rejected by the node or by some logic in the RPC?
  • Is the gas provided invalid?
  • Is it something about the bytecode of the contract that we're deploying?

I think that many of these questions can be answered by us keeping the logs from the nodes that we run in hopes of the RPC or node logs shining more light on what exactly is the issue with the transaction so we can fix it.

In the initial version of this PR all I did was that I piped stdout and stderr of the std::process::Command that we're spawning to a file in hopes of getting those logs. From that point, any kind of "readiness" check that we would perform would be done on the file and its content rather than on stdout or stderr. But, I discovered that the logs are not getting flushed to the file immediately. Additionally, std::process::Command::kill will not send the appropriate signals to the program that it should flush its stdout and stderr to the file and therefore these logs would be lost.

Thus, I switched to using the subprocess crate which allows for the processes to be gracefully terminated through the subprocess::Popen::terminate function and therefore allows for the logs to be flushed to the file.

So, the changes made in this PR are:

  • Added the tracing::instrument macro to all of the functions that the nodes have. This is important as we want to be able to look at the logs and say "This TX failure happened on node 5, let's look at the logs for node 5". So, it allows us to link together the logs and errors with the nodes on which they happened to allow for better debugging.
  • All of the processes that we start for the node have their stdout and stderr piped to a file.
  • We've changed to using subprocess for the processes that we spawn in order to be able to gracefully terminate the processes and collect their logs and avoid any issues with flushing.

With this change, I've already been able to get more useful logs to help me in debugging the above (which I plan on fixing in a future PR), which is the following log:

2025-07-12 19:13:38.836 DEBUG tokio-runtime-worker runtime::revive: try_into_checked_extrinsic: gas_price: 5000000, encoded_len: 1944 actual_fee: 19441617794180 eth_fee: 25000000    
2025-07-12 19:13:38.836 DEBUG tokio-runtime-worker runtime::revive: eth fees 25000000 too low, actual fees: 19441617794180    
2025-07-12 19:13:38.840 ERROR tokio-runtime-worker sc_service::task_manager: Essential task `txpool-background` failed. Shutting down service.    

Which make it clear that the above transaction that was failing was actually failing due to the gas not being enough for the execution.

Copy link
Member

@xermicus xermicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this helps debugging

Cargo.toml Outdated
"json",
"env-filter",
] }
subprocess = { version = "0.2.9" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was last released 3 years ago. I'd prefer if we could just rawdog the std library for process management, I find it sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was primarily using subprocess as I found that it allows for a way to perform graceful termination of the processes which allows them to flush their stdout and stderr buffers to the files (std library doesn't allow for graceful termination at the moment and I encountered cases where the flushing never happens if we just kill the process).

I think another alternative would be for us to use the standard library for process management and then have some OS specific logic for terminating the processes (which is slightly more overhead). It could be something like:

#[cfg(target_os = "windows")]
fn shutdown() {
    // Code that uses the winapis to perform the graceful termination
}

#[cfg(target_family = "unix")]
fn shutdown() {
    // Code that uses the unix APIs to perform the graceful termination
}

So it seemed better to use subprocess since we need the graceful termination for the logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be just a file handle which can be flushed.

On that note, should probably be wrapped in a bufwriter too.

Copy link
Member

@xermicus xermicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I like it. Modulo the added subprocess dependency which is a bit of a thorn in my side. I'd appreciate if we could just use the stdlib.

@0xOmarA
Copy link
Contributor Author

0xOmarA commented Jul 14, 2025

@xermicus Thanks for approving! Yeah I will try to keep using std::process::Command before merging and will report back with the results.

@0xOmarA
Copy link
Contributor Author

0xOmarA commented Jul 14, 2025

Hey @xermicus, I have reverted the subprocess change and we now use std::process::Command for the processes that we spawn.

Sadly, we can't use std::io::BufWriter<File> for stdout or stderr as they do not implement the Into<Stdio> trait. Additionally, to my understanding, we do not need to buffer the output in the parent process as I assume that the child process is handling all of the buffering on its side where it either decides to do line or block buffering based on where the output is being piped, so I believe that the child processes are already handling the buffering on their end. So, say that we do something like child.stdout.take().unwrap().flush() (which is not possible since stdout is of the type ChildStdout which has no flush method), flushing won't work since everything is buffered in the child process.

So, from my perspective, for us to correctly handle the buffering and flushing we have a few options:

  1. Using subprocess for graceful termination which should force the child process to flush its buffers to stdout and stderr files since they will be block buffered if we're pointing them to anything that's not a terminal. Graceful shut down would handle that for us.
  2. Continue using std::process::Command but then use OS specific API libraries to perform the graceful shutdown of the processes and sending them the appropriate signals to perform any cleanup they need to do.
  3. Wrap our commands in stdbuf (e.g., stdbuf -oL substrate-node) which would allow us to control how the program does its buffering and whether it should use no buffering, line buffering, or block buffering (This is not an ideal solution as it looks like it only works for programs using glibc for their IO, and go for example, doesn't do that)

What are your thoughts around these?

@xermicus
Copy link
Member

What are your thoughts around these?

I am not sure if we complicate things. The file handle can be cloned. It should suffice to keep the cloned file handle and flush / sync it whenever the node should have stopped (gracefully or not)?

@xermicus
Copy link
Member

What are your thoughts around these?

I am not sure if we complicate things. The file handle can be cloned. It should suffice to keep the cloned file handle and flush / sync it whenever the node should have stopped (gracefully or not)?

Just checked the subprocess crate. I think it's exactly what they do

@0xOmarA
Copy link
Contributor Author

0xOmarA commented Jul 14, 2025

Just checked the subprocess crate. I think it's exactly what they do

Gotcha, let me take a closer look at that.

@0xOmarA
Copy link
Contributor Author

0xOmarA commented Jul 14, 2025

@xermicus thanks for the recommendation, I think that this worked! I have made the changes you recommended in my latest commit. If all looks good to you I will go ahead and merge :)

@xermicus
Copy link
Member

@xermicus thanks for the recommendation, I think that this worked! I have made the changes you recommended in my latest commit. If all looks good to you I will go ahead and merge :)

Nice thanks a lot

@0xOmarA 0xOmarA added this pull request to the merge queue Jul 14, 2025
Merged via the queue into main with commit b204de5 Jul 14, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants