-
Notifications
You must be signed in to change notification settings - Fork 2
Persist node logs #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xermicus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, this helps debugging
Cargo.toml
Outdated
| "json", | ||
| "env-filter", | ||
| ] } | ||
| subprocess = { version = "0.2.9" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was last released 3 years ago. I'd prefer if we could just rawdog the std library for process management, I find it sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was primarily using subprocess as I found that it allows for a way to perform graceful termination of the processes which allows them to flush their stdout and stderr buffers to the files (std library doesn't allow for graceful termination at the moment and I encountered cases where the flushing never happens if we just kill the process).
I think another alternative would be for us to use the standard library for process management and then have some OS specific logic for terminating the processes (which is slightly more overhead). It could be something like:
#[cfg(target_os = "windows")]
fn shutdown() {
// Code that uses the winapis to perform the graceful termination
}
#[cfg(target_family = "unix")]
fn shutdown() {
// Code that uses the unix APIs to perform the graceful termination
}So it seemed better to use subprocess since we need the graceful termination for the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be just a file handle which can be flushed.
On that note, should probably be wrapped in a bufwriter too.
xermicus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I like it. Modulo the added subprocess dependency which is a bit of a thorn in my side. I'd appreciate if we could just use the stdlib.
|
@xermicus Thanks for approving! Yeah I will try to keep using |
|
Hey @xermicus, I have reverted the Sadly, we can't use So, from my perspective, for us to correctly handle the buffering and flushing we have a few options:
What are your thoughts around these? |
I am not sure if we complicate things. The file handle can be cloned. It should suffice to keep the cloned file handle and flush / sync it whenever the node should have stopped (gracefully or not)? |
Just checked the subprocess crate. I think it's exactly what they do |
Gotcha, let me take a closer look at that. |
|
@xermicus thanks for the recommendation, I think that this worked! I have made the changes you recommended in my latest commit. If all looks good to you I will go ahead and merge :) |
Nice thanks a lot |
Summary
This PR persists the logs that are produced by the various nodes that we start which aids in the debugging process of certain errors that we're seeing.
Description
This change was inspired by the following error:
Where I'm seeing that some transaction is being reported as invalid. However, I have no way of knowing what exactly about the transaction is invalid.
I think that many of these questions can be answered by us keeping the logs from the nodes that we run in hopes of the RPC or node logs shining more light on what exactly is the issue with the transaction so we can fix it.
In the initial version of this PR all I did was that I piped stdout and stderr of the
std::process::Commandthat we're spawning to a file in hopes of getting those logs. From that point, any kind of "readiness" check that we would perform would be done on the file and its content rather than on stdout or stderr. But, I discovered that the logs are not getting flushed to the file immediately. Additionally,std::process::Command::killwill not send the appropriate signals to the program that it should flush its stdout and stderr to the file and therefore these logs would be lost.Thus, I switched to using the
subprocesscrate which allows for the processes to be gracefully terminated through thesubprocess::Popen::terminatefunction and therefore allows for the logs to be flushed to the file.So, the changes made in this PR are:
tracing::instrumentmacro to all of the functions that the nodes have. This is important as we want to be able to look at the logs and say "This TX failure happened on node 5, let's look at the logs for node 5". So, it allows us to link together the logs and errors with the nodes on which they happened to allow for better debugging.subprocessfor the processes that we spawn in order to be able to gracefully terminate the processes and collect their logs and avoid any issues with flushing.With this change, I've already been able to get more useful logs to help me in debugging the above (which I plan on fixing in a future PR), which is the following log:
Which make it clear that the above transaction that was failing was actually failing due to the gas not being enough for the execution.