-
Notifications
You must be signed in to change notification settings - Fork 0
Create test-runner binary #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: pg/fmt
Are you sure you want to change the base?
Conversation
this take quite a while can't we speed that up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File by file (more sequentially) is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to determine if this is really needed or if retester
can already fulfill all of the requirements here.
My main concern is the following: it seems like it takes a lot of code to do this (since large parts of the execution need to be copied), but it seems like the goals of this PR are either already possible or could be made possible through small changes to retester, so it might be good to discuss the goals in detail and determine what the right approach is.
So, I will discuss the points from your markdown document one by one:
Running tests file-by-file (rather than in bulk)
I believe that this is already possible through retester
. If you pass in --concurrency.number-of-concurrent-tasks 1
then the execution will happen sequentially.
Caching passed tests to skip them in future runs
This is indeed something that retester can't currently do. So, let me ask some questions:
- Is this something essential and do we think that it harms regression testing?
- Does it have enough impact on how long the tool takes to run to justify it?
- Can't this be something that we add to the retester tool quite easily so that these test cases are ignored?
- Can't we go with a simpler implementation of this: the framework produces a JSON report at the end of a run. Can't this report be passed to the tool and it can then use it to only execute the tests that failed?
Providing cargo-test-style output for easy integration with ML pipelines
I think that there are various solutions to this concern:
- We can modify the output from the retester to look as close as possible to
cargo test
. - Instead of changing the execution logic, we can implement a simple python script that goes through the JSON report after a run finishes and translates the results to a
cargo test
style output, which is one of the reasons why we developed the JSON reports, and it's quite similar to how cargo test does it.
Single platform testing (rather than differential testing)
This is currently possible through the retester
binary. You just need to pass a single --platform
argument and the tests will not be differential, rather, it would just be for a single platform.
Fail fast: Stop on first failure with
--bail
I think that this could be a simple change to make to retester
to make it stop once a single failure is encountered.
- File-by-file execution: Run tests on individual
.sol
files, corpus files (.json
), or recursively walk directories
I actually agree with this and think that we could remove the need for corpus files from retester.
(You didn't mention it but I added it here) The ability to control the node spawning process.
I think that for this we should look at why we need it and then work backwards on what's the best solution for it would be.
I think that your use case for it is being able to proxy the ETH RPC. What command do you use to that? Perhaps this could be something that we allow retester to do, or perhaps we can introduce a proxy layer into the providers and make them write all of the requests and responses to a separate file.
|
||
This is similar to the `retester` binary but designed for ML-based test execution with a focus on: | ||
- Running tests file-by-file (rather than in bulk) | ||
- Caching passed tests to skip them in future runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this point, I'm not sure if we should cache the test results.
We're using this tool to identify issues our REVM and PolkaVM implementations and part of that is testing for regressions.
Say we're working on a fix for Test1, and we implement this fix locally in the PolkadotSDK, then we'd want to run Test1 to verify that our fix for the issue is correct but also run all of the other tests to ensure that it didn't cause any regressions.
So, I'm not sure if we should cache the test results since it seems like it would mean that regressions would not be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caching is mostly useful for development, the cache file should not be used in CI
The runner produces cargo-test-style output: | ||
|
||
``` | ||
test path/to/test1.sol ... ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the output, I think that it should look more like what we get from cargo test
since it currently conceals which test case actually failed. To use an analogy, this is currently like saying that a tests from a rust module failed without reporting which unit test exactly failed.
So, I think that the output should look something like:
Running path/to/test1.sol
running 2 tests
test case 0 ... ok
test case 1 ... ok
This way it's clear which specific test from the file failed and which succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't care too much about the test case, showing the details can be enabled if --verbose is passed
} | ||
|
||
let tx = { | ||
let deployer = self.platform_information.node.resolve_signer_or_default(deployer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With how we setup the resolc-compiler-tests
repo and with the constructor logic for the EthereumWallet
, all of the addresses present in the metadata files should be addresses that we have the private keys to. In that case, why do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned, this is pretty bad, It looks like this is something we added for avoiding incrementing nonce.
but the end result is that:
- it make things slower to boot (at least in the current setup) since you initialize the wallet with a 100000 keypairs
- it require a custom genesis
- if I want to replay one of this transaction I now need to go fetch this keypair to sign it
I don't really get the point of maximizing parallelization here, and what purpose it serve.
|
||
// Resolve the signer to ensure we use an address that has keys | ||
if let Some(from) = tx.from { | ||
tx.from = Some(self.platform_information.node.resolve_signer_or_default(from)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I feel about this logic. I think that instead of doing this we should use use the wallet constructor logic provided in the WalletConfiguration
struct from the config
crate since it allows us to have an EthereumWallet
with the keys that we need.
I say this since I think that the tool should not do any kind of address replacement logic under the hood and it should be a hard error if the test specifies an address that the tool doesn't have the private key for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have an EthereumWallet with the keys that we need.
see comment above
} | ||
|
||
/// Discover test files from the given path | ||
fn discover_test_files(path: &Path) -> anyhow::Result<Vec<PathBuf>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can reuse the logic for this defined in the revive-dt-format
crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that cover it then fine,
as a user I want to either
- specify a sol file
- specify a folder with a test.json
- specify a root folder with either
the corpus.json thing is not really needed imo
let node: &'static dyn revive_dt_node_interaction::EthereumNode = if args.start_platform { | ||
info!("Starting blockchain node..."); | ||
let node_handle = | ||
platform.new_node(context.clone()).context("Failed to spawn node thread")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that you use the NodePool
here since it does more logic than just platform.new_node
and calls more functions that need to be called during start up.
let existing_node: Box<dyn revive_dt_node_interaction::EthereumNode> = match args.platform { | ||
PlatformIdentifier::GethEvmSolc | PlatformIdentifier::LighthouseGethEvmSolc => | ||
Box::new( | ||
revive_dt_node::node_implementations::geth::GethNode::new_existing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that using an existing node is done since you want to use a proxy in front of the ETH RPC, is that correct?
In this case, what command do you use to do that? I want to see if we truly need to allow for nodes to be spawned by users or if we can come up with a solution to keep the nodes managed by the framework.
I ask this since having the nodes being managed by the framework allows us to pre-fund accounts during genesis which is really desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much more practical for dev, the other mode is mostly useful for CI
it makes it much easier to inspect the rpc queries, plus I don't need to wait for the server to start every time I run the tool, I can just start it once and let it run. If I want to run a different batch of test, then I can do that without having to pay the bootstrap cost
// Check if already passed | ||
{ | ||
let cache = cached_passed.lock().await; | ||
if cache.contains(&file_display) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that caching the passed tests would make it harder for us to catch regressions.
If we decide to continue with the caching, then I think that the cache key should not be just the file path, but rather:
struct Cachekey {
metadata_file_path: PathBuf,
case_idx: CaseIdx,
mode: Mode
}
This allows you to cache what test cases exactly passed rather than using the file as the cache key.
If you use the file path as the cache key then if the file contains 100 test cases and 2 modes and 1 test case fails in 1 mode then the entire file will be marked as failed making requiring you to rerun the entire 200 test cases again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments above, this is mostly useful for debugging, I can fix my code and just re-run the same command
.context("Failed to create cached compiler")?; | ||
|
||
let private_key_allocator = | ||
Arc::new(Mutex::new(PrivateKeyAllocator::new(alloy::primitives::U256::from(100)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the exact number of the accounts funded during genesis (assuming that private keys were funded in order), which is why I really want us to keep the nodes managed by the framework to make errors like this less likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I don't make use of that, I just use a single account to sign
} | ||
|
||
/// Build a test definition for a single test case | ||
async fn build_test_definition<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should re-use the existing logic that we have in the core crate for reading the metadata files and transforming them into test case definitions since that logic is already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure mostly asked the AI to do the grunge work didn't fully checked if it was reusing as much code as possible
this is mostly useful for testing, not CI, I want to re-run the same command and skip already passed tests
Maybe the right thing to ask here, is who are the user of retester and do we need this complex machinery if just providing simple cargo test output is more practical. I would avoid making things more complicated to the user by piping another python script into the output
see comments in the threads, having the ability to run the server manually is way more practical for development, it let you inspect the rpc queries and make the tool faster to run and re-run as it does not have to boot the server every time. |
Looks like currently there are some issue with nonce, haven't had time to investigate but it seems that it's still suffering from some kind of concurrency issues this is what I am running against geth # run geth
geth --http --http.api web3,eth,txpool,miner,debug,net
# test
cargo run --release --bin ml-test-runner -- ../resolc-compiler-tests/fixtures/solidity --platform geth-evm-solc --cached-passed .passed-geth there are a few tests that fails as well cause they are expecting a specific deployer, these should be changed to use the default signer account |
Gotcha, I agree that this would be useful. I think that the passed tests caching could be implemented in retester relatively easily though?
I agree that having it is valuable, I'm primarily thinking about the easiest and simplest way that we can implement it. Perhaps having an
Perhaps this is something that we can support too in retester? Something like |
@pgherveou I took a stab at implementing some of the functionality you mentioned here in retester and it was quite simple to do and the maintenance burden for it is also quite low which I like since it reuses a lot of the code that we wrote for retester.
The PRs are quite small and contained so it's nice to see that these features do not affect large parts of the code. The one remaining feature that would make this equivalent to this PR would be the ability to use an existing node, which I think this PR already has good ideas for, so I'm not sure about the need for a separate binary to do this. |
Nice can we make this last task a priority? then we can then close this PR if not needed |
if we can also discover tests by passing a folder that would be handy as well |
Add a test runner for executing Revive differential tests file-by-file with cargo-test-style output.
This is similar to the
retester
binary but designed for ML-based test execution with a focus on:e.g output:
depends on #180