Create test-runner binary #179

pgherveou · 2025-10-08T06:38:58Z

Add a test runner for executing Revive differential tests file-by-file with cargo-test-style output.

This is similar to the retester binary but designed for ML-based test execution with a focus on:

Running tests file-by-file
Caching passed tests to skip them in future runs in development
Providing cargo-test-style output for easy integration with CI pipelines

e.g output:

❯ cargo run --bin ml-test-runner --  /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function  --platform kitchensink-revm-solc
   Compiling ml-test-runner v0.1.0 (/home/pg/github/revive-differential-tests/crates/ml-test-runner)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 10.47s
     Running `target/debug/ml-test-runner /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function --platform kitchensink-revm-solc`
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/simple_types.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/simple.sol ... ok
    Skipping case 0: Case is ignored.
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/msg_value_overflow.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_4.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_3_complex.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_3.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_2_complex.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_2.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_1_types.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_1_complex_types.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_1_complex.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/many_arguments_1.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/function_type/f6.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/function_type/f5.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/function_type/f4.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/function_type/f3.sol ... ok
    Skipping case 0: One of the platforms do do not support the targets allowed by the test.
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/function_type/f2.sol ... ok
test /home/pg/github/resolc-compiler-tests/fixtures/solidity/simple/function/function_type/f1.sol ... ok

test result: ok. 18 passed; 0 failed; 0 cached; finished in 64.97s

depends on #180

pgherveou · 2025-10-08T10:25:16Z

2025-10-08T10:23:33.740899Z  INFO ml_test_runner: Building test definition for case 0
2025-10-08T10:23:42.882164Z  INFO revive_dt_compiler::solc: Created a new solc compiler object solc_path=/tmp/t2bf601-1/solc/0.8.29/linux-amd64 solc_version=0.8.29

this take quite a while can't we speed that up?

xermicus

File by file (more sequentially) is better.

0xOmarA

I'm trying to determine if this is really needed or if retester can already fulfill all of the requirements here.

My main concern is the following: it seems like it takes a lot of code to do this (since large parts of the execution need to be copied), but it seems like the goals of this PR are either already possible or could be made possible through small changes to retester, so it might be good to discuss the goals in detail and determine what the right approach is.

So, I will discuss the points from your markdown document one by one:

Running tests file-by-file (rather than in bulk)

I believe that this is already possible through retester. If you pass in --concurrency.number-of-concurrent-tasks 1 then the execution will happen sequentially.

Caching passed tests to skip them in future runs

This is indeed something that retester can't currently do. So, let me ask some questions:

Is this something essential and do we think that it harms regression testing?
Does it have enough impact on how long the tool takes to run to justify it?
Can't this be something that we add to the retester tool quite easily so that these test cases are ignored?
Can't we go with a simpler implementation of this: the framework produces a JSON report at the end of a run. Can't this report be passed to the tool and it can then use it to only execute the tests that failed?

Providing cargo-test-style output for easy integration with ML pipelines

I think that there are various solutions to this concern:

We can modify the output from the retester to look as close as possible to cargo test.
Instead of changing the execution logic, we can implement a simple python script that goes through the JSON report after a run finishes and translates the results to a cargo test style output, which is one of the reasons why we developed the JSON reports, and it's quite similar to how cargo test does it.

Single platform testing (rather than differential testing)

This is currently possible through the retester binary. You just need to pass a single --platform argument and the tests will not be differential, rather, it would just be for a single platform.

Fail fast: Stop on first failure with --bail

I think that this could be a simple change to make to retester to make it stop once a single failure is encountered.

File-by-file execution: Run tests on individual .sol files, corpus files (.json), or recursively walk directories

I actually agree with this and think that we could remove the need for corpus files from retester.

(You didn't mention it but I added it here) The ability to control the node spawning process.

I think that for this we should look at why we need it and then work backwards on what's the best solution for it would be.

I think that your use case for it is being able to proxy the ETH RPC. What command do you use to that? Perhaps this could be something that we allow retester to do, or perhaps we can introduce a proxy layer into the providers and make them write all of the requests and responses to a separate file.

0xOmarA · 2025-10-08T15:09:53Z

crates/ml-test-runner/README.md

+
+This is similar to the `retester` binary but designed for ML-based test execution with a focus on:
+- Running tests file-by-file (rather than in bulk)
+- Caching passed tests to skip them in future runs


On this point, I'm not sure if we should cache the test results.

We're using this tool to identify issues our REVM and PolkaVM implementations and part of that is testing for regressions.

Say we're working on a fix for Test1, and we implement this fix locally in the PolkadotSDK, then we'd want to run Test1 to verify that our fix for the issue is correct but also run all of the other tests to ensure that it didn't cause any regressions.

So, I'm not sure if we should cache the test results since it seems like it would mean that regressions would not be detected.

caching is mostly useful for development, the cache file should not be used in CI

0xOmarA · 2025-10-08T15:15:42Z

crates/ml-test-runner/README.md

+The runner produces cargo-test-style output:
+
+```
+test path/to/test1.sol ... ok


On the output, I think that it should look more like what we get from cargo test since it currently conceals which test case actually failed. To use an analogy, this is currently like saying that a tests from a rust module failed without reporting which unit test exactly failed.

So, I think that the output should look something like:

Running path/to/test1.sol running 2 tests test case 0 ... ok test case 1 ... ok

This way it's clear which specific test from the file failed and which succeeded.

I think I don't care too much about the test case, showing the details can be enabled if --verbose is passed

0xOmarA · 2025-10-08T15:17:49Z

crates/core/src/differential_tests/driver.rs

 		}

 		let tx = {
+			let deployer = self.platform_information.node.resolve_signer_or_default(deployer);


With how we setup the resolc-compiler-tests repo and with the constructor logic for the EthereumWallet, all of the addresses present in the metadata files should be addresses that we have the private keys to. In that case, why do we need to do this?

as mentioned, this is pretty bad, It looks like this is something we added for avoiding incrementing nonce.
but the end result is that:

it make things slower to boot (at least in the current setup) since you initialize the wallet with a 100000 keypairs

it require a custom genesis

if I want to replay one of this transaction I now need to go fetch this keypair to sign it

I don't really get the point of maximizing parallelization here, and what purpose it serve.

0xOmarA · 2025-10-08T15:20:29Z

crates/core/src/differential_tests/driver.rs


+				// Resolve the signer to ensure we use an address that has keys
+				if let Some(from) = tx.from {
+					tx.from = Some(self.platform_information.node.resolve_signer_or_default(from));


Not sure how I feel about this logic. I think that instead of doing this we should use use the wallet constructor logic provided in the WalletConfiguration struct from the config crate since it allows us to have an EthereumWallet with the keys that we need.

I say this since I think that the tool should not do any kind of address replacement logic under the hood and it should be a hard error if the test specifies an address that the tool doesn't have the private key for.

have an EthereumWallet with the keys that we need.

see comment above

0xOmarA · 2025-10-08T15:25:40Z