Feedback on fuzzer benchmarking setup

I'm trying to compare Hybrid-Echidna with Echidna and the Forge fuzzer on several benchmark contracts.

To make the comparison as fair as possible, I've created a benchmark generator that automatically generates challenging contracts. The benchmarks intentionally use a limited subset of Solidity to avoid language features that could be handled differently by different tools. Each contract contains ~50 assertions (some can fail, but others cannot due to infeasible path conditions). (If you're curious, you can find one of the benchmarks [here](https://gist.githubusercontent.com/wuestholz/aec07f7d3572af8d477e8e0a387fb7ab/raw/1832587591dd04ab5cd4c2ecee4cae69321d942d/maze-0.sol). The benchmark-generation approach is inspired by the [Fuzzle](https://softsec.kaist.ac.kr/~sangkilc/papers/lee-ase22.pdf) benchmark generator for C-based fuzzers.) To find the assertions that can fail, a fuzzer needs to generate up to ~15 transactions and satisfy some input constraints for each transaction.

Since I'm not deeply familiar with Hybrid-Echidna I'd like to check if there are any potential issues with my benchmark setup before sharing results.

Since Hybrid-Echidna does not support limiting the execution time (see issue at https://github.com/crytic/optik/issues/101), I'm repeatedly running the fuzzer for shorter periods until the time limit for all fuzzers (for instance, 1 hour for each contract). For each of these shorter fuzzing campaigns I'm using the following settings that deviate from the defaults:
- `seq-len`: 100 (instead of 10)
- `test-limit`: 50000 for the first short campaign and 500 for all subsequent ones (instead of 50000)
- `max-iters`: 3 for the first short campaign and 1 for all subsequent ones (instead of leaving the option unspecified)
- `no-incremental`: false for the first short campaign and true for all subsequent ones (instead of false)
- `codeSize` (Echidna): 0xc00000 (instead of 0x6000)

I increased `seq-len` to 100 since some assertions may require up to ~15 transactions, and some generated transactions may fail. Echidna uses 100 by default.

I observed very high memory consumption when leaving `max-iters` unspecified or using larger values. For this reason, I bound the number of iterations.

I reduced `test-limit` to make sure that the short campaigns terminate reasonably fast. I also observed increased memory consumption for larger values.

I enable `no-incremental` for subsequent short campaigns since the first campaign will have already performed incremental seeding once.

I also increased the `codeSize` setting to handle larger contracts, if necessary. Currently, all benchmark contracts are below the EVM limit when using the solc optimizer (0.8.19).

Somewhat surprisingly, Echidna performs much better on these benchmarks than Hybrid-Echidna. It would be great to understand why. For instance, I tried setting a solver timeout. However, this did not have a noticable effect on the fuzzing performance.

Please let me know if you see any potential issues with this setup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feedback on fuzzer benchmarking setup #103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feedback on fuzzer benchmarking setup #103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions