Skip to content

Testing: Bytecode Equivalence Infrastructure #704

@gakonst

Description

@gakonst

Summary

Build infrastructure to compare Solar-generated bytecode against solc for behavioral and structural equivalence.

Parent issue: #687

Context

To achieve bytecode parity with solc, we need automated testing that:

  1. Compiles the same Solidity code with both compilers
  2. Compares the results (behavior and/or structure)
  3. Reports differences and tracks progress over time

This is Phase 1 of the roadmap and provides continuous feedback for all other work.

Tasks

Compilation harness

  • CLI or test harness that:
    • Takes Solidity files/projects
    • Compiles with solc (specific version)
    • Compiles with Solar (same language version)
    • Extracts creation + runtime bytecode

Behavioral equivalence (Level 1)

  • Deploy both bytecodes to Anvil (or revm directly)
  • Run identical transactions/calls against both
  • Compare:
    • Return values
    • State changes
    • Revert behavior
    • Gas usage (within tolerance)
  • Fuzz testing: random inputs, compare outputs

Structural equivalence (Level 2, optional)

  • Ignore metadata and compiler version markers
  • Normalize trivial differences (jump label ordering)
  • Compare opcode sequences
  • Track bytecode size differences

Test corpus

  • Start with handcrafted tests for each feature
  • Import Solidity official tests (if license-compatible)
  • Generate random contracts for differential testing
  • Track which features pass/fail

CI integration

  • Run equivalence tests on every PR
  • Track metrics: % of tests passing
  • Store known deltas, track reductions over time
  • Gate features on maintaining or improving metrics

Example workflow

# Run equivalence test on a contract
solar-test-equiv contracts/Token.sol

# Output:
# Token.sol:
#   Compile: ✓ Solar, ✓ solc
#   Bytecode size: Solar 1234, solc 1256 (-22 bytes)
#   Behavioral tests: 45/50 passing
#   Failures:
#     - transfer(address,uint256): different return value
#     - approve: Solar reverts, solc succeeds

Patterns to follow

From Venom & Sonatina:

  • Heavy reliance on differential testing against reference compilers
  • Corpus-based testing + fuzzers feeding random programs

Acceptance Criteria

  • Command like solar-test-equiv path/to/contracts works
  • Reports # tests passed vs failed for both structural and behavioral
  • Clear metrics: % of contracts fully equivalent
  • New features can be gated on improving/preserving metrics

Estimated Complexity

Medium - Tooling work, not algorithmically complex

Dependencies

  • Solar can emit some bytecode (even if incomplete)
  • Foundry/Anvil/revm for execution

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: code generation and MIRC-enhancementCategory: an issue proposing an enhancement or a PR with oneC-testCategory: a change that impacts how or what we test

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions