Skip to content

feat(decompile): detect and resolve internal function calls#680

Closed
Jon-Becker wants to merge 6 commits into
mainfrom
jon-becker/explode-internal-calls
Closed

feat(decompile): detect and resolve internal function calls#680
Jon-Becker wants to merge 6 commits into
mainfrom
jon-becker/explode-internal-calls

Conversation

@Jon-Becker

@Jon-Becker Jon-Becker commented Dec 26, 2025

Copy link
Copy Markdown
Owner

What changed? Why?

This PR implements detection and resolution of internal function calls during decompilation. When the decompiler traces execution, it now recognizes when a function calls another function internally (via JUMP to another function's entry point or internal body) and outputs explicit function calls in the decompiled code.

Key changes:

  • Added InternalCall struct to represent detected internal calls with selector, entry point, and arguments
  • Implemented resolve_internal_body() to identify where actual function logic starts (after calldata parsing)
  • Track both function entry points and their internal bodies to detect internal calls
  • Extract function arguments from the stack based on resolved function signatures
  • Map selectors to argument counts and function names for proper formatting
  • Format internal calls as FunctionName(arg1, arg2, ...) or Unresolved_SELECTOR(...) if not resolved

This improves decompilation readability by making function calls explicit rather than leaving them as raw JUMP operations.

Notes to reviewers

  • crates/vm/src/ext/exec/mod.rs - Core VM execution tracing with internal call detection
  • crates/vm/src/ext/selectors.rs - New resolve_internal_body() function to find function body targets
  • crates/decompile/src/core/analyze.rs - Use detected internal calls to output proper function call syntax
  • crates/decompile/src/core/mod.rs - Build entry point and internal body maps, pass to analyzers

How has it been tested?

  • locally
  • with evaluations

@github-actions

Copy link
Copy Markdown
Contributor

❌ Eval Report for da9c2a5

Test Case CFG Decompilation
NestedLoop 40 15
NestedMappings 100 45
WhileLoop 100 15
TransientStorage 100 15
SimpleLoop 85 15
NestedMapping 100 0
SimpleStorage 100 30
Mapping 100 45
WETH9 100 45
Events 45 45
Average 87 27
⚠️ 10 eval(s) scoring <70%

NestedLoop (CFG: 40, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to capture the fundamental nested loop behavior and state-modifying operations. The control flow structure (nested loops) is completely missing, and the critical storage write operation (number += 1) has been incorrectly represented as a non-modifying require statement.",
  "differences": [
    "Missing nested loop structure: original has two nested for loops, decompiled has no loop constructs at all",
    "Missing state modification: original increments 'number' storage variable (loops * loops) times, decompiled performs no state changes",
    "Incorrect function mutability: original is implicitly 'public' (state-modifying), decompiled is marked 'public view' (read-only)",
    "Missing loop increment operations: original has i++ and j++ iterators, decompiled has no iteration logic",
    "Incorrect logic representation: decompiled shows 'require(!number > (number + 0x01))' which would always fail, completely misrepresenting the increment operation 'number += 1'",
    "Missing loop condition checks: original evaluates 'i < loops' and 'j < loops' multiple times, decompiled only has static require statements"
  ]
}

CFG

{
  "score": 40,
  "summary": "CFG captures loop structure (entry, condition, body, exit) but missing critical back edges that enable loop iteration. Both nested loops lack the edges that would allow them to iterate beyond the first execution.",
  "missing_paths": [
    "Inner loop back edge: missing edge from node 9 (or continuation after increment) back to node 8 to re-check inner loop condition (j < loops)",
    "Outer loop back edge: missing edge from inner loop exit (node 8 when condition false) back to node 7 to increment i and re-check outer loop condition (i < loops)",
    "Without these back edges, the nested loops appear to execute only once rather than iterating as intended in the source"
  ],
  "extra_paths": [
    "Node 10: Arithmetic overflow check and revert (compiler safety feature)",
    "Nodes 0-6, 12-15: Function dispatch, calldata decoding, and validation logic (compiler-generated)",
    "Various stack manipulation and memory operations throughout"
  ],
  "observations": [
    "The CFG correctly identifies two distinct loop structures (outer at node 7, inner at node 8)",
    "Loop entry points, condition checks, and exit paths are present",
    "The increment operation (number += 1) is captured in node 9",
    "However, the fundamental loop iteration mechanism is incomplete - back edges are missing",
    "This is a significant structural omission as loops cannot iterate without back edges to re-evaluate conditions",
    "The missing back edges represent the 'i++' and 'j++' increment operations followed by re-checking loop conditions",
    "A complete CFG would show: 9 -> 8 (inner loop back edge) and 8 -> 7 (outer loop back edge when inner completes)"
  ]
}

NestedMappings (CFG: 100, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures the approve function's storage write operation but with incorrect storage structure. The allowance view function is completely missing, and the public allowances mapping getter is incorrectly represented as an unresolved constant.",
  "differences": [
    "Missing allowance(address,address) view function - this function is completely absent from decompiled output",
    "Incorrect storage structure: nested mapping(address => mapping(address => uint256)) is represented as single-level mapping(bytes32 => bytes32), losing the two-key nested structure",
    "approve() function uses incorrect storage access pattern 'var_a = address(arg0); storage_map_a[var_a] = arg1;' which only uses one key (spender) instead of two keys (msg.sender, spender)",
    "Public allowances mapping getter incorrectly decompiled as 'unresolved_dd62ed3e' constant instead of a proper two-parameter view function",
    "Unresolved_55b6ed5c function appears in decompiled output with no corresponding functionality in original contract"
  ]
}

WhileLoop (CFG: 100, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to capture the fundamental while loop behavior. The loop logic, iteration, and state modifications are completely missing or incorrectly represented.",
  "differences": [
    "While loop control flow completely lost - no iteration logic present",
    "Loop variable 'i' increment operation missing entirely",
    "Storage write to 'number' variable missing - marked as 'view' instead of state-modifying",
    "Function incorrectly marked as 'view' when it should be state-modifying (writes to storage)",
    "Loop body logic (number += 1 for each iteration) not preserved",
    "Conditional checks present but do not represent loop iteration logic (require(!0 < arg0) is incorrect)",
    "No representation of the accumulation behavior where 'number' increases by 'loops' amount"
  ]
}

TransientStorage (CFG: 100, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation critically fails to preserve program logic. Only 1 of 6 functions is present as executable code, and it contains incorrect bit manipulation logic instead of simple assignment. The other 5 functions are completely missing or represented as invalid constant declarations. Core functionality like increment, lock/unlock, and view functions are not captured.",
  "differences": [
    "incrementCounter() function completely missing - increment operation not preserved",
    "lock() function completely missing - boolean assignment to true not preserved",
    "unlock() function completely missing - boolean assignment to false not preserved",
    "getCounter() function completely missing - transient counter read operation not preserved",
    "isLocked() function completely missing - transient locked state read operation not preserved",
    "setTempOwner() uses incorrect bit manipulation logic ((address(arg0) * 0x01) | (uint96(transient[0x01]))) instead of simple assignment (tempOwner = owner)",
    "setTempOwner() incorrectly marked as 'pure' instead of state-modifying"
  ]
}

SimpleLoop (CFG: 85, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to capture the loop structure and state modifications. The function is incorrectly marked as view, contains nonsensical require statements, and completely omits the iteration and increment logic.",
  "differences": [
    "Function incorrectly marked as 'view' instead of state-modifying",
    "Loop iteration structure completely missing - no for loop or equivalent logic",
    "State variable increment operation (number++) is not present",
    "Contains incorrect require statement 'require(!0 < arg0)' which inverts the loop condition logic",
    "Contains tautological require 'require(arg0 == arg0)' that serves no functional purpose",
    "Contains nonsensical underflow check on number when original performs increment not decrement"
  ]
}

NestedMapping (CFG: 100, Decompilation: 0)

Decompilation

{
  "score": 0,
  "summary": "Complete decompilation failure - no functional logic preserved. All storage operations, mapping accesses, and core contract behavior are missing. Functions only contain meaningless self-comparison require statements.",
  "differences": [
    "All storage write operations missing (setAllowance, setGrid, setDeepNested should write to storage but decompiled versions have no storage operations)",
    "All storage read operations missing (getAllowance should read from nested mapping but decompiled version has no storage access)",
    "All nested mapping access logic missing - no keccak256 calculations or storage slot computations present",
    "Function logic replaced with meaningless 'require(arg == arg)' statements that serve no purpose",
    "Return values completely missing - getAllowance should return uint256 from storage but decompiled version returns nothing and is marked pure",
    "State mutability incorrect - all functions marked 'pure' when they should be 'view' or state-modifying",
    "Function parameter usage missing - parameters are compared to themselves but never used for actual logic"
  ]
}

SimpleStorage (CFG: 100, Decompilation: 30)

Decompilation

{
  "score": 30,
  "summary": "Critical storage layout misinterpretation leads to incorrect logic in 3 of 4 functions. Only setValue preserves correct functionality.",
  "differences": [
    "setOwner() performs bitwise OR operations preserving lower 96 bits instead of simple address assignment, corrupting storage",
    "initialize() writes to owner variable with bitwise operations instead of setting the separate initialized boolean storage slot",
    "reset() uses incorrect bitwise operations on owner instead of cleanly resetting owner to address(0) and initialized to false",
    "Missing initialized state variable declaration - decompiler failed to identify the third storage slot",
    "Storage operations in initialize() and reset() target wrong storage slots due to misidentified storage layout"
  ]
}

Mapping (CFG: 100, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures some basic operations but has critical functional issues: missing register() function entirely, storage mapping mismatch where setOwner() writes to storage_map_a but owners() reads from storage_map_b, and setOwner() uses incorrect bitwise operations instead of simple assignment which would corrupt stored addresses.",
  "differences": [
    "register(address) function is completely missing from decompiled output",
    "setOwner() writes to storage_map_a but the corresponding owners() getter reads from storage_map_b, causing incorrect return values",
    "setOwner() uses bitwise operation '(address(arg1) * 0x01) | (uint96(storage_map_a[var_a]))' instead of simple assignment, which preserves lower 96 bits and would not correctly store the address",
    "Only 2 storage mappings identified (storage_map_a, storage_map_b) when original has 3 distinct mappings (balances, owners, registered)"
  ]
}

WETH9 (CFG: 100, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures basic structure but has critical logic errors in transferFrom that break core ERC20 functionality, incorrect storage mappings, and flawed control flow",
  "differences": [
    "transferFrom has inverted logic: requires arg0 == msg.sender when it should allow this case, and requires allowance == uint256(-1) when it should skip allowance checks for this value",
    "transferFrom has unreachable code after return statements and duplicated/incorrect logic branches",
    "transferFrom incorrectly subtracts from arg0 balance in one branch (should subtract from src which is arg0)",
    "approve function uses wrong storage map (storage_map_c) instead of allowance mapping (storage_map_d)",
    "allowance and balanceOf functions return from storage_map_d but this mapping is never written to in any function",
    "Storage mapping usage is inconsistent: balances use storage_map_c but getters reference storage_map_d",
    "transfer function marked as 'view' but should be non-view since it calls transferFrom which modifies state",
    "Missing allowance decrement logic in transferFrom when allowance is not uint256(-1)"
  ]
}

Events (CFG: 45, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures only 4 of 7 functions. The emitMultiple function logic is preserved correctly. However, emitTransfer, emitApproval, and emitWithdrawal functions are completely missing. One recovered function (Unresolved_23de6651) fails to emit any event despite taking an address parameter. Event emissions for Log, LogBytes, Deposit, and Transfer are present, but Approval and Withdrawal events are missing from declarations and logic.",
  "differences": [
    "Missing function: emitTransfer(address, address, uint256) - should emit Transfer event with 3 parameters",
    "Missing function: emitApproval(address, address, uint256) - should emit Approval event",
    "Missing function: emitWithdrawal(address, uint256) - should emit Withdrawal event",
    "Function Unresolved_23de6651 only validates address parameter but does not emit any event (likely corresponds to a missing emit function)",
    "Approval event not declared or emitted anywhere in decompiled output",
    "Withdrawal event not declared or emitted anywhere in decompiled output"
  ]
}

CFG

{
  "score": 45,
  "summary": "CFG captures function dispatcher and 2/7 complete execution paths. Major control flow missing: 5 out of 7 functions terminate prematurely before event emission, missing 71% of the contract's primary business logic.",
  "missing_paths": [
    "emitTransfer: Path exists through function selector (0x23de6651) and parameter decoding, but execution path terminates before Transfer event emission (missing LOG3 opcode and STOP)",
    "emitApproval: Path exists through function selector (0x5687f2b8) and parameter decoding, but execution path terminates before Approval event emission (missing LOG3 opcode and STOP)",
    "emitDeposit: Path exists through function selector (0x28ba84ca) and parameter decoding, but execution path terminates before Deposit event emission (missing LOG2 opcode and STOP)",
    "emitWithdrawal: Path exists through function selector (0xfc4ae4ba) and parameter decoding, but execution path terminates before Withdrawal event emission (missing LOG2 opcode and STOP)",
    "emitMultiple: Path exists through function selector (0xe3a379c5) and parameter decoding, but all three sequential event emissions are missing (Deposit, Transfer, and Log events with their respective LOG opcodes and final STOP)"
  ],
  "extra_paths": [
    "Callvalue check with REVERT path (constructor check, expected)",
    "Calldatasize validation with REVERT paths (ABI safety, expected)",
    "Function selector dispatcher using binary search tree structure (expected)",
    "ABI decoding validation with overflow checks for dynamic types (expected)",
    "Invalid function selector fallback REVERT path (expected)"
  ],
  "observations": [
    "Function dispatcher correctly implements binary search tree for all 7 function selectors with 100% coverage",
    "Only 2 out of 7 functions (emitLog and emitLogBytes) have complete execution paths with event emissions visible in CFG",
    "Pattern detected: functions with dynamic calldata types (string, bytes) show complete paths, while functions with only fixed-size types (address, uint256) terminate prematurely",
    "All event topic signatures exist in the deployed bytecode, confirming the issue is with CFG generation/extraction, not compilation",
    "The contract's primary purpose is event emission, yet 7 out of 9 total event emissions (77.8%) are not captured in the CFG",
    "Missing LOG opcodes: 4 LOG3 opcodes (for Transfer and Approval events) and 4 LOG2 opcodes (for Deposit and Withdrawal events)",
    "No source-level branching, loops, or conditionals exist in the contract, making this a straightforward linear control flow case",
    "CFG shows proper handling of JUMPDEST resolution for the paths it does capture"
  ]
}

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark for da9c2a5

Click to view benchmark
Test Base PR %
heimdall_cfg/complex 10.3±0.94ms 9.5±0.37ms -7.77%
heimdall_cfg/simple 966.9±17.27µs 983.2±24.22µs +1.69%
heimdall_decoder/seaport 43.9±5.42µs 40.9±1.91µs -6.83%
heimdall_decoder/transfer 3.0±0.31µs 3.1±0.36µs +3.33%
heimdall_decoder/uniswap 11.7±0.63µs 11.4±0.78µs -2.56%
heimdall_decompiler/abi_complex 44.0±1.29ms 35.6±1.34ms -19.09%
heimdall_decompiler/abi_simple 1094.9±56.87µs 1120.2±21.99µs +2.31%
heimdall_decompiler/sol_complex 63.3±2.47ms 46.8±2.21ms -26.07%
heimdall_decompiler/sol_simple 1665.8±85.73µs 1726.6±42.23µs +3.65%
heimdall_decompiler/yul_complex 47.8±3.43ms 37.9±2.73ms -20.71%
heimdall_decompiler/yul_simple 1213.6±25.70µs 1280.9±16.23µs +5.55%
heimdall_disassembler/complex 972.3±33.28µs 993.1±61.18µs +2.14%
heimdall_disassembler/simple 50.6±3.76µs 48.4±3.98µs -4.35%
heimdall_vm/erc20_transfer 187.4±9.01µs 194.3±11.16µs +3.68%
heimdall_vm/fib 627.0±19.41µs 652.6±25.29µs +4.08%
heimdall_vm/ten_thousand_hashes 489.3±19.78ms 3.7±1.82s +656.18%

@Jon-Becker Jon-Becker closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant