Skip to content

CallElimCorrect: rewrite using small-step semantics#1306

Open
PROgram52bc wants to merge 15 commits into
main2from
htd/callelim-smallstep-on-main2
Open

CallElimCorrect: rewrite using small-step semantics#1306
PROgram52bc wants to merge 15 commits into
main2from
htd/callelim-smallstep-on-main2

Conversation

@PROgram52bc

@PROgram52bc PROgram52bc commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR replaces the deprecated big-step CallElimCorrect proof with a small-step version, plus reusable infrastructure (SubstProps, CoreTransformProps) and shared-file helpers. The previous file carried a DEPRECATED notice flagging it for a small-step rewrite; this is that rewrite.

Top-level public theorem callElimStatementCorrect is preserved (now a conjunction over a _terminal arm and an _exiting arm), the contract semantics for procedure calls (EvalCommandContract.call_sem) is unified into a single failure-flag–parameterized rule, and the _terminal arm is fully polymorphic over the failure flag ({f : Bool}) — a source success trace maps to a target success trace and a source failure trace maps to a target failure trace, in one statement.

Net repository delta vs main2: +9,277 / -4,861 LoC across 19 files (the bulk being reusable infrastructure rather than CallElim-specific). Compared to the original 14,972-LoC big-step proof, this represents a ~-8k LoC reduction overall, while strictly strengthening what is proved.

What's added

The 19 files modified group into 4 layers:

Top-level proof (~10,525 LoC churn)

  • Strata/Transform/CallElimCorrect.lean: full rewrite against small-step semantics. Same public theorem name callElimStatementCorrect; structured as _terminal ∧ ∀ lbl, _exiting, mirroring Specification.Overapproximates and TripleBlock. The _terminal arm is polymorphic in the failure flag {f : Bool}; the _exiting arm reuses the original Heval for non-call cases and is vacuously discharged for the call case (a singleton call statement only ever reaches .terminal, never .exiting).

Reusable infrastructure (~2,747 LoC, new files)

  • Strata/Transform/SubstProps.lean (1,621 LoC, new): generic substitution-semantics infrastructure used by CallElim and reusable by future transforms. Includes H_asserts_zip / H_assumes_zip / H_check_block_zip walkers, singletonAssertEval / singletonAssumeEval and their _poly failure-flag variants, plus H_asserts_zip_fail (the L4 flag-flip walker that powers the polymorphic-f failure arm).
  • Strata/Transform/CoreTransformProps.lean (1,126 LoC, new): reusable helpers for the Core small-step layer — singleCmdToStmts / _poly, H_havocs / _poly, H_init, H_inits, genIdent* correctness, fresh-name predicates, EvalCallElim_glue and EvalCallElim_glue_fail (composing the L1–L6 segments with the L4 success/failure flip).

Core semantics adjustments (~30 LoC)

  • Strata/Languages/Core/StatementSemantics.lean: EvalCommandContract.call_sem unified — previously two separate path shapes (success: havoc + write-back; failure: σ' = σ and the precondition iff in the wrong direction); now a single rule parameterized by a Bool failure flag indicator, with the precondition iff narrowed to non-Free preconditions to match callElimCmd's actual emission rule. EvalCommand.call_sem propagates the callee body's terminal hasFailure flag to the caller (previously hardcoded to false, silently dropping any failures inside callee bodies).
  • Strata/Languages/Core/StatementSemanticsProps.lean (+297 LoC): 8 new helpers around updatedStates / readValues / evalExpressions / initStates, plus the revived 5-theorem refinement-transit chain (EvalCommandRefinesContract, StepStmt[Star]_refines_contract, EvalStatement[s]RefinesContract) over the unified call_sem shape. EvalCallBodyRefinesContract is closed (was a long-standing sorry inside a comment block).
  • Strata/Languages/Core/WF.lean: WFCallSiteProp.preBoolTyped and WFPrePostProp.boolTyped per-precondition Bool-totality clauses, used by the failure-arm to derive a failing-pre witness.
  • Strata/Transform/ProcBodyVerifyCorrect.lean: takes CalleesNoFailure π φ as a new hypothesis (every procedure body, run from a non-failing init env, terminates with hasFailure = false), threaded through core_noFailure_preserved. The hypothesis is exactly the body-noFailure clause that clause 2 of the theorem's own conclusion (ProcedureCorrect) establishes per procedure, so callers discharge it inductively.

Shared-file helpers (~250 LoC)

  • Strata/Languages/Core/StatementSemanticsProps.lean (above): 8 new shared helpers consumed by both CallElim and its sibling Core proofs.
  • Strata/DL/Util/ListUtils.lean (+150 LoC net): adds notin_append4, unzip_snd_length, disjoint_of_nodup_append_three, zip_pair_split, nodup_3_decompose, notin_3_append_of, zip_zip_unzip_{snd,fst_unzip_fst}_of_lengths, zip_notin_{fst,snd}_pair; promotes existing helpers to public.
  • Strata/DL/Imperative/CmdSemanticsProps.lean (new, +33 LoC): generalized pure-Imperative lemmas subst_defined_tail, subst_nodup_tail (P : PureExpr).
  • @[expose] attributes added where required by the new module-system reductions:
    • Strata/DL/Imperative/CmdSemantics.lean: isNotDefined, substStores, substDefined, substNodup, invStores, substSwap, WellFormedSemanticEvalVal.
    • Strata/DL/Lambda/LExprWF.lean: substFvar, substFvars.
    • Strata/DL/Util/StringGen.lean: StringGenState.gen.
    • Strata/Languages/Core/CoreGen.lean: CoreGenState.WF, CoreGenState.gen.
    • Strata/Languages/Core/Procedure.lean: Procedure.Spec.getCheckExprs, Procedure.Spec.updateCheckExprs.
    • Strata/Languages/Core/StatementSemantics.lean: updatedState, updatedStates', updatedStates, WellFormedCoreEvalTwoState.
    • Strata/Transform/CoreTransform.lean: ~17 generator/transform defs.
    • Strata/Transform/CallElim.lean: callElimCmd, plus an extracted oldTyLookupCallElim helper required for the proof to share auxiliary match definitions across two do-blocks.

Notable commits

The branch is structured so each landed commit can be reviewed independently:

  • 286c6ce36 — initial small-step rewrite (preserves the original _terminal-only theorem at f = false).
  • 3710bb339 — add the exit-arm to callElimStatementCorrect (review feedback from @aqjune-aws): mirrors Specification.Overapproximates shape; non-call cases reuse the original Heval, the call case is vacuously discharged via seq_reaches_exiting.
  • 0eb0342ee — review-feedback rename *Semantics → *SemanticsProps and relocate generic lemmas to ListUtils / CmdSemanticsProps / SubstSemanticsProps.
  • 4bfccc31b — integrate hasFailure flag into EvalCommand.call_sem (closes a soundness gap inherited from the original semantics).
  • e679bd560 — second review-feedback rename *SemanticsProps → *Props (the Semantics qualifier was misleading: these files hold transform machinery and substitution lemmas, not semantics defs).
  • 44deb44b2 — close EvalCallBodyRefinesContract and revive the dormant Core refinement bridge.
  • ba122da08 — unify EvalCommandContract.call_sem (failure-flag-parameterized) and revive the 5-theorem refinement-transit chain.
  • 3daec283b — cleanup pass: 6 unused theorems in CoreTransformProps, 5-helper dead chain in SubstProps, dangling docstring references, TODO disposition. Net -327 LoC.
  • be987fddc — lift _terminal to polymorphic {f : Bool} (PR CallElimCorrect: lift _terminal to polymorphic {f : Bool} #1340, merged into this branch).

Build status

  • lake build: clean (490 jobs)
  • 0 sorries introduced (baseline sorry inventory in ProgramWF / ProcedureWF / StatementWF / CmdEval is unchanged; this PR closes one previously commented-out sorry in EvalCallBodyRefinesContract)
  • 0 axioms introduced
  • #print axioms callElimStatementCorrect[propext, Classical.choice, Quot.sound] (Lean stdlib only)

Test plan

  • lake build clean across all 490 jobs
  • #print axioms callElimStatementCorrect reports stdlib axioms only
  • No new sorries / admits / axioms introduced
  • Conjunctive shape of callElimStatementCorrect matches Specification.Overapproximates / TripleBlock
  • _terminal arm polymorphic in {f : Bool} (success and failure traces both covered)
  • Review-feedback follow-ups landed (exit arm, file renames, generic-lemma relocation)
  • CI passes

Warning: This repository will shortly undergo a split into several separate repositories. If you're creating a PR that crosses the boundaries between these repositories, you may want to hold off until the split is complete or be prepared to rework your PR into multiple PRs once the split is complete.

The code that will be moved includes:

  • Strata/DDM/*
  • Strata/Languages/Boole/*
  • Strata/Languages/Python/* along with Tools/Python/*
  • Tools/BoogieToStrata

Replaces the deprecated big-step `CallElimCorrect` proof with a
small-step version, plus reusable infrastructure
(`SubstSemantics`, `CoreTransformSemantics`) and shared-file helpers.

The previous file carried a `DEPRECATED` notice flagging it for a
small-step rewrite; this is that rewrite.

## Summary

- `Strata/Transform/CallElimCorrect.lean`: 4597 -> 3814 LoC (-17%).
  Same top-level theorem name (`callElimStatementCorrect`); 6 public
  declarations preserved.
- New `Strata/Transform/SubstSemantics.lean` (1721 LoC): generic
  substitution-semantics infrastructure (used by CallElim; reusable by
  future transforms).
- New `Strata/Transform/CoreTransformSemantics.lean` (1151 LoC):
  reusable helpers for the Core small-step layer (`singleCmdToStmts`,
  `H_havocs`, `H_init`, `H_inits`, `genIdent*` correctness, fresh-name
  predicates, etc.).

Net repository delta: +2,345 LoC (6,916 insertions / 4,571 deletions
across 13 files), with the bulk being reusable infrastructure rather
than CallElim-specific. Compared to the original 14,972-LoC big-step
proof, this represents a -8,056 LoC reduction overall.

## Changes to shared files

- `Strata/Languages/Core/StatementSemanticsProps.lean` (+159 LoC): 8
  new helpers (`updatedStates_get_notin`, `readValues_get`,
  `evalExpressions_get`, `readValues_updatedStatesSame`,
  `evalExpressions_isDefined_flatMap`, `initStates_get_notin`, plus
  2-/3-layer fall-through variants).
- `Strata/DL/Util/ListUtils.lean` (+60 LoC net): adds
  `notin_append4`, `unzip_snd_length`,
  `disjoint_of_nodup_append_three`, `nodup_append_three_disjoint`;
  promotes existing helpers to `public`.
- `@[expose]` attributes added where needed by the new module-system
  reductions:
  - `Strata/DL/Imperative/CmdSemantics.lean`: `isNotDefined`,
    `substStores`, `substDefined`, `substNodup`, `invStores`,
    `substSwap`, `WellFormedSemanticEvalVal`.
  - `Strata/DL/Lambda/LExprWF.lean`: `substFvar`, `substFvars`.
  - `Strata/DL/Util/StringGen.lean`: `StringGenState.gen`.
  - `Strata/Languages/Core/CoreGen.lean`: `CoreGenState.WF`,
    `CoreGenState.gen`.
  - `Strata/Languages/Core/Procedure.lean`:
    `Procedure.Spec.getCheckExprs`, `Procedure.Spec.updateCheckExprs`.
  - `Strata/Languages/Core/StatementSemantics.lean`: `updatedState`,
    `updatedStates'`, `updatedStates`, `WellFormedCoreEvalTwoState`.
  - `Strata/Transform/CoreTransform.lean`: ~17 generator/transform
    defs.
  - `Strata/Transform/CallElim.lean`: `callElimCmd`, plus an extracted
    `oldTyLookupCallElim` helper required for the proof to share
    auxiliary `match` definitions across two `do`-blocks.

## Testing

Full repo build green (602 jobs); StrataTest green (617 jobs); zero
warnings.
@github-actions github-actions Bot added the Core label Jun 1, 2026
@PROgram52bc PROgram52bc marked this pull request as ready for review June 2, 2026 21:56
@PROgram52bc PROgram52bc requested a review from a team as a code owner June 2, 2026 21:56
@PROgram52bc PROgram52bc requested a review from atomb June 2, 2026 22:19

@aqjune-aws aqjune-aws left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments which can be applied orthogonally to our ongoing discussion in a private channel.
One extra question is that, I think the correctness of the top-level statement must also consider the case when a program may 'throw', through Imperative.exit. For example,

call f();
exit "some-block";

Note that the original program exits to "some-block", hence the final state after small-steps isn't ".terminal" (unlike what EvalStmtsSmall states, which is used transitively by EvalStatementsContract). If we apply call-elimination on this,

(omit temporary vars)
assume <pre>;
assert <post>;
exit "some-block";

it will have to exit to the right state.

Comment thread Strata/Transform/CoreTransformProps.lean
Comment thread Strata/Transform/SubstProps.lean
Comment thread Strata/Transform/SubstSemantics.lean Outdated
Comment thread Strata/Transform/SubstSemantics.lean Outdated
Comment thread Strata/Transform/CallElimCorrect.lean Outdated
Comment thread Strata/Transform/CallElimCorrect.lean Outdated
Address PR #1306 review comment from @aqjune-aws: the previous statement
of `callElimStatementCorrect` only covered programs reaching `.terminal`,
silently excluding programs whose top-level execution ends in an
`.exiting lbl ρ` configuration (e.g., `[call f(); exit "L"]` outside an
enclosing block). `EvalStatementsContract` unfolds to
`StepStmtStar (.terminal _)`, so such runs failed the hypothesis and
were vacuously "covered" by the existing proof.

Mirror the conjunctive shape used by `Specification.Overapproximates`
(Specification.lean:594) and `TripleBlock` (Specification.lean:211):
return `(terminal-arm) ∧ (∀ lbl, exiting-arm)`.

Refactor into three theorems for clarity:
- `callElimStatementCorrect_terminal` (private, unchanged): the
  existing call-elim chain via `EvalCallElim_glue`.
- `callElimStatementCorrect_exit` (private, new): non-call cases reuse
  the original `Heval` via `callElimStmt_non_call_eq` (sts = [st]); the
  call case is vacuously discharged because `step_cmd` only ever yields
  `.terminal`, so `(.stmts [.cmd (.call ...)] _) →* .exiting lbl _` is
  unreachable (proven by `seq_reaches_exiting` plus inversion).
- `callElimStatementCorrect` (public): assembles the two arms.

No label-collision hazard: callElim only introduces fresh temp idents,
not block labels, so exit labels transport unchanged. Zero external
callers, so the conjunctive signature change is free downstream.
…nticsProps

Address PR #1306 review feedback from @aqjune-aws:

* Files holding only lemmas (not actual semantics defs) follow the
  existing repo convention `*SemanticsProps.lean` (e.g.,
  `Strata/Languages/Core/StatementSemanticsProps.lean`):
    - `Strata/Transform/CoreTransformSemantics.lean`
        -> `Strata/Transform/CoreTransformSemanticsProps.lean`
    - `Strata/Transform/SubstSemantics.lean`
        -> `Strata/Transform/SubstSemanticsProps.lean`
  Audited `SubstSemantics.lean`'s deps: still requires
  `Core.Transform.createFvar(s)` and `singletonAssertEval`/
  `singletonAssumeEval`, so it stays under `Strata/Transform/`.

* Generic lemmas relocated to their proper homes:
    - Pure list lemmas -> `Strata/DL/Util/ListUtils.lean`
      (now in `List` namespace):
        `zip_pair_split`, `nodup_3_decompose`, `notin_3_append_of`,
        `zip_zip_unzip_snd_of_lengths`,
        `zip_zip_unzip_fst_unzip_fst_of_lengths`,
        `zip_notin_fst_pair`, `zip_notin_snd_pair`
    - Pure-Imperative lemmas -> new
      `Strata/DL/Imperative/CmdSemanticsProps.lean`
      (generalized from CoreStore/Expression.Ident to P : PureExpr,
      now in `Imperative` namespace):
        `subst_defined_tail`, `subst_nodup_tail`
    - Core-substitution helper moved out of `CallElimCorrect.lean`
      into `SubstSemanticsProps.lean`:
        `substDefined_of_app`

* Imports updated: `CallElimCorrect`, `SubstSemanticsProps`, and
  `Strata/DL/Imperative/Imperative.lean` (added new
  `CmdSemanticsProps` import).
@PROgram52bc

PROgram52bc commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Fixed in 3710bb3. The theorem now has a conjunctive shape, refactored into three theorems: _terminal (private, unchanged), _exit (private, new for the call case), and a public conjunction. No external callers needed updating.

…ep-on-main2

# Conflicts:
#	Strata/Transform/CallElimCorrect.lean
Layer-A small-step semantics for procedure calls now propagates the
callee body's terminal hasFailure flag to the caller. Previously,
EvalCommand.call_sem hardcoded the output flag to `false`, silently
dropping any failures inside callee bodies (a soundness gap inherited
from the original semantics).

The change exposes a previously-vacuous case in
`evalCommand_failure_implies_assert_ff` (StatementSemanticsProps.lean):
when call_sem yields `true`, the failing assert lives somewhere inside
the callee body, not at the call site itself, so the lemma's existing
existential `∃ a, coreIsAtAssert (.stmt (.cmd c) ρ) a ∧ …` cannot be
discharged structurally.

Closure strategy: a new precondition `CalleesNoFailure π φ` asserts
that every procedure body in `π`, run from a non-failing init env,
terminates with `hasFailure = false`. This is threaded through both
`evalCommand_failure_implies_assert_ff` and `core_noFailure_preserved`.
The call_sem arm of the former is then discharged by applying
`hCallees` to the body trace bound by call_sem and contradicting the
resulting `false` against the `true` index.

Downstream: `procBodyVerify_procedureCorrect` (the sole consumer of
`core_noFailure_preserved`) takes `CalleesNoFailure` as a new
hypothesis. The hypothesis is exactly the body-noFailure clause that
clause 2 of `ProcedureCorrect` (this theorem's own conclusion)
establishes per procedure, so callers can discharge it inductively
across all procedures in `π` once the program is verified.

No new sorries, no axioms, no termination admits. Build clean
(488 jobs).
@aqjune-aws

Copy link
Copy Markdown
Contributor

Thanks for adding the exiting cases. It is good to see them proven :)

I have two high-level important comments which will more be related to what is to be proven.

  1. The call_sem of EvalCommandContract

You would have already figured out the issue in call_sem. To talk about its proper fix, let me introduce the difference between the small-step semantics of assume and assert first:

  • For assume, if the program state doesn't satisfy its condition, it is simply stuck; it doesn't proceed.
  • For assert, even if the program state doesn't satisfy the condition, it proceeds; it only updates the hasFailure flag.
    You can confirm this from EvalCmd which has eval_assert_pass, eval_assert_fail and eval_assume. This is designed in this way so that even if a program includes multiple asserts and the first assert fails, we can still talk about the second assert individually (without using hasFailure, because hasFailure will already have been marked as true in this case).

Now coming to call_sem... When we are calling a procedure and the procedure has pre-post condition, from the caller's side I think a natural flow would be (1) we assert the precondition, (2) execute the body, and (3) assume the postcondition. It means that actually this part is too strong:

    (∀ pre, (Procedure.Spec.getCheckExprs p.spec.preconditions).contains pre →
      isDefinedOver (HasVarsPure.getVars) σAO pre ∧
      δ σAO pre = .some HasBool.tt) →

This has to be something like this:

    preconditionPassed <-> (∀ pre, (Procedure.Spec.getCheckExprs p.spec.preconditions).contains pre →
      isDefinedOver (HasVarsPure.getVars) σAO pre ∧
      δ σAO pre = .some HasBool.tt) →
...
    ----
    EvalCommandContract π δ σ (.call n callArgs md) σ' (\neg preconditionPassed)
  1. Connection between EvalCommand and EvalCommandContract

I thought EvalCommand should be the ground-truth operational semantics for procedure call in a closed program, because when we do a procedure call, its interpreter will actually get into the body of the procedure call and concretely execute stuffs. How are EvalCommand and EvalCommandContracts related? Is there a theorem stating equivalence or refinement relation betweeen these two?

…icsProps -> SubstProps

Address PR #1306 follow-up review comment: the *Semantics qualifier in
the prior names is misleading since CoreTransform.lean and SubstSemantics.lean
do not define formal semantics — they hold transform machinery and
substitution lemmas respectively. The Props-suffix convention should
match the root file name without an inserted Semantics qualifier.

  Strata/Transform/CoreTransformSemanticsProps.lean
    -> Strata/Transform/CoreTransformProps.lean
  Strata/Transform/SubstSemanticsProps.lean
    -> Strata/Transform/SubstProps.lean

Updated importers (CallElimCorrect.lean, SubstProps.lean) and one stale
docstring reference in CmdSemanticsProps.lean.
@PROgram52bc

Copy link
Copy Markdown
Contributor Author

The call_sem of EvalCommandContract

Thanks for pointing it out. That makes sense. I will weaken the precondition requirement in call_sem, and adapt the proof accordingly.

Connection between EvalCommand and EvalCommandContract

Here is a potential theorem we could prove to connect the two. It is currently commented out because the contract version doesn't create the extra variables in the store, which makes the proof a little more complicated than it needs to be. But thanks for the question -- I will try to look into that next.

…ondition

Per @aqjune-aws review feedback, model the procedure-call contract
following the assert/assume distinction in EvalCmd:

* Caller asserts the precondition (flag-encoded violation).
* Body abstraction = havoc + postcondition assume (stuck on violation).
* Caller assumes the postcondition (no flag, only stuckness).

Implementation: split the existing single `call_sem` constructor into
two arms mirroring `eval_assert_pass` / `eval_assert_fail`:

* `call_sem` (renamed in spirit to "pass arm" via docstring) — the
  precondition holds (∀ pre, … = some tt); execution proceeds through
  havoc + postcondition assume; conclusion σ' false. Premise list and
  binders unchanged from the prior single rule, so all downstream
  destructures (notably CallElimCorrect.lean's `cases Hcc | call_sem`
  at lines 1393-1398) continue to work verbatim.

* `call_sem_pre_fail` (new) — some precondition fails (∃ pre, … = some ff);
  execution halts at the assert violation; store unchanged (σ' = σ),
  flag = true. Mirrors eval_assert_fail. Drops post-precondition
  premises (HavocVars / postcondition assume / output ReadValues /
  UpdateStates) since execution does not reach those phases.

The new fail arm is currently inert in proofs because:

* CallElimCorrect's terminal arm extracts `Hcc : EvalCommandContract …
  σ' false` (failure index pinned to false). Lean's index discrimination
  during `cases Hcc` auto-discharges the new arm — its conclusion `σ
  true` cannot unify with the σ' false hypothesis.

* core_noFailure_preserved is stated against EvalCommand (concrete
  semantics), not EvalCommandContract, so it is unaffected.

Build clean (488 jobs); zero new sorries; no axioms. Pre-existing
sorries unchanged at 22 (only 2 outside comment blocks: ProgramWF and
the inactive EvalCallBodyRefinesContract bridge).

Diff: Strata/Languages/Core/StatementSemantics.lean +27/-1.

Postcondition violations remain stuck-on-violation (no derivation),
mirroring `eval_assume`'s single-rule design. The verifier's
postcondition discharge happens on the callee side via
`procBodyVerify_procedureCorrect` (clause 2 of ProcedureCorrect), not
at the contract rule.
The dormant Core refinement bridge from concrete EvalCommand to
abstract EvalCommandContract has carried a long-standing `sorry`
inside a `/- ... -/` block-comment at StatementSemanticsProps.lean's
EvalCallBodyRefinesContract since commit 74e7be4 (2025-08-08).
This commit closes that sorry and brings the leaf bridge theorem out
of the comment block.

Proof sketch: given concrete `call_sem` premises (body trace
`CoreStepStar π φ (.stmts p.body ⟨σAO, δ, false⟩) (.terminal ρ')`,
postcondition validity `δ ρ'.store post = some HasBool.tt`,
ReadValues from ρ'.store, UpdateStates writeback), construct a
matching `EvalCommandContract.call_sem` derivation by:

  1. Take σO := updatedStates σAO (ListMap.keys p.header.outputs)
     modvals where modvals is the concrete-trace ReadValues witness.
  2. Discharge `HavocVars σAO outputs σO` by induction on outputs
     (havoc admits any value; we choose modvals[i] for output i).
  3. Discharge `ReadValues σO outputs modvals` directly from σO's
     definition via `readValues_updatedStatesSame`.
  4. Discharge `δ σO post = some HasBool.tt` via congruence: σO and
     ρ'.store agree at output keys (both = modvals[i] there); under
     `p_post_scoped`, postcondition free vars are confined to outputs;
     by `WellFormedSemanticEvalExprCongr`, agreement on FVs implies
     equal evaluation; transport concrete Hpost to σO.
  5. UpdateStates is reused verbatim from the concrete trace.

The theorem now takes three new outer hypotheses, each mathematically
true and derivable from upstream WF predicates:

  * `WellFormedSemanticEvalExprCongr δ` — a standard evaluator
    well-formedness assumption (existing congruence machinery).
  * `(ListMap.keys p.header.outputs).Nodup` — output parameter names
    are distinct in the procedure header.
  * `p_post_scoped : ∀ post ∈ postconditions, getVars post ⊆
    p.header.outputs.keys` — postconditions only mention output
    parameters (a scoping assumption typical for procedure contracts).

Implementation note: the destructure `cases H | call_sem` uses the
`generalize hf : (false : Bool) = b at H` trick (borrowed from
`evalCommand_failure_implies_assert_ff`) because the `false` index
cannot auto-substitute under `cases` when ρ' is existentially bound
in the constructor.

The five dependent transit theorems (EvalCommandRefinesContract,
StepStmt_refines_contract, StepStmtStar_refines_contract,
EvalStatementsRefinesContract, EvalStatementRefinesContract) remain
inside a fresh `/- ... -/` block following the proof, with an
explanatory header noting they need the three new outer hypotheses
threaded through. This is a follow-up; closing the leaf bridge is the
load-bearing step.

Build clean (488/488 jobs); zero new sorries; no axioms; no
termination admits. Net diff: +109/-12 LoC in
Strata/Languages/Core/StatementSemanticsProps.lean.
Unify `EvalCommandContract.call_sem` to be parameterized by a Bool
failure flag indicator. The precondition status is connected to `failed`
via an iff: the call fails iff some precondition fails to evaluate to
`tt` at `σAO`. On the success path (`failed = false`) the result store
is produced by havoc + write-back via `UpdateStates`; on the failure
path (`failed = true`) the result store is unchanged.

Revive the 5 refinement-transit theorems (View B: f = false-only)
threading 3 outer hypotheses (`δ_wfCong`, `π_wf` packaging
`p_outputs_nodup` and `p_post_scoped`, plus a `h_no_fail` non-failure
witness that back-propagates via `StepStmtStar_hasFailure_monotone`):

  - EvalCommandRefinesContract
  - StepStmt_refines_contract
  - StepStmtStar_refines_contract
  - EvalStatementsRefinesContract
  - EvalStatementRefinesContract

Adjust the `call_sem` destructure in CallElimCorrect.lean to
re-synthesize the legacy combined `Hpre` and the success-path
`Hupdate` from the new bool-indicator-shaped premises (the destructure
site is pinned to `failed = false`).
Resolves two conflicts introduced by upstream commit 0044067 (#1327)
"Factor out theorems and lemmas into dedicated Props files":

* Strata/Languages/Core/StatementSemanticsProps.lean — drop the local
  `import all Strata.DL.Imperative.SemanticsProps` (file deleted upstream)
  and adopt upstream's `Strata.DL.Imperative.StmtSemanticsProps` import,
  which now re-houses `StepStmtStar_hasFailure_monotone` and friends.
* Strata/DL/Imperative/CmdSemanticsProps.lean — add/add conflict.  Take
  upstream's full content (isDefined / isNotDefined / Init/UpdateState /
  set commutation lemmas) and append the local `subst_defined_tail` and
  `subst_nodup_tail` helpers, plus the Nodup import they require.

Follow-up fixes after the merge:

* Strata/Transform/SubstProps.lean and CallElimCorrect.lean — update
  `import Strata.DL.Lambda.Lambda` references to `Strata.DL.Lambda` for
  the Lambda.lean → Lambda.lean rename (also from #1327).
* Strata/Languages/Core/StatementSemanticsProps.lean — pass explicit
  `Expression`, `EvalCommand π φ`, `EvalPureFunc φ` arguments to
  `StepStmtStar_hasFailure_monotone` (the upstream version of the lemma
  binds these from the surrounding section's `variable` declarations,
  not as implicits).

`lake build` passes cleanly after resolution.
…sition

Audit-driven cleanup of branch htd/callelim-smallstep-on-main2 (no logic
changes; build clean at 490 jobs):

DELETIONS (~327 LoC removed across 6 files):
- Strata/Transform/CoreTransformProps.lean: removed 6 unused theorems
  (genOldExprIdentsTrip_snd, genIdentGeneratedWF,
   genOldExprIdentsTripGeneratedWF, genOldExprIdentsTrip_isOldTempIdent,
   genArgExprIdents_length, genOutExprIdents_length). Each grep-confirmed
  zero callers across Strata/.
- Strata/Transform/SubstProps.lean: removed 5-helper dead chain
  (createAsserts_list, createAssumes_list, H_check_block, H_asserts,
   H_assumes). The *_zip variants are the ones actually used by
  CallElimCorrect; these unzipped variants have no consumers.
- Strata/DL/Util/ListUtils.lean: removed
  List.nodup_append_three_disjoint (specialization of base lemma; no
  callers).
- Strata/Languages/Core/StatementSemanticsProps.lean: removed
  10-line commented InitVarsNotDefMonotone' stub (would not typecheck
  if uncommented).

COMMENT/DOCSTRING FIXES (~12 sites):
- Strata/Transform/CallElimCorrect.lean:
  - line 1091: 'six call-site WF clauses' → 'seven' (WFCallSiteProp
    has 7 fields); dropped legacy pointer to Hpre_post_lhs_disj
    (no longer in the codebase).
  - line 1221: 'two halves of legacy Hwfgenst' → 'three fields'
    (WFGenStateProp has 3 including wfgen).
  - line 1300: 'see WFCallSiteProp in Strata/Languages/Core/WF.lean'
    → 'see WFCallSiteProp above (line 1095 of this file)'.
  - line 1390: 'call_sem hardwires the failure flag to false' →
    accurate comment about Hcc being pinned to failed=false at this
    destructure (call_sem itself is failure-flag–parameterized).
- Strata/Transform/SubstProps.lean: dropped 5 dangling references to
  removed legacy theorems (Lambda.LExpr.substFvarCorrect,
  ReadValuesUpdatedStates, etc.).
- Strata/Transform/CoreTransformProps.lean: dropped 2 dangling
  references (ReadValuesUpdatedState, EvalExpressionUpdatedState).
- Strata/Languages/Core/StatementSemanticsProps.lean: fixed
  'assymmetry' → 'asymmetry' typo.

TODO DISPOSITION:
- Strata/Transform/CallElim.lean:171: replaced bare `/- TODO -/ false`
  with explicit comment documenting that the validation hook is
  intentionally conservative — call elimination replaces a callee
  body with its contract (over-approximation); the hook returns false
  until per-obligation proof witnesses are available. Logical value
  preserved.

REFINEMENT GENERALIZATION INVESTIGATED (no change):
- Investigated generalizing EvalCallBodyRefinesContract from
  `false → false` to a Bool failure-flag-indexed signature. Two
  structural obstacles make strict generalization infeasible:
  (1) σ' mismatch — concrete EvalCommand.call_sem always writes back
      via UpdateStates, while contract EvalCommandContract.call_sem
      forces σ' = σ when failed = true.
  (2) Precondition iff incompatibility — concrete failed = true
      (body asserts fail) carries Hpre saying ALL preconditions hold,
      but the contract iff says failed = false IFF all preconditions
      hold, so all-pre-hold forces contract failed = false.
  The current View B (false-only) signature is therefore retained.
  Bridging would require either weakening the contract's failure-path
  σ' = σ semantics or adding a procedure-correctness premise; both
  are out of scope for this cleanup pass.

Build clean (490 jobs); zero new sorries; no axioms. Working-tree
sorry inventory unchanged from pre-cleanup baseline (all sorries live
in ProgramWF/ProcedureWF/StatementWF/CmdEval — files this cleanup did
not touch).
@aqjune-aws

Copy link
Copy Markdown
Contributor

Hi David, thanks for working on this topic - I think that including the refinement proof might be a too big work & interfere with the unstructured CFG. I will be glad to work on the task.
In parallel, I am trying to port a few updates that fixes problems in small-step semantics and extending the definition of correctness of transformation.
Could you wait for a while until the updates from my end are delivered please? :)

@PROgram52bc

Copy link
Copy Markdown
Contributor Author

Hi June, Yes. I am still working on adapting the call elimination proof to work with a generic failure flag (the case where preconditions fail). So I can definitely wait for your updates while iterating on that. I think it would be good to reuse some of the definitions in your updates in the current call-elimination proof.

As for the refinement proof, I do think it is quite complicated with the failure flag integrated -- if we constrain it to failed = false, I believe the proof should be straightforward, but that's just a conjecture.

## Summary

Lifts `callElimStatementCorrect_terminal` from the parent branch's `f =
false → false` shape (PR #1306, success-only refinement) to fully
polymorphic `{f : Bool}`: a source trace ending in failure maps to a
target trace ending in failure, and a source success trace maps to a
target success trace, in one statement. Downstream consumers no longer
case-split on the failure flag at the call site.

This is a foundational refactor + generalization, not a feature.
Reviewers should follow the proof structure: Stages 1-3 are pure
extractions (success theorem unchanged), Path 1 narrows a public iff for
semantic alignment, Stage 6 (split into a scaffold and a fill-in) lands
the actual polymorphic-f result.

## Base branch

This PR sits on top of `htd/callelim-smallstep-on-main2` (PR #1306,
currently open). Merge into that branch, or rebase onto `main` once
#1306 lands.

## Why this matters

The parent's success-only theorem suffices to argue that
call-elimination preserves the *absence* of bugs, but it leaves a dual
gap: it does not say anything about preserving the *presence* of bugs.
Without the failure arm, callElim could in principle turn a buggy
program into one that verifies. Lifting `_terminal` to `{f : Bool}`
makes that argument compositional — the same lemma covers both
directions, and any future transform layered on top of call-elimination
inherits the polymorphic guarantee for free.

The failure arm also exercises corners of the contract semantics the
success arm never touches: the assert-zip on call-site preconditions,
havoc of out-parameters in the failure branch, and the assume-zip on
call-site postconditions all have to compose correctly under `hasFailure
= true`. Closing it flushed out a real semantic mismatch in
`EvalCommandContract.call_sem` (Path 1 below) that was invisible from
the success-only side.

## Commit-by-commit walkthrough

Read the seven commits in order:

### 1. 193dee2 — WIP: polymorphic-f infrastructure (no live consumers)

Forward-looking scaffolding, kept as a separate commit so that the
consumer-side proof changes in later commits aren't buried under
unrelated infrastructure. Lands in:

- `Strata/Languages/Core/StatementSemantics.lean` — **Change A**:
collapses the two conditional `Hupdate` premises of
`EvalCommandContract.call_sem` into a single unconditional
`UpdateStates` premise. The failure path no longer pins `σ' = σ`, which
is what makes a polymorphic-f rule shape coherent.
- `Strata/Languages/Core/WF.lean` — `WFPrePostProp.boolTyped` clause
added (zero migration cost, no live construction sites yet).
- `Strata/Transform/CoreTransformProps.lean` and
`Strata/Transform/SubstProps.lean` — new helpers
`singleCmdToStmts_poly`, `singletonAssumeEval_poly`, `H_havocs_poly`,
`H_check_block_zip_poly`, `H_assumes_zip_poly`,
`singletonAssertFailEval`, `singletonAssertEval_poly`, plus the
substantive new walker `H_asserts_zip_fail` (L4 flag-flip from `false`
to `true` via a failing-pre witness) and `EvalCallElim_glue_fail`
(composes L1-L6 with the L4 flip).

Build clean, sorry-free, axiom-free at this commit.

### 2. 580ef2d — Stage 1: extract `HoldEval_bridge_at_σO`

Pure refactor. Lifts the per-index δ-eval bridge for `mkOld`
old-variable fvars at the post-havoc store `σO` out of
`callElimStatementCorrect_terminal`'s call arm into a private theorem
with 11 explicit hypotheses. Net diff in
`Strata/Transform/CallElimCorrect.lean`: +114 / -61. Same proof body as
the inline version; the signature overhead is intentional because Stage
6 will share this helper between the success and failure arms. Sorry
count and axiom set unchanged.

### 3. 5406b08 — Stage 2a: extract `HoldSubBridge_at_σO`

Pure refactor in `Strata/Transform/CallElimCorrect.lean`. Lifts the
per-fvar bridge for `createOldVarsSubst`'s codomain at the L6
intermediate stores `σ_R1` / `σO` into a private theorem with 6
hypotheses; the inline block becomes a one-line call. Net diff: +82 /
-29. Sorry count and axiom set unchanged.

`HinputSubBridge` (the larger sibling, originally targeted for Stage 4)
is deferred, then ultimately made redundant by the flag-generalization
approach (see "Stages 4 & 5: skipped" below).

### 4. cf10129 — Stage 3: extract `b1`/`b2_var_witness_at_oldSubst`

Pure refactor in `Strata/Transform/CallElimCorrect.lean`. Lifts the two
class-(b) substitution-decomposition witnesses out of the call arm:

- `b1_var_witness_at_oldSubst`: when a `Map.find?` lands in the
`createOldVarsSubst` segment of `oldSubst_L6`, the codomain entry is a
fresh-old `createFvar gen` and the witness `var ∈ getVars w` forces `var
= genOldIdents[i]`.
- `b2_var_witness_at_oldSubst`: when the lookup misses
`createOldVarsSubst` and hits `callElim_inputOnlyOldSubst`, the codomain
is a positional `inArgs` element.

Both helpers take `proc'` and `args` directly so
`callElim_inputOnlyOldSubst` can be referenced by name (needed for
`find?_append_none_elim` unification at the consumer). 4 callsites each
in `_terminal` (inv + post variants); Stage 6's failure arm reuses the
same callsites with identical arguments. Net diff: +123 / -77.

### Stages 4 & 5: skipped

Originally planned to extract `HinputSubBridge` and a final
shared-plumbing block. The flag-generalization approach made these
unnecessary — the call branch in `_terminal` `cases` on `f`, and the
`false` arm reuses its existing 2200-LoC body verbatim, so we never
needed shared sub-helpers between the two arms. Keeping the inline block
on the `false` side was strictly cheaper than extracting it.

### 5. fdf2fc4 — Path 1: narrow `EvalCommandContract.call_sem` iff to
non-Free preconditions

This is the semantic alignment that makes the failure arm provable.
`callElimCmd` only emits asserts for non-`Free` preconditions (Free
requires are assumed by the implementation, not checked at call sites).
The previous `call_sem` iff ranged over **all** preconditions, which
made the polymorphic-f failure arm unattainable: a witness `pre` with
`eval ≠ tt` at `σAO` could fall in the Free segment, giving no L4 entry
to flip the flag.

That mismatch was harmless at `f = false` (the success arm never relies
on the iff direction that exposed it) but a hard blocker on the failure
arm. Restricting the iff to non-Free preconditions aligned the two
semantics without changing any existing proof:

- `Strata/Languages/Core/StatementSemantics.lean`: `call_sem` rule's
`Hpre_def` and `Hpre_iff` now range over `getCheckExprs
(preconditions.filter (·.attr ≠ .Free))`.
- `Strata/Languages/Core/StatementSemanticsProps.lean`:
`EvalCallBodyRefinesContract` / `EvalCommandRefinesContract` /
`StepStmt[Star]_refines_contract` / `EvalStatement[s]RefinesContract`
gain `[LawfulBEq Expression.Expr]` instances (needed for the
contains↔mem bridge in the new `presFiltered_subset` derivation that
powers the rule constructor).
- `Strata/Transform/CallElimCorrect.lean`: success arm's `Hpre_evalTt` /
`Hpre` re-synthesis ranges over filtered preconditions; `HpreFiltered`
no longer needs `filterCheck_in_getCheckExprs` — direct membership in
`presFiltered` transports straight to the filtered contains-form.
Failure-arm helper signature updated to take `Hpre_def` / `Hpre_iff` at
the new filtered shape.

Build clean (490 jobs) with one expected sorry at line 1654 (failure-arm
body, filled by the next commit).

### 6. 45db77d — Stage 6 scaffold: lift `_terminal` to `{f : Bool}`,
delegate failure arm

Shape change without filling in the body. In
`Strata/Transform/CallElimCorrect.lean`,
`callElimStatementCorrect_terminal`'s signature lifts from `f = false →
false` to `{f : Bool}`, with `Heval` and the conclusion both at flag
`f`. Non-call cases close polymorphically via `nc_close` (`Heval` passes
through unchanged for identity statements). The call branch dispatches:

```lean
cases f with
| false => existing 2200-LoC body verbatim (f=false case)
| true  => exact callElimStatementCorrect_terminal_call_arm_fail ...
```

The `_terminal_call_arm_fail` helper is declared with full signature
(~25 hypotheses from the `cases Hcc` destructure) and a `sorry` body,
ready for the final commit. Splitting the signature lift from the proof
close means the lifted public signature is reviewable and buildable on
its own, and the failure-arm proof body lands against an already-stable
interface. Sorry count: 18 → 19 (temporary).

### 7. be35c7f — Stage 6: close polymorphic-f failure arm + lift
public signature

Replaces the sorry-stubbed
`callElimStatementCorrect_terminal_call_arm_fail` in
`Strata/Transform/CallElimCorrect.lean` with the full proof body (~2,148
LoC) that mirrors the success arm's L1-L3 + L5/L6 plumbing while
flag-flipping at L4:

- **L1, L2, L3**: identical to success (init segments produce `f = false
→ false`).
- **L4 (asserts)**: `H_asserts_zip_fail` with failing-pre witness.
- **L5 (havocs)**: `H_havocs_poly` at `f := true`.
- **L6 (assumes)**: `H_assumes_zip_poly` at `f := true`.
- **Glue**: `EvalCallElim_glue_fail` composing `false → false → false →
false → true`.

Failing-pre witness derivation (now unblocked by Path 1):

1. `Hpre_iff` at `flag = true`: `Classical.em` on "all preconditions
eval to `tt`" → contrapositive gives `∃ pre ∈ presFiltered, δ σAO pre ≠
some tt`.
2. Bool-totality from new `WFCallSiteProp.preBoolTyped` field (mirrors
the `boolTyped` clause on `WFPrePostProp`): `δ σAO pre = some tt ∨ ff`.
3. Combine → `∃ pre, δ σAO pre = some HasBool.ff`.
4. Transport `σAO ↦ σ_old` via `subst_fvars_correct` against
`Hsubst_L4`.

WF additions in `Strata/Languages/Core/WF.lean`:

- `WFCallSiteProp.preBoolTyped` (~line 1156): per-precondition
bool-totality.
- `WFCallSiteSpec.preBoolTyped` (~line 1207): per-call specialization.
- `WFCallSiteProp.specialize` forwards the new field.

The exit arm is unchanged — call statements never produce `.exiting`.
Sorry count: 19 → 18 (back to baseline). Build clean, no new axioms.

## Polymorphic helpers — all genuinely consumed

| Helper | Source | Callsites |
| --- | --- | --- |
| `EvalCallElim_glue_fail` | `Strata/Transform/CoreTransformProps.lean`
| 1 |
| `H_asserts_zip_fail` | `Strata/Transform/SubstProps.lean` | 1 |
| `H_havocs_poly` | `Strata/Transform/CoreTransformProps.lean` | 1 |
| `H_assumes_zip_poly` | `Strata/Transform/CoreTransformProps.lean` | 1
|
| `WFPrePostProp.preBoolTyped` | `Strata/Languages/Core/WF.lean` | 1 |

Helpers are deliberately single-callsite for now: the polymorphic-f
infrastructure is staged to be reusable, but until a second consumer
materializes the helpers stay scoped to this proof and we avoid
speculative API design. No dormant scaffolding.

## Soundness

- **Build**: clean (490 jobs)
- **New sorries**: 0 (baseline 18 in `Strata/Languages/Core/WF.lean` are
pre-existing and unrelated to this work)
- **New admits**: 0
- **New axioms**: 0
- **`#print axioms callElimStatementCorrect_terminal`**: `[propext,
Classical.choice, Quot.sound]` — Lean stdlib only, no project-local
axioms
- **Helper consumption**: all 4 polymorphic helpers + `preBoolTyped`
have ≥ 1 live callsite (verified above)
- **Path 1 conservativity**: the narrowed `call_sem` iff matches
`callElimCmd`'s actual emission rule (asserts only for non-Free
precondition components); success-arm reproof is mechanical

## Commit list

```
be35c7f Stage 6: close polymorphic-f failure arm + lift public signature
fdf2fc4 Path 1: narrow EvalCommandContract.call_sem iff to non-Free preconditions
45db77d Stage 6 scaffold: lift _terminal to {f : Bool}, delegate failure arm
cf10129 Stage 3: extract b1/b2_var_witness_at_oldSubst helpers
5406b08 Stage 2a: extract HoldSubBridge_at_σO helper
580ef2d Stage 1: extract HoldEval_bridge_at_σO helper
193dee2 WIP: polymorphic-f infrastructure (no live consumers)
```

## Test plan

- [x] `lake build` clean across all 490 jobs
- [x] Zero new sorries introduced (baseline 18 in
`Strata/Languages/Core/WF.lean` preserved)
- [x] Zero new admits/axioms introduced
- [x] `#print axioms callElimStatementCorrect_terminal` → Lean stdlib
only (`[propext, Classical.choice, Quot.sound]`)
- [x] Each polymorphic helper genuinely consumed
(`EvalCallElim_glue_fail`, `H_asserts_zip_fail`, `H_havocs_poly`,
`H_assumes_zip_poly`, `WFPrePostProp.preBoolTyped` — 1 callsite each)
- [ ] Reviewer: spot-check Stages 1 / 2a / 3 (580ef2d, 5406b08,
cf10129) are pure refactors by diffing helper bodies against the
inline blocks they replaced
- [ ] Reviewer: confirm Path 1 (fdf2fc4) preserves success-arm
soundness — `Hpre_evalTt` re-synthesis over filtered preconditions is
the only consumer-side change in the success arm
- [ ] Reviewer: confirm Change A in 193dee2 (collapsed `UpdateStates`
premise on `call_sem`) preserves the parent branch's `false → false`
semantics
- [ ] Downstream consumers of `_terminal` (none on
`htd/callelim-smallstep-on-main2` today; re-check after rebase if #1306
lands first)
## Summary

Round 3 tier-1 cleanups for the polymorphic-`f` callElim work merged in
#1340. These commits were authored on top of #1340's head before the
squash merge but were not included; this PR brings them onto the parent
branch.

Net delta: **-287 LoC** (5 files, +181/-468) across 19 small, atomic
commits. No new sorries, no new axioms, no behavioral change.

## Highlights

| Commit | Description | LoC |
|---|---|---|
| 2bb66b3 | R3-1: collapse `WFCallSiteProp` into `WFCallSiteSpec` via
def-wrapper | -64 |
| ca67ea7 | R3-2: drop f=false corollaries; rely on poly versions |
-121 |
| 680e092 | R3-12: hoist `π/φ/δ` binders into file-scope `variable`
block | -10 |
| 26bf9ae | R3-10: extract `havocVars_3layer_lift` helper | -23 |
| ed3725d | R3-8: add `Procedure.Spec.checkedPreconditions` abbrev |
-12 |
| 7b6fbb3 | R3-19: simplify HentryFail mem-zip pattern in fail arm |
-10 |
| 8e3bc3f | R3-32: flatten nested mem_append cases via rcases | -14 |

Plus 12 smaller items: bool-totality cleanup, redundant hypothesis
drops, helper extractions, doc fixes (stale "seven" counts, removed
references to deleted `WFPrePostProp.boolTyped`, fixed inverted section
header hierarchy, dropped legacy `Hwfgenst` references).

## Soundness

- **Build clean**: 490 jobs, no warnings or errors
- **Sorries**: 0 in modified files
- **Axioms**: `callElimStatementCorrect` depends only on `[propext,
Classical.choice, Quot.sound]` — Lean stdlib only
- **No behavioral change**: each commit is a refactor verified by `lake
build`

## Test plan
- [ ] CI green
- [ ] No sorry/axiom regressions vs base
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants