Skip to content

Core: Sequence bounds preconditions and VC-printer fallback fix#1100

Open
fabiomadge wants to merge 4 commits into
mainfrom
feat/core-sequence-wf
Open

Core: Sequence bounds preconditions and VC-printer fallback fix#1100
fabiomadge wants to merge 4 commits into
mainfrom
feat/core-sequence-wf

Conversation

@fabiomadge
Copy link
Copy Markdown
Contributor

@fabiomadge fabiomadge commented May 2, 2026

Blocker for Laurel Seq/Array in #1073. Two related Core changes:

1. Fix VC-printer fallback for unknown 0-ary ops

handleZeroaryOps used to emit re.none() as its fallback, so VCs containing any 0-ary op outside the regex set (e.g. Sequence.empty) rendered as re.none() in printer output. Switched to mkGenericCall, matching how handleUnaryOps / handleBinaryOps already handle unknown ops.

2. Bounds preconditions for Sequence.select / update / take / drop

Following the Int.SafeDiv pattern:

Op Precondition
Sequence.select 0 <= i && i < Sequence.length(s)
Sequence.update 0 <= i && i < Sequence.length(s)
Sequence.take 0 <= n && n <= Sequence.length(s)
Sequence.drop 0 <= n && n <= Sequence.length(s)

Sequence.length / empty / append / build / contains remain total.

PrecondElim picks the obligations up automatically at call sites in imperative code (via transformStmt) and in pure positions like requires / ensures / quantifier bodies / function bodies (via the synthetic $$wf procedures).

Obligations carry propertyType = "outOfBoundsAccess" (new MetaData constant, mirroring divisionByZero), flow through a new PropertyType.outOfBoundsAccess enum variant and the three metadata-to-PropertyType conversion sites, and render as "out-of-bounds-access" in SARIF output.

Incidental fix

While wiring the SARIF classification, I noticed propertyTypeToClassification in SarifOutput.lean was pre-existing dead code: vcResultToSarifResult never set properties.propertyType, so every obligation defaulted to "assert" in SARIF output. This PR wires it up, so divisionByZero and arithmeticOverflow obligations now also classify correctly in SARIF alongside the new outOfBoundsAccess.

Testing

  • StrataTest/Transform/PrecondElim.lean Test 10collectPrecondAsserts attaches outOfBoundsAccess metadata for all four partial ops plus a nested Sequence.select(Sequence.update(...)) call. Mirrors the pattern in OverflowCheckTest.lean. Complemented by per-op pretty-print snapshots (#guard_msgs) that assert the exact obligation body string — these catch regressions that preserve count and metadata tag but corrupt the obligation body (e.g. passing .Le instead of .Lt to mkSeqBoundsPrecond at a call site, changing the bound-variable name, or swapping the lower/upper bound inside mkSeqBoundsPrecond).
  • StrataTest/Languages/Core/Tests/GenericCallFallbackTest.lean — regression test that Sequence.empty renders as a generic call in VC printer output, not re.none(). Added as the missing handleZeroaryOps case in the existing handleUnaryOps / handleBinaryOps / handleTernaryOps family.
  • StrataTest/Languages/Core/Tests/SarifOutputTests.lean — property-type classification tests covering all five PropertyType variants.

Collateral updates to existing tests reflect the new obligations (Seq.lean), updated requires on Sequence function signatures (ProgramEvalTests.lean), and the 0-ary printer fix (Loops.lean). Note: the bounds obligations in Seq.lean appear as true && 0 < length(...) — the partial evaluator simplifies 0 <= 0 to true but does not further simplify true && X to X. This is a pre-existing evaluator gap, not newly introduced by this PR; the SMT solver discharges the obligation trivially.

Known downstream impact

PR #1073 (Laurel Seq/Array) emits Sequence.select / update / take / drop calls from its translation. Its T18_Sequences test uses #guard_msgs(drop info, error) on a diagnostics-only pipeline, so the test assertions should still pass syntactically — but individual sequence-manipulation programs now require bounds guards to fully verify. PR #1073 will need to adjust any end-to-end verification examples after this one merges. No in-repo Laurel tests are broken by this PR.

Out of scope

Parseable Sequence.empty<T>() syntax in raw Core source — a grammar-level feature for a separate PR. The printer fix here handles the display side; SMT encoding was already correct.

@github-actions github-actions Bot added the Core label May 2, 2026
@fabiomadge fabiomadge force-pushed the feat/core-sequence-wf branch 2 times, most recently from dacef58 to 5e268e5 Compare May 2, 2026 23:05
@fabiomadge fabiomadge marked this pull request as ready for review May 2, 2026 23:24
@fabiomadge fabiomadge requested a review from a team May 2, 2026 23:24
tautschnig
tautschnig previously approved these changes May 4, 2026
Comment thread Strata/Languages/Core/DDMTransform/FormatCore.lean Outdated
Comment thread Strata/Languages/Core/Factory.lean Outdated
Comment thread Strata/Languages/Core/Factory.lean Outdated
Comment thread Strata/Transform/PrecondElim.lean Outdated
private def classifyPrecondition (funcName : String) (precondIdx : Nat := 0) : Option String :=
and overflow), the precondition index distinguishes them.

Exposed (not `private`) so that tests can verify the classification for each
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be a in a comment. You can just change the visibility

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — reverted to private and dropped the stale "Exposed" docstring. No test actually needs it public (the metadata-check test uses collectPrecondAsserts + metadata inspection instead).

Comment thread StrataTest/Transform/PrecondElim.lean Outdated
Comment thread StrataTest/Transform/PrecondElim.lean Outdated
Comment thread StrataTest/Transform/PrecondElim.lean Outdated
atomb
atomb previously approved these changes May 5, 2026
Copy link
Copy Markdown
Contributor

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice. There's a little room to reduce repetition, however.

Comment thread Strata/DL/Imperative/CmdEval.lean Outdated
Comment thread Strata/Languages/Core/StatementEval.lean Outdated
Comment thread Strata/Languages/Core/ObligationExtraction.lean Outdated
Comment thread StrataTest/Transform/PrecondElim.lean
Copy link
Copy Markdown
Contributor

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof-coverage suggestions ("what to prove next"). These are all follow-ups, not blocking.

  • convertMetaDataPropertyType_inverse_of_classifyPrecondition — for every (fn, idx) that classifyPrecondition maps to some metaString, convertMetaDataPropertyType applied to a MetaData tagged with that string returns a non-.assert PropertyType. Concretely: there shouldn't be a gap where classifyPrecondition tags a call but convertMetaDataPropertyType silently reverts to .assert. Trivial to state, would catch the string-mismatch class of regressions the triplication enables.

  • PropertyType-to-SARIF round-trip — (propertyTypeToClassification p).fromClassification? = some p for all p : PropertyType, where fromClassification? is a matching deserializer (not yet written). Would make PropertyType the single source of truth and mechanically prune the MetaData.* string constants.

  • mkSeqBoundsPrecond_form — a structural theorem about (mkSeqBoundsPrecond x k).expr: it equals boolAnd (intLe 0 x) (k.opExpr x (seqLength s)) (or the concrete LExpr shape). Makes Test 10's off-by-one blind spot (see inline comment #2) explicit and proved rather than merely tested. Small.

  • seqSelect_denote_partial — semantic soundness of the bounds precondition with respect to Lean's List.get?: 0 ≤ i < s.length → (seqSelect s i).denote = .some (s.get! i). This is FactoryCorrect.lean territory (cf. PR #1103) rather than this PR, but worth tracking as the obvious continuation: the preconditions added here are only useful insofar as they correspond to a semantic partiality.

  • collectPrecondAsserts-spec — a purely structural theorem that for every sub-expression that is a call to a function with N preconditions, collectPrecondAsserts produces exactly N assert statements, each tagged by classifyPrecondition. Test 10 verifies counts 1, 1, 1, 1, and 2 empirically; a theorem would subsume all of that and catch e.g. a future refactor that stops visiting nested calls.

Refactoring opportunities (not blocking).

  • As noted in finding #3, consolidate MetaData.{divisionByZero,...} / propertyTypeToClassification / convertMetaDataPropertyType into a single table.
  • The new assertOutOfBoundsObligations helper in the tests is a good pattern; the pre-existing overflow/division tests in the same file (Test 1, Test 3, Test 5) could share a sibling helper, reducing per-test boilerplate uniformly.

Comment thread Strata/Transform/PrecondElim.lean Outdated
Comment on lines +64 to +70
/-- Classify a function precondition into a property type for SARIF reporting.
For functions with multiple preconditions (e.g., SafeSDiv has both div-by-zero
and overflow), the precondition index distinguishes them. -/
private def classifyPrecondition (funcName : String) (precondIdx : Nat := 0) : Option String :=
and overflow), the precondition index distinguishes them.

Exposed (not `private`) so that tests can verify the classification for each
partial op without having to inspect metadata on generated asserts. -/
def classifyPrecondition (funcName : String) (precondIdx : Nat := 0) : Option String :=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carryover from joscoh's thread at this location (#1100 (comment)...). The reply says:

Done — reverted to private and dropped the stale "Exposed" docstring. No test actually needs it public (the metadata-check test uses collectPrecondAsserts + metadata inspection instead).

But the code in bbf0d27f7 still has def classifyPrecondition (public in the module sense) and still carries the "Exposed (not private)" paragraph. I grepped the tree: the only callers of classifyPrecondition are line 113 in this same file — the tests use collectPrecondAsserts + md.getPropertyType as the reply says — so private is genuinely safe. This is presumably an unpushed intent.

Suggested replacement for lines 64–70:

Suggested change
/-- Classify a function precondition into a property type for SARIF reporting.
For functions with multiple preconditions (e.g., SafeSDiv has both div-by-zero
and overflow), the precondition index distinguishes them. -/
private def classifyPrecondition (funcName : String) (precondIdx : Nat := 0) : Option String :=
and overflow), the precondition index distinguishes them.
Exposed (not `private`) so that tests can verify the classification for each
partial op without having to inspect metadata on generated asserts. -/
def classifyPrecondition (funcName : String) (precondIdx : Nat := 0) : Option String :=
/-- Classify a function precondition into a property type for SARIF reporting.
For functions with multiple preconditions (e.g., SafeSDiv has both div-by-zero
and overflow), the precondition index distinguishes them. -/
private def classifyPrecondition (funcName : String) (precondIdx : Nat := 0) : Option String :=

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this — you are right, the intent never actually landed. The test commit was silently reverting private back to def on every rebase (the change was stuck in that commit from an earlier iteration). Dropped the reversion from the test commit and re-applied the private.

private def mkSeqBoundsPrecond
(varName : String) (k : SeqBoundKind) :
Strata.DL.Util.FuncPrecondition (LExpr CoreLParams.mono) CoreLParams.Metadata :=
let sVar : LExpr CoreLParams.mono := .fvar default "s" (some (seqTy mty[%a]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit coupling: mkSeqBoundsPrecond hard-codes "s" here, which is only correct because the four callers (seqSelectFunc, seqUpdateFunc, seqTakeFunc, seqDropFunc) all happen to name their sequence input "s". The h_precond proof obligation on polyUneval catches mismatches at definition time — so this isn't a correctness trap, but the constraint isn't documented at the point where it bites.

Either take a second seqVarName parameter (symmetric with the existing varName for the index/count):

Suggested change
let sVar : LExpr CoreLParams.mono := .fvar default "s" (some (seqTy mty[%a]))
/-- Precondition `0 <= varName && varName `k.opExpr` Sequence.length(seqVarName)`. -/
private def mkSeqBoundsPrecond
(seqVarName : String) (varName : String) (k : SeqBoundKind) :
Strata.DL.Util.FuncPrecondition (LExpr CoreLParams.mono) CoreLParams.Metadata :=
let sVar : LExpr CoreLParams.mono := .fvar default seqVarName (some (seqTy mty[%a]))
let xVar : LExpr CoreLParams.mono := .fvar default varName (some mty[int])

and update the four call sites to mkSeqBoundsPrecond "s" "i" .Lt etc. — slightly more verbose at the call sites, but the helper's contract is now self-describing.

Or, if you prefer to keep the helper's signature narrow, add a one-line doc note: Assumes the enclosing function names its sequence input "s".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a docstring note — the "s" hardcoding is a convention that mirrors yNeZeroPrecond's "y" in IntBoolFactory.lean, and polyUneval's h_precond free-vars check catches mismatches at elaboration time. Kept the signature narrow rather than adding a parameter, matching the existing pattern.

Comment on lines +437 to +447

/-- Check that `collectPrecondAsserts` produces exactly `expectedCount`
obligations from `expr`, each tagged with `outOfBoundsAccess`. -/
private def assertOutOfBoundsObligations
(expr : Core.Expression.Expr) (expectedCount : Nat) : IO Unit := do
let stmts := collectPrecondAsserts Core.Factory expr "test" #[]
assert! stmts.length == expectedCount
for s in stmts do
let md : MetaData Core.Expression := match s with
| Statement.assert _ _ md => md | _ => #[]
assert! md.getPropertyType == some MetaData.outOfBoundsAccess
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertOutOfBoundsObligations checks the obligation count and the metadata tag, but not the obligation content. That means an off-by-one regression in mkSeqBoundsPrecond — say, someone flips intLtFunc to intLeFunc for the .Lt case, producing ... && i <= length(s) for Sequence.select — slides past Test 10 entirely (count and tag both stay right).

The proof-coverage suggestion in the main review body (mkSeqBoundsPrecond_form) is the strong version. A lighter-weight test-level check that would catch this class of regression cheaply:

/-- Check that `collectPrecondAsserts` produces exactly `expectedCount`
    obligations from `expr`, each tagged with `outOfBoundsAccess` and
    each having the expected `boolAnd`-of-two-int-comparisons shape
    (so off-by-one regressions in `mkSeqBoundsPrecond` are caught). -/
private def assertOutOfBoundsObligations
    (expr : Core.Expression.Expr) (expectedCount : Nat) : IO Unit := do
  let stmts := collectPrecondAsserts Core.Factory expr "test" #[]
  assert! stmts.length == expectedCount
  for s in stmts do
    match s with
    | Statement.assert _ e md =>
      assert! md.getPropertyType == some MetaData.outOfBoundsAccess
      -- The obligation is `boolAnd (intLe 0 _) (intLt|intLe _ (seqLength _))`.
      match e with
      | LExpr.app _ (LExpr.app _ andOp _) _ =>
        assert! andOp.isBoolAnd  -- pseudo; real predicate will depend on LExpr shape
      | _ => assert! false
    | _ => assert! false

(Exact predicate isBoolAnd would need spelling out; the intent is to fail loudly if the top-level constructor stops being boolAnd.) Happy to prepare a concrete patch if you'd like.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the gap — I confirmed locally that flipping .Lt to .Le slides past the current helper. Deferring the fix to follow-up per your "not blocking" framing. mkSeqBoundsPrecond_form (from the main review body) is the clean structural answer; a test-level pretty-print string comparison would be a cheaper partial substitute. Happy to take either in a follow-up PR.

Comment on lines +122 to +134
/-- Convert a `MetaData` entry's property-type classification string to the
`PropertyType` enum. Falls back to `.assert` when the metadata carries
no classification or an unrecognized string; callers that emit
propertyType classifications should add a matching arm here. -/
def convertMetaDataPropertyType {P : PureExpr} [BEq P.Ident]
(md : MetaData P) : PropertyType :=
match md.getPropertyType with
| some s =>
if s == MetaData.divisionByZero then .divisionByZero
else if s == MetaData.arithmeticOverflow then .arithmeticOverflow
else if s == MetaData.outOfBoundsAccess then .outOfBoundsAccess
else .assert
| none => .assert
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have three parallel representations of the same enum:

  1. MetaData.divisionByZero / arithmeticOverflow / outOfBoundsAccessString constants (MetaData.lean:293–299).
  2. propertyTypeToClassification : PropertyType → String (SarifOutput.lean:92) — the display side, kebab-case.
  3. convertMetaDataPropertyType : MetaData → PropertyType — the decode side, using the constants from (1).

Adding a new PropertyType variant requires changes in all three places plus a new kebab/camel string pair, and forgetting (1)+(3) silently degrades to .assert at runtime — there's no compile-time check that (1) and (3) stay in sync.

One way to consolidate: a single definition that tabulates the serialization plus inverse:

/-- Canonical mapping from `PropertyType` to the pair of (metadata tag, SARIF classification) strings. -/
def PropertyType.serialization : PropertyType → Option (String × String)
  | .divisionByZero     => some ("divisionByZero",     "division-by-zero")
  | .arithmeticOverflow => some ("arithmeticOverflow", "arithmetic-overflow")
  | .outOfBoundsAccess  => some ("outOfBoundsAccess",  "out-of-bounds-access")
  | .cover              => some ("cover",              "cover")
  | .assert             => none

Then propertyTypeToClassification and convertMetaDataPropertyType both derive from this single source, and the MetaData.* constants can go. A #[simp] round-trip lemma (fromClassification? (toClassification p) = some p) gives compile-time coverage of the set. See the proof-coverage suggestion in the main review body.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the architectural observation. Deferring per your "not blocking" framing — a PropertyType.serialization table with the round-trip lemma you sketched is the right shape, but it touches pre-existing constants (MetaData.divisionByZero etc.) and the SARIF classifier, so a standalone cleanup PR is cleaner than folding it in here.

Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Clean PR. The convertMetaDataPropertyType extraction is a good de-duplication, SeqBoundKind is a nice way to prevent nested-precondition accidents, and the SARIF property-type wiring fix is a genuine improvement (was dead code before). Test coverage is solid.

No new issues beyond what's already been flagged in existing threads.

-- -----------------------------------------------------------------------
-- Test: unknown 0-ary operation renders as generic call (e.g.
-- `Sequence.empty`, which is registered in the factory but not yet
-- parseable in raw Core source). Previously, the `handleZeroaryOps`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Nit: "Previously, the handleZeroaryOps fallback silently substituted re.none()." — this is a non-stateless comment referencing the old behavior. Consider rephrasing to describe what the test verifies without referencing the prior state, e.g.:

-- Test: unknown 0-ary operation renders as a generic call (e.g.
-- `Sequence.empty`, which is registered in the factory but not yet
-- parseable in raw Core source).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — dropped the "Previously, ..." sentence.

@fabiomadge fabiomadge force-pushed the feat/core-sequence-wf branch from bbf0d27 to 2bf64c3 Compare May 6, 2026 22:03
@fabiomadge fabiomadge requested review from atomb, joscoh and tautschnig May 7, 2026 22:30
tautschnig
tautschnig previously approved these changes May 8, 2026
Comment on lines +524 to +536
private def mkSeqBoundsPrecond
(varName : String) (k : SeqBoundKind) :
Strata.DL.Util.FuncPrecondition (LExpr CoreLParams.mono) CoreLParams.Metadata :=
let sVar : LExpr CoreLParams.mono := .fvar default "s" (some (seqTy mty[%a]))
let xVar : LExpr CoreLParams.mono := .fvar default varName (some mty[int])
let zero : LExpr CoreLParams.mono := .intConst default 0
let lenS : LExpr CoreLParams.mono := .app default seqLengthFunc.opExpr sVar
let lower : LExpr CoreLParams.mono :=
.app default (.app default (intLeFunc (T := CoreLParams)).opExpr zero) xVar
let upper : LExpr CoreLParams.mono :=
.app default (.app default k.opExpr xVar) lenS
⟨.app default (.app default (boolAndFunc (T := CoreLParams)).opExpr lower) upper,
default⟩
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof coverage: a @[simp] free-vars theorem here, parallel to yNeZeroPrecond_freeVars at Strata/DL/Lambda/IntBoolFactory.lean:432, would both document the invariant and trivially discharge polyUneval's h_precond at every call site:

@[simp]
private theorem mkSeqBoundsPrecond_freeVars (varName : String) (k : SeqBoundKind) :
    ((mkSeqBoundsPrecond varName k).expr.freeVars).map (·.1.name) = ["s", varName] := by
  cases k <;>
  simp [mkSeqBoundsPrecond, LExpr.freeVars, SeqBoundKind.opExpr,
        LFunc.opExpr, intLeFunc, intLtFunc, boolAndFunc, seqLengthFunc]

(Exact simp lemmas will depend on how each helper is defined; the pattern in IntBoolFactory.lean:432 gives the template. If the .map (·.1.name) = ["s", varName] equality is too tight — e.g. ordering-dependent — a variant is the weaker-but-still-useful form.)

With that in place, the "s" hardcoding noted in your docstring becomes a theorem, not a convention maintained by vigilance. And the h_precond in each of the four seqSelect/Update/Take/Drop call sites discharges via simp rather than relying on grind's opacity-sensitive behaviour.

Second, stronger theorem that I think is more valuable — see the inline on the test file — is a structural theorem mkSeqBoundsPrecond_form pinning the top-level AST to boolAnd (intLe 0 _) (k.opExpr _ _). That one catches the off-by-one regression I flagged last round at elaboration time. Sketch:

theorem mkSeqBoundsPrecond_form (varName : String) (k : SeqBoundKind) :
    ∃ lo up,
      (mkSeqBoundsPrecond varName k).expr =
        .app default (.app default (boolAndFunc (T := CoreLParams)).opExpr lo) up ∧
      -- lo is `0 ≤ x` with `intLeFunc` (independent of `k`)
      lo = .app default (.app default (intLeFunc (T := CoreLParams)).opExpr (.intConst default 0))
             (.fvar default varName (some mty[int])) ∧
      -- up is `x ≤/< length(s)` with `k.opExpr`
      up = .app default (.app default k.opExpr
               (.fvar default varName (some mty[int])))
             (.app default seqLengthFunc.opExpr (.fvar default "s" (some (seqTy mty[%a])))) := by
  cases k <;> exact ⟨_, _, rfl, rfl, rfl⟩

If rfl holds as written (I'd expect it to, the definition is entirely concrete), this theorem is essentially free and closes the Test 10 content-check gap structurally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landed mkSeqBoundsPrecond_form per your sketch — cases k <;> rfl closes it. One subtlety: the up clause needs to pin the upper-bound op per-kind rather than abstract it as k.upperOpExpr, otherwise any change to the mapping inside upperOpExpr satisfies rfl trivially (both sides reduce through the same function). Verified locally that swapping intLtFuncintLeFunc in upperOpExpr now breaks this proof.

Deferring mkSeqBoundsPrecond_freeVars_subset to a follow-up: I landed it locally and confirmed the ⊆ [varName, "s"] form works (with rw [mkSeqBoundsPrecond_form] then cases k <;> simp [LExpr.freeVars, WFLFunc.opExpr, LFunc.opExpr, List.subset_def]). But I also probed stripping the @[simp] tag — the existing polyUneval default first | decide | grind was already closing h_precond at all four call sites, so the tag provides no observable benefit in this PR. Happy to land it separately if you want the theorem purely for documentation value.

Independently, mkSeqBoundsPrecond_form does not catch the call-site regression (.Lt.Le at a preconditions := [mkSeqBoundsPrecond "i" _]): the theorem is universally quantified over k, so a specific k at the call site is not pinned. That class is covered by the pretty-print snapshots on the test-file thread (see reply there).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: after pushback from @joscoh here, I've dropped mkSeqBoundsPrecond_form. His argument is that the theorem reduces to restating the definition — writing code twice — and enforcing that a function is definitionally equal to its own unfolding.

I probed whether the theorem catches anything unique vs the per-op pretty-print snapshots (Test 10a) and confirmed it doesn't:

  • upperOpExpr .Lt => .Le drift → all 4 snapshots fire.
  • Swapping lower/upper inside mkSeqBoundsPrecond's body → all 4 snapshots fire.
  • Callsite .Lt → .Le → snapshot for that callsite fires (which the theorem never caught anyway, since it's quantified over k).

The theorem was only buying earlier detection at the feat-commit boundary, not unique coverage. Dropped to avoid the redundancy.

Keeping the upperOpExpr rename + scoping docstring (those are independent wins) and the snapshot tests (those do the coverage work).

Comment on lines +122 to +134
/-- Convert a `MetaData` entry's property-type classification string to the
`PropertyType` enum. Falls back to `.assert` when the metadata carries
no classification or an unrecognized string; callers that emit
propertyType classifications should add a matching arm here. -/
def convertMetaDataPropertyType {P : PureExpr} [BEq P.Ident]
(md : MetaData P) : PropertyType :=
match md.getPropertyType with
| some s =>
if s == MetaData.divisionByZero then .divisionByZero
else if s == MetaData.arithmeticOverflow then .arithmeticOverflow
else if s == MetaData.outOfBoundsAccess then .outOfBoundsAccess
else .assert
| none => .assert
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per your "deferring to a follow-up PR" reply on the triple-representation thread — agreed on scope. One cheap intermediate measure worth adding here, before the full PropertyType.serialization consolidation lands, is a @[simp] roundtrip theorem. It doesn't change the code shape, but it gives you a compile-time error if someone adds a MetaData.* constant without a matching convertMetaDataPropertyType arm (or vice versa):

/-- Encoding `p` as a MetaData propertyType string and decoding it back
    yields `p` (for all non-`.assert` variants; `.assert` is the default). -/
@[simp]
theorem convertMetaDataPropertyType_roundTrip
    {P : PureExpr} [BEq P.Ident] (p : PropertyType) :
    p ≠ .assert →
    let md : MetaData P := MetaData.fromPair MetaData.propertyType
      (match p with
       | .divisionByZero     => MetaData.divisionByZero
       | .arithmeticOverflow => MetaData.arithmeticOverflow
       | .outOfBoundsAccess  => MetaData.outOfBoundsAccess
       | .cover              => "cover"    -- if .cover is ever tagged via MetaData
       | .assert             => "")
    convertMetaDataPropertyType md = p := by
  intro hne
  cases p <;> simp_all [convertMetaDataPropertyType, MetaData.getPropertyType, ...]
  -- the `.cover` case needs a MetaData string constant if you want it covered here,
  -- or split the theorem to non-`.cover`-and-non-`.assert` PropertyTypes.

(Exact simp set depends on how MetaData.fromPair / getPropertyType reduce; adjust from there. If .cover isn't meant to flow through MetaData, restrict the theorem's scope.)

When you do the full consolidation follow-up, this theorem becomes the inverse-of-forward lemma and is still useful. If a new PropertyType variant gets added without the corresponding MetaData.* + convertMetaDataPropertyType arm, cases p now produces an unhandled case and the proof stops compiling. That's the compile-time coverage the triple-representation is currently missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring. I prototyped the three variant round-trip theorems locally and they go through, but proving them requires adding @[expose] to about seven definitions in Strata/DL/Imperative/MetaData.lean (pushElem, findElem, propertyType, getPropertyType, Field.beq, and the three property-type string constants) plus convertMetaDataPropertyType itself in EvalContext.lean, because their bodies are otherwise opaque to simp/decide outside their defining file.

Exposing pre-existing MetaData internals is a public-API change that predates this PR's scope, and (as you note) the cleanest version is bundled with the full PropertyType.serialization consolidation. I'd rather do it once, cleanly, in the follow-up PR than slip the exposure in here.

Secondary note: the construction in your sketch uses MetaData.fromPair, which doesn't exist in the current codebase — the canonical form is MetaData.empty.pushElem MetaData.propertyType (.msg MetaData.divisionByZero). Mentioning in case it's useful when you come back to this.

Comment on lines +438 to +447
/-- Check that `collectPrecondAsserts` produces exactly `expectedCount`
obligations from `expr`, each tagged with `outOfBoundsAccess`. -/
private def assertOutOfBoundsObligations
(expr : Core.Expression.Expr) (expectedCount : Nat) : IO Unit := do
let stmts := collectPrecondAsserts Core.Factory expr "test" #[]
assert! stmts.length == expectedCount
for s in stmts do
let md : MetaData Core.Expression := match s with
| Statement.assert _ _ md => md | _ => #[]
assert! md.getPropertyType == some MetaData.outOfBoundsAccess
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per your "deferring to follow-up" reply — fine on the blocking call. Restating the concrete substitute so it's in one place:

The current helper catches count and tag regressions but not content regressions. An off-by-one — .Lt ↔ .Le in mkSeqBoundsPrecond, or worse, swapping intLeFunc for intLtFunc on the lower bound — slides through Test 10 entirely. I confirmed locally that flipping mkSeqBoundsPrecond's upper-bound op on Select would not fail any test in this PR.

Two substitute shapes, in increasing order of strength:

Cheap: snapshot the pretty-printed form of each obligation (same shape as Examples/Seq.lean already does for the end-to-end case, but tighter-scoped to this helper):

/-- info: "0 <= i && i < Sequence.length(s)" -/
#guard_msgs in
#eval do
  let stmts := collectPrecondAsserts Core.Factory
    (LExpr.mkApp () Core.seqSelectOp [fxS, fxI]) "test" #[]
  match stmts.head? with
  | some (Statement.assert _ e _) => IO.println (repr (Std.format e))
  | _ => IO.println "<unexpected>"

Four of these (one per op, with .Lt vs .Le spelled in the snapshot) would catch any bound-operator regression. Pretty-print strings are a blunt instrument but cheap.

Structural: the mkSeqBoundsPrecond_form theorem I sketched in the inline comment on Factory.lean. If that rfls, it obviates this test-level check — the regression becomes a compile error in the factory file, not a test failure. That's the better target if you can get it.

Happy to prep either patch if useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landed the cheap pretty-print snapshot version — four #guard_msgs blocks (one per partial op: Select, Update, Take, Drop), each asserting the exact printed body of the first obligation. Verified locally that flipping .Lt.Le at a call site (e.g. seqSelectFunc's preconditions) now trips these tests with a clear diff.

Note that this complements, rather than is subsumed by, the structural mkSeqBoundsPrecond_form theorem I also landed: the theorem catches helper-implementation drift (changes to mkSeqBoundsPrecond's body or SeqBoundKind.upperOpExpr's mapping), while these snapshots catch call-site drift (a caller passing the wrong SeqBoundKind). The theorem is universally quantified over k, so it cannot pin the specific k at a call site.


private def SeqBoundKind.opExpr : SeqBoundKind → LExpr CoreLParams.mono
| .Lt => (intLtFunc (T := CoreLParams)).opExpr
| .Le => (intLeFunc (T := CoreLParams)).opExpr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: SeqBoundKind.opExpr hard-codes intLtFunc / intLeFunc for the current two kinds. Fine for today. If a third partial-Sequence op ever needs a different comparison (e.g. a Nat-typed variant, or bitvector sequences), the inductive grows and so does this match. Worth a one-line comment tying the enum to "comparison operators over int" so a future bvLtFunc-flavoured kind is clearly a separate thing rather than another arm here.

Also a naming nit: SeqBoundKind.opExpr returns the upper-bound comparison; mkSeqBoundsPrecond uses k.opExpr only for the upper bound (the lower is hardcoded to intLeFunc). A name like SeqBoundKind.upperOpExpr reads more self-documenting:

Suggested change
| .Le => (intLeFunc (T := CoreLParams)).opExpr
private def SeqBoundKind.upperOpExpr : SeqBoundKind → LExpr CoreLParams.mono
| .Lt => (intLtFunc (T := CoreLParams)).opExpr
| .Le => (intLeFunc (T := CoreLParams)).opExpr

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — renamed to upperOpExpr and added a scoping comment noting this is int-comparison-only and that a non-int variant (e.g. bitvector) should introduce a separate helper rather than extend this enum.

Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 LGTM — all comments addressed.

Comment thread Strata/Languages/Core/Factory.lean Outdated
call sites. The upper-bound comparison is pinned per-kind (strict `<` for
`.Lt`, non-strict `≤` for `.Le`) rather than abstracted as `k.upperOpExpr`,
so that swapping the mapping inside `upperOpExpr` breaks this proof. -/
private theorem mkSeqBoundsPrecond_form (varName : String) (k : SeqBoundKind) :
Copy link
Copy Markdown
Contributor

@joscoh joscoh May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I personally don't think this theorem is all that necessary or helpful. IIUC, it reduces to proving that

SeqBoundKind.upperOpExpr k = match k with
   | .Lt => (intLtFunc (T := CoreLParams)).opExpr
   | .Le => (intLeFunc (T := CoreLParams)).opExpr

Which is true, but it seems quite silly to enforce that functions in the code don't change by stating lemmas for each that prove they are definitionally equal to their unfolding. At this point, we are just writing the code twice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. You're right that it reduces to writing the code twice, and I verified locally that everything this theorem catches is already caught by the per-op pretty-print snapshots in StrataTest/Transform/PrecondElim.lean (Test 10a) — both the upperOpExpr-mapping regression and the lower/upper-swap regression inside mkSeqBoundsPrecond. The theorem wasn't adding a unique safety net, just earlier detection at the feat-commit boundary.

Dropped the theorem. Kept the rename to upperOpExpr and its scoping comment, since those are legitimate standalone improvements independent of the theorem question.

MikaelMayer
MikaelMayer previously approved these changes May 8, 2026
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 LGTM — all comments addressed.

handleZeroaryOps fell back to logging an error and returning re.none() for
any 0-ary op outside the regex set. That silently substituted a regex
primitive for unrelated ops in VC printer output; users saw re.none()
where e.g. Sequence.empty() was intended.

Switch the fallback to mkGenericCall, matching how handleUnaryOps and
handleBinaryOps already handle unknown ops. The printer now emits the
op name as a free-variable reference, preserving the intent.

Parseable Sequence.empty<T>() syntax is still a separate grammar-level
feature; this commit only fixes the printer-side noise.
polyUneval is the combinator used to declare unevaluated polymorphic
functions with axioms. Unlike unaryOp and binaryOp, it had no way to
attach preconditions; callers had to hand-build the WFLFunc.

Add a 'preconditions' parameter and the matching free-vars proof
obligation (subset of the function's input names), defaulting to empty.
No behavioral change for existing callers.
Sequence.select and Sequence.update now require `0 <= i < length(s)`;
Sequence.take and Sequence.drop require `0 <= n <= length(s)`. PrecondElim
picks these up and generates VC obligations at call sites, both in
statement positions (via transformStmt) and in pure positions (via
mkContractWFProc / mkFuncWFProc) — so requires/ensures/quantifier-body
subscripts are also covered.

Obligations carry the propertyType metadata "outOfBoundsAccess" (new
MetaData constant) and flow through a new PropertyType.outOfBoundsAccess
enum variant — with matching entries in the statement-eval /
obligation-extraction / cmd-eval metadata-to-PropertyType conversion
sites — to finally render as "out-of-bounds-access" in SARIF output,
matching how divisionByZero and arithmeticOverflow are classified.

Side effect: `propertyTypeToClassification` in SarifOutput.lean was
previously dead code; `vcResultToSarifResult` never set
`properties.propertyType` so the SARIF output defaulted every obligation
to "assert". Wiring this up means divisionByZero and arithmeticOverflow
obligations now also classify correctly in SARIF — a pre-existing bug
this PR incidentally fixes.
New tests in StrataTest/Transform/PrecondElim.lean:
- Test 10:  Sequence.select in a procedure body emits the bounds assert
            (PrecondElim is unconditional — it inserts regardless of any
            surrounding requires guard; the SMT solver discharges).
- Test 10c: Sequence.select inside a requires clause triggers the
            $$wf-procedure path (mkContractWFProc).
- Test 10d: Sequence.select inside a function body triggers the
            function-body $$wf path (mkFuncWFStmts).
- Test 11:  collectPrecondAsserts attaches outOfBoundsAccess metadata
            for all four partial ops and a nested call. Mirrors
            OverflowCheckTest.lean. Also verifies Sequence.length emits
            no obligation (it is total).
- Test 12:  Sequence.empty printer regression for the commit-1 fix —
            renders as a generic call, not re.none().

New property-classification tests in
StrataTest/Languages/Core/Tests/SarifOutputTests.lean cover all five
PropertyType variants, exercising the SARIF wiring fix in commit 3.

Collateral test updates for real behavioral changes:
- StrataTest/Languages/Core/Examples/Seq.lean: expected VC output
  includes the new bounds obligations (all SMT-provable from the
  surrounding context, except the pre-existing contains_yes unknown).
- StrataTest/Languages/Core/Tests/ProgramEvalTests.lean: Sequence func
  signatures now render with the attached requires clauses.
- StrataTest/Languages/Core/Examples/Loops.lean: commit-1 printer fix
  propagates (re.none() -> top, error message format updated).
@fabiomadge fabiomadge force-pushed the feat/core-sequence-wf branch from a875db7 to 8325c0e Compare May 11, 2026 09:55
@MikaelMayer
Copy link
Copy Markdown
Contributor

🤖 Autopilot gave up.
Stuck on review_verify after 2 attempts with no progress.

@fabiomadge fabiomadge requested review from MikaelMayer and joscoh May 11, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants