Tool-Assisted Proof Patterns

Use this for hard proofs where computation, CAS, SMT, optimization solvers, or Lean may help. The goal is not to "ask tools for a proof"; it is to turn tool output into a smaller human-checkable lemma, certificate, counterexample, or theorem repair.

Map

Imported patterns and tool roles: choose the backend by expected artifact.
Tool plan and default loop: refute, shrink, run the narrowest query, translate.
Artifact translation and domain recipes: convert output into a proof step or repair.
Stop rules: avoid trusting numerical evidence or repeated timeouts.

Imported Patterns

Lean/Goedel blueprint pattern: maintain a human-readable proof map and a machine-checkable or reviewer-checkable dependency graph. Each node has a statement, statement dependencies, proof dependencies, downstream use, status, and failure diagnosis. Prove or check ready leaves on the current theorem path before side lemmas unless explicitly rewiring the graph.
Cost-aware tool routing: after repeated failed tool or Lean attempts, decide whether the next run has positive value. Continue only if the expected artifact is new: counterexample, exact condition, certificate, smaller subgoal, retrieved premise, or theorem repair.
Flyspeck pattern: combine a human proof with formal verification and external computations only when every computational step has a checked certificate or proof object.
CAS refutation pattern: before simplifying a theorem, ask for a counterexample to the theorem's negation under explicit domains.
Certificate pattern: prefer certificates that can be independently checked: dual variables, KKT conditions, exact rational identities, interval bounds, SMT models/unsat cores, or Lean proof terms.
Premise-retrieval pattern: when a lemma feels standard, search for the nearest theorem pattern or library lemma before inventing a new proof.
Repair-loop pattern: when a tool rejects a step, preserve the failed subgoal, extract the smallest missing lemma, then retry only that lemma.
Sorrify/localize pattern: when a skeleton is promising, replace the failed block with a named lemma, keep the good dependencies, and repair only that lemma.
Compact-feedback pattern: after a tool or Lean failure, keep the next repair input small: node statement, assumptions, dependencies, previous attempt, feedback, and proposed fix.

Tool Roles

Wolfram: symbolic simplification, quantifier elimination, condition discovery, inequality checks, FOCs/KKT algebra, counterexample search with FindInstance.
Python/SymPy/flint/gmpy2: exact rational algebra, polynomial coefficient checks, finite stress tests, high-precision sanity checks.
Z3/cvc5/PySAT: finite/integer/Boolean counterexamples, IC/IR feasibility, small-model search, discrete impossibility.
CVXPy/OR-Tools: primal/dual optimization sanity checks, finite LP/MIP certificates, policy/capacity/matching counterexamples.
Sage: exact combinatorics, graph/algebra/finite-field/group computations.
Lean/mathlib: local formalization of stable lemmas once the informal statement is precise.
Empirical/simulation tools: falsification and sanity checks only; they do not prove universal claims.

Tool Plan Template

For each proof-critical step, write:

lemma or claim fragment:
assumptions/domains:
negation to test:
backend:
query or script:
expected artifact: counterexample, True, conditions, dual certificate, unsat result, exact identity, Lean proof.
how the artifact becomes a proof step:
failure interpretation:
compact repair state if the step fails:

If the expected artifact is only "numerical evidence," the step is not proof-ready.

Default Tool Loop

Normalize the claim: remove ambiguous notation, name domains, list boundary cases.
Refute first: test the negation in the smallest scalar, finite, boundary, or relaxed-assumption model.
Guess a pattern if needed: use exact small cases, active sets, coefficient sequences, symbolic simplification, or optimized witnesses to infer a construction, invariant, or algebra normal form. If the evidence is a sequence, run scripts/pattern_miner.py before treating the guess as a candidate lemma.
Hold out a case: test at least one small case that was not used to guess the pattern.
Shrink the theorem into one or more tool-checkable lemmas.
Pick artifact type before running the tool.
Run the narrowest query: avoid global simplification on a full theorem. For multiple candidate lemmas, try the hardest boundary case or the most certificate-friendly subgoal first.
Translate tool output into a lemma, missing assumption, counterexample, or certificate.
Independently check proof-critical artifacts when feasible: symbolic plus numeric, CAS plus exact arithmetic, LP solution plus dual certificate, informal proof plus Lean.
Record command, assumptions, result, and translation in TOOL_PLAN.md and LEDGER.md.
Local repair if the check fails: preserve any proved parent nodes, isolate the first false or too-hard block, and retry only after adding a new parent, repaired statement, counterexample, theorem pattern, or certificate.

Artifact Translation

FindInstance returns an instance: the theorem is false under current assumptions.
FindInstance returns {}: no counterexample found in that theory/search scope; not a proof unless the backend solved the quantified formula completely.
Reduce/Resolve gives conditions: these are candidate assumptions or case splits, not prose by themselves.
FullSimplify[..., assumptions] == True: a proof-critical algebraic step is strongly checked under those assumptions; still state the algebraic lemma.
Numerical optimizer returns a solution: use it to guess active constraints, then prove KKT/duality/convexity or construct an exact certificate.
LP/MIP/SMT feasible model: use it as a counterexample or witness.
LP dual variables or Bellman inequalities: convert to a certificate that a human can verify line by line.
Lean accepts a lemma: status becomes formalized-local; still explain how the lemma fits the paper proof.
Lean fails: inspect the failed subgoal; often it reveals a missing assumption, coercion/domain issue, or too-strong statement.
Tool returns a counterexample to a sublemma: mark the node false-negated; repair/drop it and rewire dependents instead of regenerating the whole proof.
Tool times out or gives no useful artifact: classify the gap. If the missing lemma is a bad gap, switch to discovery/retrieval before more tool calls.

Domain Recipes

Optimization / OR/MS

Use tools to guess active sets, simplify KKT conditions, and search boundary failures.
Final proof needs convexity/concavity, constraint qualification, feasibility, and a primal-dual/KKT certificate.
For nonconvex problems, numerical optima are only conjecture generators unless paired with a global argument.

Dynamic Programming / MDP

Use finite-state Python/LP to test Bellman inequalities and policy structure.
Use Wolfram for Q-value difference signs and threshold boundaries.
Final proof needs Bellman equality for the candidate policy and Bellman inequality for deviations, with boundary/tie cases.

Mechanism Design / Econ

Use finite type grids and LP/Z3 for IC/IR feasibility and deviation search.
Use Wolfram for envelope derivatives, virtual values, and payoff differences.
Final proof needs monotonicity/cyclic monotonicity, payment construction, and boundary IR.

Learning Theory / Bandits

Use Python to stress-test finite horizons and constants; use Wolfram/SymPy for summation and tuning.
Use Lean only for stable local inequalities or algebraic lemmas.
Final proof needs a named good event, failure probability, instantaneous bound, summation/telescope, and exact quantifier type.

Lower Bounds

Use tools to tune hard instances and KL/TV expressions.
Final proof needs feasible instances, closeness bound, separation bound, and testing-to-risk transfer.

Stop Rules

If a tool query times out twice, shrink the lemma or change artifact type.
If two tools disagree, inspect domains, numeric precision, branch cuts, and boundary assumptions before trusting either.
If the same missing assumption appears in two tool outputs, repair the theorem statement before drafting prose.
If no artifact can be translated into a proof step, report still open with the smallest remaining lemma.

Source Patterns

Lean/mathlib projects: blueprint, premise retrieval, local formalization, proof-state repair.
Liquid Tensor Experiment: human-readable blueprint linked to Lean formalization.
Flyspeck/Kepler: formal proof plus checked external computation and nonlinear inequality certification.
Wolfram theorem proving: FindInstance, Reduce, Resolve, and simplification under explicit assumptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool-Assisted Proof Patterns

Map

Imported Patterns

Tool Roles

Tool Plan Template

Default Tool Loop

Artifact Translation

Domain Recipes

Optimization / OR/MS

Dynamic Programming / MDP

Mechanism Design / Econ

Learning Theory / Bandits

Lower Bounds

Stop Rules

Source Patterns

FilesExpand file tree

tool-assisted-proof-patterns.md

Latest commit

History

tool-assisted-proof-patterns.md

File metadata and controls

Tool-Assisted Proof Patterns

Map

Imported Patterns

Tool Roles

Tool Plan Template

Default Tool Loop

Artifact Translation

Domain Recipes

Optimization / OR/MS

Dynamic Programming / MDP

Mechanism Design / Econ

Learning Theory / Bandits

Lower Bounds

Stop Rules

Source Patterns