Skip to content

MCSAT: L2O#635

Open
ahmed-irfan wants to merge 131 commits into
SRI-CSL:masterfrom
elipparini:L2O_calc_update
Open

MCSAT: L2O#635
ahmed-irfan wants to merge 131 commits into
SRI-CSL:masterfrom
elipparini:L2O_calc_update

Conversation

@ahmed-irfan
Copy link
Copy Markdown
Member

@ahmed-irfan ahmed-irfan commented Jun 1, 2026

Work done by

Ovascos and others added 27 commits August 20, 2025 14:56
Expose the l2o (learn-to-optimize) feature through a command-line/SMT2
option and gate when it runs relative to the target-cache recaching
scheme.

- Add a `bool l2o` mcsat option, wired as `--mcsat-l2o` /
  `:yices-mcsat-l2o` and auto-enabled for QF_NIA.
- In the recache handler, when l2o is enabled drop the recache interval
  300->50 and alternate: every even recache runs l2o_run() and clears
  the target cache (keeping the best cache), odd recaches fall back to
  trail_recache(count/2). Vary l2o's cache seeding so every 3rd run
  cold-starts.
- Add a keep_best parameter to trail_clear_extra_cache().
- Handle finite-field terms (ARITH_FF_CONSTANT/ARITH_FF_POLY) in
  collect_free_vars so l2o can collect and then gracefully reject
  FF assertions instead of crashing.
The ff_plugin was no longer registered in mcsat_add_plugins (the
ff_plugin_id field and the mcsat_add_plugin(..., ff_plugin_allocator,
...) call had been dropped on this branch). As a result no plugin owned
FF_TYPE / ARITH_FF_* terms, so finite-field constraints went
unenforced and QF_FFA problems could be reported sat when they are
unsat.

Restore the field and the registration line. Fixes the 4 failing
tests/regress/mcsat/ff regressions (now unsat = gold); verified no other
mcsat regression changes (differential over mcsat/{nra,bv,ff}: exactly
those 4 flip to passing, zero new failures).
Harden the l2o evaluator/search against crashes:

- Add l2o_term_is_supported(): walk each assertion and reject (skip
  l2o) any whose subterm kinds the cost function / evaluator can't
  handle. Checked before collect_free_vars, so neither collect_free_vars
  (get_composite NULL-deref) nor the evaluator (default-case longjmp into
  an already-returned setjmp frame) is reached for unsupported kinds.
- Drop the input-sized stack VLAs: the evaluator's POWER_PRODUCT and
  ARITH_POLY cases now re-fetch already-memoized values instead of a
  vars_eval[n] array; hill_climbing's step_size[n_var] is heap-allocated.
  These could overflow the stack on large polynomials / many variables.
- Route XOR_TERM through the unsupported path instead of silently
  storing INFINITY, and free eval_stack before the defensive longjmp.
A NaN cost (e.g. from Inf-Inf in a polynomial) made the search loop
guard `best_cost > 0.0` false, so hill climbing aborted immediately
without optimizing; an Inf/NaN seed could also leave did_improve unable
to accept any move.

- Loop guard `best_cost > 0.0` -> `!(best_cost <= 0.0)`: a non-finite
  cost is treated as unsatisfied so the search keeps running toward a
  finite cost.
- did_improve now only moves to a finite cost, accepting any finite
  value when the current best is non-finite (escape a bad seed) and a
  strict improvement otherwise. Behavior for finite costs is unchanged.
Equality/disequality atoms were decided with bit-exact == / != on
doubles produced by lossy q_get_double / pow / division, so terms that
are mathematically equal could compare unequal (and vice versa),
mis-scoring an assignment.

Add an L2O_APPROX_EQ(x,y) macro (|x-y| within a magnitude-scaled
tolerance) and use it in the evaluator (ARITH_EQ_ATOM, EQ_TERM,
ARITH_BINEQ_ATOM) and in the disequality branches of l2o_calculate.
Cheap branchless ops, so the hot evaluation loop stays fast. Also
document that integer div/mod stays in double on purpose (exact within
2^53) for speed. Affects only the cost/hint, not soundness.
l2o_evaluator_run_term runs per candidate move during hill climbing.

- Hoist the trace-tag lookup: trace_enabled(...) was called on essentially
  every term visited (36 sites); compute it once into a local bool.
- Stop computing term_type_kind eagerly each iteration; inline it at its
  only (trace/assert) uses, so release builds don't compute it at all.
- Skip the libm pow() call for the common small exponents in POWER_PRODUCT
  (exp 1 -> x, exp 2 -> x*x).

Behavior-preserving (results only feed the hint); no test regression.
The evaluator's eval_map/eval_cache were double_hmap hash maps keyed by
term id, but term ids are dense small integers so every lookup paid a
hash. Replace them with term_double_map_t: a flat array indexed by the
term id with a tracked "marked" set for O(#entries) reset -- the same
sparse-array-map scheme as utils/tag_map / utils/mark_vectors. Lookups
are now direct indexing, no hashing. Growth mirrors tag_map (with an
extra size_t-overflow guard for the double allocation).

Behavior-preserving: these maps were only ever used as term_id -> double
(no key readback, no iteration). Verified identical regression output on
mcsat/{nra,bv,ff}, and under a debug build l2o runs (up to 504 evaluator
calls) exercise the new map with the consistency asserts active and no
failures.
Record the parked Tier 1 perf idea (recompute only assertions touching
the changed variable) with its design sketch and the two caveats:
interaction with the evaluator eval_cache swap, and that the regression
diff can't catch a wrong delta (l2o only emits a hint) so it needs a
debug-only self-check assert. Benchmark before adopting.
double_hash_map (originally eval_hash_map, "adapted from int_hash_map")
was added by the L2O module solely as the evaluator's term_id -> double
value cache. The Tier 2 change replaced that with term_double_map, so the
type now has no users. Delete it: the .c/.h, its src/Makefile entry, and
the stale include in l2o.h. ~575 lines of dead, branch-introduced code.

Verified: builds clean and the mcsat/{nra,bv,ff} regression set is
unchanged.
Branch-introduced dead code with zero references anywhere (and not used
by the FF code or anything else):
- int_hset_close_and_sort (+ its now-unused int_array_sort.h include);
  kept int_hset_close and the is_closed debug guard, which are live.
- the #if 0 iteration/remove/iterate scaffolding in int_hash_mmap
  (.c definitions and the matching .h declarations).

Builds clean; mcsat/{nra,bv,ff} regression set unchanged.
These scratch files under examples/l2o/ were added in the original
"Add L2O module" commit and never referenced by any test harness,
build rule, or script.
Added by the L2O branch to int_vectors.h but never called anywhere.
get_composite now lives in terms.c and is used by both the
preprocessor and l2o; the extraction note is done.
An empty libpoly feasibility set means the variable has no value
consistent with the current trail -- a conflict the next propagation
will report (l2o_run executes before it). l2o_search_state_create now
detects this and skips the run instead of seeding a hint from a doomed
assignment. Previously it passed the empty set to
lp_feasibility_set_pick_value, whose emptiness assert is compiled out
in release libpoly, causing a NULL deref (SIGSEGV) on many QF_NRA
benchmarks (e.g. 20180501-Economics-Mulligan).

Also initialize the search state before the n_var==0 early return so
every early-return path leaves a valid (skippable) state.
l2o_set_hint converts each variable's value to a rational via
double_to_mcsat_value -> mpq_set_d, which is undefined for NaN/Inf.
Hill climbing should not produce a non-finite value (the cost functions
saturate at finite points), but guard it anyway: skip the hint for that
variable instead of feeding mpq_set_d a non-finite double. The hint is
advisory, so leaving one variable unset is safe.
double_to_mcsat_value converted a hill-climbing double to a rational
with an exact mpq_set_d, so e.g. 0.1 became a ~2^52 fraction that
libpoly then carried (and amplified by polynomial degree) through exact
arithmetic. Approximate instead by the simplest p/q with q <= 1024 using
continued-fraction convergents: simple values are recovered exactly
(0.1 -> 1/10, 2.4 -> 12/5, integers -> n/1), others are bounded. The
hint is advisory and the search is O(log cap) of double/int64 ops, so
the approximation is harmless and cheap. Large-magnitude (>= 2^52,
already integral) values use the exact conversion; non-finite values
are rejected defensively rather than reaching mpq_set_d.
No regression test ran with --mcsat-l2o, so the recent l2o crash fixes
were unguarded. Add two QF_NRA benchmarks (from SMT-LIB
20180501-Economics-Mulligan) with .options enabling --mcsat-l2o:

- MulliganEconomicsModel0064c (unsat): l2o_run hits a variable with an
  empty feasible set and bails; SIGSEGV'd before the empty-set fix.
- MulliganEconomicsModel0055a (sat): l2o hill-climbing runs (~2900 eval
  steps) and sets hints, exercising the value->rational conversion.
Match the smtcomp2025 branch's recache heuristic: drop the extra
non-base-level recache trigger at the top of the search loop, leaving
only the base-level-gated trigger after propagation. The recache
intervals and every-other-recache l2o split were already identical.
Change the cold-start cadence so the first two l2o runs after a recache
reuse the cached (target/best) assignment as the hill-climbing start,
and only every third run cold-starts. Previously the very first l2o run
cold-started, discarding the warm cache already populated during search.
clang under -std=gnu23 with -Werror flags a declaration immediately
following the case label (-Wc23-extensions). Wrap the case body in
braces so the declaration sits inside a compound statement, matching
the other cases in l2o_calculate.
…alized

In release builds (-O2/-O3, NDEBUG) the asserts that guarantee a map
hit are compiled out, so gcc sees term_double_map_find's out-param 'v'
as possibly-unwritten in evaluator_get and evaluator_get_cache. These
are static inline, so the warning propagated to every call site. The
caller always guarantees a hit; initialize 'v' to 0.0 to satisfy the
analysis without changing behavior.
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 68.804% (+0.01%) from 68.793% — elipparini:L2O_calc_update into SRI-CSL:master

@ahmed-irfan ahmed-irfan requested review from Ovascos and disteph June 1, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants