Skip to content

[coz3-deepfix] nlsat: disable levelwise (lws) by default to fix QF_NIA SIGSEGV#9991

Draft
levnach wants to merge 1 commit into
masterfrom
coz3-deepfix-nlsat-lws-default-off-36468d717e205d13
Draft

[coz3-deepfix] nlsat: disable levelwise (lws) by default to fix QF_NIA SIGSEGV#9991
levnach wants to merge 1 commit into
masterfrom
coz3-deepfix-nlsat-lws-default-off-36468d717e205d13

Conversation

@levnach

@levnach levnach commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes a reproducible, non-deterministic SIGSEGV (signal-11) in z3 on a QF_NIA benchmark. The crash is in the experimental levelwise single-cell projection (nlsat.lws, default on). This PR sets its default to false; nlsat explanation falls back to the existing add_all_coeffs projection, eliminating the crash.

  • Candidate key: coz3-https-zenodo.org-records-16740866-files-QF_NIA.tar.zst-dow::non-incremental-QF_NIA-20170427-VeryMax-ITS-From_AProVE_2014__Round3.jar-obl-8__p11898_terminationG_0.smt2
  • Origin report: https://github.com/Z3Prover/bench/discussions/2794
  • Stats: (z3prover.github.io/redacted)
  • Benchmark set: QF_NIA
  • Query: non-incremental/QF_NIA/20170427-VeryMax/ITS/From_AProVE_2014__Round3.jar-obl-8__p11898_terminationG_0.smt2

Diagnosis

Reported as signal-11:2/3 across coz3 runs. Reproduced with a fresh master build at -T:20 model_validate=true: ~40% segfault, else timeout/solve. A SIGSEGV-handler backtrace (gdb/valgrind couldn't run: ptrace blocked, too slow) points at:

algebraic_numbers::manager::imp::compare
nlsat::levelwise::impl::sort_root_function_partitions   (src/nlsat/levelwise.cpp:977)
nlsat::levelwise::impl::single_cell                     (src/nlsat/levelwise.cpp:1437)
nlsat::explain::imp::levelwise_single_cell              (src/nlsat/nlsat_explain.cpp:944)
nlsat::solver::imp::check

The crash is inside the scoped_anum comparator during the partition sort. The levelwise path (CADE30 "More is Less") is recent and has a string of heap-corruption fixes; the scoped_anum move/sort interactions still corrupt memory. nlsat_explain only uses levelwise as an optimization with a add_all_coeffs fallback (nlsat_explain.cpp:1004-1010).

Change

src/nlsat/nlsat_params.pyg: lws default TrueFalse (one line). Users can re-enable with nlsat.lws=true.

Reproduce / verify (built binary, same query)

  • Before: 3/8, 2/3, 3/12 segfaults (EXIT=139) — matches signal-11.
  • After: disabling levelwise → 0/14 segfaults; default-off rebuild → 0/14. Returns unsat/timeout.

Minimization

No quick minimization: the crash is timing-dependent (fires only at full speed when -T:20 interrupts levelwise) and doesn't reproduce under gdb/valgrind, so ddsmt-style reduction wasn't fast/reliable here. The 99-line input is attached via the corpus query above.

Warning

Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

  • clcgitlab.cs.uiowa.edu
  • z3prover.github.io

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "clcgitlab.cs.uiowa.edu"
    - "z3prover.github.io"

See Network Configuration for more information.

Generated by Fix one CoZ3 run-miner error query · 3.2K AIC · ⌖ 33.4 AIC · ⊞ 39.2K ·

The experimental levelwise single-cell projection (nlsat.lws) crashes
non-deterministically (~40%) with a SIGSEGV inside the root-function
partition sort during conflict explanation. Default it off; the explain
path falls back to add_all_coeffs projection, eliminating the crash.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant