Skip to content

Derive#9704

Open
NikolajBjorner wants to merge 22 commits into
masterfrom
derive
Open

Derive#9704
NikolajBjorner wants to merge 22 commits into
masterfrom
derive

Conversation

@NikolajBjorner
Copy link
Copy Markdown
Contributor

revise symbolic derivation routine

NikolajBjorner and others added 3 commits June 2, 2026 14:53
Implement a new seq::derive class (seq_derive.h/cpp) that computes
symbolic derivatives of regular expressions using ITE-trees, based on
the RE# approach (Varatalu, Veanes, Ernits - POPL 2025).

Key features:
- Two-argument operator()(ele, r): computes derivative of regex r w.r.t.
  element ele (concrete character or de Bruijn variable for symbolic mode)
- ACI canonicalization (flatten, stable_sort, dedup) for union/intersection
- ITE-tree combinators for binary/unary operations
- Info-based nullability with recursive fallback
- Complement absorption rules
- Depth-bounded recursion to prevent stack overflow

Integration with seq_rewriter:
- mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive
- Removed dead mk_derivative_rec function
- Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0,
  mk_re_inter0, mk_re_complement
- Added depth limiting in Antimirov derivative helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
NikolajBjorner and others added 3 commits June 3, 2026 11:32
- Add simplify_ite post-processing in operator() to simplify ITE conditions
- Add simplify_ite_rec(cond, sign, r) for propagating condition truth values
- Handles c == cond, x=ch1 vs x=ch2 with different constants
- Add eval(ele, d) for efficient two-phase: symbolic derivative + concrete eval
- mk_derivative uses two-phase pattern: m_derive(r) then m_derive.eval(ele, d)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_rewriter.cpp
Comment thread src/ast/rewriter/seq_rewriter.cpp
Comment thread src/ast/rewriter/seq_rewriter.cpp
NikolajBjorner and others added 2 commits June 3, 2026 15:29
Instead of treating reverse(r) as stuck (returning symbolic mk_derivative),
normalize it by pushing reverse inward through the regex structure, then
compute the derivative of the normalized result. Mirrors mk_re_reverse logic.

Handles: concat, union, intersection, diff, ite, opt, complement, star,
plus, loop, to_re (string literals, units, concats), and symmetric cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache now indexes by (ele, r) pair using obj_pair_map
- Remove eval() function; operator()(ele, r) handles all cases
- Rewrite simplify_ite_rec with path vector of signed conditions
- Add range-based simplification: (lo <= x, false) + (x <= hi, false)
  eliminates ite(x = v, t, e) when v is outside [lo, hi]
- Add is_itos case in derive_to_re: guards on n >= 0, digit range,
  and first character match
- Port is_reverse normalization (previous commit)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
}
}

// Try to evaluate c directly
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move simplification to the beginning of the function before checking if it is determined by the path

Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.h Outdated
@@ -0,0 +1,141 @@
/*++
Copyright (c) 2025 Microsoft Corporation
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is 2026

Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
@@ -0,0 +1,950 @@
/*++
Copyright (c) 2025 Microsoft Corporation
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is 2026

Comment thread src/ast/rewriter/seq_derive.cpp Outdated

Authors:

Nikolaj Bjorner (nbjorner) 2025-06-03
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is 2026

- Add lightweight structural is_subset for union/inter simplification
- Use m.is_value instead of is_const_char for swap checks
- Move eval_cond to beginning of simplify_ite_rec
- Use path.shrink(sz) instead of copying extended_path
- Fix normalize_reverse stuck case to return mk_reverse(r)
- Expose subsumes() in public API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
// Post-processing: simplify ITE conditions w.r.t. m_ele
// -------------------------------------------------------

bool derive::eval_cond(expr* cond, bool& result) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change the signature to be

lbool eval_cond(expr* cond)

so that l_undef corresponds to returning false, and l_true corresponds to result being set to true.

NikolajBjorner and others added 2 commits June 4, 2026 08:29
- Add push_path(path, c, sign) that decomposes conjuncts/disjuncts
- Add simplify_ite_rec(path, c, t, e) helper for cleaner recursion
- Change eval_cond signature to return lbool (l_undef = undetermined)
- Fix copyright year from 2025 to 2026

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add top-level cache (m_top_cache) to ensure stable AST node identity
  across repeated derivative calls, preventing state graph divergence
- Add get_head_tail helper for derive_to_re with str.is_unit/str.is_concat
- Add ITE hoisting in mk_union/mk_inter to keep ITEs at top level
- Add De Morgan rule in mk_complement: ~(A∪B) → ~A ∩ ~B
- Add ~ε → .+ simplification in mk_complement
- Add prefix factoring: a·x ∪ a·y = a·(x∪y) and a·x ∩ a·y = a·(x∩y)
- Add r* ∩ .+ = r+ special case in mk_inter
- Enhance is_subset with union/intersection distributivity and complement
- Remove De Morgan from mk_inter to prevent infinite recursion loop

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
}
}
}
if (has_lo && has_hi && lo_bound <= v_val && v_val <= hi_bound) {
Copy link
Copy Markdown
Contributor Author

@NikolajBjorner NikolajBjorner Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this code. It can be removed. The same code occurs below.
So break out of the loop.

Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp
expr_ref mk_deriv_concat(expr* d, expr* tail);

// Extract head character and tail from a sequence expression
bool get_head_tail(expr* s1, expr* s2, expr_ref& hd, expr_ref& tl);
Copy link
Copy Markdown
Contributor Author

@NikolajBjorner NikolajBjorner Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function repeated elsewhere? Can you just use a single definition?

- Simplify trivial range bounds in derive_range: when lo=0, omit
  the lo<=x condition; when hi=max_char, omit the x<=hi condition.
  Full charset ranges return epsilon directly.

- Add char_le(0,x)=true and char_le(x,max)=true to eval_cond for
  always-valid bounds.

- Add range implication logic to simplify_ite_rec: when path has
  negated/positive char_le constraints, detect implied or contradicted
  char_le conditions (e.g., ¬(x<=127) implies 128<=x).

- Add is_subset(a, .+) check: non-nullable regexes are subsets of .+

- In update_state_graph, skip recursive exploration of nullable targets
  to avoid state explosion.

These fixes resolve timeouts on 5724 (all problems), 5721 P1, and 5693.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NikolajBjorner and others added 2 commits June 4, 2026 16:59
Introduce exclusion intervals alongside the existing path-based condition
tracking in simplify_ite_rec. The intervals track which character values
are still possible at each point in the ITE tree, enabling simplification
of nested range conditions that the per-entry path approach cannot handle.

Key additions:
- intervals_t type and push_intervals() to maintain live character ranges
- eval_range_cond() checks AND-of-char_le conditions against intervals
- intersect_intervals/exclude_interval utilities from seq_rewriter pattern
- Negated AND handling: ¬(lo<=x ∧ x<=hi) excludes [lo,hi] from intervals

The interval check runs before the existing eval_path_cond logic, catching
cases like: if(0<=x<=10, t, if(1<=x<=8, t2, e2)) → if(0<=x<=10, t, e2)
where the inner range [1,8] is fully contained in the excluded outer range.

Fixes remaining regression timeouts on 5728 P2 and 5731 P4.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
if (m.is_true(cond)) return l_true;
if (m.is_false(cond)) return l_false;

// elem = char or char = elem
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use instead is_char_const_range

Comment thread src/ast/rewriter/seq_derive.cpp Outdated
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
// not(e1)
if (m.is_not(cond, e1)) {
lbool inner = eval_cond(e1);
if (inner != l_undef)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the ~ operator on l_bool work for this?

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp
Comment thread src/ast/rewriter/seq_derive.cpp Outdated
}

// ITE hoisting with depth bound (fallback when pred_implies doesn't fire)
if (m_inter_hoist_depth < m_max_inter_hoist_depth) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this code, m_max_inter_hoist_depth is effectively 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant