Derive#9704
Conversation
Implement a new seq::derive class (seq_derive.h/cpp) that computes symbolic derivatives of regular expressions using ITE-trees, based on the RE# approach (Varatalu, Veanes, Ernits - POPL 2025). Key features: - Two-argument operator()(ele, r): computes derivative of regex r w.r.t. element ele (concrete character or de Bruijn variable for symbolic mode) - ACI canonicalization (flatten, stable_sort, dedup) for union/intersection - ITE-tree combinators for binary/unary operations - Info-based nullability with recursive fallback - Complement absorption rules - Depth-bounded recursion to prevent stack overflow Integration with seq_rewriter: - mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive - Removed dead mk_derivative_rec function - Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0, mk_re_inter0, mk_re_complement - Added depth limiting in Antimirov derivative helpers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add simplify_ite post-processing in operator() to simplify ITE conditions - Add simplify_ite_rec(cond, sign, r) for propagating condition truth values - Handles c == cond, x=ch1 vs x=ch2 with different constants - Add eval(ele, d) for efficient two-phase: symbolic derivative + concrete eval - mk_derivative uses two-phase pattern: m_derive(r) then m_derive.eval(ele, d) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of treating reverse(r) as stuck (returning symbolic mk_derivative), normalize it by pushing reverse inward through the regex structure, then compute the derivative of the normalized result. Mirrors mk_re_reverse logic. Handles: concat, union, intersection, diff, ite, opt, complement, star, plus, loop, to_re (string literals, units, concats), and symmetric cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache now indexes by (ele, r) pair using obj_pair_map - Remove eval() function; operator()(ele, r) handles all cases - Rewrite simplify_ite_rec with path vector of signed conditions - Add range-based simplification: (lo <= x, false) + (x <= hi, false) eliminates ite(x = v, t, e) when v is outside [lo, hi] - Add is_itos case in derive_to_re: guards on n >= 0, digit range, and first character match - Port is_reverse normalization (previous commit) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| } | ||
| } | ||
|
|
||
| // Try to evaluate c directly |
There was a problem hiding this comment.
move simplification to the beginning of the function before checking if it is determined by the path
| @@ -0,0 +1,141 @@ | |||
| /*++ | |||
| Copyright (c) 2025 Microsoft Corporation | |||
| @@ -0,0 +1,950 @@ | |||
| /*++ | |||
| Copyright (c) 2025 Microsoft Corporation | |||
|
|
||
| Authors: | ||
|
|
||
| Nikolaj Bjorner (nbjorner) 2025-06-03 |
- Add lightweight structural is_subset for union/inter simplification - Use m.is_value instead of is_const_char for swap checks - Move eval_cond to beginning of simplify_ite_rec - Use path.shrink(sz) instead of copying extended_path - Fix normalize_reverse stuck case to return mk_reverse(r) - Expose subsumes() in public API Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| // Post-processing: simplify ITE conditions w.r.t. m_ele | ||
| // ------------------------------------------------------- | ||
|
|
||
| bool derive::eval_cond(expr* cond, bool& result) { |
There was a problem hiding this comment.
can you change the signature to be
lbool eval_cond(expr* cond)
so that l_undef corresponds to returning false, and l_true corresponds to result being set to true.
- Add push_path(path, c, sign) that decomposes conjuncts/disjuncts - Add simplify_ite_rec(path, c, t, e) helper for cleaner recursion - Change eval_cond signature to return lbool (l_undef = undetermined) - Fix copyright year from 2025 to 2026 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add top-level cache (m_top_cache) to ensure stable AST node identity across repeated derivative calls, preventing state graph divergence - Add get_head_tail helper for derive_to_re with str.is_unit/str.is_concat - Add ITE hoisting in mk_union/mk_inter to keep ITEs at top level - Add De Morgan rule in mk_complement: ~(A∪B) → ~A ∩ ~B - Add ~ε → .+ simplification in mk_complement - Add prefix factoring: a·x ∪ a·y = a·(x∪y) and a·x ∩ a·y = a·(x∩y) - Add r* ∩ .+ = r+ special case in mk_inter - Enhance is_subset with union/intersection distributivity and complement - Remove De Morgan from mk_inter to prevent infinite recursion loop Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| } | ||
| } | ||
| } | ||
| if (has_lo && has_hi && lo_bound <= v_val && v_val <= hi_bound) { |
There was a problem hiding this comment.
we don't need this code. It can be removed. The same code occurs below.
So break out of the loop.
| expr_ref mk_deriv_concat(expr* d, expr* tail); | ||
|
|
||
| // Extract head character and tail from a sequence expression | ||
| bool get_head_tail(expr* s1, expr* s2, expr_ref& hd, expr_ref& tl); |
There was a problem hiding this comment.
is this function repeated elsewhere? Can you just use a single definition?
- Simplify trivial range bounds in derive_range: when lo=0, omit the lo<=x condition; when hi=max_char, omit the x<=hi condition. Full charset ranges return epsilon directly. - Add char_le(0,x)=true and char_le(x,max)=true to eval_cond for always-valid bounds. - Add range implication logic to simplify_ite_rec: when path has negated/positive char_le constraints, detect implied or contradicted char_le conditions (e.g., ¬(x<=127) implies 128<=x). - Add is_subset(a, .+) check: non-nullable regexes are subsets of .+ - In update_state_graph, skip recursive exploration of nullable targets to avoid state explosion. These fixes resolve timeouts on 5724 (all problems), 5721 P1, and 5693. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce exclusion intervals alongside the existing path-based condition tracking in simplify_ite_rec. The intervals track which character values are still possible at each point in the ITE tree, enabling simplification of nested range conditions that the per-entry path approach cannot handle. Key additions: - intervals_t type and push_intervals() to maintain live character ranges - eval_range_cond() checks AND-of-char_le conditions against intervals - intersect_intervals/exclude_interval utilities from seq_rewriter pattern - Negated AND handling: ¬(lo<=x ∧ x<=hi) excludes [lo,hi] from intervals The interval check runs before the existing eval_path_cond logic, catching cases like: if(0<=x<=10, t, if(1<=x<=8, t2, e2)) → if(0<=x<=10, t, e2) where the inner range [1,8] is fully contained in the excluded outer range. Fixes remaining regression timeouts on 5728 P2 and 5731 P4. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| if (m.is_true(cond)) return l_true; | ||
| if (m.is_false(cond)) return l_false; | ||
|
|
||
| // elem = char or char = elem |
There was a problem hiding this comment.
use instead is_char_const_range
| // not(e1) | ||
| if (m.is_not(cond, e1)) { | ||
| lbool inner = eval_cond(e1); | ||
| if (inner != l_undef) |
There was a problem hiding this comment.
does the ~ operator on l_bool work for this?
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
| } | ||
|
|
||
| // ITE hoisting with depth bound (fallback when pred_implies doesn't fire) | ||
| if (m_inter_hoist_depth < m_max_inter_hoist_depth) { |
There was a problem hiding this comment.
remove this code, m_max_inter_hoist_depth is effectively 0
revise symbolic derivation routine