SYNTHESIZER: Add enumerative loop invariant synthesizer #7430

qinheping · 2022-12-12T16:59:45Z

Dependent on #7429.
This PR is the same as in #7393. But add the enumerative synthesizer into goto-synthesizer instead of goto-instrument.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Implement the functionality described below.

Motivation

This loop invariant synthesizer use the idea counter-example-guided synthesis (CEGIS) to synthesize loop invariants for programs with only checks instrumented by goto-instrument with flag --pointer-check.

This PR contain the driver of the synthesizer and the verifier we use to check invariant candidates.

Verifier

The verifier take as input a goto program with pointer checks and a map from loop id to loop invariant candidates. It first annotate and apply the loop invariant candidates into the goto model; and then simulate the CBMC api to verify the instrumented goto program. If there are some violations---loop invariants are not inductive, or some pointer checks fail---, it record valuation from trace generated by the back end to construct a formatted counterexample cext.

Counterexample

A counterexample cext record valuations of variables in the trace, and other information about the violation. The valuation we record including

set of live variables upon the entry of the loop.
the havoced value of all primitive-typed variables;
the havoced offset and the object size of all pointer-typed variables;
history values of 2 and 3.

The valuations will be used as true positive (history values) and true negative (havcoed valuation) to filter out bad invariant clause with the idea of the Daikon invariant detector in a following PR. However, in this PR we only construct the valuation but not actually use them.

Synthesizer

Loop invariants we synthesize are of the form
(in_clause || !guard) && (!guard -> pos_clause)
where in_clause and out_clause are predicates we store in two different map, and guard is the loop guard. The idea is that we separately synthesize the condition in_clause that should hold before the execution of the loop body, and condition pos_clause that should hold as post-condition of the loop.

When the violation happen in the loop, we update in_clause. When the violation happen after the loop, we update pos_clause. When the invariant candidate it not inductive, we enumerate strengthening clause to make it inductive.

To be more efficient, we choose different synthesis strategy for different type of violation

For out-of-boundary violation, we choose to use the violated predicate as the new clause, which is the WLP of the violation if the violation is only dependent on the havocing instruction. TODO: to make it more complete, we need to implement a WLP algorithm
For null-pointer violation, we choose __CPROVER_same_object(ptr, __CPROVER_loop_entry(ptr)) as the new clause. That is, the havoced pointer should points to the same object at the start of every iteration of the loop. It is a heuristic choice. This can be extended with the idea of alias analysis if needed.
For invariant-not-preserved violation, we enumerate strengthening clauses and check that if the invariant will be inductive after strengthening (disjunction with the new clause).

The synthesizer works as follow

initialize the two invariant maps,
verify the current candidates built from the two maps,
_a. return the candidate if there is no violation
_b. synthesize a new clause to resolve the first violation and add it to the correct map,
repeat 2.

codecov · 2022-12-12T18:17:47Z

Codecov Report

Base: 78.38% // Head: 78.46% // Increases project coverage by +0.08% 🎉

Coverage data is based on head (6827ce5) compared to base (2a745f8).
Patch coverage: 88.79% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7430      +/-   ##
===========================================
+ Coverage    78.38%   78.46%   +0.08%     
===========================================
  Files         1659     1662       +3     
  Lines       190372   190904     +532     
===========================================
+ Hits        149217   149794     +577     
+ Misses       41155    41110      -45

Impacted Files	Coverage Δ
src/goto-checker/solver_factory.h	`100.00% <ø> (ø)`
src/goto-instrument/contracts/utils.h	`100.00% <ø> (ø)`
src/goto-programs/loop_ids.h	`100.00% <ø> (ø)`
...nthesizer/enumerative_loop_contracts_synthesizer.h	`100.00% <ø> (ø)`
src/solvers/smt2_incremental/smt_solver_process.h	`100.00% <ø> (ø)`
src/goto-synthesizer/expr_enumerator.cpp	`74.54% <50.00%> (-0.98%)`	⬇️
src/goto-checker/solver_factory.cpp	`82.01% <80.00%> (ø)`
src/ansi-c/c_typecheck_base.cpp	`83.07% <85.71%> (-0.04%)`	⬇️
src/goto-synthesizer/cegis_verifier.cpp	`86.00% <86.00%> (ø)`
...hesizer/enumerative_loop_contracts_synthesizer.cpp	`91.77% <92.36%> (+4.27%)`	⬆️
... and 101 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jimgrundy · 2022-12-13T16:29:09Z

Does this depend on PR7127 getting merged?

qinheping · 2022-12-13T16:31:10Z

Does this depend on PR7127 getting merged?

No. It only depend on #7429 which has been merged.

regression/goto-synthesizer/chain.sh

regression/goto-synthesizer/loop_contracts_synthesis_02/test.desc

regression/goto-synthesizer/loop_contracts_synthesis_03/test.desc

src/analyses/dependence_graph.cpp

src/goto-synthesizer/cegis_verifier.cpp

tautschnig · 2022-12-14T08:04:35Z

src/goto-synthesizer/cegis_verifier.h

+  cext()
+  {
+  }


Safe to omit

Seems this still needs to be removed?

I tried to omit it. But failed to compile with error

/Users/qinhh/Repos/CBMC/clone_2/src/goto-synthesizer/cegis_verifier.h:90:3: error: constructor for 'cegis_verifiert' must explicitly initialize the member 'return_cex' which does not have a default constructor cegis_verifiert( ^ /Users/qinhh/Repos/CBMC/clone_2/src/goto-synthesizer/cegis_verifier.h:105:8: note: member is declared here cext return_cex; ^ /Users/qinhh/Repos/CBMC/clone_2/src/goto-synthesizer/cegis_verifier.h:25:7: note: 'cext' declared here class cext ^ 1 error generated.

Perhaps the use should be fixed? A default-constructed cext seems ill-formed. Perhaps that return_cex class member shouldn't exist?

Agreed, the return value of verify() is now optionalt<cext>---it return a counterexample if there is some violation.

src/goto-synthesizer/enumerative_loop_contracts_synthesizer.h

src/goto-synthesizer/expr_enumerator.cpp

src/goto-synthesizer/cegis_verifier.cpp

kroening · 2022-12-15T18:43:02Z

src/goto-programs/goto_program.h

+    /// We record the original loop number for some instrumended instructions
+    /// so that we know which loop they are transformed from.
+    optionalt<unsigned> original_loop_number;
+


Please drop, this is not needed, and costs extra memory.

Sure. I plan to instead define ID_original_loop_number and store the original loop number in the field ID_original_loop_number of the lhs exprt of instrumented assignments such as loop-havoc. Does it sound OK to you?

No, please use a map.

Thank you, I will store the map from instrumented instructions to their original loop number in code_contractst. And the functions I used to decided whether an instruction is a loop-havoc or instrumented instruction will be no longer needed with such map.

tautschnig

Adding a blocker just to make sure we don't end up inadvertently merging this: must not be merged until the change in goto_program.h that was added after my earlier approval is removed (cf. conversation with Daniel).

tautschnig · 2022-12-16T08:27:07Z

src/ansi-c/compiler_headers/arm_builtin_headers.inc

+"// clang-format off\n"
+"void __breakpoint(int val);\n"
+"void __cdp(unsigned int coproc, unsigned int ops, unsigned int regs);\n"
+"void __clrex(void);\n"
+"unsigned char __clz(unsigned int val);\n"
+"unsigned int __current_pc(void);\n"


Something has gone wrong here, this is an auto-generated file. (There are a few others of this kind added in the latest commit.)

qinheping · 2022-12-16T09:20:14Z

Adding a blocker just to make sure we don't end up inadvertently merging this: must not be merged until the change in goto_program.h that was added after my earlier approval is removed (cf. conversation with Daniel).

I added another commit to remove undo the changes in goto_program.h. I use a map to store the original loop numbers for some instrumented instructions in code_contractst.
@kroening

kroening · 2022-12-16T11:57:45Z

I added another commit to remove undo the changes in goto_program.h. I use a map to store the original loop numbers for some instrumented instructions in code_contractst.

Please squash the commits!

martin-cs

Hi @qinheping , thanks for this; it is a really great piece of work and it is long-overdue that we have this capability. Given that, I feel a bit bad critiquing it. I have only looked at the code in analyses and goto-instrument (but not contracts).

As I understand it, the functionality you want for dependence-graphs is to be able to do reachability between two program locations. Is this correct? If so, dependence_grapht inherits from grapht so you should be able to use the graph functions directly. I would far rather we added an can_reach method to grapht than custom searching here because it decouples the computation of the dependence graph and the use of it. There is another set of dependency graph code analyses/variable-sensitivity/variable_sensitivity_dependence_graph.{h,cpp} which is more configurable and hopefully more precise, it would be nice to be able to switch between them if there are issues.

Perhaps I have misunderstood as your is_flow_dependent seems to consider control and then data, but doesn't then check control dependency again? If you want this behaviour then maybe use the graph visitor rather than recursing?

Also, non-blocking thoughts:

The commit message is great; can that make it in to the documentation?
I appreciate that in this case it is difficult but it would help reviewing in the future if things could be broken out a bit more.

src/goto-instrument/havoc_utils.cpp

martin-cs · 2022-12-16T16:09:16Z

src/goto-instrument/havoc_utils.cpp

@@ -62,4 +70,6 @@ void havoc_utilst::append_scalar_havoc_code_for_expr(
  side_effect_expr_nondett rhs(expr.type(), location);
  dest.add(goto_programt::make_assignment(
    code_assignt{expr, std::move(rhs), location}, location));
+  dest.instructions.back().source_location_nonconst().set_comment(
+    IS_LOOP_HAVOC);


I appreciate that you want to put some kind of tag in the program but this feels fragile and ad-hoc. Would it be possible to use a function call to end_of_loop_havoc_block or a LOCATION or even an OTHER instruction? Without knowing the wider context (hard to get from such a big commit) I can't really say what is best.

I removed is_loop_havoc function and related codes.
Instead, I changed the structure of transformed loop from

havoc(write_set)

to

DECL __is_in_loop_havoc_block ASSIGN __is_in_loop_havoc_block = true havoc(write_set) ASSIGN __is_in_loop_havoc_block = false

Now, after applying a loop contract, code_cotractst will scan instrumented instructions to find all assignments in loop-havoc blocks and store them into a set loop_havoc_set.

@tautschnig : We cannot decide whether an assignment is loop havoc from its instruction structure. So I change the structure of transformed loops to better identify loop havoc.

qinheping · 2022-12-18T05:41:42Z

Hi @qinheping , thanks for this; it is a really great piece of work and it is long-overdue that we have this capability. Given that, I feel a bit bad critiquing it. I have only looked at the code in analyses and goto-instrument (but not contracts).

As I understand it, the functionality you want for dependence-graphs is to be able to do reachability between two program locations. Is this correct? If so, dependence_grapht inherits from grapht so you should be able to use the graph functions directly. I would far rather we added an can_reach method to grapht than custom searching here because it decouples the computation of the dependence graph and the use of it. There is another set of dependency graph code analyses/variable-sensitivity/variable_sensitivity_dependence_graph.{h,cpp} which is more configurable and hopefully more precise, it would be nice to be able to switch between them if there are issues.

Perhaps I have misunderstood as your is_flow_dependent seems to consider control and then data, but doesn't then check control dependency again? If you want this behaviour then maybe use the graph visitor rather than recursing?

Also, non-blocking thoughts:
* The commit message is great; can that make it in to the documentation?

* I appreciate that in this case it is difficult but it would help reviewing in the future if things could be broken out a bit more.

Thank you! I removed all the changes to dependence_grapht and now use the get_reachable function instead.

Implement the functionality described below. Motivation --- This loop invariant synthesizer use the idea counter-example-guided synthesis (CEGIS) to synthesize loop invariants for programs with only checks instrumented by `goto-instrument` with flag `--pointer-check`. This PR contain the driver of the synthesizer and the verifier we use to check invariant candidates. Verifier --- The verifier take as input a goto program with pointer checks and a map from loop id to loop invariant candidates. It first annotate and apply the loop invariant candidates into the goto model; and then simulate the CBMC api to verify the instrumented goto program. If there are some violations---loop invariants are not inductive, or some pointer checks fail---, it record valuation from trace generated by the back end to construct a formatted counterexample `cext`. Counterexample --- A counterexample `cext` record valuations of variables in the trace, and other information about the violation. The valuation we record including 1. set of live variables upon the entry of the loop. 2. the havoced value of all primitive-typed variables; 3. the havoced offset and the object size of all pointer-typed variables; 4. history values of 2 and 3. The valuations will be used as true positive (history values) and true negative (havcoed valuation) to filter out bad invariant clause with the idea of the Daikon invariant detector in a following PR. However, in this PR we only construct the valuation but not actually use them. Synthesizer --- Loop invariants we synthesize are of the form `` (in_clause || !guard) && (!guard -> pos_clause)`` where `in_clause` and `out_clause` are predicates we store in two different map, and `guard` is the loop guard. The idea is that we separately synthesize the condition `in_clause` that should hold before the execution of the loop body, and condition `pos_clause` that should hold as post-condition of the loop. When the violation happen in the loop, we update `in_clause`. When the violation happen after the loop, we update `pos_clause`. When the invariant candidate it not inductive, we enumerate strengthening clause to make it inductive. To be more efficient, we choose different synthesis strategy for different type of violation * For out-of-boundary violation, we choose to use the violated predicate as the new clause, which is the WLP of the violation if the violation is only dependent on the havocing instruction. **TODO**: to make it more complete, we need to implement a WLP algorithm * For null-pointer violation, we choose `__CPROVER_same_object(ptr, __CPROVER_loop_entry(ptr))` as the new clause. That is, the havoced pointer should points to the same object at the start of every iteration of the loop. It is a heuristic choice. This can be extended with the idea of alias analysis if needed. * For invariant-not-preserved violation, we enumerate strengthening clauses and check that if the invariant will be inductive after strengthening (disjunction with the new clause). The synthesizer works as follow 1. initialize the two invariant maps, 2. verify the current candidates built from the two maps, _a. return the candidate if there is no violation _b. synthesize a new clause to resolve the **first** violation and add it to the correct map, 3. repeat 2.

martin-cs

@qinheping Thank you for the quick and accommodating changes. I am approving this although I don't think it touches anywhere I am a maintainer any more. Thanks for all your work.

qinheping requested review from tautschnig, feliperodri, remi-delmas-3000, martin-cs, chris-ryder, peterschrammel and kroening as code owners December 12, 2022 16:59

qinheping mentioned this pull request Dec 12, 2022

CONTRACTS: Add enumerative loop invariant synthesizer #7393

Closed

5 tasks

qinheping self-assigned this Dec 12, 2022

qinheping added aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts Synthesis Invariant synthesis labels Dec 12, 2022

qinheping force-pushed the goto-synthesizer branch from 1ee20cf to d046d11 Compare December 13, 2022 07:10

qinheping force-pushed the goto-synthesizer branch from d046d11 to f5629d6 Compare December 13, 2022 17:10

feliperodri assigned remi-delmas-3000, feliperodri and tautschnig and unassigned qinheping Dec 13, 2022

tautschnig reviewed Dec 14, 2022

View reviewed changes

tautschnig assigned qinheping and unassigned tautschnig Dec 14, 2022

qinheping force-pushed the goto-synthesizer branch 4 times, most recently from 096e2b4 to 706be24 Compare December 15, 2022 06:47

tautschnig reviewed Dec 15, 2022

View reviewed changes

src/goto-synthesizer/cegis_verifier.cpp Outdated Show resolved Hide resolved

tautschnig assigned martin-cs Dec 15, 2022

kroening reviewed Dec 15, 2022

View reviewed changes

qinheping force-pushed the goto-synthesizer branch 4 times, most recently from 4730446 to 2bc4a55 Compare December 15, 2022 23:38

tautschnig requested changes Dec 16, 2022

View reviewed changes

qinheping force-pushed the goto-synthesizer branch 5 times, most recently from e599b85 to 17579cc Compare December 16, 2022 06:57

tautschnig reviewed Dec 16, 2022

View reviewed changes

qinheping force-pushed the goto-synthesizer branch from a536e77 to a84a654 Compare December 16, 2022 09:13

qinheping requested a review from tautschnig December 16, 2022 09:20

qinheping force-pushed the goto-synthesizer branch from 06d6a89 to 0b52b56 Compare December 16, 2022 15:46

qinheping requested a review from kroening December 16, 2022 16:04

martin-cs requested changes Dec 16, 2022

View reviewed changes

qinheping force-pushed the goto-synthesizer branch 2 times, most recently from f7e1845 to 8df49ed Compare December 18, 2022 05:35

qinheping requested a review from martin-cs December 18, 2022 05:50

qinheping force-pushed the goto-synthesizer branch from 8df49ed to 5a7873d Compare December 18, 2022 20:03

qinheping force-pushed the goto-synthesizer branch from 5a7873d to 6827ce5 Compare December 18, 2022 20:06

martin-cs approved these changes Dec 19, 2022

View reviewed changes

tautschnig approved these changes Dec 19, 2022

View reviewed changes

tautschnig merged commit 50a795e into diffblue:develop Dec 19, 2022

SYNTHESIZER: Add enumerative loop invariant synthesizer #7430

SYNTHESIZER: Add enumerative loop invariant synthesizer #7430

Uh oh!

Conversation

qinheping commented Dec 12, 2022

Motivation

Verifier

Counterexample

Synthesizer

Uh oh!

codecov bot commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jimgrundy commented Dec 13, 2022

Uh oh!

qinheping commented Dec 13, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinheping Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tautschnig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinheping commented Dec 16, 2022

Uh oh!

kroening commented Dec 16, 2022

Uh oh!

martin-cs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinheping commented Dec 18, 2022

Uh oh!

martin-cs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Dec 12, 2022 •

edited

Loading

qinheping Dec 15, 2022 •

edited

Loading