Skip to content

SYNTHESIZER: Add enumerative loop invariant synthesizer #7430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 19, 2022

Conversation

qinheping
Copy link
Collaborator

Dependent on #7429.
This PR is the same as in #7393. But add the enumerative synthesizer into goto-synthesizer instead of goto-instrument.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Implement the functionality described below.

Motivation

This loop invariant synthesizer use the idea counter-example-guided synthesis (CEGIS) to synthesize loop invariants for programs with only checks instrumented by goto-instrument with flag --pointer-check.

This PR contain the driver of the synthesizer and the verifier we use to check invariant candidates.

Verifier

The verifier take as input a goto program with pointer checks and a map from loop id to loop invariant candidates. It first annotate and apply the loop invariant candidates into the goto model; and then simulate the CBMC api to verify the instrumented goto program. If there are some violations---loop invariants are not inductive, or some pointer checks fail---, it record valuation from trace generated by the back end to construct a formatted counterexample cext.

Counterexample

A counterexample cext record valuations of variables in the trace, and other information about the violation. The valuation we record including

  1. set of live variables upon the entry of the loop.
  2. the havoced value of all primitive-typed variables;
  3. the havoced offset and the object size of all pointer-typed variables;
  4. history values of 2 and 3.

The valuations will be used as true positive (history values) and true negative (havcoed valuation) to filter out bad invariant clause with the idea of the Daikon invariant detector in a following PR. However, in this PR we only construct the valuation but not actually use them.

Synthesizer

Loop invariants we synthesize are of the form
(in_clause || !guard) && (!guard -> pos_clause)
where in_clause and out_clause are predicates we store in two different map, and guard is the loop guard. The idea is that we separately synthesize the condition in_clause that should hold before the execution of the loop body, and condition pos_clause that should hold as post-condition of the loop.

When the violation happen in the loop, we update in_clause. When the violation happen after the loop, we update pos_clause. When the invariant candidate it not inductive, we enumerate strengthening clause to make it inductive.

To be more efficient, we choose different synthesis strategy for different type of violation

  • For out-of-boundary violation, we choose to use the violated predicate as the new clause, which is the WLP of the violation if the violation is only dependent on the havocing instruction. TODO: to make it more complete, we need to implement a WLP algorithm
  • For null-pointer violation, we choose __CPROVER_same_object(ptr, __CPROVER_loop_entry(ptr)) as the new clause. That is, the havoced pointer should points to the same object at the start of every iteration of the loop. It is a heuristic choice. This can be extended with the idea of alias analysis if needed.
  • For invariant-not-preserved violation, we enumerate strengthening clauses and check that if the invariant will be inductive after strengthening (disjunction with the new clause).

The synthesizer works as follow

  1. initialize the two invariant maps,
  2. verify the current candidates built from the two maps,
    _a. return the candidate if there is no violation
    _b. synthesize a new clause to resolve the first violation and add it to the correct map,
  3. repeat 2.

@qinheping qinheping self-assigned this Dec 12, 2022
@qinheping qinheping added aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts Synthesis Invariant synthesis labels Dec 12, 2022
@codecov
Copy link

codecov bot commented Dec 12, 2022

Codecov Report

Base: 78.38% // Head: 78.46% // Increases project coverage by +0.08% 🎉

Coverage data is based on head (6827ce5) compared to base (2a745f8).
Patch coverage: 88.79% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7430      +/-   ##
===========================================
+ Coverage    78.38%   78.46%   +0.08%     
===========================================
  Files         1659     1662       +3     
  Lines       190372   190904     +532     
===========================================
+ Hits        149217   149794     +577     
+ Misses       41155    41110      -45     
Impacted Files Coverage Δ
src/goto-checker/solver_factory.h 100.00% <ø> (ø)
src/goto-instrument/contracts/utils.h 100.00% <ø> (ø)
src/goto-programs/loop_ids.h 100.00% <ø> (ø)
...nthesizer/enumerative_loop_contracts_synthesizer.h 100.00% <ø> (ø)
src/solvers/smt2_incremental/smt_solver_process.h 100.00% <ø> (ø)
src/goto-synthesizer/expr_enumerator.cpp 74.54% <50.00%> (-0.98%) ⬇️
src/goto-checker/solver_factory.cpp 82.01% <80.00%> (ø)
src/ansi-c/c_typecheck_base.cpp 83.07% <85.71%> (-0.04%) ⬇️
src/goto-synthesizer/cegis_verifier.cpp 86.00% <86.00%> (ø)
...hesizer/enumerative_loop_contracts_synthesizer.cpp 91.77% <92.36%> (+4.27%) ⬆️
... and 101 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jimgrundy
Copy link
Collaborator

Does this depend on PR7127 getting merged?

@qinheping
Copy link
Collaborator Author

Does this depend on PR7127 getting merged?

No. It only depend on #7429 which has been merged.

Comment on lines 53 to 57
cext()
{
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe to omit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this still needs to be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to omit it. But failed to compile with error

/Users/qinhh/Repos/CBMC/clone_2/src/goto-synthesizer/cegis_verifier.h:90:3: error: constructor for 'cegis_verifiert' must explicitly initialize the member 'return_cex' which does not have a default constructor
  cegis_verifiert(
  ^
/Users/qinhh/Repos/CBMC/clone_2/src/goto-synthesizer/cegis_verifier.h:105:8: note: member is declared here
  cext return_cex;
       ^
/Users/qinhh/Repos/CBMC/clone_2/src/goto-synthesizer/cegis_verifier.h:25:7: note: 'cext' declared here
class cext
      ^
1 error generated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the use should be fixed? A default-constructed cext seems ill-formed. Perhaps that return_cex class member shouldn't exist?

Copy link
Collaborator Author

@qinheping qinheping Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the return value of verify() is now optionalt<cext>---it return a counterexample if there is some violation.

@tautschnig tautschnig assigned qinheping and unassigned tautschnig Dec 14, 2022
@qinheping qinheping force-pushed the goto-synthesizer branch 4 times, most recently from 096e2b4 to 706be24 Compare December 15, 2022 06:47
/// We record the original loop number for some instrumended instructions
/// so that we know which loop they are transformed from.
optionalt<unsigned> original_loop_number;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop, this is not needed, and costs extra memory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I plan to instead define ID_original_loop_number and store the original loop number in the field ID_original_loop_number of the lhs exprt of instrumented assignments such as loop-havoc. Does it sound OK to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, please use a map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I will store the map from instrumented instructions to their original loop number in code_contractst. And the functions I used to decided whether an instruction is a loop-havoc or instrumented instruction will be no longer needed with such map.

@qinheping qinheping force-pushed the goto-synthesizer branch 4 times, most recently from 4730446 to 2bc4a55 Compare December 15, 2022 23:38
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a blocker just to make sure we don't end up inadvertently merging this: must not be merged until the change in goto_program.h that was added after my earlier approval is removed (cf. conversation with Daniel).

@qinheping qinheping force-pushed the goto-synthesizer branch 5 times, most recently from e599b85 to 17579cc Compare December 16, 2022 06:57
Comment on lines 1 to 6
"// clang-format off\n"
"void __breakpoint(int val);\n"
"void __cdp(unsigned int coproc, unsigned int ops, unsigned int regs);\n"
"void __clrex(void);\n"
"unsigned char __clz(unsigned int val);\n"
"unsigned int __current_pc(void);\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something has gone wrong here, this is an auto-generated file. (There are a few others of this kind added in the latest commit.)

@qinheping
Copy link
Collaborator Author

Adding a blocker just to make sure we don't end up inadvertently merging this: must not be merged until the change in goto_program.h that was added after my earlier approval is removed (cf. conversation with Daniel).

I added another commit to remove undo the changes in goto_program.h. I use a map to store the original loop numbers for some instrumented instructions in code_contractst.
@kroening

@qinheping qinheping requested a review from tautschnig December 16, 2022 09:20
@kroening
Copy link
Member

I added another commit to remove undo the changes in goto_program.h. I use a map to store the original loop numbers for some instrumented instructions in code_contractst.

Please squash the commits!

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qinheping , thanks for this; it is a really great piece of work and it is long-overdue that we have this capability. Given that, I feel a bit bad critiquing it. I have only looked at the code in analyses and goto-instrument (but not contracts).

As I understand it, the functionality you want for dependence-graphs is to be able to do reachability between two program locations. Is this correct? If so, dependence_grapht inherits from grapht so you should be able to use the graph functions directly. I would far rather we added an can_reach method to grapht than custom searching here because it decouples the computation of the dependence graph and the use of it. There is another set of dependency graph code analyses/variable-sensitivity/variable_sensitivity_dependence_graph.{h,cpp} which is more configurable and hopefully more precise, it would be nice to be able to switch between them if there are issues.

Perhaps I have misunderstood as your is_flow_dependent seems to consider control and then data, but doesn't then check control dependency again? If you want this behaviour then maybe use the graph visitor rather than recursing?

Also, non-blocking thoughts:

  • The commit message is great; can that make it in to the documentation?
  • I appreciate that in this case it is difficult but it would help reviewing in the future if things could be broken out a bit more.

@@ -62,4 +70,6 @@ void havoc_utilst::append_scalar_havoc_code_for_expr(
side_effect_expr_nondett rhs(expr.type(), location);
dest.add(goto_programt::make_assignment(
code_assignt{expr, std::move(rhs), location}, location));
dest.instructions.back().source_location_nonconst().set_comment(
IS_LOOP_HAVOC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you want to put some kind of tag in the program but this feels fragile and ad-hoc. Would it be possible to use a function call to end_of_loop_havoc_block or a LOCATION or even an OTHER instruction? Without knowing the wider context (hard to get from such a big commit) I can't really say what is best.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed is_loop_havoc function and related codes.
Instead, I changed the structure of transformed loop from

havoc(write_set)

to

DECL __is_in_loop_havoc_block
ASSIGN __is_in_loop_havoc_block = true
havoc(write_set)
ASSIGN __is_in_loop_havoc_block = false

Now, after applying a loop contract, code_cotractst will scan instrumented instructions to find all assignments in loop-havoc blocks and store them into a set loop_havoc_set.

@tautschnig : We cannot decide whether an assignment is loop havoc from its instruction structure. So I change the structure of transformed loops to better identify loop havoc.

@qinheping qinheping force-pushed the goto-synthesizer branch 2 times, most recently from f7e1845 to 8df49ed Compare December 18, 2022 05:35
@qinheping
Copy link
Collaborator Author

Hi @qinheping , thanks for this; it is a really great piece of work and it is long-overdue that we have this capability. Given that, I feel a bit bad critiquing it. I have only looked at the code in analyses and goto-instrument (but not contracts).

As I understand it, the functionality you want for dependence-graphs is to be able to do reachability between two program locations. Is this correct? If so, dependence_grapht inherits from grapht so you should be able to use the graph functions directly. I would far rather we added an can_reach method to grapht than custom searching here because it decouples the computation of the dependence graph and the use of it. There is another set of dependency graph code analyses/variable-sensitivity/variable_sensitivity_dependence_graph.{h,cpp} which is more configurable and hopefully more precise, it would be nice to be able to switch between them if there are issues.

Perhaps I have misunderstood as your is_flow_dependent seems to consider control and then data, but doesn't then check control dependency again? If you want this behaviour then maybe use the graph visitor rather than recursing?

Also, non-blocking thoughts:

* The commit message is great; can that make it in to the documentation?

* I appreciate that in this case it is difficult but it would help reviewing in the future if things could be broken out a bit more.

Thank you! I removed all the changes to dependence_grapht and now use the get_reachable function instead.

Implement the functionality described below.

Motivation
---
This loop invariant synthesizer use the idea counter-example-guided synthesis (CEGIS) to synthesize loop invariants for programs with only checks instrumented by `goto-instrument` with flag `--pointer-check`.

This PR contain the driver of the synthesizer and the verifier we use to check invariant candidates.

Verifier
---
The verifier take as input a goto program with pointer checks and a map from loop id to loop invariant candidates. It first annotate and apply the loop invariant candidates into the goto model; and then simulate the CBMC api to verify the instrumented goto program. If there are some violations---loop invariants are not inductive, or some pointer checks fail---, it record valuation from trace generated by the back end to construct a formatted counterexample `cext`.

Counterexample
---
A counterexample `cext` record valuations of variables in the trace, and other information about the violation. The valuation we record including
1. set of live variables upon the entry of the loop.
2. the havoced value of all primitive-typed variables;
3. the havoced offset and the object size of all pointer-typed variables;
4. history values of 2 and 3.

The valuations will be used as true positive (history values) and true negative (havcoed valuation) to filter out bad invariant clause with the idea of the Daikon invariant detector in a following PR. However, in this PR we only construct the valuation but not actually use them.

Synthesizer
---
Loop invariants we synthesize are of the form
`` (in_clause || !guard) && (!guard -> pos_clause)``
where `in_clause` and `out_clause` are predicates we store in two different map, and `guard` is the loop guard. The idea is that we separately synthesize the condition `in_clause` that should hold before the execution of the loop body, and condition `pos_clause` that should hold as post-condition of the loop.

When the violation happen in the loop, we update `in_clause`. When the violation happen after the loop, we update `pos_clause`. When the invariant candidate it not inductive, we enumerate strengthening clause to make it inductive.

To be more efficient, we choose different synthesis strategy for different type of violation
* For out-of-boundary violation, we choose to use the violated predicate as the new clause, which is the WLP of the violation if the violation is only dependent on the havocing instruction. **TODO**: to make it more complete, we need to implement a WLP algorithm
* For null-pointer violation, we choose `__CPROVER_same_object(ptr, __CPROVER_loop_entry(ptr))` as the new clause. That is, the havoced pointer should points to the same object at the start of every iteration of the loop. It is a heuristic choice. This can be extended with the idea of alias analysis if needed.
* For invariant-not-preserved violation, we enumerate strengthening clauses and check that if the invariant will be inductive after strengthening (disjunction with the new clause).

The synthesizer works as follow
1. initialize the two invariant maps,
2. verify the current candidates built from the two maps,
_a. return the candidate if there is no violation
_b. synthesize a new clause to resolve the **first** violation and add it to the correct map,
3. repeat 2.
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinheping Thank you for the quick and accommodating changes. I am approving this although I don't think it touches anywhere I am a maintainer any more. Thanks for all your work.

@tautschnig tautschnig merged commit 50a795e into diffblue:develop Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts Synthesis Invariant synthesis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants