Skip to content

CONTRACTS: Add enumerative loop invariant synthesizer #7393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

qinheping
Copy link
Collaborator

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Implement the functionality described below.

Motivation

This loop invariant synthesizer use the idea counter-example-guided synthesis (CEGIS) to synthesize loop invariants for programs with only checks instrumented by goto-instrument with flag --pointer-check.

This PR contain the driver of the synthesizer and the verifier we use to check invariant candidates.

Verifier

The verifier take as input a goto program with pointer checks and a map from loop id to loop invariant candidates. It first annotate and apply the loop invariant candidates into the goto model; and then simulate the CBMC api to verify the instrumented goto program. If there are some violations---loop invariants are not inductive, or some pointer checks fail---, it record valuation from trace generated by the back end to construct a formatted counterexample cext.

Counterexample

A counterexample cext record valuations of variables in the trace, and other information about the violation. The valuation we record including

  1. set of live variables upon the entry of the loop.
  2. the havoced value of all primitive-typed variables;
  3. the havoced offset and the object size of all pointer-typed variables;
  4. history values of 2 and 3.

The valuations will be used as true positive (history values) and true negative (havcoed valuation) to filter out bad invariant clause with the idea of the Daikon invariant detector in a following PR. However, in this PR we only construct the valuation but not actually use them.

Synthesizer

Loop invariants we synthesize are of the form
(in_clause || !guard) && (!guard -> pos_clause)
where in_clause and out_clause are predicates we store in two different map, and guard is the loop guard. The idea is that we separately synthesize the condition in_clause that should hold before the execution of the loop body, and condition pos_clause that should hold as post-condition of the loop.

When the violation happen in the loop, we update in_clause. When the violation happen after the loop, we update pos_clause. When the invariant candidate it not inductive, we enumerate strengthening clause to make it inductive.

To be more efficient, we choose different synthesis strategy for different type of violation

  • For out-of-boundary violation, we choose to use the violated predicate as the new clause, which is the WLP of the violation if the violation is only dependent on the havocing instruction. TODO: to make it more complete, we need to implement a WLP algorithm
  • For null-pointer violation, we choose __CPROVER_same_object(ptr, __CPROVER_loop_entry(ptr)) as the new clause. That is, the havoced pointer should points to the same object at the start of every iteration of the loop. It is a heuristic choice. This can be extended with the idea of alias analysis if needed.
  • For invariant-not-preserved violation, we enumerate strengthening clauses and check that if the invariant will be inductive after strengthening (disjunction with the new clause).

The synthesizer works as follow

  1. initialize the two invariant maps,
  2. verify the current candidates built from the two maps,
    _a. return the candidate if there is no violation
    _b. synthesize a new clause to resolve the first violation and add it to the correct map,
  3. repeat 2.

The flag synthesize-loop-invariants will also apply synthesized loop contracts.

@qinheping qinheping added aws Bugs or features of importance to AWS CBMC users aws-medium Code Contracts Function and loop contracts Synthesis Invariant synthesis labels Nov 28, 2022
@qinheping qinheping self-assigned this Nov 28, 2022
@qinheping qinheping force-pushed the loop_invariant_synthesis branch 2 times, most recently from b91b168 to 0b0eb68 Compare November 28, 2022 08:30
@tautschnig
Copy link
Collaborator

You have an extensive PR description, which is really nice. Would you mind also including that in the commit message, or at least an executive summary thereof?

Comment on lines +10 to +11
This test shows that loop invariant with form of range predicates can be correctly
synthesized for programs with only pointer checks but no other assertions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens when there are additional assertions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synthesizer tries to solve violation one by one. If there are other violations beyond pointer checks and invariant checks, it will throw an exception saying the type of violation is unsupported. My plan is focusing on pointer checks in this PR, and create another PR to add the enumerating-and-check mode that keeps enumerating until all checks pass for other type of violations.

Comment on lines 34 to 42
loop_idt() : function_id(""), loop_number(-1)
{
}

loop_idt(const loop_idt &other)
: function_id(other.function_id), loop_number(other.loop_number)
{
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which piece of code needs these? At least the first one looks a bit dangerous and I wonder whether we might instead need to go for a bigger change where loop_number is made optionalt<unsigned int>.

Copy link
Collaborator Author

@qinheping qinheping Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When computing cause loop, we want a value to indicate that no cause loop found, which means that the violation is nothing about loop invariants. Therefore, synthesizing loop invariant doesn't help in such case.
Also, now loop_number is a member variable for every instruction. However, it make no sense to have a loop number for an instruction not in a loop. So I agree that changing loop_number from unsigned int to optionalt<unsigned int> is a good idea.
I will make the change for loop_idt in this PR. And then change other loop_number (those in goto_programt::instructiont) with another PR. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do return an optionalt<loop_idt> when looking for a cause.
Please don't change goto_programt::instructiont. It's deliberate that the loop_number in the instruction is undefined unless the instruction forms a loop.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, my comments are a bit all-over-the-place and very surface level: My biggest concern is that we're building what could (should?) be a free-standing tool into goto-instrument. I really think that this deserves to be a tool of its own. Maybe you really want to copy #6526, which should be the bare-bones for a new tool.

@qinheping qinheping force-pushed the loop_invariant_synthesis branch 2 times, most recently from 4104af1 to 36412d2 Compare December 5, 2022 20:42
@qinheping
Copy link
Collaborator Author

My apologies, my comments are a bit all-over-the-place and very surface level: My biggest concern is that we're building what could (should?) be a free-standing tool into goto-instrument. I really think that this deserves to be a tool of its own. Maybe you really want to copy #6526, which should be the bare-bones for a new tool.

It sounds great to me. I will create another PR to initialize the new tool goto-synthesizer. At the same time, let's keep pushing this PR to be merged. Once the PR for the new tool is approved, I will migrate the loop-contract synthesizer to the new tool.

@qinheping qinheping requested a review from tautschnig December 5, 2022 20:49
Implement the functionality described below.

Motivation
---
This loop invariant synthesizer use the idea counter-example-guided synthesis (CEGIS) to synthesize loop invariants for programs with only checks instrumented by `goto-instrument` with flag `--pointer-check`.

This PR contain the driver of the synthesizer and the verifier we use to check invariant candidates.

Verifier
---
The verifier take as input a goto program with pointer checks and a map from loop id to loop invariant candidates. It first annotate and apply the loop invariant candidates into the goto model; and then simulate the CBMC api to verify the instrumented goto program. If there are some violations---loop invariants are not inductive, or some pointer checks fail---, it record valuation from trace generated by the back end to construct a formatted counterexample `cext`.

Counterexample
---
A counterexample `cext` record valuations of variables in the trace, and other information about the violation. The valuation we record including
1. set of live variables upon the entry of the loop.
2. the havoced value of all primitive-typed variables;
3. the havoced offset and the object size of all pointer-typed variables;
4. history values of 2 and 3.

The valuations will be used as true positive (history values) and true negative (havcoed valuation) to filter out bad invariant clause with the idea of the Daikon invariant detector in a following PR. However, in this PR we only construct the valuation but not actually use them.

Synthesizer
---
Loop invariants we synthesize are of the form
`` (in_clause || !guard) && (!guard -> pos_clause)``
where `in_clause` and `out_clause` are predicates we store in two different map, and `guard` is the loop guard. The idea is that we separately synthesize the condition `in_clause` that should hold before the execution of the loop body, and condition `pos_clause` that should hold as post-condition of the loop.

When the violation happen in the loop, we update `in_clause`. When the violation happen after the loop, we update `pos_clause`. When the invariant candidate it not inductive, we enumerate strengthening clause to make it inductive.

To be more efficient, we choose different synthesis strategy for different type of violation
* For out-of-boundary violation, we choose to use the violated predicate as the new clause, which is the WLP of the violation if the violation is only dependent on the havocing instruction. **TODO**: to make it more complete, we need to implement a WLP algorithm
* For null-pointer violation, we choose `__CPROVER_same_object(ptr, __CPROVER_loop_entry(ptr))` as the new clause. That is, the havoced pointer should points to the same object at the start of every iteration of the loop. It is a heuristic choice. This can be extended with the idea of alias analysis if needed.
* For invariant-not-preserved violation, we enumerate strengthening clauses and check that if the invariant will be inductive after strengthening (disjunction with the new clause).

The synthesizer works as follow
1. initialize the two invariant maps,
2. verify the current candidates built from the two maps,
_a. return the candidate if there is no violation
_b. synthesize a new clause to resolve the **first** violation and add it to the correct map,
3. repeat 2.

The flag `synthesize-loop-invariants` will also apply synthesized loop contracts.
@qinheping qinheping force-pushed the loop_invariant_synthesis branch from 36412d2 to 8fa325f Compare December 5, 2022 21:01
@qinheping qinheping force-pushed the loop_invariant_synthesis branch 2 times, most recently from 5658e34 to 0ad2852 Compare December 6, 2022 06:54
@codecov
Copy link

codecov bot commented Dec 6, 2022

Codecov Report

Base: 78.39% // Head: 78.41% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (c63e328) compared to base (b03d870).
Patch coverage: 89.36% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7393      +/-   ##
===========================================
+ Coverage    78.39%   78.41%   +0.01%     
===========================================
  Files         1655     1657       +2     
  Lines       190281   190766     +485     
===========================================
+ Hits        149172   149590     +418     
- Misses       41109    41176      +67     
Impacted Files Coverage Δ
src/goto-instrument/contracts/utils.h 100.00% <ø> (ø)
src/goto-instrument/havoc_utils.h 100.00% <ø> (ø)
...nthesizer/enumerative_loop_invariant_synthesizer.h 100.00% <ø> (ø)
...rc/goto-instrument/synthesizer/expr_enumerator.cpp 77.55% <50.00%> (+2.02%) ⬆️
.../goto-instrument/goto_instrument_parse_options.cpp 71.44% <81.81%> (-0.04%) ⬇️
src/goto-instrument/synthesizer/cegis_verifier.cpp 84.93% <84.93%> (ø)
...hesizer/enumerative_loop_invariant_synthesizer.cpp 90.72% <91.24%> (+3.22%) ⬆️
src/analyses/dependence_graph.cpp 91.57% <100.00%> (+1.27%) ⬆️
src/analyses/dependence_graph.h 85.89% <100.00%> (+0.76%) ⬆️
src/goto-instrument/contracts/contracts.cpp 95.40% <100.00%> (+<0.01%) ⬆️
... and 29 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@qinheping qinheping force-pushed the loop_invariant_synthesis branch from 0ad2852 to 4b558d8 Compare December 6, 2022 17:28
@tautschnig
Copy link
Collaborator

It sounds great to me. I will create another PR to initialize the new tool goto-synthesizer. At the same time, let's keep pushing this PR to be merged. Once the PR for the new tool is approved, I will migrate the loop-contract synthesizer to the new tool.

Now that this has happened: can this PR directly go in the new tool? It will avoid adding a whole bunch of dependencies into goto-instrument.

@qinheping
Copy link
Collaborator Author

It sounds great to me. I will create another PR to initialize the new tool goto-synthesizer. At the same time, let's keep pushing this PR to be merged. Once the PR for the new tool is approved, I will migrate the loop-contract synthesizer to the new tool.

Now that this has happened: can this PR directly go in the new tool? It will avoid adding a whole bunch of dependencies into goto-instrument.

Agreed. I created two PRs:
#7429 for Migrating the current synthesizer interface and the expression enumerator from goto-instrument into goto-synthesizer.
#7430 that make the same changes as in this PR but in goto-synthesizer.

@qinheping qinheping closed this Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts do not merge Synthesis Invariant synthesis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants