Skip to content

Unexpected assertion result with arrays of __CPROVER_bitvector #7104

Closed
@brenoguim

Description

@brenoguim

CBMC version: 5.65.1 (cbmc-5.65.1) 64-bit x86_64 linux
Operating system: RHEL7
Exact command line resulting in the issue:

main.c

struct Foo1 { __CPROVER_bitvector[1] m_array[2]; };
struct Foo8 { __CPROVER_bitvector[8] m_array[2]; };

int main(void)
{
    struct Foo1 f1;
    struct Foo8 f8;

    __CPROVER_assert(f1.m_array[1] == *(f1.m_array + 1), "");
    __CPROVER_assert(f8.m_array[1] == *(f8.m_array + 1), "");
    return 0;
}

$ ./cbmc main.c
CBMC version 5.65.1 (cbmc-5.65.1) 64-bit x86_64 linux
Parsing main.c
Converting
Type-checking main
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
Runtime Symex: 0.00162542s
size of program expression: 40 steps
simple slicing removed 5 assignments
Generated 2 VCC(s), 1 remaining after simplification
Runtime Postprocess Equation: 0.000118281s
Passing problem to propositional reduction
converting SSA
Runtime Convert SSA: 0.000258398s
Running propositional reduction
Post-processing
Runtime Post-process: 6.0269e-05s
Solving with MiniSAT 2.2.1 with simplifier
62 variables, 37 clauses
SAT checker: instance is SATISFIABLE
Runtime Solver: 0.000188411s
Runtime decision procedure: 0.000598905s

** Results:
main.c function main
[main.assertion.1] line 9 assertion: FAILURE
[main.assertion.2] line 10 assertion: SUCCESS

** 1 of 2 failed (2 iterations)
VERIFICATION FAILED

What behaviour did you expect:
Both assertions to pass

What happened instead:
Assertion on line 9 failed.


Since I investigated this issue, I'll leave my analysis:

To my understanding, the issue is in the *(f1.m_array + 1) expression. When CBMC tries to dereference this expression, it eventually reaches value_set_dereference.cpp:627:

// try to build a member/index expression - do not use byte_extract
auto subexpr = get_subexpression_at_offset(
root_object_subexpression, o.offset(), dereference_type, ns);

At this point, it receives an expression to denote the member of Foo1 that is being dereferenced.
In the good case (Foo8), it receives a nice indexexpression, while in the bad case, it receives a byte_extract_little_endian.

The reason for this difference, I believe, is on a semantic mismatch of the layout of an array of bitvectors.

Diving in get_subexpression_at_offset leads us to pointer_offset_size.cpp:596, where we look for the member that matches the given offset inside the struct:

if(source_type.id()==ID_struct)
{
const struct_typet &struct_type = to_struct_type(source_type);
mp_integer m_offset_bits = 0;
for(const auto &component : struct_type.components())
{
const auto m_size_bits = pointer_offset_bits(component.type(), ns);
if(!m_size_bits.has_value())
return {};
// if this member completely contains the target, and this member is
// byte-aligned, recurse into it
if(
offset_bytes * 8 >= m_offset_bits && m_offset_bits % 8 == 0 &&
offset_bytes * 8 + *target_size_bits <= m_offset_bits + *m_size_bits)
{
const member_exprt member(expr, component.get_name(), component.type());
return get_subexpression_at_offset(
member, offset_bytes - m_offset_bits / 8, target_type_raw, ns);
}
m_offset_bits += *m_size_bits;
}

This is done by iterating the members of the struct, tracking their offset and their size. Then it's possible to calculate if the offset we are looking for is inside such member. If we find it, we recurse into that member.

In the line pointer_offset_size.cpp:603, for the "good" case, we ask for the size (in bits) of the __CPROVER_bitvector[8] [2] the answer is 16 bits. That's perfect.

However, on the "bad" case, we ask for the size of __CPROVER_bitvector[1] [2] and it answers 2 bits.
Of course, since I'm looking for the second element, my offset is 1 byte which is beyond the 2 bits of size claimed by the array.

This code fails to find the appropriate member, and fallsback to the byte_extract_little_endian. And that is still wrong because it extracts the second byte in the struct, which isn't where the element is (according to the semantics above), hence the property fails.

There seems to exist a semantic mistmatch between C language and the bitvector extension.

In C, there is no way an array with two elements takes only 2 bits. Since you can take the address of each element individually, and these addresses must be different, they must be at least one byte apart.

I understand this is not pure C, so there is room for custom semantics, but as shown in the testcase, these semantics can cause very unexpected results.

I thought about a few possible ways to solve this. This problem is generally fixed by introducing paddings. For example, bitfields inside of structs will trigger the insertion of padding bits to round the next element to a byte-aligned position:

  1. Add an "element padding" attribute on the array. This would allow the usage of arrays with the current semantics in other places, but the C frontend would create them with paddings. It would require review of all places using array types though.

  2. Create __CPROVER_aligned_bitvector types that have internal paddings to allow safe usage in C. All operations with these types would just forward to the internal __CPROVER_bitvector type. Then it's up to the user to choose.

  3. Change the semantic of __CPROVER_bitvector such that that their size is always rounded up. I guess this is fine if field-sensitivity kicks in, but can be wasteful if not.

Let me know if you need any more information, or I can help with anything.

Metadata

Metadata

Assignees

Labels

More info neededawsBugs or features of importance to AWS CBMC userssoundnessSoundness bug? Review and add "aws" if it is, or remove "soundness" if it isn't.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions