Skip to content

s2n_record_writev standalone example #7357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

nwetzler
Copy link

@nwetzler nwetzler commented Nov 16, 2022

DO NOT MERGE.

This is a standalone example of a problem encountered with function contracts during investigation of s2n-tls API-level functions. Licensing should be resolved.

We need to merge #7395 first.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@nwetzler nwetzler added do not merge aws Bugs or features of importance to AWS CBMC users Code Contracts Function and loop contracts labels Nov 16, 2022
@nwetzler nwetzler self-assigned this Nov 16, 2022
Copy link
Collaborator

@jimgrundy jimgrundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a README saying that this is a contract/symex torture test that currently leads to a massive blowup in memory.

@jimgrundy
Copy link
Collaborator

@TGWDB This is a torture test that exposes a SymEx memory blowup. It is a stripped down problem from the s2n-tls (https://github.com/aws/s2n-tls), which is licensed under Apache 2.0. Can we include an Apache licensed file in CBMC -- it's not something that becomes part of the executable, it's just a regression test.

@codecov
Copy link

codecov bot commented Nov 17, 2022

Codecov Report

Base: 78.37% // Head: 78.38% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (5357db4) compared to base (941030e).
Patch coverage: 94.26% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7357   +/-   ##
========================================
  Coverage    78.37%   78.38%           
========================================
  Files         1647     1647           
  Lines       190328   190362   +34     
========================================
+ Hits        149172   149213   +41     
+ Misses       41156    41149    -7     
Impacted Files Coverage Δ
src/analyses/constant_propagator.h 82.35% <ø> (ø)
src/ansi-c/expr2c.cpp 67.66% <ø> (+0.42%) ⬆️
src/goto-instrument/havoc_loops.cpp 0.00% <0.00%> (ø)
src/util/pointer_predicates.cpp 95.12% <ø> (+2.39%) ⬆️
src/util/pointer_predicates.h 100.00% <ø> (ø)
src/util/simplify_expr_class.h 90.47% <ø> (ø)
src/util/simplify_expr_pointer.cpp 86.58% <ø> (+0.75%) ⬆️
src/analyses/constant_propagator.cpp 95.46% <88.88%> (ø)
src/util/expr_util.cpp 90.55% <95.83%> (+0.81%) ⬆️
src/ansi-c/c_typecheck_expr.cpp 75.60% <100.00%> (ø)
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TGWDB
Copy link
Contributor

TGWDB commented Nov 17, 2022

@TGWDB This is a torture test that exposes a SymEx memory blowup. It is a stripped down problem from the s2n-tls (https://github.com/aws/s2n-tls), which is licensed under Apache 2.0. Can we include an Apache licensed file in CBMC -- it's not something that becomes part of the executable, it's just a regression test.

My interpretation of the Apache 2.0 licence is that this is a derivative work and thus needs to carry appropriate licence information. Practically, I think including the Apache 2.0 licence text for the s2n-tls project as comment in the source code for the regression test and clearly stating this source code is a derivative of the originating work would meet this standard.

An example doing this elsewhere in the codebase can be found here: https://github.com/diffblue/cbmc/blob/48893287099cb5780302fe9dc415eb6888354fd6/jbmc/regression/jbmc/swap2/org/springframework/build/gradle/MergePlugin.groovy

@jimgrundy
Copy link
Collaborator

@nwetzler - can you add the Apache copyright header at the top of the s2n_record_writev.c file ... you'll find the text at the top of every C source file in s2n-tls.

Also, add a NOTICE file to this directory. The Apache license expects this. You can pretty much take the NOTICE file form s2n-tls.

Resolve the check-clang-format issue.

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@jimgrundy jimgrundy assigned tautschnig and unassigned TGWDB Nov 17, 2022
@nwetzler
Copy link
Author

Can we also have a README saying that this is a contract/symex torture test that currently leads to a massive blowup in memory.

Done.

@nwetzler
Copy link
Author

@nwetzler - can you add the Apache copyright header at the top of the s2n_record_writev.c file ... you'll find the text at the top of every C source file in s2n-tls.

Done.

Also, add a NOTICE file to this directory. The Apache license expects this. You can pretty much take the NOTICE file form s2n-tls.

Done.

Resolve the check-clang-format issue.

Done.

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@tautschnig Do you know the what we want to do with this in regression?

@tautschnig
Copy link
Collaborator

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@tautschnig Do you know the what we want to do with this in regression?

We could tag it as THOROUGH, but this will likely mean that the CI job running THOROUGH tests will now fail. So it would be best to mark it FUTURE, which is the only set of tests we don't actually run at this time.

My take on this test is that we need to either keep this PR open for the foreseeable future or merge it as is an create an accompanying issue for someone to diagnose what is actually going on here.

@jimgrundy
Copy link
Collaborator

I would like to:

  • Remove the "do not merge tag"
  • Tag this as "FUTURE" so doesn't run by default in CI (or even in thorough mode)
  • Merge it
  • Create an issue to debug this referencing this test and how to run it

@jimgrundy jimgrundy self-requested a review November 21, 2022 20:59
@nwetzler
Copy link
Author

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@tautschnig Do you know the what we want to do with this in regression?

We could tag it as THOROUGH, but this will likely mean that the CI job running THOROUGH tests will now fail. So it would be best to mark it FUTURE, which is the only set of tests we don't actually run at this time.

My take on this test is that we need to either keep this PR open for the foreseeable future or merge it as is an create an accompanying issue for someone to diagnose what is actually going on here.

@tautschnig How do we tag this as FUTURE if the examples are self-contained with their own makefiles?

@nwetzler
Copy link
Author

I've added a new subfolder containing a new minimized example that Remi constructed. It completely removes function contracts from the problem and now reaches an error state much sooner.

In both examples now (the original and the minimized), we're having a memory out during conversion to SAT after producing SSA.

Copy link
Collaborator

@jimgrundy jimgrundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this now be two separate regressions?


make veryclean && make result

This creates a large problem during symbolic execution and solving.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an accurate description of the minimized version? Doesn't seem right.


This creates a large problem during symbolic execution and solving.

This test is currently not included in the any regression.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one could be, right, because it's not slow, just bad?

@@ -0,0 +1,722 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this in the minimized version?

@tautschnig
Copy link
Collaborator

tautschnig commented Nov 25, 2022

I've added a new subfolder containing a new minimized example that Remi constructed. It completely removes function contracts from the problem and now reaches an error state much sooner.

Does this make it possible to just add it to regular regression tests? We could then make use of the tags (KNOWNBUG or FUTURE).

tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 9, 2023
The minimized example from diffblue#7357 resulted in an invariant failure in
`solvers/flattening/boolbv_add_sub.cpp`, reporting "add/sub with mixed
types." This was caused by simplifying nested byte-extract operations
where one used unsigned offsets and the other one signed. Since we do
not required a particular type for byte-extract offsets we must cope
with different offset types when folding nested byte-extract operations
into a single one.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 10, 2023
This avoids repeatedly visiting already-simplified operands. On the
example from diffblue#7357 this reduces symex time from 1172 seconds to 922
seconds.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 15, 2023
This is a performance-improving refactoring: We only need the result of
a `has_subtype` call under conditions that are infrequently met. On the
benchmark of diffblue#7357, this avoids 443214 calls of `has_subtype`, which
previously was the most costly part of `simplify_byte_extract`.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 15, 2023
This is a performance-improving refactoring: We only need the result of
a `has_subtype` call under conditions that are infrequently met. On the
benchmark of diffblue#7357, this avoids 443214 calls of `has_subtype`, which
previously was the most costly part of `simplify_byte_extract`.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 16, 2023
When `get_value_set_rec` discovers a nondet symbol it will consider the
pointer pointing to any of the known objects (as of 3789670). It
suffices to do this once for each run of `get_value_set`, even when
multiple nondet symbols are encountered while traversing an expression.

This reduces the symex time on the test of diffblue#7357 from 930 seconds to 404
seconds.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 18, 2023
This avoids repeatedly visiting already-simplified operands. On the
example from diffblue#7357 this reduces symex time from 1172 seconds to 922
seconds.
@tautschnig
Copy link
Collaborator

See #7378 (comment) for an analysis of the performance problems surfaced by this test. Next steps are to get #7716 and #7395 merged.

tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 19, 2023
When `get_value_set_rec` discovers a nondet symbol it will consider the
pointer pointing to any of the known objects (as of 3789670). It
suffices to do this once for each run of `get_value_set`, even when
multiple nondet symbols are encountered while traversing an expression.

This reduces the symex time on the test of diffblue#7357 from 930 seconds to 404
seconds.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jul 19, 2023
When `get_value_set_rec` discovers a nondet symbol it will consider the
pointer pointing to any of the known objects (as of 3789670). It
suffices to do this once for each run of `get_value_set`, even when
multiple nondet symbols are encountered while traversing an expression.

This reduces the symex time on the test of diffblue#7357 from 930 seconds to 404
seconds.
@tautschnig
Copy link
Collaborator

With 6f3c16c this example now works as expected, although memory consumption likely remains too high to do these tests on a GitHub runner.

  • Minimized example:
User time (seconds): 109.80
System time (seconds): 5.36
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:55.17
Maximum resident set size (kbytes): 8634460
Exit status: 10
  • Full test:
User time (seconds): 705.16
System time (seconds): 53.04
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 12:38.28
Maximum resident set size (kbytes): 90213984
Exit status: 10

I tried to run the minimized example on revision d60295d (which is the base that this PR is built on top of), but had to abort the test after 25 minutes (during post-processing) when memory consumption exceeded 512 GB of RAM.

Given the memory consumption, I am wondering whether it makes sense to add the minimized variant as a THOROUGH test, but I am concerned that we still wouldn't be able to run it on any system other GitHub's MacOS runners (which feature 14 GB of memory).

@tautschnig tautschnig removed their assignment Sep 18, 2023
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Sep 18, 2023
This avoids repeatedly visiting already-simplified operands. On the
example from diffblue#7357 this reduces symex time from 1172 seconds to 922
seconds.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Nov 6, 2023
This avoids repeatedly visiting already-simplified operands. On the
example from diffblue#7357 this reduces symex time from 1172 seconds to 922
seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users Code Contracts Function and loop contracts do not merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants