s2n_record_writev standalone example #7357

nwetzler · 2022-11-16T23:15:29Z

DO NOT MERGE.

This is a standalone example of a problem encountered with function contracts during investigation of s2n-tls API-level functions. Licensing should be resolved.

We need to merge #7395 first.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

jimgrundy

Can we also have a README saying that this is a contract/symex torture test that currently leads to a massive blowup in memory.

regression/contracts/s2n_record_writev/s2n_record_writev.c

jimgrundy · 2022-11-16T23:26:02Z

@TGWDB This is a torture test that exposes a SymEx memory blowup. It is a stripped down problem from the s2n-tls (https://github.com/aws/s2n-tls), which is licensed under Apache 2.0. Can we include an Apache licensed file in CBMC -- it's not something that becomes part of the executable, it's just a regression test.

codecov · 2022-11-17T00:29:57Z

Codecov Report

Base: 78.37% // Head: 78.38% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (5357db4) compared to base (941030e).
Patch coverage: 94.26% of modified lines in pull request are covered.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #7357   +/-   ##
========================================
  Coverage    78.37%   78.38%           
========================================
  Files         1647     1647           
  Lines       190328   190362   +34     
========================================
+ Hits        149172   149213   +41     
+ Misses       41156    41149    -7

Impacted Files	Coverage Δ
src/analyses/constant_propagator.h	`82.35% <ø> (ø)`
src/ansi-c/expr2c.cpp	`67.66% <ø> (+0.42%)`	⬆️
src/goto-instrument/havoc_loops.cpp	`0.00% <0.00%> (ø)`
src/util/pointer_predicates.cpp	`95.12% <ø> (+2.39%)`	⬆️
src/util/pointer_predicates.h	`100.00% <ø> (ø)`
src/util/simplify_expr_class.h	`90.47% <ø> (ø)`
src/util/simplify_expr_pointer.cpp	`86.58% <ø> (+0.75%)`	⬆️
src/analyses/constant_propagator.cpp	`95.46% <88.88%> (ø)`
src/util/expr_util.cpp	`90.55% <95.83%> (+0.81%)`	⬆️
src/ansi-c/c_typecheck_expr.cpp	`75.60% <100.00%> (ø)`
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

TGWDB · 2022-11-17T16:05:24Z

@TGWDB This is a torture test that exposes a SymEx memory blowup. It is a stripped down problem from the s2n-tls (https://github.com/aws/s2n-tls), which is licensed under Apache 2.0. Can we include an Apache licensed file in CBMC -- it's not something that becomes part of the executable, it's just a regression test.

My interpretation of the Apache 2.0 licence is that this is a derivative work and thus needs to carry appropriate licence information. Practically, I think including the Apache 2.0 licence text for the s2n-tls project as comment in the source code for the regression test and clearly stating this source code is a derivative of the originating work would meet this standard.

An example doing this elsewhere in the codebase can be found here: https://github.com/diffblue/cbmc/blob/48893287099cb5780302fe9dc415eb6888354fd6/jbmc/regression/jbmc/swap2/org/springframework/build/gradle/MergePlugin.groovy

jimgrundy · 2022-11-17T22:14:10Z

@nwetzler - can you add the Apache copyright header at the top of the s2n_record_writev.c file ... you'll find the text at the top of every C source file in s2n-tls.

Also, add a NOTICE file to this directory. The Apache license expects this. You can pretty much take the NOTICE file form s2n-tls.

Resolve the check-clang-format issue.

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

nwetzler · 2022-11-18T00:31:49Z

Can we also have a README saying that this is a contract/symex torture test that currently leads to a massive blowup in memory.

Done.

nwetzler · 2022-11-18T23:41:04Z

@nwetzler - can you add the Apache copyright header at the top of the s2n_record_writev.c file ... you'll find the text at the top of every C source file in s2n-tls.

Done.

Also, add a NOTICE file to this directory. The Apache license expects this. You can pretty much take the NOTICE file form s2n-tls.

Done.

Resolve the check-clang-format issue.

Done.

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@tautschnig Do you know the what we want to do with this in regression?

tautschnig · 2022-11-19T23:12:55Z

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@tautschnig Do you know the what we want to do with this in regression?

We could tag it as THOROUGH, but this will likely mean that the CI job running THOROUGH tests will now fail. So it would be best to mark it FUTURE, which is the only set of tests we don't actually run at this time.

My take on this test is that we need to either keep this PR open for the foreseeable future or merge it as is an create an accompanying issue for someone to diagnose what is actually going on here.

jimgrundy · 2022-11-21T20:58:32Z

I would like to:

Remove the "do not merge tag"
Tag this as "FUTURE" so doesn't run by default in CI (or even in thorough mode)
Merge it
Create an issue to debug this referencing this test and how to run it

nwetzler · 2022-11-22T00:13:59Z

Then check in with Micheal to check that this regression will be run at the correct time (which I presume is default never, but perhaps when some specific long-running tests are selected). Also, ask Michael how to mark it as expected to fail, because if run to completion he does report fails and if those are expected we don't want folks to investigate.

@tautschnig Do you know the what we want to do with this in regression?

We could tag it as THOROUGH, but this will likely mean that the CI job running THOROUGH tests will now fail. So it would be best to mark it FUTURE, which is the only set of tests we don't actually run at this time.

My take on this test is that we need to either keep this PR open for the foreseeable future or merge it as is an create an accompanying issue for someone to diagnose what is actually going on here.

@tautschnig How do we tag this as FUTURE if the examples are self-contained with their own makefiles?

nwetzler · 2022-11-22T00:16:32Z

I've added a new subfolder containing a new minimized example that Remi constructed. It completely removes function contracts from the problem and now reaches an error state much sooner.

In both examples now (the original and the minimized), we're having a memory out during conversion to SAT after producing SSA.

jimgrundy

Should this now be two separate regressions?

jimgrundy · 2022-11-25T17:37:57Z

regression/contracts/s2n_record_writev/minimized/README

+
+make veryclean && make result
+
+This creates a large problem during symbolic execution and solving.


Is this an accurate description of the minimized version? Doesn't seem right.

jimgrundy · 2022-11-25T17:38:15Z

regression/contracts/s2n_record_writev/minimized/README

+
+This creates a large problem during symbolic execution and solving.
+
+This test is currently not included in the any regression.


This one could be, right, because it's not slow, just bad?

jimgrundy · 2022-11-25T17:39:09Z

regression/contracts/s2n_record_writev/minimized/s2n_record_writev.c

@@ -0,0 +1,722 @@
+/*


Do we still need this in the minimized version?

tautschnig · 2022-11-25T18:45:52Z

I've added a new subfolder containing a new minimized example that Remi constructed. It completely removes function contracts from the problem and now reaches an error state much sooner.

Does this make it possible to just add it to regular regression tests? We could then make use of the tags (KNOWNBUG or FUTURE).

The minimized example from diffblue#7357 resulted in an invariant failure in `solvers/flattening/boolbv_add_sub.cpp`, reporting "add/sub with mixed types." This was caused by simplifying nested byte-extract operations where one used unsigned offsets and the other one signed. Since we do not required a particular type for byte-extract offsets we must cope with different offset types when folding nested byte-extract operations into a single one.

This avoids repeatedly visiting already-simplified operands. On the example from diffblue#7357 this reduces symex time from 1172 seconds to 922 seconds.

This is a performance-improving refactoring: We only need the result of a `has_subtype` call under conditions that are infrequently met. On the benchmark of diffblue#7357, this avoids 443214 calls of `has_subtype`, which previously was the most costly part of `simplify_byte_extract`.

When `get_value_set_rec` discovers a nondet symbol it will consider the pointer pointing to any of the known objects (as of 3789670). It suffices to do this once for each run of `get_value_set`, even when multiple nondet symbols are encountered while traversing an expression. This reduces the symex time on the test of diffblue#7357 from 930 seconds to 404 seconds.

This avoids repeatedly visiting already-simplified operands. On the example from diffblue#7357 this reduces symex time from 1172 seconds to 922 seconds.

tautschnig · 2023-05-19T11:23:26Z

See #7378 (comment) for an analysis of the performance problems surfaced by this test. Next steps are to get #7716 and #7395 merged.

When `get_value_set_rec` discovers a nondet symbol it will consider the pointer pointing to any of the known objects (as of 3789670). It suffices to do this once for each run of `get_value_set`, even when multiple nondet symbols are encountered while traversing an expression. This reduces the symex time on the test of diffblue#7357 from 930 seconds to 404 seconds.

tautschnig · 2023-09-18T11:33:46Z

With 6f3c16c this example now works as expected, although memory consumption likely remains too high to do these tests on a GitHub runner.

Minimized example:

User time (seconds): 109.80
System time (seconds): 5.36
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:55.17
Maximum resident set size (kbytes): 8634460
Exit status: 10

Full test:

User time (seconds): 705.16
System time (seconds): 53.04
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 12:38.28
Maximum resident set size (kbytes): 90213984
Exit status: 10

I tried to run the minimized example on revision d60295d (which is the base that this PR is built on top of), but had to abort the test after 25 minutes (during post-processing) when memory consumption exceeded 512 GB of RAM.

Given the memory consumption, I am wondering whether it makes sense to add the minimized variant as a THOROUGH test, but I am concerned that we still wouldn't be able to run it on any system other GitHub's MacOS runners (which feature 14 GB of memory).

This avoids repeatedly visiting already-simplified operands. On the example from diffblue#7357 this reduces symex time from 1172 seconds to 922 seconds.

nwetzler added do not merge aws Bugs or features of importance to AWS CBMC users Code Contracts Function and loop contracts labels Nov 16, 2022

nwetzler self-assigned this Nov 16, 2022

nwetzler force-pushed the s2n_record_writev branch from c6e9542 to a3f2acc Compare November 16, 2022 23:18

jimgrundy reviewed Nov 16, 2022

View reviewed changes

regression/contracts/s2n_record_writev/s2n_record_writev.c Outdated Show resolved Hide resolved

jimgrundy assigned TGWDB Nov 16, 2022

jimgrundy assigned tautschnig and unassigned TGWDB Nov 17, 2022

jimgrundy self-requested a review November 21, 2022 20:59

jimgrundy approved these changes Nov 21, 2022

View reviewed changes

nwetzler force-pushed the s2n_record_writev branch from 8c73988 to 97638f2 Compare November 22, 2022 00:11

jimgrundy reviewed Nov 25, 2022

View reviewed changes

s2n_record_writev standalone example

5357db4

nwetzler force-pushed the s2n_record_writev branch from 97638f2 to 5357db4 Compare November 29, 2022 17:02

tautschnig mentioned this pull request May 9, 2023

Simplify byte extract: support signed and unsigned offsets #7705

Merged

3 tasks

tautschnig mentioned this pull request May 15, 2023

simplify_byte_extract: avoid unnecessary calls to has_subtype #7712

Merged

4 tasks

tautschnig mentioned this pull request May 16, 2023

Value sets: make pointers nondet only once #7716

Merged

4 tasks

tautschnig removed their assignment Sep 18, 2023


		make veryclean && make result

		This creates a large problem during symbolic execution and solving.


		This creates a large problem during symbolic execution and solving.

		This test is currently not included in the any regression.

s2n_record_writev standalone example #7357

Are you sure you want to change the base?

s2n_record_writev standalone example #7357

Uh oh!

Conversation

nwetzler commented Nov 16, 2022 • edited by feliperodri Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimgrundy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimgrundy commented Nov 16, 2022

Uh oh!

codecov bot commented Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TGWDB commented Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimgrundy commented Nov 17, 2022

Uh oh!

nwetzler commented Nov 18, 2022

Uh oh!

nwetzler commented Nov 18, 2022

Uh oh!

tautschnig commented Nov 19, 2022

Uh oh!

jimgrundy commented Nov 21, 2022

Uh oh!

nwetzler commented Nov 22, 2022

Uh oh!

nwetzler commented Nov 22, 2022

Uh oh!

jimgrundy left a comment

Choose a reason for hiding this comment

Uh oh!

jimgrundy Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

jimgrundy Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

jimgrundy Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

tautschnig commented Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tautschnig commented May 19, 2023

Uh oh!

tautschnig commented Sep 18, 2023

Uh oh!

Uh oh!

nwetzler commented Nov 16, 2022 •

edited by feliperodri

Loading

jimgrundy left a comment •

edited

Loading

codecov bot commented Nov 17, 2022 •

edited

Loading

TGWDB commented Nov 17, 2022 •

edited

Loading

tautschnig commented Nov 25, 2022 •

edited

Loading