regression2sv-benchmarks improvements

This issue is to document possible improvements and fixes to the regression2sv-benchmarks script. Not intended to be done for SV-COMP 2022 any more.

- [ ] Support multi-property benchmarks.

    Technically this is already possible to an extent (data race and assert properties might end up in the same benchmark), but the validity of these combinations is questionable. For example, benchmarks containing data races wouldn't be valid for unreach-call at all.
    Also, our testing language doesn't allow specifying expected output for assert and race on the same line, which makes it difficult to properly label such multi-property benchmarks.

- [ ] Support successful asserts with irrelevant comments (i.e. not indicative of any other result).

    We have some asserts, where the comment is just a human-readable explanation, but this currently makes it unacceptable for the script. This was to ensure that we don't accidentally submit untrue assertions.
    Extra checks have to be put in place to consider comments that aren't failing or unknown or ignored to be handled as succeeding.

- [ ] Support asserts with `NOWARN`.

    Sometimes we do this to indicate unreachable code (i.e. Goblint shouldn't say anything about the assertion success/unknown/fail). These can be converted to `reach_error()` with expected `unreach-call` verdict being `true` (given compatibility with other asserts).

- [ ] Support always true asserts.

    Sometimes we use always true asserts (e.g. `assert(1)`) to indicate that the location should be reachable. As an SV-COMP benchmark, it would always be correct since it never fails, but such benchmark doesn't actually require the verifier to know it's reachable.
    Instead these should be converted to `reach_error()` with expected `unreach-call` verdict being `false` (given compatibility with other asserts).

- [ ] Check reachability of all normal asserts.

    Generalizing the previous point about always true asserts, in our testing language we actually require all asserts to be (may-)reachable. Therefore it would be possible to generate a special reachability variant for each assert individually, by converting it to `reach_error()` with expected `unreach-call` verdict being `false`.
    This requires that all of our asserts are actually reachable and aren't simply bogus checks on unreachable code (which should be explicitly `NOWARN`).

- [ ] Change from 32bit to 64bit architecture?

    Currently all benchmarks are generated to have 32bit architecture, but Goblint doesn't actually care and analyzes everything with the architecture where it was compiled on (#54). Since we all probably use 64bit machines and some singlethreaded benchmarks do checks specific to 64bit integer ranges, 64bit would probably be more appropriate.

- [ ] Generate all singlethreaded benchmarks.

    Currently the script is only restricted to multithreaded benchmarks. This is purely for logistical reasons:
    * We have much more singlethreaded tests and they'd have to be carefully considered for sv-benchmarks submission.
    * In sv-benchmarks the directory `c/goblint-regression` is part of ConcurrencySafety, so only multithreaded benchmarks should go in there. To submit singlethreaded benchmarks we would have to split the directories in sv-benchmarks first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

regression2sv-benchmarks improvements #383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

regression2sv-benchmarks improvements #383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions