Skip to content

regression2sv-benchmarks improvements #383

Open
@sim642

Description

@sim642

This issue is to document possible improvements and fixes to the regression2sv-benchmarks script. Not intended to be done for SV-COMP 2022 any more.

  • Support multi-property benchmarks.

    Technically this is already possible to an extent (data race and assert properties might end up in the same benchmark), but the validity of these combinations is questionable. For example, benchmarks containing data races wouldn't be valid for unreach-call at all.
    Also, our testing language doesn't allow specifying expected output for assert and race on the same line, which makes it difficult to properly label such multi-property benchmarks.

  • Support successful asserts with irrelevant comments (i.e. not indicative of any other result).

    We have some asserts, where the comment is just a human-readable explanation, but this currently makes it unacceptable for the script. This was to ensure that we don't accidentally submit untrue assertions.
    Extra checks have to be put in place to consider comments that aren't failing or unknown or ignored to be handled as succeeding.

  • Support asserts with NOWARN.

    Sometimes we do this to indicate unreachable code (i.e. Goblint shouldn't say anything about the assertion success/unknown/fail). These can be converted to reach_error() with expected unreach-call verdict being true (given compatibility with other asserts).

  • Support always true asserts.

    Sometimes we use always true asserts (e.g. assert(1)) to indicate that the location should be reachable. As an SV-COMP benchmark, it would always be correct since it never fails, but such benchmark doesn't actually require the verifier to know it's reachable.
    Instead these should be converted to reach_error() with expected unreach-call verdict being false (given compatibility with other asserts).

  • Check reachability of all normal asserts.

    Generalizing the previous point about always true asserts, in our testing language we actually require all asserts to be (may-)reachable. Therefore it would be possible to generate a special reachability variant for each assert individually, by converting it to reach_error() with expected unreach-call verdict being false.
    This requires that all of our asserts are actually reachable and aren't simply bogus checks on unreachable code (which should be explicitly NOWARN).

  • Change from 32bit to 64bit architecture?

    Currently all benchmarks are generated to have 32bit architecture, but Goblint doesn't actually care and analyzes everything with the architecture where it was compiled on (make sizes of primitive types configurable with current machine as default #54). Since we all probably use 64bit machines and some singlethreaded benchmarks do checks specific to 64bit integer ranges, 64bit would probably be more appropriate.

  • Generate all singlethreaded benchmarks.

    Currently the script is only restricted to multithreaded benchmarks. This is purely for logistical reasons:

    • We have much more singlethreaded tests and they'd have to be carefully considered for sv-benchmarks submission.
    • In sv-benchmarks the directory c/goblint-regression is part of ConcurrencySafety, so only multithreaded benchmarks should go in there. To submit singlethreaded benchmarks we would have to split the directories in sv-benchmarks first.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions