Skip to content

Conversation

@sim642
Copy link
Member

@sim642 sim642 commented Aug 14, 2024

The may_create function for our history-based thread IDs from the ESOP paper is quite conservative.
After thinking about it for a while (in comparison to is_must_parent, I noticed many cases where may_create can be more precise and incrementally implemented them here.

I haven't constructed programs where all these thread IDs and their improvements would actually show up and matter, so these may be very obscure (or even infeasible for some other reasons I didn't notice).
Also, I'm not sure if it is now optimal, possibly not yet in the "both are non-unique" case.

TODO

  • Add unit tests.
  • Add regression tests.
  • Rename the two functions.
  • Make prefix list un-reversed? Being reversed is only useful for efficient "appending", but the prefixes should be very short. Reversing those lists for various predicates (which may be called many times) might be the more overwhelming inefficiency.
  • Benchmark for (unlikely) precision improvement or performance regression.
  • From GobCon: consider over-approximated $$C$$ implications.

@sim642 sim642 added feature precision relational Relational analyses (Apron, affeq, lin2var) labels Aug 14, 2024
@michael-schwarz michael-schwarz self-requested a review August 14, 2024 17:37
@michael-schwarz
Copy link
Member

I wonder how often (if at all) any of this actually hits: As soon as the creator is not the must-parent, it's already game over:

let definitely_not_started (current, created) other =
if (not (TID.is_must_parent current other)) then
false
else
let ident_or_may_be_created creator = TID.equal creator other || TID.may_create creator other in
if ConcDomain.ThreadSet.is_top created then

@sim642
Copy link
Member Author

sim642 commented Aug 14, 2024

Indeed, since those are our only usages of must_be_parent and may_create, they might not and I didn't consider that so far.
Although there we require current to be must parent, but creator is checked for may_create. The naming in the code is maybe a bit confusing, because creator (from $$C$$) is created by current.

I did a test run with coverage locally and (even to by surprise) all lines and branches are covered. But the actual usefulness of this will become clear when I try to construct regression test programs around these hypothetical improvements.

@sim642 sim642 added the performance Analysis time, memory usage label Aug 15, 2024
@sim642
Copy link
Member Author

sim642 commented Aug 15, 2024

I managed to simplify and optimize it to something reasonable by using a simple (but efficient) GobList.remove_common_prefix function. The old way of checking of p is a prefix of p' was

P.equal p (P.common_suffix p p')

which is very roundabout: it constructs the common prefix (via list reversals) just to check for equality (which goes through the entire list again).

In many cases the check

S.subset (S.union (S.of_list p) s) (S.union (S.of_list p') s')

does capture everything that's needed, but it's far from obvious that it's all that's needed (for optimality).
By using information from checks (e.g. s is empty, or p is prefix of p' or vice versa) and well-formedness (p has unique elements, and p and s are disjoint), a lot less needs to be actually checked – in some cases nothing at all!


I've now realized that the naming of the two functions is also a bit weird. I would rename:

  1. is_must_parentmust_be_ancestor because in standard terminology parent means immediate,
  2. may_createmay_be_ancestor to make naming uniform.

@sim642 sim642 force-pushed the threadid-history-may_create branch from aa3fb6d to cc7a76a Compare August 15, 2024 12:54
@sim642
Copy link
Member Author

sim642 commented Aug 15, 2024

I went through all the unit tests for the improved cases and tried to construct a real program for each one. Five of them have a corresponding regression test now that also checks for the improved precision from this. For others I documented why a real program for data race regression testing at least isn't possible.
There may be a use for may_create in the future that doesn't guarantee is_must_parent checks before it. And even for the time being, the change to may_create cannot even harm performance because in those cases the new logic isn't used anyway.

@sim642 sim642 marked this pull request as ready for review August 15, 2024 13:00
@sim642 sim642 self-assigned this Aug 20, 2024
@sim642
Copy link
Member Author

sim642 commented Aug 21, 2024

A before vs after benchmark run on SV-COMP ConcurrencySafety with 60s timeout gives zero verdict differences and performance differences are also just within measurement noise.

As to the over-approximated $$C$$ sets, I don't see how it would matter: this predicate even doesn't use it. The only place in thread spawning where it does matter is thread creation in a loop, where main could directly create main, {foo} (not just main, foo). But as these thread IDs, main, {foo} could equivalently be created by main, foo recursively. Since both predicates are about ancestors (not immediate parents), this shouldn't matter. main is considered both must and may ancestor of main, {foo}.

@michael-schwarz
Copy link
Member

I saw you have a pending thesis on verifying the correctness of this. Do we want to wait for this or should I attempt to review it such that we can then merge it?

@sim642
Copy link
Member Author

sim642 commented Dec 16, 2024

I offered a thesis topic on formally verifying this but there's no student working on it (and I'm not sure if there ever will be), so we shouldn't wait for that. I've left a lot of comments for the various cases along with tests, so I hope that's sufficiently clear and convincing that this should be right.

Copy link
Member

@michael-schwarz michael-schwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my POV this is good to go when the three remarks about comments are addressed, and the failing CI is fixed.

@sim642
Copy link
Member Author

sim642 commented Jan 23, 2025

I added some more explanations to the three comments. I wouldn't want to outright remove them because then it's unclear why something might appear missing.

@sim642 sim642 added this to the v2.6.0 milestone Jan 27, 2025
@sim642 sim642 merged commit bce9f92 into master Jan 27, 2025
21 checks passed
@sim642 sim642 deleted the threadid-history-may_create branch January 27, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature performance Analysis time, memory usage precision relational Relational analyses (Apron, affeq, lin2var)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants