- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 33.3k
 
GH-132657: Add lock-free set contains implementation #132290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This makes for longer code vs using the custom LOAD_*/STORE_* macros. However, I think this makes the code more clear.
| 
           🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.  | 
    
        
          
                Misc/NEWS.d/next/Core_and_Builtins/2025-07-11-19-57-27.gh-issue-132657.vwDuO2.rst
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | 
           Hello @eendebakpt, since you have recently worked on the setobject.c source, perhaps you are able to take another look at this. I think the free-threaded scaling improvement is significant. I've merge with the main branch and updated the benchmark results.  | 
    
| NUM_LOOPS = 20 | ||
| 
               | 
          ||
| s = frozenset() | ||
| def make_set(): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the make_set here in a thread? It just assigns the same frozenset to s each time.
Maybe change make_set so that it creates a different frozenset each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment.  The make_set() binds s to a new frozenset object (at least I intend it to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The s is indeed updated with a new frozenset, but it is a frozenset with the same elements each time. I was thinking about something like
def make_set():
            nonlocal s
            barrier.wait()
            num_elements = NUM_ITEMS
            while not done:
               num_elements = (num_elements  + 17 %) NUM_ITEMS
                s = frozenset(range(num_elements))
But what are we testing here. There is no concurrent mutation of the s, so this test is not specific to the free-threading build I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I'm not sure what I was thinking when writing that test.  It doesn't test what I intended.  I re-wrote it so that a set() is used as the key and it mutated.
| #define SET_MARK_SHARED(so) _PyObject_GC_SET_SHARED(so) | ||
| 
               | 
          ||
| static void | ||
| ensure_shared_on_read(PySetObject *so) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this is identical to the ensure_shared_on_read from dictobject.c. The SET_DICT_SHARED would correspond to SET_SET_SHARED, which is an odd name so here it is SET_MARK_SHARED.
| FT_ATOMIC_STORE_SSIZE_RELAXED(a->hash, -1); | ||
| FT_ATOMIC_STORE_SSIZE_RELAXED(b->hash, -1); | ||
| } | ||
| if (!SET_IS_SHARED(b) && SET_IS_SHARED(a)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_swap_bodies is only called in two locations in this file. For both cases the second argument b is a newly created temporary which is discarded afterwards. So we could replace part (all?) of these branches with asserts.  Part of the other work (e.g. copying back to b) might not be needed for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would simplify the code and be slightly faster (not sure how much it actually matters). I'll create a separate issue for that so we can benchmark it.
| { | ||
| int status; | ||
| setentry *entry; | ||
| if (PyFrozenSet_CheckExact(so)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this path can be taken. set_lookkey_threadsafe is only called in set_contains_entry. The set_contains_entry is only called in _PySet_Contains. _PySet_Contains is called in set___contains___impl and in bytecodes.c _CONTAINS_OP_SET (the latter can have frozenset as input)
| if (table == FT_ATOMIC_LOAD_PTR_ACQUIRE(so->table) && | ||
| startkey == FT_ATOMIC_LOAD_PTR_ACQUIRE(ep->key)) { | ||
| assert(cmp == SET_LOOKKEY_FOUND || cmp == SET_LOOKKEY_NO_MATCH); | ||
| return cmp; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we checking here and why? For the original code the PyObject_RichCompareBool could have changed the set, and these checks guard against this. However, in the free-threading setting a concurrent thread can change the set so after the table == FT_ATOMIC_LOAD_PTR_ACQUIRE(so->table) && startkey == FT_ATOMIC_LOAD_PTR_ACQUIRE(ep->key), but before the return cmp;. So any calling method cannot assume that when SET_LOOKKEY_FOUND is returned the key is still in the set.
Maybe adding the check help to preserve backwards compatibility in case of a mutating rich compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the "why" is to handle PyObject_RichCompareBool() mutating the set, via the restart.  I think we want to preserve that, even for the free-threaded build, since some code that running single-threaded might assume set lookups handle mutation that way.
* Add comment to _PySet_Contains() * Rename set_compare_entry() to set_compare_entry_lock_held(). * Remove unneeded `startkey == NULL` test.
The test_contains_frozenset() was not actually doing anything useful, rewrite it. Add test_contains_hash_mutate() which triggers unusual code paths due to `__hash__` on the key mutating the container.
This roughly follows what was done for dictobject to make a lock-free lookup operation. The set contains operation scales much better when used from multiple-threads. The frozenset contains performance seems unchanged (as already lock-free).
Summary of changes:
set_lookkey()intoset_do_lookup()which now takes a function pointer that does the entry comparison. This is similar to dictobject anddo_lookup(). In an optimized build, the comparison function is inlined and there should be no performance cost to this.set_do_lookupto return a status separately from the entry value.set_compare_frozenset()and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)FT_ATOMIC_*macros as needed for atomic loads and storesmemcpy().so->tableto NULL while the change is a happening. Assign the real table array after the change is done.Free-threading scaling benchmark results from the attached scripts (result for 8 cores in parallel). This is a modified version of the
ftscalingbenchmark.pyscript.ftscaling_set.py.txt