-
Notifications
You must be signed in to change notification settings - Fork 3k
Optimise accesses to the Table name -> Table id map #7118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimise accesses to the Table name -> Table id map #7118
Conversation
CT Test Results 4 files 225 suites 1h 26m 33s ⏱️ Results for commit 5d58cad. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
ping @sverker |
I'm sorry. I will try to get time to look at this again. I think it demands a proper review. Lock-less stuff are tricky. |
erts/emulator/beam/erl_db.c
Outdated
db_lock(tb, kind); | ||
if (name_lck) | ||
erts_rwmtx_runlock(name_lck); | ||
db_lock(tb, kind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here in db_get_table_aux() when the table is locked we should double check the name of the table to be the expected (if we did lookup by name) and bail out with badarg if not. That would probably mean we got raced by a rename operation.
This will also make the ABA problem go away. And with this I think we could just clear an inline_entry in remove_named_tab() without having to allocate a new empty one. Remove in an outline bucket with more than one entry is still more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried getting this to work and I'm hitting a small issue.
The lock is not always taken in db_get_table_aux(): if what == DB_READ_TBL_STRUCT
, then there is an early return before the lock is taken. The double-check might still work, but I'm struggling to figure out how to make it safe, and ensure that the table does not change under us. Apart from that I've got a patch with the relevant changes (removing the hack for size, and doing an atomic write of 0 to the inline_entry size when removing the only table in it (rather than allocating an empty outlined entry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should double check the name after locking the table. Otherwise the table op may be seen as happening after the table was renamed.
ets:rename(aaa, bbb),
ets:insert(bbb, {key, bbb})
A concurrent process doing ets:lookup(aaa, key)
should never get {key,bbb}
if it never existed in the tabled when it was named aaa.
For DB_READ_TBL_STRUCT
it's enough to do the check without lock. Just to make sure the table got the correct name now. DB_READ_TBL_STRUCT
is only used by yielding ets:insert
as a first cheap lookup of the table. It will lock the table later and then also double check the name. See comment in function ets_insert_2_list_lock_tbl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I added it to the comment (and moved the double-check to after taking the lock).
ping @RobinMorisset Would love to see this merged :) |
oops, sorry for forgetting about that. I'm a bit busy this week, but I'll try to get this ready for merging next week. |
@RobinMorisset would be nice to have in OTP 27 :) |
9047110
to
0f96524
Compare
This new version passes tests, so it should not be completely broken, but I've put the double-checking before the locking because the lock is not always taken (see my comment), so I'm not very confident yet that this last change is correct. |
Currently these accesses are guarded by (striped) read-write locks. With this patch, there are still (regular) locks for protecting writing threads from each other, but threads that only need to read that map (so the overwhelming majority) no longer need to use any lock. Instead we now rely on lightweight atomic synchronisation, and on the thread_progress functionality for memory reclaimation. One ugly hack is that if we add 1 table, then remove it, the corresponding bucket may go - inline, empty - inline, with a value - outlined vector without a value Once we have a filled the inline vector, we can only transition to an outlined vector, and never go back to the inline state. This saves us from an ABA hazard: - Reader sees inline vector of size 1 - Reader reads the name NameA that corresponds to table IdA, sees it matches what it looked for - Writer deletes (or rename the table), moving us to an empty vector - Writer inserts another table in that bucket, with a different name: NameB -> IdB - Reader reads IdB, concluding that NameA -> IdB. As long as we stick to outline vectors and always allocate a fresh one when deleting tables, we don't have that risk, as the vectors only get freed (and thus the adress may get reused by another vector) once all readers have made thread progress (and thus are no longer in the middle of an operation). One consequence of this hack, is that we must lie to the memory counting system, pretending to have an outlined vector at all times, even when we only have the initial inlined vector. Otherwise, the tests detect the memory increase in inline (1 element) -> outlined (0 element) -> outlined (1 element), and report a memory leak. This patch was motivated by noticing that heavyweight services can spend several % of CPU time in db_get_table_aux. A trial in production confirmed that this patch roughly halves the time spent in that function.
…), and avoid the hack where we make empty outline vectors.
6f1dfd3
to
da8337a
Compare
erts/emulator/beam/erl_db.c
Outdated
void schedule_meta_name_tab_entries_for_deletion(struct meta_name_tab_entries *entries) | ||
{ | ||
char* ptr = (char *) entries; | ||
ptr -= sizeof(ErtsThrPrgrLaterOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using raw char pointer arithmetic with sizeof(ErtsThrPrgrLaterOp)
I think you should introduce a struct meta_name_tab_entries_allocated
, or whatever you want to call it.
struct meta_name_tab_entries_allocated {
ErtsThrPrgrLaterOp later_op;
struct meta_name_tab_entries data;
};
and then use macro ErtsContainerStruct(entries, struct meta_name_tab_entries_allocated, data)
to find the start of it.
and entries = &allocated->data
to go the other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I did not know about ErtsContainerStruct and it is indeed much cleaner.
erts/emulator/beam/erl_db.c
Outdated
if (what == DB_READ_TBL_STRUCT) { | ||
if (name_lck) | ||
erts_rwmtx_runlock(name_lck); | ||
return tb; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you must check the name for DB_READ_TBL_STRUCT as well. Otherwise we may start to operate on a completely random wrong table which feels a bit scary. At least I think it can result in incorrect errors if the random table has a different keypos
for example.
erts/emulator/beam/erl_db.c
Outdated
if (ERTS_UNLIKELY(tb->common.the_name != id)) { | ||
*freason_p = BADARG | EXF_HAS_EXT_INFO; | ||
p->fvalue = EXI_ID; | ||
tb = NULL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
block should end with return NULL
, otherwise it will crash down where it checks tb->common.status
.
I assume that means this code has never been executed. It's quite difficult to provoke the necessary race condition, but I think we should at least try with an evil test case that concurrently loops around both ets:rename
and maybe ets:lookup
and ets:insert
. Instrumenting the code with a strategic erts_thr_yield()
could even make it practically possible for the test case to provoke the race.
if (entries->data[i].name_atom == id) {
erts_thr_yield();
tb = entries->data[i].tb;
break;
}
I've checked and the new test catches that bug (triggers the segfault, which disappears with the fix). Thanks again for the careful review! |
Did you get the test case to trigger the race as it is, without |
I got it to trigger the race as it is, but I had 1M iterations at the time. I reduced the number of iterations because it takes forever to run otherwise now that the race is fixed. |
Testing some more, 1k iterations does not trigger the bug, but 10k iterations is enough to trigger it reliably (2 out of 2 runs). |
I found the issue: I was missing a db_unlock in the double-check branch! |
You can also use undocumented |
You should always also run debug built emulator when developing. That will detect the missing unlock. |
How do you ensure that the debug version is used by the tests? I've just been following https://github.com/erlang/otp/blob/master/HOWTO/TESTING.md and doing
|
I don't know if there is a way to select emulator type when running tests like that. I use the old way by releasing the tests: Then I can start whatever
|
I've just tried that, and |
You need to stand in the test_server directory. The more modern variant is using common test: (though you will still need test_server in the path).
|
What is TEST_INSTALL_PATH supposed to be? Sorry for all the questions, I generally find the test and build system rather mystifying and I'd like to figure it out for future PRs. |
|
It worked! Thanks for the help. |
That is not surprising. The debug VM does a lot of extra checks that can radically change the timing. |
as per https://github.com/erlang/otp/blob/master/HOWTO/DEVELOPMENT.md#types-and-flavors |
@RobinMorisset bumping this is case you have time to complete this PR :) thanks |
Currently these accesses are guarded by (striped) read-write locks.
With this patch, there are still (regular) locks for protecting writing threads from each other, but threads that only need to read that map (so the overwhelming majority) no longer need to use any lock. Instead we now rely on lightweight atomic synchronisation, and on the thread_progress functionality for memory reclaimation.
One ugly hack is that if we add 1 table, then remove it, the corresponding bucket may go
Once we have a filled the inline vector, we can only transition to an outlined vector, and never go back to the inline state. This saves us from an ABA hazard:
One consequence of this hack, is that we must lie to the memory counting system, pretending to have an outlined vector at all times, even when we only have the initial inlined vector. Otherwise, the tests detect the memory increase in inline (1 element) -> outlined (0 element) -> outlined (1 element), and report a memory leak.
This patch was motivated by noticing that heavyweight services can spend several % of CPU time in db_get_table_aux. A trial in production confirmed that this patch roughly halves the time spent in that function.