Optimise accesses to the Table name -> Table id map #7118

RobinMorisset · 2023-04-13T14:40:20Z

Currently these accesses are guarded by (striped) read-write locks.

With this patch, there are still (regular) locks for protecting writing threads from each other, but threads that only need to read that map (so the overwhelming majority) no longer need to use any lock. Instead we now rely on lightweight atomic synchronisation, and on the thread_progress functionality for memory reclaimation.

One ugly hack is that if we add 1 table, then remove it, the corresponding bucket may go

inline, empty
inline, with a value
outlined vector without a value

Once we have a filled the inline vector, we can only transition to an outlined vector, and never go back to the inline state. This saves us from an ABA hazard:

Reader sees inline vector of size 1
Reader reads the name NameA that corresponds to table IdA, sees it matches what it looked for
Writer deletes (or rename the table), moving us to an empty vector
Writer inserts another table in that bucket, with a different name: NameB -> IdB
Reader reads IdB, concluding that NameA -> IdB. As long as we stick to outline vectors and always allocate a fresh one when deleting tables, we don't have that risk, as the vectors only get freed (and thus the adress may get reused by another vector) once all readers have made thread progress (and thus are no longer in the middle of an operation).

One consequence of this hack, is that we must lie to the memory counting system, pretending to have an outlined vector at all times, even when we only have the initial inlined vector. Otherwise, the tests detect the memory increase in inline (1 element) -> outlined (0 element) -> outlined (1 element), and report a memory leak.

This patch was motivated by noticing that heavyweight services can spend several % of CPU time in db_get_table_aux. A trial in production confirmed that this patch roughly halves the time spent in that function.

github-actions · 2023-04-13T14:41:22Z

CT Test Results

4 files 225 suites 1h 26m 33s ⏱️
3 474 tests 3 380 ✅ 94 💤 0 ❌
4 901 runs 4 782 ✅ 119 💤 0 ❌

Results for commit 5d58cad.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

RobinMorisset · 2023-09-26T09:10:05Z

ping @sverker

sverker · 2023-10-04T18:00:12Z

ping @sverker

I'm sorry. I will try to get time to look at this again. I think it demands a proper review. Lock-less stuff are tricky.

sverker · 2023-10-10T10:45:16Z

erts/emulator/beam/erl_db.c

-	db_lock(tb, kind);
-        if (name_lck)
-            erts_rwmtx_runlock(name_lck);
+        db_lock(tb, kind);


I think here in db_get_table_aux() when the table is locked we should double check the name of the table to be the expected (if we did lookup by name) and bail out with badarg if not. That would probably mean we got raced by a rename operation.
This will also make the ABA problem go away. And with this I think we could just clear an inline_entry in remove_named_tab() without having to allocate a new empty one. Remove in an outline bucket with more than one entry is still more complicated.

I've tried getting this to work and I'm hitting a small issue.
The lock is not always taken in db_get_table_aux(): if what == DB_READ_TBL_STRUCT, then there is an early return before the lock is taken. The double-check might still work, but I'm struggling to figure out how to make it safe, and ensure that the table does not change under us. Apart from that I've got a patch with the relevant changes (removing the hack for size, and doing an atomic write of 0 to the inline_entry size when removing the only table in it (rather than allocating an empty outlined entry).

You should double check the name after locking the table. Otherwise the table op may be seen as happening after the table was renamed.

ets:rename(aaa, bbb), ets:insert(bbb, {key, bbb})

A concurrent process doing ets:lookup(aaa, key) should never get {key,bbb} if it never existed in the tabled when it was named aaa.

For DB_READ_TBL_STRUCT it's enough to do the check without lock. Just to make sure the table got the correct name now. DB_READ_TBL_STRUCT is only used by yielding ets:insert as a first cheap lookup of the table. It will lock the table later and then also double check the name. See comment in function ets_insert_2_list_lock_tbl.

Thanks for the explanation, I added it to the comment (and moved the double-check to after taking the lock).

lpgauth · 2024-01-19T18:00:37Z

ping @RobinMorisset

Would love to see this merged :)

RobinMorisset · 2024-01-23T10:21:36Z

oops, sorry for forgetting about that. I'm a bit busy this week, but I'll try to get this ready for merging next week.

lpgauth · 2024-02-08T13:52:46Z

@RobinMorisset would be nice to have in OTP 27 :)

RobinMorisset · 2024-02-20T10:57:39Z

This new version passes tests, so it should not be completely broken, but I've put the double-checking before the locking because the lock is not always taken (see my comment), so I'm not very confident yet that this last change is correct.

Currently these accesses are guarded by (striped) read-write locks. With this patch, there are still (regular) locks for protecting writing threads from each other, but threads that only need to read that map (so the overwhelming majority) no longer need to use any lock. Instead we now rely on lightweight atomic synchronisation, and on the thread_progress functionality for memory reclaimation. One ugly hack is that if we add 1 table, then remove it, the corresponding bucket may go - inline, empty - inline, with a value - outlined vector without a value Once we have a filled the inline vector, we can only transition to an outlined vector, and never go back to the inline state. This saves us from an ABA hazard: - Reader sees inline vector of size 1 - Reader reads the name NameA that corresponds to table IdA, sees it matches what it looked for - Writer deletes (or rename the table), moving us to an empty vector - Writer inserts another table in that bucket, with a different name: NameB -> IdB - Reader reads IdB, concluding that NameA -> IdB. As long as we stick to outline vectors and always allocate a fresh one when deleting tables, we don't have that risk, as the vectors only get freed (and thus the adress may get reused by another vector) once all readers have made thread progress (and thus are no longer in the middle of an operation). One consequence of this hack, is that we must lie to the memory counting system, pretending to have an outlined vector at all times, even when we only have the initial inlined vector. Otherwise, the tests detect the memory increase in inline (1 element) -> outlined (0 element) -> outlined (1 element), and report a memory leak. This patch was motivated by noticing that heavyweight services can spend several % of CPU time in db_get_table_aux. A trial in production confirmed that this patch roughly halves the time spent in that function.

…), and avoid the hack where we make empty outline vectors.

sverker · 2024-02-22T19:58:56Z

erts/emulator/beam/erl_db.c

+void schedule_meta_name_tab_entries_for_deletion(struct meta_name_tab_entries *entries)
+{
+    char* ptr = (char *) entries;
+    ptr -= sizeof(ErtsThrPrgrLaterOp);


Instead of using raw char pointer arithmetic with sizeof(ErtsThrPrgrLaterOp)

I think you should introduce a struct meta_name_tab_entries_allocated, or whatever you want to call it.

struct meta_name_tab_entries_allocated { ErtsThrPrgrLaterOp later_op; struct meta_name_tab_entries data; };

and then use macro ErtsContainerStruct(entries, struct meta_name_tab_entries_allocated, data) to find the start of it.

and entries = &allocated->data to go the other way.

Thanks for the suggestion, I did not know about ErtsContainerStruct and it is indeed much cleaner.

sverker · 2024-02-22T20:07:33Z

erts/emulator/beam/erl_db.c

        if (what == DB_READ_TBL_STRUCT) {
-            if (name_lck)
-                erts_rwmtx_runlock(name_lck);
            return tb;
        }


I think you must check the name for DB_READ_TBL_STRUCT as well. Otherwise we may start to operate on a completely random wrong table which feels a bit scary. At least I think it can result in incorrect errors if the random table has a different keypos for example.

sverker · 2024-02-26T14:44:00Z

erts/emulator/beam/erl_db.c

+            if (ERTS_UNLIKELY(tb->common.the_name != id)) {
+                *freason_p = BADARG | EXF_HAS_EXT_INFO;
+                p->fvalue = EXI_ID;
+                tb = NULL;
+            }


This if block should end with return NULL, otherwise it will crash down where it checks tb->common.status.

I assume that means this code has never been executed. It's quite difficult to provoke the necessary race condition, but I think we should at least try with an evil test case that concurrently loops around both ets:rename and maybe ets:lookup and ets:insert. Instrumenting the code with a strategic erts_thr_yield() could even make it practically possible for the test case to provoke the race.

if (entries->data[i].name_atom == id) { erts_thr_yield(); tb = entries->data[i].tb; break; }

RobinMorisset · 2024-02-27T10:04:28Z

I've checked and the new test catches that bug (triggers the segfault, which disappears with the fix). Thanks again for the careful review!

sverker · 2024-02-27T10:57:31Z

Did you get the test case to trigger the race as it is, without erts_thr_yield and with only 1000 iterations?

RobinMorisset · 2024-02-27T11:06:32Z

I got it to trigger the race as it is, but I had 1M iterations at the time. I reduced the number of iterations because it takes forever to run otherwise now that the race is fixed.

RobinMorisset · 2024-02-27T11:14:38Z

Testing some more, 1k iterations does not trigger the bug, but 10k iterations is enough to trigger it reliably (2 out of 2 runs).
But once I fix the bug, 10k iterations takes forever (to the point I'm wondering whether it is triggering some kind of deadlock or livelock or something).

RobinMorisset · 2024-02-27T11:34:50Z

I found the issue: I was missing a db_unlock in the double-check branch!
The test now runs fine (and very fast) with the fix, and immediately blows up if I re-introduce the segfault in that branch, confirming that it is being tested.

sverker · 2024-02-27T11:38:51Z

You can also use undocumented spawn_opt option {scheduler,N} to make sure writer and reader are spawned on different schedulers.

sverker · 2024-02-27T11:42:38Z

You should always also run debug built emulator when developing. That will detect the missing unlock.

RobinMorisset · 2024-02-27T11:59:56Z

You should always also run debug built emulator when developing. That will detect the missing unlock.

How do you ensure that the debug version is used by the tests? I've just been following https://github.com/erlang/otp/blob/master/HOWTO/TESTING.md and doing

make stdlib_test ARGS="-suite ets_SUITE -case racy_rename"

sverker · 2024-02-27T12:56:51Z

I don't know if there is a way to select emulator type when running tests like that.

I use the old way by releasing the tests:
https://github.com/erlang/otp/blob/master/HOWTO/TESTING.md#releasing-tests

Then I can start whatever erl I want and run tests with ts:run.

$ERL_TOP/bin/cerl -debug
1> ts:install().
2> ts:run(stdlib,ets_SUITE,racy_rename,[batch]).

RobinMorisset · 2024-02-27T14:15:25Z

I've just tried that, and ts:install() fails with ** exception error: undefined function ts:install/0.
Is there an additional command I should run to bring ts in scope?

dgud · 2024-02-27T14:21:49Z

You need to stand in the test_server directory.

The more modern variant is using common test: (though you will still need test_server in the path).

cd  TEST_INSTALL_PATH/test/app/
erl -pa /TEST_INSTALL_PATH/test/test_server -sname foo EXTRA_ARGS
> ct:run_test([{suite, ets_SUITE}, {testcase,racy_rename}]).

RobinMorisset · 2024-02-27T14:41:51Z

What is TEST_INSTALL_PATH supposed to be?
I tried doing $ERL_TOP/bin/cerl -debug -pa ../../tests/test_server in $ERL_TOP/release/tests/stdlib_test and then calling ct:run_test([{suite, ets_SUITE}, {testcase,racy_rename}]). in the resulting shell, but it gave me an error message "Failed to start CTH, see the CT Log for details" (with no information on how to find that log) and "*** FAILED {ets_SUITE,init_per_suite} ***".
I also tried ts:install(), and it apparently found it, but it gave another error message: {error,no_configure_script}

Sorry for all the questions, I generally find the test and build system rather mystifying and I'd like to figure it out for future PRs.

dgud · 2024-02-27T15:26:19Z

-pa ../../tests/test_server This needs to be an absolute path for a to me unknown reason.

RobinMorisset · 2024-02-27T16:08:41Z

It worked! Thanks for the help.
Unfortunately, the race that reproduces every few thousand iterations in release mode stubbornly refuses to reproduce in debug mode, even with 1M iterations. Still it is good to know how to run tests in debug mode.

sverker · 2024-02-27T17:30:50Z

It worked! Thanks for the help. Unfortunately, the race that reproduces every few thousand iterations in release mode stubbornly refuses to reproduce in debug mode, even with 1M iterations. Still, it is good to know how to run tests in debug mode.

That is not surprising. The debug VM does a lot of extra checks that can radically change the timing.

garazdawi · 2024-02-27T18:44:17Z

make TYPE=debug
make stdlib_test TYPE=debug ARGS="-suite ets_SUITE -case racy_rename"

as per https://github.com/erlang/otp/blob/master/HOWTO/DEVELOPMENT.md#types-and-flavors

lpgauth · 2024-07-12T14:53:44Z

@RobinMorisset bumping this is case you have time to complete this PR :) thanks

rickard-green added the team:VM Assigned to OTP team VM label Apr 17, 2023

rickard-green assigned sverker Apr 17, 2023

sverker reviewed Oct 10, 2023

View reviewed changes

sverker added the waiting waiting for changes/input from author label Oct 30, 2023

RobinMorisset force-pushed the ets_table_names_upstream branch from 9047110 to 0f96524 Compare February 20, 2024 10:54

RobinMorisset and others added 2 commits February 22, 2024 04:28

Apply Sverker's suggestion to do double-checking in db_get_table_aux(…

da8337a

…), and avoid the hack where we make empty outline vectors.

RobinMorisset force-pushed the ets_table_names_upstream branch from 6f1dfd3 to da8337a Compare February 22, 2024 15:40

sverker reviewed Feb 22, 2024

View reviewed changes

RobinMorisset added 2 commits February 23, 2024 03:20

Add name double-checking to the path that does not take the lock

7e66a51

Use ErtsContainerStruct instead of raw pointer arithmetic

432f6e1

sverker reviewed Feb 26, 2024

View reviewed changes

RobinMorisset added 2 commits February 26, 2024 08:12

Fix bug in the branch that only executes in races

f8374af

Add racy_rename test to ets_SUITE

355a4e4

Fix missing db_unlock, and increase WriterIterations to 10k

5d58cad

Optimise accesses to the Table name -> Table id map #7118

Are you sure you want to change the base?

Optimise accesses to the Table name -> Table id map #7118

Uh oh!

Conversation

RobinMorisset commented Apr 13, 2023

Uh oh!

github-actions bot commented Apr 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CT Test Results

Artifacts

Uh oh!

RobinMorisset commented Sep 26, 2023

Uh oh!

sverker commented Oct 4, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lpgauth commented Jan 19, 2024

Uh oh!

RobinMorisset commented Jan 23, 2024

Uh oh!

lpgauth commented Feb 8, 2024

Uh oh!

RobinMorisset commented Feb 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

sverker commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

sverker commented Feb 27, 2024

Uh oh!

sverker commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

sverker commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

dgud commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

dgud commented Feb 27, 2024

Uh oh!

RobinMorisset commented Feb 27, 2024

Uh oh!

sverker commented Feb 27, 2024

Uh oh!

garazdawi commented Feb 27, 2024

Uh oh!

lpgauth commented Jul 12, 2024

Uh oh!

Uh oh!

github-actions bot commented Apr 13, 2023 •

edited

Loading