Skip to content

Optimise accesses to the Table name -> Table id map #7118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
67 changes: 38 additions & 29 deletions erts/emulator/beam/erl_db.c
Original file line number Diff line number Diff line change
Expand Up @@ -837,6 +837,28 @@ DbTable* db_get_table_aux(Process *p,
DB_HASH_ADAPT_NUMBER_OF_LOCKS(tb);
db_lock(tb, kind);

if (is_atom(id)) {
// We are re-checking the name <-> id mapping.
// This protects from an ABA issue:
// - Reader sees inline vector of size 1
// - Reader reads the name NameA that corresponds to table IdA, sees it matches what it looked for
// - Writer deletes (or rename the table), moving us to an empty vector
// - Writer inserts another table in that bucket, with a different name: NameB -> IdB
// - Reader reads IdB, concluding that NameA -> IdB.
// In that case, we will correctly fail with badarg, since IdB points to a table with the_name = NameB != NameA.
// This must be done while holding the lock, to deal with the following case:
// - Writer does ets:rename(a, b), ets:insert(b, {key, value})
// - Reader does ets:lookup(a, key)
// We want to make sure that the reader does not get {key, value}, when it was not ever in a table called "a".
// This check is skipped in the case where what == DB_READ_TBL_STRUCT, but that is fine because this is only used in the insert operation
// which later does its own double-check after taking the lock (see code and comment in ets_insert_2_list_lock_tbl).
if (ERTS_UNLIKELY(tb->common.the_name != id)) {
*freason_p = BADARG | EXF_HAS_EXT_INFO;
p->fvalue = EXI_ID;
tb = NULL;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if block should end with return NULL, otherwise it will crash down where it checks tb->common.status.

I assume that means this code has never been executed. It's quite difficult to provoke the necessary race condition, but I think we should at least try with an evil test case that concurrently loops around both ets:rename and maybe ets:lookup and ets:insert. Instrumenting the code with a strategic erts_thr_yield() could even make it practically possible for the test case to provoke the race.

if (entries->data[i].name_atom == id) {
    erts_thr_yield();
    tb = entries->data[i].tb;
    break;
}

}

#ifdef ETS_DBG_FORCE_TRAP
/*
* The ets_SUITE uses this to verify that all table lookups calls
Expand Down Expand Up @@ -993,13 +1015,6 @@ static int insert_named_tab(Eterm name_atom, DbTable* tb, int have_lock)

if (entries != &bucket->inline_entry) {
schedule_meta_name_tab_entries_for_deletion(entries);
} else {
// This is a hack: we pretend to use an out-of-line vector even when we have an inline entry
// Otherwise the tests can detect the (bounded) memory increase caused by
// empty inline entry -> outline vector -> empty outline vector
// and mistake it for a memory leak...
int alloc_size = alloc_size_meta_name_tab_entries(META_NAME_TAB_ENTRIES_MIN_CAPACITY);
ERTS_ETS_MISC_MEM_ADD(-alloc_size);
}
}
ret = 1; /* Ok */
Expand Down Expand Up @@ -1046,22 +1061,23 @@ static int remove_named_tab(DbTable *tb, int have_lock)
goto done;
}

// Trying to remove an entry from the vector without reallocation
// while reader threads concurrently access it would be a nightmare.
// So we just move things to a new vector instead
new_entries = alloc_meta_name_tab_entries(size - 1);
memcpy(&new_entries->data[0], &entries->data[0], index * sizeof(struct meta_name_tab_entry));
memcpy(&new_entries->data[index], &entries->data[index+1], (size - index - 1) * sizeof(struct meta_name_tab_entry));
erts_atomic_set_wb(&bucket->entries, (erts_aint_t) new_entries);
if (entries != &bucket->inline_entry) {
schedule_meta_name_tab_entries_for_deletion(entries);
if (entries == &bucket->inline_entry) {
// We don't need to do anything to the actual entry: if a reader thread sees size=0 it won't look further.
// And if it saw size = 1, then it is as if it had fully run before us.
erts_atomic32_set_wb(&bucket->inline_entry.size, (erts_aint_t) 0);
} else {
// This is a hack: we pretend to use an out-of-line vector even when we have an inline entry
// Otherwise the tests can detect the (bounded) memory increase caused by
// empty inline entry -> outline vector -> empty outline vector
// and mistake it for a memory leak...
int alloc_size = alloc_size_meta_name_tab_entries(META_NAME_TAB_ENTRIES_MIN_CAPACITY);
ERTS_ETS_MISC_MEM_ADD(-alloc_size);
// Trying to remove an entry from the vector without reallocation
// while reader threads concurrently access it would be a nightmare.
// So we just move things to a new vector instead
if (size - 1 > 1) {
new_entries = alloc_meta_name_tab_entries(size - 1);
} else {
new_entries = &bucket->inline_entry;
}
memcpy(&new_entries->data[0], &entries->data[0], index * sizeof(struct meta_name_tab_entry));
memcpy(&new_entries->data[index], &entries->data[index+1], (size - index - 1) * sizeof(struct meta_name_tab_entry));
erts_atomic_set_wb(&bucket->entries, (erts_aint_t) new_entries);
schedule_meta_name_tab_entries_for_deletion(entries);
}

ret = 1; /* Ok */
Expand Down Expand Up @@ -4748,13 +4764,6 @@ void init_db(ErtsDbSpinCount db_spin_count)
ERTS_ETS_MISC_MEM_ADD(size);

for (i=0; i<=meta_name_tab_mask; i++) {
// This is a hack: we pretend to use an out-of-line vector even though we have an inline entry
// Otherwise the tests can detect the (bounded) memory increase caused by
// empty inline entry -> outline vector -> empty outline vector
// and mistake it for a memory leak...
int alloc_size = alloc_size_meta_name_tab_entries(META_NAME_TAB_ENTRIES_MIN_CAPACITY);
ERTS_ETS_MISC_MEM_ADD(alloc_size);

erts_atomic_init_nob(&meta_name_tab[i].entries, (erts_aint_t) &meta_name_tab[i].inline_entry);
meta_name_tab[i].inline_entry.capacity = 1;
erts_atomic32_init_nob(&meta_name_tab[i].inline_entry.size, 0);
Expand Down