Skip to content

Manual Flush() or CompactRange() can stall in 7.04, 7.3.1, 8.67, 9.7  #13280

Open
@matthewvon

Description

It is possible to stall/hang the thread calling Flush() or CompactRange if min_write_buffer_number_to_merge is greater than 1. We previously used v6.20.3 which never had this problem.

Expected behavior

Calls to Flush() or CompactRange() never stall/hang calling thread when min_write_buffer_number_to_merge is greater than 1.

Actual behavior

rocksdb::DBImpl::WaitForFlushMemTables() will wait forever, or until some other activity causes a buffer omitted by PickMemtablesToFlush() to flush.

Steps to reproduce the behavior

Stack trace of the hung thread in v7.3.1 looks like this:
#6 0x00007febda17b9ad in rocksdb::DBImpl::WaitForFlushMemTables (this=0x7feb04cc9300, cfds=..., flush_memtable_ids=..., resuming_from_bg_err=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2355
#7 0x00007febda17a1d7 in rocksdb::DBImpl::FlushMemTable (this=0x7feb04cc9300, cfd=0x7febd127c140, flush_options=..., flush_reason=rocksdb::FlushReason::kManualFlush, writes_stopped=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2101
#8 0x00007febda177dfb in rocksdb::DBImpl::Flush (this=0x7feb04cc9300, flush_options=..., column_family=0x7fecfeb9b650) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:1711
#9 0x00007febd9f4d612 in rocksdb::StackableDB::Flush (this=0x7febd394eeb0, fopts=..., column_family=0x7fecfeb9b650) at /home/mmaszewski/.gradle/caches/8.11/transforms/da103af47e391520fee616b7e7f114b5/transformed/cpp-api-headers/rocksdb/utilities/stackable_db.h:344

Call sequence of background thread looks like this:
FlushMemTableToOutputFile needs_to_sync_closed_wals 1, GetLatestMemTableID 101 (~line 194 db_impl_compaction_flush.cc)
FlushMemTableToOutputFile-2 needs_to_sync_closed_wals 1, GetLatestMemTableID 102 (~line 231 db_impl_compaction_flush.cc)
break in PickMemtablesToFlush: GetID 102, max_memtable_id 101 (~line 365 in memtable_list.cc)

The background thread will not schedule a follow-up flush job because IsFlushPending() (~line 332 memtable_list.cc) sees that the num_flush_not_started_ is less than min_write_buffer_number_to_merge_.

Calling SyncWAL() immediately prior to Flush() or CompactRange() is not reliable. Works sometimes, sometimes not.

Only known workaround at this point is setting min_write_buffer_number_to_merge to 1.

I am currently looking for ways to enhance IsFlushPending() to override the comparison of num_flush_not_started_ to min_write_buffer_to_merge_ if manual flush/compact active.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions