Manual Flush() or CompactRange() can stall in 7.04, 7.3.1, 8.67, 9.7 #13280
Description
It is possible to stall/hang the thread calling Flush() or CompactRange if min_write_buffer_number_to_merge is greater than 1. We previously used v6.20.3 which never had this problem.
Expected behavior
Calls to Flush() or CompactRange() never stall/hang calling thread when min_write_buffer_number_to_merge is greater than 1.
Actual behavior
rocksdb::DBImpl::WaitForFlushMemTables() will wait forever, or until some other activity causes a buffer omitted by PickMemtablesToFlush() to flush.
Steps to reproduce the behavior
Stack trace of the hung thread in v7.3.1 looks like this:
#6 0x00007febda17b9ad in rocksdb::DBImpl::WaitForFlushMemTables (this=0x7feb04cc9300, cfds=..., flush_memtable_ids=..., resuming_from_bg_err=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2355
#7 0x00007febda17a1d7 in rocksdb::DBImpl::FlushMemTable (this=0x7feb04cc9300, cfd=0x7febd127c140, flush_options=..., flush_reason=rocksdb::FlushReason::kManualFlush, writes_stopped=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2101
#8 0x00007febda177dfb in rocksdb::DBImpl::Flush (this=0x7feb04cc9300, flush_options=..., column_family=0x7fecfeb9b650) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:1711
#9 0x00007febd9f4d612 in rocksdb::StackableDB::Flush (this=0x7febd394eeb0, fopts=..., column_family=0x7fecfeb9b650) at /home/mmaszewski/.gradle/caches/8.11/transforms/da103af47e391520fee616b7e7f114b5/transformed/cpp-api-headers/rocksdb/utilities/stackable_db.h:344
Call sequence of background thread looks like this:
FlushMemTableToOutputFile needs_to_sync_closed_wals 1, GetLatestMemTableID 101 (~line 194 db_impl_compaction_flush.cc)
FlushMemTableToOutputFile-2 needs_to_sync_closed_wals 1, GetLatestMemTableID 102 (~line 231 db_impl_compaction_flush.cc)
break in PickMemtablesToFlush: GetID 102, max_memtable_id 101 (~line 365 in memtable_list.cc)
The background thread will not schedule a follow-up flush job because IsFlushPending() (~line 332 memtable_list.cc) sees that the num_flush_not_started_ is less than min_write_buffer_number_to_merge_.
Calling SyncWAL() immediately prior to Flush() or CompactRange() is not reliable. Works sometimes, sometimes not.
Only known workaround at this point is setting min_write_buffer_number_to_merge to 1.
I am currently looking for ways to enhance IsFlushPending() to override the comparison of num_flush_not_started_ to min_write_buffer_to_merge_ if manual flush/compact active.