Skip to content

TiFlash crashes when memory limit is exceeded #10739

@raymondzk

Description

@raymondzk

Enhancement

1.tiflash version?
8.1.1

  1. Depiction
    TiFlash process crashed with a FATAL error indicating that the memory limit was exceeded.
    log:
    2131036:[2026/03/11 11:01:32.451 +08:00] [FATAL] [Exception.cpp:106] ["Code: 0, e.displayText() = DB::TiFlashException: Memory limit exceeded caused by 'RSS(Resident Set Size) much larger than limit' : process memory size would be 137.91 GiB for (attempt to allocate chunk of 2097152 bytes), limit of memory for data computing : 136.63 GiB. Memory Usage of Storage: non-query: peak=30.95 GiB, amount=1.92 MiB; kvstore: peak=904.57 MiB, amount=10.27 KiB; query-storage-task: peak=13.20 GiB, amount=12.75 GiB; fetch-pages: peak=0.00 B, amount=0.00 B; shared-column-data: peak=13.20 GiB, amount=12.75 GiB., e.what() = DB::TiFlashException, Stack trace:\n\n\n 0x1b97e0c\tDB::TiFlashException::TiFlashException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::TiFlashError const&) [tiflash+28933644]\n \tdbms/src/Common/TiFlashException.h:263\n 0x1b97120\tMemoryTracker::alloc(long, bool) [tiflash+28930336]\n \tdbms/src/Common/MemoryTracker.cpp:219\n 0x1b96cf8\tMemoryTracker::alloc(long, bool) [tiflash+28929272]\n \tdbms/src/Common/MemoryTracker.cpp:230\n 0x1ba1c0c\tAllocator<false>::alloc(unsigned long, unsigned long) [tiflash+28974092]\n \tdbms/src/Common/Allocator.cpp:68\n 0x1c084c0\tvoid DB::PODArrayBase<1ul, 4096ul, Allocator<false>, 15ul, 16ul>::alloc<>(unsigned long) [tiflash+29394112]\n \tdbms/src/Common/PODArray.h:145\n 0x7291568\tDB::ColumnString::insertRangeFrom(DB::IColumn const&, unsigned long, unsigned long) [tiflash+120132968]\n \tdbms/src/Columns/ColumnString.cpp:97\n 0x6a6a800\tDB::DM::ColumnFileInMemory::readDataForFlush() const [tiflash+111585280]\n \tdbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileInMemory.cpp:106\n 0x6a9ec44\tDB::DM::MemTableSet::buildFlushTask(DB::DM::DMContext&, unsigned long, unsigned long, unsigned long) [tiflash+111799364]\n \tdbms/src/Storages/DeltaMerge/Delta/MemTableSet.cpp:310\n 0x6a8cb90\tDB::DM::DeltaValueSpace::flush(DB::DM::DMContext&) [tiflash+111725456]\n \tdbms/src/Storages/DeltaMerge/Delta/DeltaValueSpace.cpp:365\n 0x695d65c\tDB::DM::Segment::flushCache(DB::DM::DMContext&) [tiflash+110483036]\n \tdbms/src/Storages/DeltaMerge/Segment.cpp:2279\n 0x690008c\tDB::DM::DeltaMergeStore::flushCache(std::__1::shared_ptr<DB::DM::DMContext> const&, DB::DM::RowKeyRange const&, bool) [tiflash+110100620]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:774\n 0x69028e0\tDB::DM::DeltaMergeStore::flushCache(DB::Context const&, DB::DM::RowKeyRange const&, bool) [tiflash+110110944]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:747\n 0x7bd939c\tDB::KVStore::tryFlushRegionCacheInStorage(DB::TMTContext&, DB::Region const&, std::__1::shared_ptr<DB::Logger> const&, bool) [tiflash+129864604]\n \tdbms/src/Storages/KVStore/KVStore.cpp:227\n 0x7c340c4\tDB::KVStore::forceFlushRegionDataImpl(DB::Region&, bool, DB::TMTContext&, DB::RegionTaskLock const&, unsigned long, unsigned long) const [tiflash+130236612]\n \tdbms/src/Storages/KVStore/MultiRaft/Persistence.cpp:255\n 0x7c3363c\tDB::KVStore::canFlushRegionDataImpl(std::__1::shared_ptr<DB::Region> const&, unsigned char, bool, DB::TMTContext&, DB::RegionTaskLock const&, unsigned long, unsigned long, unsigned long, unsigned long) [tiflash+130233916]\n \tdbms/src/Storages/KVStore/MultiRaft/Persistence.cpp:230\n 0x7c33dd4\tDB::KVStore::tryFlushRegionData(unsigned long, bool, bool, DB::TMTContext&, unsigned long, unsigned long, unsigned long, unsigned long) [tiflash+130235860]\n \tdbms/src/Storages/KVStore/MultiRaft/Persistence.cpp:123\n 0x7c0ea08\tTryFlushData [tiflash+130083336]\n \tdbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:161\n 0xffff9a840f9c\t_$LT$engine_store_ffi..observer..TiFlashObserver$LT$T$C$ER$GT$$u20$as$u20$raftstore..coprocessor..AdminObserver$GT$::pre_exec_admin::h2f5bf67dbdf7c90f [libtiflash_proxy.so+26152860]\n \tcontrib/tiflash-proxy/proxy_components/engine_store_ffi/src/observer.rs:120\n 0xffff9b665724\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::apply_raft_cmd::h9308910d47c3ade6 [libtiflash_proxy.so+40982308]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:1429\n 0xffff9b67aa94\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::he5587c01a9599a25 [libtiflash_proxy.so+41069204]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:1377\n 0xffff9b67cb6c\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h849a05848402ae24 [libtiflash_proxy.so+41077612]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:1129\n 0xffff9b65bce4\traftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::h915edb389d0ce878 [libtiflash_proxy.so+40942820]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:4020\n 0xffff9b65ec14\traftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::hc0f710a21a8448f8 [libtiflash_proxy.so+40954900]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:4351\n 0xffff9a91e7b8\t_$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::h474edac058d2c646 [libtiflash_proxy.so+27060152]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:4633\n 0xffff9a89b618\tbatch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::hdbfc86c50b98d3ed [libtiflash_proxy.so+26523160]\n \tcontrib/tiflash-proxy/components/batch-system/src/batch.rs:380\n 0xffff9a970444\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h209bcd90e7cc37ca [libtiflash_proxy.so+27395140]\n \t/root/.rustup/toolchains/nightly-2022-11-15-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:121\n 0xffff9a9b2284\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h159d73113cffcc67 [libtiflash_proxy.so+27665028]\n \t/root/.rustup/toolchains/nightly-2022-11-15-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:513\n 0xffff9bd9f29c\tstd::sys::unix::thread::Thread::new::thread_start::h45f22376cc6c77f8 [libtiflash_proxy.so+48558748]\n \t/root/.rustup/toolchains/nightly-2022-11-15-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/thread.rs:108\n 0xffff98e17d38\tstart_thread [libpthread.so.0+32056]\n 0xffff98c0f680\tthread_start [libc.so.6+915072]"] [source="uint8_t DB::TryFlushData(DB::EngineStoreServerWrap *, uint64_t, uint8_t, uint64_t, uint64_t, uint64_t, uint64_t)"] [thread_id=9637]

3.What did you expect to see? (Required)
TiFlash should abort the sql when the tiflash process memory limit is exceeded, instead of tiflash process crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    contributionThis PR is from a community contributor.first-time-contributorIndicates that the PR was contributed by an external member and is a first-time contributor.type/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions