[manager] write_location_manager callback failed#71
Conversation
There was a problem hiding this comment.
👋 Review Summary
This PR improves how write timeouts are handled by propagating WriteLocationInfo through the callback, wiring it into CacheManager::FinishWriteCache, and tightening the tests around expiration and cleanup paths. The changes overall move the write timeout behavior in a safer and more observable direction.
🛡️ Key Risks & Issues
- Write timeout callbacks in
CacheManager::StartWriteCachenow callFinishWriteCachewith a non-nullWriteLocationInfoinstead of relying onGetAndDelete. This fixes the prior issue where expired sessions could not be finished, but it’s a behavior change: timeout now actively processes all blocks withsucceed_block = 0. It would be good to double-check this aligns with upper-layer expectations for how timed-out writes should be reconciled and reclaimed, especially if callers previously assumed timeouts did not trigger delete work. - The new callback signature in
WriteLocationManagerlooks structurally sound and improves ownership clarity by passing aunique_ptr<WriteLocationInfo>out of the internal map after erasure. I did not see new concurrency or lifetime hazards introduced by this pattern; the callbacks are invoked after the map unlocks, with exclusive ownership of the metadata. - In
WriteLocationManagerTest.MultiThreadTest, the callback currently captureskeysandidsby value while also passing them bystd::moveintoPut, and it callsGetAndDelete("session_1", dummy)for every worker. This makes the assertions rely on unspecified evaluation order and only exercises a single hard-coded session in the cleanup call. I’ve left an inline comment with concrete suggestions to make this test deterministic and properly per-session.
🧪 Verification Advice
- Add or extend a test that explicitly covers the timeout callback path in
CacheManager::StartWriteCache→FinishWriteCache, ensuring that when the session expires and the callback fires, the passedWriteLocationInfois used correctly and the reclaimer work is scheduled as intended. - Consider adding a scenario that uses a non-zero success mask at timeout (e.g., some blocks already written) to validate that
FinishWriteCache’s block-level accounting and subsequent reclaim behavior remain correct under partial success. - If feasible in the existing test harness, add a small test or assertion around the new logging branches in
ReclaimerTaskSupervisor::Submitto confirm that failures fromschedule_plan_executor_->Submitare observable in logs during negative-path testing.
💡 Thoughts & Suggestions
- The move to passing
WriteLocationInfoviaunique_ptris a nice step toward making the ownership model explicit and avoiding double-consumption of session data. Once theMultiThreadTestlambda is cleaned up, these tests should give you a solid base for future evolutions of the expiration/cleanup mechanism. - For future maintenance, it might be worth centralizing the logic that decides what
success_block_maskshould be in timeout vs. normal completion paths, so the semantics stay consistent if additional write states or metrics are introduced later.
🤖 Generated by Qoder • View workflow run
| this->manager_.Put(session_id, | ||
| std::move(keys), | ||
| std::move(ids), | ||
| 2, | ||
| [this, keys, ids, session_id](WriteLocationInfoPtr info) { |
There was a problem hiding this comment.
In MultiThreadTest, the lambda captures keys and ids by value while the same variables are passed as std::move(keys) / std::move(ids) into Put on lines 71–74. Because function argument evaluation order is unspecified, some compilers may evaluate the captures after the move, so the captured keys/ids end up empty and the ASSERT_EQ(keys, info->keys) / ASSERT_EQ(ids, info->location_ids) checks rely on unspecified behavior. It would be safer to either avoid std::move here or build separate vectors solely for the assertions.
Also, inside the same callback you call GetAndDelete("session_1", dummy) regardless of which session_id was used for this worker. That means this test is not actually validating per-session cleanup for the current worker and could hide issues with how different sessions are managed. Using the session_id parameter instead of a hard-coded "session_1" would better exercise the intended multi-session behavior.
🤖 Generated by Qoder • Fix in Qoder
No description provided.