Skip to content

[manager] write_location_manager callback failed#71

Merged
MMeecatfish merged 1 commit intomainfrom
fix/write_location_manager_callback_failed
Mar 24, 2026
Merged

[manager] write_location_manager callback failed#71
MMeecatfish merged 1 commit intomainfrom
fix/write_location_manager_callback_failed

Conversation

@MMeecatfish
Copy link
Collaborator

No description provided.

Copy link

@qoderai qoderai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Review Summary

This PR improves how write timeouts are handled by propagating WriteLocationInfo through the callback, wiring it into CacheManager::FinishWriteCache, and tightening the tests around expiration and cleanup paths. The changes overall move the write timeout behavior in a safer and more observable direction.

🛡️ Key Risks & Issues

  • Write timeout callbacks in CacheManager::StartWriteCache now call FinishWriteCache with a non-null WriteLocationInfo instead of relying on GetAndDelete. This fixes the prior issue where expired sessions could not be finished, but it’s a behavior change: timeout now actively processes all blocks with succeed_block = 0. It would be good to double-check this aligns with upper-layer expectations for how timed-out writes should be reconciled and reclaimed, especially if callers previously assumed timeouts did not trigger delete work.
  • The new callback signature in WriteLocationManager looks structurally sound and improves ownership clarity by passing a unique_ptr<WriteLocationInfo> out of the internal map after erasure. I did not see new concurrency or lifetime hazards introduced by this pattern; the callbacks are invoked after the map unlocks, with exclusive ownership of the metadata.
  • In WriteLocationManagerTest.MultiThreadTest, the callback currently captures keys and ids by value while also passing them by std::move into Put, and it calls GetAndDelete("session_1", dummy) for every worker. This makes the assertions rely on unspecified evaluation order and only exercises a single hard-coded session in the cleanup call. I’ve left an inline comment with concrete suggestions to make this test deterministic and properly per-session.

🧪 Verification Advice

  • Add or extend a test that explicitly covers the timeout callback path in CacheManager::StartWriteCacheFinishWriteCache, ensuring that when the session expires and the callback fires, the passed WriteLocationInfo is used correctly and the reclaimer work is scheduled as intended.
  • Consider adding a scenario that uses a non-zero success mask at timeout (e.g., some blocks already written) to validate that FinishWriteCache’s block-level accounting and subsequent reclaim behavior remain correct under partial success.
  • If feasible in the existing test harness, add a small test or assertion around the new logging branches in ReclaimerTaskSupervisor::Submit to confirm that failures from schedule_plan_executor_->Submit are observable in logs during negative-path testing.

💡 Thoughts & Suggestions

  • The move to passing WriteLocationInfo via unique_ptr is a nice step toward making the ownership model explicit and avoiding double-consumption of session data. Once the MultiThreadTest lambda is cleaned up, these tests should give you a solid base for future evolutions of the expiration/cleanup mechanism.
  • For future maintenance, it might be worth centralizing the logic that decides what success_block_mask should be in timeout vs. normal completion paths, so the semantics stay consistent if additional write states or metrics are introduced later.

🤖 Generated by QoderView workflow run

Comment on lines +71 to +75
this->manager_.Put(session_id,
std::move(keys),
std::move(ids),
2,
[this, keys, ids, session_id](WriteLocationInfoPtr info) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MultiThreadTest, the lambda captures keys and ids by value while the same variables are passed as std::move(keys) / std::move(ids) into Put on lines 71–74. Because function argument evaluation order is unspecified, some compilers may evaluate the captures after the move, so the captured keys/ids end up empty and the ASSERT_EQ(keys, info->keys) / ASSERT_EQ(ids, info->location_ids) checks rely on unspecified behavior. It would be safer to either avoid std::move here or build separate vectors solely for the assertions.

Also, inside the same callback you call GetAndDelete("session_1", dummy) regardless of which session_id was used for this worker. That means this test is not actually validating per-session cleanup for the current worker and could hide issues with how different sessions are managed. Using the session_id parameter instead of a hard-coded "session_1" would better exercise the intended multi-session behavior.


🤖 Generated by QoderFix in Qoder

@github-actions github-actions bot added the ai reviewed AI has reviewed this PR label Mar 24, 2026
@MMeecatfish MMeecatfish merged commit aedeb8b into main Mar 24, 2026
9 of 10 checks passed
@MMeecatfish MMeecatfish deleted the fix/write_location_manager_callback_failed branch March 24, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai reviewed AI has reviewed this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant