[Store] Prefer LOCAL_DISK replica when both LOCAL_DISK and DISK types exist by ertcmm · Pull Request #1963 · kvcache-ai/Mooncake

ertcmm · 2026-04-23T13:03:36Z

Description

This PR fixes a data retrieval failure and routing bug in RealClient during SSD Offloading when an object simultaneously possesses both LOCAL_DISK and DISK replicas. It ensures proper data retrieval functionality when LOCAL_DISK and DISK mutually coexist after an object's memory replica has been evicted.

The Issue:

Ignored Priority System (Blind Indexing): In RealClient::batch_get_into_internal (and identically across batch_get_buffer_internal), the fetch dispatcher skipped GetPreferredReplica() entirely and naively locked onto query_result_values.replicas[0].
The system was coerced to bypass the dedicated high-efficiency SSD offline proxy (batch_get_into_offload_object_internal) and push the LOCAL_DISK item into the standard Client::BatchGet pipeline mappings. Standard backend transfers failed to parse or resolve LocalDiskDescriptor schemas concurrently, resulting in broken routes and failures.

Fix

Removed Hardcoded Indexing (Blind Indexing): Rewrote RealClient fetch pipelines to strictly call client_->GetPreferredReplica() uniformly across all batch operations instead of blindly seizing replicas[0].
Removed Restrictive Routing Constraints: Completely eliminated the replicas.size() == 1 precondition. The path dispatcher now seamlessly routes data internally based purely on whether the optimal replica evaluates to .is_local_disk_replica().
Secured Target Fetch Accuracy (Key Operations): Updated KeyOp properties to meticulously cache and utilize the specifically chosen preferred_replica. This guarantees that configurations with [LOCAL_DISK, DISK] naturally and safely detour into the batch_get_into_offload_object_internal channel, averting parsing collisions.

Module

Type of Change

How Has This Been Tested?

The modifications were verified by forcing local memory eviction policies triggering simultaneous LOCAL_DISK and global DISK entries bound to target objects.

Assured the RealClient arrays reliably generated payload size arrays via combinations mimicking [LOCAL_DISK, DISK].
Validated that GetPreferredReplica consistently selects the .is_local_disk_replica().
Successfully validated that eliminating the size == 1 check correctly delegates these payloads back natively into batch_get_into_offload_object_internal, securing safe operations away from legacy paths.

Checklist

I have performed a self-review of my own code.
I have formatted my own code using ./scripts/code_format.sh before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

…xist

gemini-code-assist

Code Review

This pull request introduces support for local disk replicas within the mooncake-store. Key changes include the addition of a completion status check for replicas and a revised selection strategy in GetPreferredReplica that prioritizes local memory, remote memory, local disk, and global disk in descending order. The RealClient has been updated to handle local disk offloading for single and batch retrieval operations. Review feedback identifies several critical issues: the local disk offload path currently lacks support for multi-slice objects, which could lead to data loss, and the use of key-based maps for tracking operations fails to account for duplicate keys in input vectors, potentially leaving buffers uninitialized. Additionally, an optimization was suggested to improve the efficiency of processing batch results from multiple storage nodes.

gemini-code-assist · 2026-04-23T13:07:06Z

+    if (replica.is_local_disk_replica()) {
+        std::unordered_map<std::string, Slice> slices_map;
+        slices_map.emplace(key, slices.at(0));


The LOCAL_DISK offload path currently only supports a single slice per key. If the object size exceeds kMaxSliceSize, allocateSlices will produce multiple slices, but only the first one is fetched here, leading to incomplete data retrieval. A check should be added to ensure slices.size() == 1 before proceeding.

if (replica.is_local_disk_replica()) { if (slices.size() != 1) { LOG(ERROR) << "Local disk offload currently only supports 1 slice per key, given: " << slices.size() << " for key: " << key; return nullptr; } std::unordered_map<std::string, Slice> slices_map; slices_map.emplace(key, slices.at(0));

gemini-code-assist · 2026-04-23T13:07:06Z

+        for (const auto &op : valid_local_disk_ops) {
+            const auto &replica = op.preferred_replica;
+            auto [it, _] = offload_objects.try_emplace(
+                replica.get_local_disk_descriptor().transport_endpoint);
+            it->second.emplace(op.key, op.slices.at(0));
+        }


There are two issues here:

Similar to get_buffer_internal, this path only fetches the first slice (slices.at(0)). If an object is split into multiple slices, data will be lost. A check for op.slices.size() == 1 is needed.

If the input keys vector contains duplicate entries, it->second.emplace will only store the first occurrence. Consequently, only the buffer for the first occurrence will be filled, while subsequent occurrences will remain uninitialized but still be marked as success in the result processing loop (lines 2195-2201).

gemini-code-assist · 2026-04-23T13:07:06Z

+        if (replica.is_local_disk_replica()) {
            valid_local_disk_operations.emplace(


valid_local_disk_operations is a std::unordered_map, which means if the input keys vector contains duplicate entries, only the first occurrence will be recorded and fetched. However, results[i] is set to success for all occurrences (lines 3310 and 3323), leading to uninitialized buffers for duplicate keys. Positional tracking (e.g., using indices) should be used instead of a key-based map for destination buffers.

gemini-code-assist · 2026-04-23T13:07:06Z

+            valid_local_disk_operations.emplace(
+                key,


Similar to batch_get_into_internal, using a map for valid_local_disk_operations causes issues when duplicate keys are present in the input. Only the first occurrence is fetched, but all are marked as success in the results vector.

gemini-code-assist · 2026-04-23T13:07:06Z

+                for (auto &op : valid_local_disk_ops) {
+                    if (offload_objects_it.second.count(op.key)) {
+                        final_results[op.original_index] =
+                            std::make_shared<BufferHandle>(
+                                std::move(*op.buffer_handle));
+                    }
+                }


This nested loop has $O(N_{endpoints} \times N_{ops})$ complexity. For large batches and multiple storage nodes, this can be inefficient. Consider grouping valid_local_disk_ops by endpoint during the initial pass or using a mapping from key to operations to speed up result processing.

feat: Prefer LOCAL_DISK replica when both LOCAL_DISK and DISK types e…

b75e854

…xist

ertcmm requested review from XucSh, YiXR, stmatengss and ykwd as code owners April 23, 2026 13:03

github-actions Bot added run-ci Store labels Apr 23, 2026

ertcmm changed the title ~~feat: Prefer LOCAL_DISK replica when both LOCAL_DISK and DISK types exist~~ [Store] Prefer LOCAL_DISK replica when both LOCAL_DISK and DISK types exist Apr 23, 2026

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Store] Prefer LOCAL_DISK replica when both LOCAL_DISK and DISK types exist#1963

[Store] Prefer LOCAL_DISK replica when both LOCAL_DISK and DISK types exist#1963
ertcmm wants to merge 1 commit into
kvcache-ai:mainfrom
ertcmm:feat-localdisk_first

ertcmm commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (replica.is_local_disk_replica()) {
		valid_local_disk_operations.emplace(

Conversation

ertcmm commented Apr 23, 2026

Description

Module

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant