[data_storage] enhance DataStorageBackend::Create interface#58
Open
wangxiyu191 wants to merge 1 commit intomainfrom
Open
[data_storage] enhance DataStorageBackend::Create interface#58wangxiyu191 wants to merge 1 commit intomainfrom
wangxiyu191 wants to merge 1 commit intomainfrom
Conversation
::Create interface
There was a problem hiding this comment.
👋 Review Summary
整体改造思路清晰:通过 CreateBlocksRequest/SpecBlockKeys/SpecCreateResult 把原来一维的 Create 接口升级为按 spec 维度的二维结构,Hf3fs/Nfs/Mooncake 以及 DataStorageManager / CacheManager 的适配也基本做到了一致、可维护。
🛡️ Key Risks & Issues
目前没有在实现中发现明显的逻辑错误或数据一致性问题:
- CacheManager::GenWriteLocation 侧构造 CreateBlocksRequest 时,block_keys 与 original_key_indices 的映射保持了一致,位置与数量在当前实现下是自洽的(kv_cache_manager/manager/cache_manager.cc:701–727)。
- DataStorageBackend::Create 返回值(vector)在 Hf3fs/Nfs/Mooncake 的实现中都遵守了“按 spec 顺序一一对应、结果长度与 block_keys 长度一致”的约定,CacheManager 也有防御性的 size 校验和失败回滚逻辑(hf3fs_backend.cc:64–108, nfs_backend.cc:49–89, mooncake_backend.cc:84–105, cache_manager.cc:751–812)。
- DataStorageManager::Create 统一在 EC_OK 的 URI 上填充 hostName,并在 metrics 中统计总 key 数,行为与旧接口相比更规范(kv_cache_manager/data_storage/data_storage_manager.cc:189–210)。
整体来看,本次改造在接口设计和数据流上是协调一致的,没有发现会直接导致“脏数据”或 URI 错配的明显缺陷。
🧪 Verification Advice
建议在后续压测 / 联调阶段重点关注:
- 多 spec + location_spec_group_names 过滤场景下,某些 key 只落到部分 spec 的组合路径,验证生成的 CreateBlocksRequest 与最终 CacheLocation 的 spec 集合是否符合预期。
- 人为注入后端 Create 层面的部分失败(某个 spec 某几个 key 返回非 EC_OK),确认 CacheManager 能正确触发回滚,并且 DataStorageManager::Delete 收到的 URI 集合与已成功创建的完全一致。
- 针对 MooncakeBackend::Create 增补单元测试或联调用例,覆盖单 spec / 多 spec / 空请求等情况,确保 key、size 参数格式与上层调用方预期一致。
💡 Thoughts & Suggestions
- 当前通过 SpecBlockKeys + original_key_indices 做 index 映射的方式较为通用,后续如果有更多 group/filter 逻辑也可以在这层扩展,但建议在接口注释中明确 block_keys.size() 与 original_key_indices.size() 必须相等的约束,以避免未来后端实现者误用。
- CacheManager::GenWriteLocation 中对 size 不匹配和 EC 非 OK 的处理已经比较防御性,但可以考虑在日志中增加 instance_id/spec_name 维度的信息,方便线上排查某个 spec 或后端行为异常的问题。
🤖 Generated by Qoder • View workflow run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
当前NFS等backend不同 spec 的 key 会混入同一文件(同size的spec合并分配导致),会让多进程同时写一个文件。为解决该问题做如下重构,将Create分批自由度完全下放给Backend。
核心重构:
新增 SpecBlockKeys、CreateBlocksRequest、SpecCreateResult 数据结构,Create 参数和返回值从一维改为按 spec 维度的二维结构
各后端(Hf3fs、Nfs、Mooncake 等)完成了新接口适配
主要改动点:
Hf3fs / NfsBackend:按 spec_.key_count_per_file() 批大小合并同 spec 内的 block 到同一文件,通过 blkid 区分,保证不同 spec 的 key 不混入同一文件
MooncakeBackend:每个 key 独立生成 URI,不做批文件合并
DataStorageManager:统一在返回 URI 上填充 hostName,供后续 Meta 存储使用
CacheManager::GenWriteLocation:重写创建逻辑,支持按 group 过滤 spec,并在任何 spec 结果异常时对所有已分配 URI 做回滚清理,避免脏数据
单元测试:全面更新,覆盖了单/多 key、单/多 spec、空请求、分批行为及 URI 拼接等场景,包含 MultipleSpecsNotMixed 新测试用例
Background
Currently, for backends such as NFS, keys with different specifications (specs) may be intermingled within the same file (a consequence of merging and allocating specs of identical size), leading to scenarios where multiple processes attempt to write to the same file simultaneously. To resolve this issue, the following refactoring has been implemented: the full discretion regarding the batching of Create operations has been completely delegated to the Backend.
Core Refactoring:
Introduced new data structures:
SpecBlockKeys,CreateBlocksRequest, andSpecCreateResult. The parameters and return values for theCreateoperation have been restructured from a one-dimensional format to a two-dimensional structure organized byspecdimensions.All backend modules (Hf3fs, Nfs, Mooncake, etc.) have been updated to adapt to the new interfaces.
Key Changes:
Hf3fs / NfsBackend: Blocks belonging to the same
specare batched and consolidated into a single file, based on the batch size defined byspec_.key_count_per_file(). Blocks are distinguished via theirblkid, ensuring that keys from differentspecsare never intermingled within the same file.MooncakeBackend: A unique URI is generated independently for each individual key; no batch file consolidation is performed.
DataStorageManager: Standardized the process of populating the
hostNamefield within returned URIs, facilitating subsequent usage by the Meta storage layer.CacheManager::GenWriteLocation: The creation logic has been rewritten to support filtering
specsbygroup. Additionally, if an error occurs during the processing of anyspec, a rollback and cleanup operation is triggered to revoke all previously allocated URIs, thereby preventing the generation of "dirty data."Unit Tests: Comprehensively updated to cover various scenarios, including single/multiple keys, single/multiple
specs, empty requests, batching behaviors, and URI construction. This update includes a new test case specifically designed to verify thatMultipleSpecsNotMixed.