Open
Description
Search before continuing 先搜索,再继续
- I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。
Description 描述
Currently, the dj_ckpt_manager and executor only support the HF dataset. They essentially performs three actions:
- Tracks and saves the executed operation list from OP_1 to OP_i.
- Saves the processed dataset ( D_{op_i} ).
- Checks and loads ( D_{op_i} ) when the feature is enabled during re-processing.
It would be straightforward to extend this feature into ray_executor. For step 2 and 3, we can implement a few new interfaces for snapshotting Ray Data states and using persistent storage.
Use case 使用场景
No response
Additional 额外信息
No response
Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?
- Yes I'd like to help by submitting a PR! 是的!我愿意提供帮助并提交一个PR!