Skip to content

Checkpointer support for Ray-Mode #487

Open
@yxdyc

Description

@yxdyc

Search before continuing 先搜索,再继续

  • I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。

Description 描述

Currently, the dj_ckpt_manager and executor only support the HF dataset. They essentially performs three actions:

  1. Tracks and saves the executed operation list from OP_1 to OP_i.
  2. Saves the processed dataset ( D_{op_i} ).
  3. Checks and loads ( D_{op_i} ) when the feature is enabled during re-processing.

It would be straightforward to extend this feature into ray_executor. For step 2 and 3, we can implement a few new interfaces for snapshotting Ray Data states and using persistent storage.

Use case 使用场景

No response

Additional 额外信息

No response

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?

  • Yes I'd like to help by submitting a PR! 是的!我愿意提供帮助并提交一个PR!

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions