[Feat] Support explicit `FusedOP` that allows for the configuration and application of multiple operators in smaller, manageable batches

### Search before continuing 先搜索，再继续

- [X] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。


### Description 描述

Currently, the Data-Juicer's recipe and default executor support processing only a sequence of operations, such as OP1 and OP2, over the entire dataset in a linear fashion:

```
dataset.process([OP1, OP2])
```
However, to facilitate more granular control and optimize resource management, particularly in scenarios requiring batch-wise sequential processing, the following approach is envisaged:

```
for data_batch in dataset.batch_iterator(batch_size):
      data_batch.process([OP1, OP2])
```
This method allows for the application of operators in smaller, manageable batches, potentially improving efficiency, reducing memory footprint and simplifying the code implementation.

To integrate this feature into the cfg.yaml configuration file, a special token, such as `dj_batched_group_ops` can be proposed. This token will enable users to specify batch processing parameters directly within the configuration, as illustrated below:
```
process:
  - clean_email_mapper:
  - clean_links_mapper:

-->

process:
  - FusedOP:
      - batch_size: 1  # or any desired batch size
      - clean_email_mapper:
      - clean_links_mapper:

```

### Use case 使用场景

_No response_

### Additional 额外信息

_No response_

### Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR？

- [X] Yes I'd like to help by submitting a PR! 是的！我愿意提供帮助并提交一个PR！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Support explicit `FusedOP` that allows for the configuration and application of multiple operators in smaller, manageable batches #413

Search before continuing 先搜索，再继续

Description 描述

Use case 使用场景

Additional 额外信息

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR？

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat] Support explicit FusedOP that allows for the configuration and application of multiple operators in smaller, manageable batches #413

Description

Search before continuing 先搜索，再继续

Description 描述

Use case 使用场景

Additional 额外信息

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR？

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feat] Support explicit `FusedOP` that allows for the configuration and application of multiple operators in smaller, manageable batches #413