[Data] Support strict=False mode for StreamingRepartition#60295
[Data] Support strict=False mode for StreamingRepartition#60295machichima wants to merge 24 commits intoray-project:masterfrom
Conversation
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a strict parameter to StreamingRepartition, allowing for a non-strict mode. In non-strict mode, repartitioning doesn't stitch blocks, which enables more operator fusion opportunities. The changes are well-implemented across the logical planning, fusion rules, and physical planning layers. The default for repartition is now non-strict, which is a good choice for performance. The added tests are comprehensive and cover both the new non-strict behavior and the fusion logic. My main feedback is to add documentation for the new strict parameter in the user-facing Dataset.repartition method to ensure users understand how to use it.
| num_blocks: Optional[int] = None, | ||
| target_num_rows_per_block: Optional[int] = None, | ||
| *, | ||
| strict: bool = False, |
There was a problem hiding this comment.
The new strict parameter should be documented in the repartition method's docstring. Explaining the difference between strict=True (the old behavior) and strict=False (the new default) is important for users to understand its impact on block sizes and fusion.
You could add something like this to the Args section:
strict: If ``True``, `repartition` guarantees that all output blocks, except for the last one, will have `target_num_rows_per_block` rows. If ``False``, `repartition` is more relaxed and may produce blocks smaller than `target_num_rows_per_block` without stitching them. This is only used with `target_num_rows_per_block`. Defaults to ``False``.Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
|
@owenowenisme PTAL. Thank you! |
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
6cfbfc5 to
04964bc
Compare
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
owenowenisme
left a comment
There was a problem hiding this comment.
test_operator_fusion is failing could you please take a look?
| input_physical_dag, | ||
| data_context, | ||
| name=op.name, | ||
| compute_strategy=compute, |
There was a problem hiding this comment.
I think we need min_rows_per_bundle = op.target_num_rows_per_block here if strict=False?
There was a problem hiding this comment.
Seems like when we set min_rows_per_bundle here, the BlockRefBundler will try to stitch the output:
Therefor, I think we should keep it as None here to prevent stitching
ray/python/ray/data/_internal/execution/operators/map_operator.py
Lines 828 to 835 in 68d01c4
| strict: If ``True``, ``repartition`` guarantees that all output blocks, | ||
| except for the last one, will have exactly ``target_num_rows_per_block`` rows. | ||
| If ``False``, ``repartition`` is more relaxed and may produce blocks smaller | ||
| than ``target_num_rows_per_block`` without stitching them together. | ||
| This parameter is only used with ``target_num_rows_per_block``. | ||
| Defaults to ``False``. |
There was a problem hiding this comment.
Might be better to say that will only produce at most 1 block that is < target_num_rows_per_block per input block if strict is false.
|
|
||
|
|
||
| @pytest.mark.parametrize("batch_size", [30, 35, 45]) | ||
| def test_streaming_repartition_fusion_non_strict( |
There was a problem hiding this comment.
I think fusion test should be in python/ray/data/tests/test_operator_fusion.py
There was a problem hiding this comment.
There's existing fusion and streaming repartition related test in this file, I think we can put this here as it align with existing tests. WDYT?
| ref_bundler = StreamingRepartitionRefBundler(batch_size) | ||
| # No further fusion because StreamingRepartitionRefBundler is stateful | ||
| # and maintains internal buffering state across bundles. | ||
| supports_fusion = False |
There was a problem hiding this comment.
Will this prevent fusion when batch_size == target_num_rows_per_block ?
There was a problem hiding this comment.
Yes, but I think it's intended. As the original code (strict mode) hard-coded supports_fusion=False to prevent further fusion
# For now, we don't want to over-fuse StreamingRepartition with other map operators,
# so the result operator does not support further fusion.
supports_fusion=False,| strict: If True, guarantees that all output blocks, except for the last one, | ||
| will have exactly target_num_rows_per_block rows. If False, is more relaxed | ||
| and may produce blocks smaller than target_num_rows_per_block without | ||
| stitching them together. Defaults to False. |
There was a problem hiding this comment.
Ditto with the comment in dataset.py
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
| if not ( | ||
| isinstance(up_logical_op, MapBatches) | ||
| and up_logical_op._batch_size is not None | ||
| and down_logical_op.target_num_rows_per_block is not None | ||
| and down_logical_op.target_num_rows_per_block > 0 | ||
| # When the batch_size is a multiple of target_num_rows_per_block, fusing would still produce exactly identical sequence of blocks. | ||
| # See `_fuse_streaming_repartition_operators_in_dag` docstring for details. | ||
| # TODO: when the StreamingRepartition supports none_strict_mode, we can fuse | ||
| # `MapBatches -> StreamingRepartition` no matter what the `batch_size` and `target_num_rows` are. | ||
| # https://anyscale1.atlassian.net/browse/DATA-1731 | ||
| and up_logical_op._batch_size | ||
| % down_logical_op.target_num_rows_per_block | ||
| ): | ||
| return False |
There was a problem hiding this comment.
I don't think this logic is correct -- if _batch_size is None we'd still allow to fuse StreamingRepartition
There was a problem hiding this comment.
Hi @alexeykudinkin ,
I was following the original logic here, which also return False when _batch_size is None
ray/python/ray/data/_internal/logical/rules/operator_fusion.py
Lines 280 to 294 in 8e2e0aa
Also, while we use StreamingRepartitionRefBundler(batch_size), based on the class def, the batch_size cannot be None
ray/python/ray/data/_internal/streaming_repartition.py
Lines 34 to 37 in 68d01c4
Therefor, I think we should keep this here?
There was a problem hiding this comment.
Well, you're relaxing this, right?
There are now should be 2 modes:
- StreamingRepartition(strict=True): batch-size need to be exact multiple of
target_num_rows_per_blockto produce correct results. - StreamingRepartition(strict=False): batch-size could be anything (even null)
There was a problem hiding this comment.
Make sense! Thank you for pointing this out. Updated in c77787c
There was a problem hiding this comment.
| if not ( | |
| isinstance(up_logical_op, MapBatches) | |
| and up_logical_op._batch_size is not None | |
| and down_logical_op.target_num_rows_per_block is not None | |
| and down_logical_op.target_num_rows_per_block > 0 | |
| # When the batch_size is a multiple of target_num_rows_per_block, fusing would still produce exactly identical sequence of blocks. | |
| # See `_fuse_streaming_repartition_operators_in_dag` docstring for details. | |
| # TODO: when the StreamingRepartition supports none_strict_mode, we can fuse | |
| # `MapBatches -> StreamingRepartition` no matter what the `batch_size` and `target_num_rows` are. | |
| # https://anyscale1.atlassian.net/browse/DATA-1731 | |
| and up_logical_op._batch_size | |
| % down_logical_op.target_num_rows_per_block | |
| ): | |
| return False | |
| if ( | |
| not isinstance(up_logical_op, MapBatches) | |
| or not down_logical_op.target_num_rows_per_block | |
| ): | |
| return False |
Can we simplify the logic here like this? And add check and raise error at dataset api to check if target_num_rows_per_block is not None it should not be negative
(Maybe move this)
ray/python/ray/data/_internal/streaming_repartition.py
Lines 35 to 37 in f5a53c4
…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
| ref_bundler = StreamingRepartitionRefBundler(batch_size) | ||
| # No further fusion because StreamingRepartitionRefBundler is stateful | ||
| # and maintains internal buffering state across bundles. | ||
| supports_fusion = False |
There was a problem hiding this comment.
We'd not be blocking any subsequent fusion like that
Let's add a test that we're able to fuse multiple ops like this:
- Map > Map > SR
- Map > SR > SR
There was a problem hiding this comment.
While the comment is on line 338 (supports_fusion=False), I want to make sure do we want to support fusion for strict mode? Or just add test for non-strict mode? I think it's the latter one?
There was a problem hiding this comment.
The Map > SR > SR case cannot work here because after the first Map > SR fusion, the logical operator becomes AbstractUDFMap rather than MapBatches.
ray/python/ray/data/_internal/logical/rules/operator_fusion.py
Lines 355 to 369 in f3d444a
The current implementation only allows MapBatches > SR fusion:
To support Map > SR > SR fusion, we will need more changes, which I think is a bit out of scope of this PR.
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Description
Currently, StreamingRepartition operator is essentially
strict=True. We want to relax this to allow non-strict mode with following guarantees:target_num_rowstarget_num_rowsblocks per input block (ie it wouldn’t do any stitching)This mode will be the default mode and would allow StreamingRepartition to be fused into previous operator
Related issues
Closes #60026
Additional information
strict: bool = Falseparameter to repartition()_get_fused_streaming_repartition_operator()andplan_streaming_repartition_op():- Strict: uses
ref_bundler=StreamingRepartitionRefBundler- Non-strict: uses
ref_bundler=None(defaultBlockRefBundler)