Skip to content

Commit 477e827

Browse files
committed
[FLINK-38564][docs]FLIP-537: Add document documentation to explain how Source re-assign splits once recovery in sources.md.
1 parent 4931d08 commit 477e827

File tree

2 files changed

+17
-1
lines changed

2 files changed

+17
-1
lines changed

docs/content.zh/docs/dev/datastream/sources.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,14 @@ Data Source API 以统一的方式对无界流数据和有界批数据进行处
5555

5656
事实上,这两种情况之间的区别是非常小的:在有界/批处理情况中,枚举器生成固定数量的分片,而且每个分片都必须是有限的。但在无界流的情况下,则无需遵从限制,也就是分片大小可以不是有限的,或者枚举器将不断生成新的分片。
5757

58-
<a name="examples"></a>
5958

59+
**作业恢复时Split重新分配**
60+
61+
在通常情况下,一旦 *SplitEnumerator**分片*分配给 *SourceReader*,这些*分片*不会再重新分配给其他 *SourceReader* 。当作业从故障中恢复时,来自状态的*分片*会立即添加回 *SourceReader*
62+
63+
当 source 实现了 `SupportsSplitReassignmentOnRecovery` 接口时,恢复过程的行为会有所不同。 发生故障时,不会立即将*分片*重新分配给原来的 *SourceReader*,而是将所有*分片*收集并添加回 *SplitEnumerator*。 然后 *SplitEnumerator* 负责在 *SourceReader* 之间重新分配这些*分片*,以实现平衡的分配。 这种机制通过让中心化的 *SplitEnumerator**分片*分配做出正确的决策,从而实现更灵活和高效的恢复。
64+
65+
<a name="examples"></a>
6066
#### 示例
6167

6268
以下是一些简化的概念示例,以说明在流和批处理情况下 data source 组件如何交互。

docs/content/docs/dev/datastream/sources.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,16 @@ The Data Source API supports both unbounded streaming sources and bounded batch
5454

5555
The difference between both cases is minimal: In the bounded/batch case, the enumerator generates a fixed set of splits, and each split is necessarily finite. In the unbounded streaming case, one of the two is not true (splits are not finite, or the enumerator keeps generating new splits).
5656

57+
58+
**Split Reassignment On Recovery**
59+
60+
Under normal circumstances, once the *SplitEnumerator* assigns *Splits* to *SourceReaders*, these *splits* are not reassigned to other readers again. When the source is recovering from a failure, the *splits* from the saved state will be added back to the readers immediately.
61+
62+
When a source implements the `SupportsSplitReassignmentOnRecovery` interface, the recovery process behaves differently.
63+
Upon failure, instead of immediately reassigning the *splits* back to the same *SourceReaders*, all *splits* are collected and added back to the *SplitEnumerator*.
64+
The *SplitEnumerator* then takes responsibility for redistributing these *splits* among the available *SourceReaders* in a balanced manner.
65+
This mechanism enables more flexible and efficient recovery by allowing the central *SplitEnumerator* to make informed decisions about split distribution.
66+
5767
#### Examples
5868

5969
Here are some simplified conceptual examples to illustrate how the data source components interact, in streaming and batch cases.

0 commit comments

Comments
 (0)