[Feature][Zeta] Support the tolerable-failed configuration for checkpoints #10223

xiaochen-zhou · 2025-12-21T02:01:58Z

Purpose of this pull request

Support defines how many consecutive checkpoint failures will be tolerated, before the whole job is failed over. The default value is 0, which means no checkpoint failures will be tolerated, and the job will fail on first reported checkpoint failure.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Add test: CheckpointCoordinatorTest#testTolerableFailedCheckpoints()

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If necessary, please update incompatible-changes.md to describe the incompatibility caused by this PR.
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

…oints

dybyte · 2025-12-26T15:16:58Z

...erver/src/main/java/org/apache/seatunnel/engine/server/checkpoint/CheckpointCoordinator.java

+        if (tolerableFailures > 0 && failedCount <= tolerableFailures) {
+            LOG.warn(
+                    "Checkpoint failed (consecutive failures: {}/{}): {}",
+                    failedCount,
+                    tolerableFailures,
+                    ExceptionUtils.getMessage(checkpointException));
+            cleanFailedCheckpoint(reason);
+            return;
+        }


What happens if a checkpoint fails during a savepoint operation? Is there a possibility that the job become non-responsive?

xiaochen-zhou added 2 commits December 21, 2025 09:58

[Feature][Zeta] Support the tolerable-failed configuration for checkp…

998075e

…oints

[Feature][Zeta] Support the tolerable-failed configuration for checkp…

0b5041d

…oints

github-actions bot added document core SeaTunnel core module Zeta api labels Dec 21, 2025

dybyte reviewed Dec 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature][Zeta] Support the tolerable-failed configuration for checkpoints #10223

[Feature][Zeta] Support the tolerable-failed configuration for checkpoints #10223

Uh oh!

xiaochen-zhou commented Dec 21, 2025

Uh oh!

dybyte Dec 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature][Zeta] Support the tolerable-failed configuration for checkpoints #10223

Are you sure you want to change the base?

[Feature][Zeta] Support the tolerable-failed configuration for checkpoints #10223

Uh oh!

Conversation

xiaochen-zhou commented Dec 21, 2025

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

dybyte Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dybyte Dec 26, 2025 •

edited

Loading