feat: resetOffsetsAndBackfill using bounded stream supervisor#19477
feat: resetOffsetsAndBackfill using bounded stream supervisor#19477aho135 wants to merge 4 commits into
Conversation
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 2 |
| P3 | 0 |
| Total | 3 |
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 2 |
| P3 | 0 |
| Total | 3 |
Reviewed 11 of 11 changed files.
This is an automated review by Codex GPT-5.5
| endOffsets | ||
| ); | ||
|
|
||
| streamSupervisor.resetOffsets(resetMetadata); |
There was a problem hiding this comment.
[P1] Reset can skip data before backfill is guaranteed
resetOffsets only enqueues a ResetOffsetsNotice; it does not synchronously update metadata. This code queues the main supervisor reset before the bounded config is fully built and before createOrUpdateAndStartSupervisor succeeds. If any later step fails, the queued reset can still advance the live supervisor to latest offsets with no backfill supervisor, losing the skipped range this endpoint is meant to preserve.
|
|
||
| @Override | ||
| protected void updatePartitionLagFromStream() | ||
| public void updatePartitionLagFromStream() |
There was a problem hiding this comment.
[P2] Kinesis cannot provide backfill end offsets
The new manager path calls updatePartitionLagFromStream() and then getLatestSequencesFromStream(), but Kinesis only updates time lag here and does not override getLatestSequencesFromStream(), so it inherits the base empty map. Any Kinesis supervisor that passes the earlier checks will fail with empty latest offsets instead of starting a backfill.
There was a problem hiding this comment.
Good callout, @jaykanakiya will be tackling Kinesis support in a separate PR
|
|
||
| // Verify useConcurrentLocks is enabled | ||
| final Map<String, Object> context = streamSpec.getContext(); | ||
| if (context == null || !Boolean.TRUE.equals(context.get("useConcurrentLocks"))) { |
There was a problem hiding this comment.
[P2] Concurrent-lock check rejects valid true contexts
This check only accepts a literal Boolean true under the hard-coded key. Other Druid paths in this class parse Tasks.USE_CONCURRENT_LOCKS with QueryContexts.getAsBoolean, which accepts values like string true. Supervisors whose tasks actually use concurrent locks can therefore be rejected by this endpoint.
This change adds an endpoint called resetOffsetsAndBackfill to SupervisorResource. This is a useful feature for operating Druid clusters where the most recent data is the most important (such as alerting use cases).
Description
Adds an endpoint called resetOffsetsAndBackfill to automatically ingest skipped data in the case where the offset is reset to latest. This requires useEarliestOffset=false, useConcurrentLocks=true because there can be conflicting time intervals between the backfill task and the main supervisor tasks and the Supervisor needs to be in a running state in order to call updatePartitionLagFromStream() to get the latest offsets
Release note
Adds an endpoint called
resetOffsetsAndBackfillfor Stream Supervisors. This is a useful feature for operating Druid clusters where the most recent data is the most importantKey changed/added classes in this PR
SupervisorResourceSupervisorManagerThis PR has: