[Server] ReplicaFetcher busy loop retry storm during leader election or bucket migration

### Search before asking

- [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar.


### Fluss version

0.8.0 (latest release)

### Please describe the bug 🐞

When processing fetch requests in ReplicaManager.readFromLog(), if any bucket encounters an error (e.g., `NOT_LEADER_OR_FOLLOWER`, `UNKNOWN_TABLE_OR_BUCKET_EXCEPTION`), the current implementation immediately short-circuits the entire fetch request.
This short-circuit behavior bypasses the DelayedFetch mechanism, causing the fetch response to be returned immediately. As a result, ReplicaFetcherThread receives the response without any delay and retries immediately. During leader election or bucket migration, these errors persist temporarily, leading to a tight retry loop without any backoff.
Additionally, in ReplicaFetcherThread, when handling `NOT_LEADER_OR_FOLLOWER` error, the replica was not added to replicasWithError, preventing proper error tracking and handling.

### Solution

1. Classify fetch errors into critical and non-critical categories:
- Non-critical (expected) errors: `NOT_LEADER_OR_FOLLOWER`, `UNKNOWN_TABLE_OR_BUCKET_EXCEPTION`
- Critical errors: all other errors

2. Avoid short-circuiting for non-critical errors:
- Collect non-critical error buckets separately instead of breaking immediately
- Allow the fetch request to continue processing other buckets and enter the DelayedFetch flow normally
- Merge the error buckets into the delayed response callback

3. Fix error tracking in ReplicaFetcherThread:
- Add the replica to replicasWithError when `NOT_LEADER_OR_FOLLOWER` error occurs

This ensures that even during leader election or bucket migration, fetch requests still go through the normal delay mechanism, preventing busy loop retry storms.

### Are you willing to submit a PR?

- [x] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Server] ReplicaFetcher busy loop retry storm during leader election or bucket migration #2073

Search before asking

Fluss version

Please describe the bug 🐞

Solution

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Server] ReplicaFetcher busy loop retry storm during leader election or bucket migration #2073

Description

Search before asking

Fluss version

Please describe the bug 🐞

Solution

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions