Skip to content

branch-4.1: [fix](streaming-job) bound cdc_client RPCs with per-category timeouts #62870#62983

Open
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-62870-branch-4.1
Open

branch-4.1: [fix](streaming-job) bound cdc_client RPCs with per-category timeouts #62870#62983
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-62870-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62870

…#62870)

### What problem does this PR solve?

Introduce two configurable timeouts (mirroring the BE
`brpc_light/heavy_work_pool` naming) and apply them to all 8 cdc_client
RPC call sites:

- `streaming_cdc_light_rpc_timeout_sec = 90` for `/api/close`,
`/api/compareOffset`, `/api/fetchEndOffset`, `/api/getTaskOffset`,
`/api/getFailReason` (server-side single-statement queries / cache
lookups, expected sub-second). Default is 90s rather than 30s to absorb
cdc_client cold-start: when the BE-spawned cdc_client process is not yet
running, `start_cdc_client` performs a health-check loop (worst case
~45s) before serving the request — 90s gives enough headroom to avoid
spurious timeouts during this window while still bounding
`JobManager.writeLock` hold time.
- `streaming_cdc_heavy_rpc_timeout_sec = 600` for `/api/initReader`,
`/api/fetchSplits`, `/api/writeRecords` (may legitimately take minutes
for replication slot creation, large snapshot split computation, or
batch writes).
@github-actions github-actions Bot requested a review from yiguolei as a code owner April 30, 2026 14:42
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

@JNSimba
Copy link
Copy Markdown
Member

JNSimba commented May 2, 2026

run external

@JNSimba
Copy link
Copy Markdown
Member

JNSimba commented May 2, 2026

rum nonConcurrent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants