Skip to content

[Feature] [Connectors-v2] Add Doris sink redirect enhancement#10715

Merged
corgy-w merged 7 commits intoapache:devfrom
yzeng1618:dev-doris-redirect
Apr 20, 2026
Merged

[Feature] [Connectors-v2] Add Doris sink redirect enhancement#10715
corgy-w merged 7 commits intoapache:devfrom
yzeng1618:dev-doris-redirect

Conversation

@yzeng1618
Copy link
Copy Markdown
Collaborator

Purpose of this pull request

This PR enhances Doris sink redirect handling and adds an opt-in direct-to-BE write path for connector-doris.
#10697

Does this PR introduce any user-facing change?

Yes.

Before this change, Doris sink always sent stream load requests to FE and relied on FE redirect. When users encountered a raw 307 Temporary Redirect, the connector only exposed limited diagnostics, and there was no explicit way to opt in to direct BE writes.

After this change:

  • users can configure benodes and direct_to_be=true to send stream load write requests directly to Doris BE
  • when direct_to_be=true and sink.enable-2pc=true, data writes go to BE while commit/abort still go through FE
  • the connector fails fast if direct_to_be=true but benodes is missing or blank
  • redirect-related exceptions now include more context such as request path, redirect location, direct_to_be / sink.enable-2pc state, and troubleshooting guidance
  • Doris connector docs now describe the new options and redirect troubleshooting guidance

How was this patch tested?

Unit tests
E2E

Check list

Copy link
Copy Markdown

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this PR locally and traced both the stream-load path and the 2PC control path.

The direct-to-BE wiring itself looks reasonable, but the new 307-diagnostic branches are very hard to reach in the real runtime because HttpUtil still enables DefaultRedirectStrategy for PUT and marks all methods redirectable (seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/util/HttpUtil.java:28). In practice the client will usually auto-follow the 307 before DorisStreamLoad.handlePreCommitResponse() or DorisCommitter.commitTransaction()/abortTransaction() can inspect the raw redirect response (.../DorisStreamLoad.java:221, .../DorisCommitter.java:83). If the redirected host is unreachable, the caller will still just see an IOException, not the new Location/stage diagnostics.

The current tests mock a raw 307 directly, so they prove the formatter works, but they do not prove the production client path can actually surface that information. Please either disable automatic redirect for these requests and handle 307 explicitly in the connector, or move the diagnostic enrichment into the redirect strategy / IO-error path so the promised redirect context is observable in production.

DanielLeens

This comment was marked as outdated.

@yzeng1618
Copy link
Copy Markdown
Collaborator Author

I pulled this PR locally and traced both the stream-load path and the 2PC control path.

The direct-to-BE wiring itself looks reasonable, but the new 307-diagnostic branches are very hard to reach in the real runtime because HttpUtil still enables DefaultRedirectStrategy for PUT and marks all methods redirectable (seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/util/HttpUtil.java:28). In practice the client will usually auto-follow the 307 before DorisStreamLoad.handlePreCommitResponse() or DorisCommitter.commitTransaction()/abortTransaction() can inspect the raw redirect response (.../DorisStreamLoad.java:221, .../DorisCommitter.java:83). If the redirected host is unreachable, the caller will still just see an IOException, not the new Location/stage diagnostics.

The current tests mock a raw 307 directly, so they prove the formatter works, but they do not prove the production client path can actually surface that information. Please either disable automatic redirect for these requests and handle 307 explicitly in the connector, or move the diagnostic enrichment into the redirect strategy / IO-error path so the promised redirect context is observable in production.

I pulled this PR locally and traced both the stream-load path and the 2PC control path.

The direct-to-BE wiring itself looks reasonable, but the new 307-diagnostic branches are very hard to reach in the real runtime because HttpUtil still enables DefaultRedirectStrategy for PUT and marks all methods redirectable (seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/util/HttpUtil.java:28). In practice the client will usually auto-follow the 307 before DorisStreamLoad.handlePreCommitResponse() or DorisCommitter.commitTransaction()/abortTransaction() can inspect the raw redirect response (.../DorisStreamLoad.java:221, .../DorisCommitter.java:83). If the redirected host is unreachable, the caller will still just see an IOException, not the new Location/stage diagnostics.

The current tests mock a raw 307 directly, so they prove the formatter works, but they do not prove the production client path can actually surface that information. Please either disable automatic redirect for these requests and handle 307 explicitly in the connector, or move the diagnostic enrichment into the redirect strategy / IO-error path so the promised redirect context is observable in production.

If the data synchronization volume is small, the 307 redirect issue will not be triggered. In fact, the HttpUtil method follows redirects by default. However, the 307 error occurs occasionally during large-volume data synchronization. Therefore, detailed exception logs have been added, and the relevant parameters for direct_to_be and benodes have also been supplemented.

@DanielLeens
Copy link
Copy Markdown

@yzeng1618 I think the blocker still stands.

The key issue is not whether 307 happens often or only under heavy load. It is that the current production client path usually does not let the connector observe the raw 307 at all:

  • HttpUtil still installs DefaultRedirectStrategy and makes every method redirectable (HttpUtil.java:28-36)
  • the new diagnostics in DorisStreamLoad.handlePreCommitResponse() and DorisCommitter.commitTransaction()/abortTransaction() only run if those methods receive the original 307 response (DorisStreamLoad.java:221-242, DorisCommitter.java:83-131)

So in the real runtime, Apache HttpClient will typically follow the redirect before those branches execute. If the redirected target is bad or unreachable, the caller still just gets an IOException, not the new Location / stage-specific redirect message that this PR promises.

That means the current tests prove the formatter logic, but they still do not prove the production path can surface the new diagnosis. I still think we need one of these before merge:

  1. disable auto-redirect for these Doris requests and handle 307 explicitly in the connector, or
  2. move the diagnostic enrichment into the redirect strategy / follow-up IO-error path where the production client actually fails.

Without that, the new redirect troubleshooting branch remains mostly unreachable in the real path.

@yzeng1618 yzeng1618 requested a review from zhangshenghang April 7, 2026 02:22
@yzeng1618
Copy link
Copy Markdown
Collaborator Author

@yzeng1618 I think the blocker still stands.

The key issue is not whether 307 happens often or only under heavy load. It is that the current production client path usually does not let the connector observe the raw 307 at all:

  • HttpUtil still installs DefaultRedirectStrategy and makes every method redirectable (HttpUtil.java:28-36)
  • the new diagnostics in DorisStreamLoad.handlePreCommitResponse() and DorisCommitter.commitTransaction()/abortTransaction() only run if those methods receive the original 307 response (DorisStreamLoad.java:221-242, DorisCommitter.java:83-131)

So in the real runtime, Apache HttpClient will typically follow the redirect before those branches execute. If the redirected target is bad or unreachable, the caller still just gets an IOException, not the new Location / stage-specific redirect message that this PR promises.

That means the current tests prove the formatter logic, but they still do not prove the production path can surface the new diagnosis. I still think we need one of these before merge:

  1. disable auto-redirect for these Doris requests and handle 307 explicitly in the connector, or
  2. move the diagnostic enrichment into the redirect strategy / follow-up IO-error path where the production client actually fails.

Without that, the new redirect troubleshooting branch remains mostly unreachable in the real path.

I checked this again against the actual client path and updated the PR.

The connector still keeps the existing auto-redirect behavior, but the Doris request execution path now uses HttpClientContext so that when Apache HttpClient follows the FE->BE 307 and the redirected target fails at the IO layer, we surface a Doris exception with the redirect context (request, Location, direct_to_be, 2pc, stage, and the underlying IO cause).

I also added regression tests for the real production path (307 -> auto-follow -> follow-up connection failure) instead of only mocking a raw 307 response.

So the diagnostics are now observable in the runtime path that previously only returned an IOException.

@yzeng1618 yzeng1618 requested a review from DanielLeens April 7, 2026 02:23
Copy link
Copy Markdown

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the latest HEAD locally.

The blocker from my previous review looks addressed now:

  • the production request path now goes through HttpClientContext, so when Apache HttpClient auto-follows the FE -> BE redirect and the follow-up target fails at the IO layer, the connector can still surface Doris-specific redirect context (HttpUtil.java:53-79)
  • DorisCommitter / DorisStreamLoad are wired to that tracked execution path (DorisCommitter.java:101-118, 165-177; DorisStreamLoad.java:295-305)
  • the regression tests now cover the real runtime shape (307 -> auto-follow -> follow-up connection failure) instead of only mocking a raw 307 (DorisCommitterTest.java:95-118 and the corresponding stream-load test)

I do not see the previous observability gap in the current revision.
Nice follow-up on closing the real client-path issue.

davidzollo
davidzollo previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if CI passes
LGTM

@zhangshenghang
Copy link
Copy Markdown
Member

Under the condition of direct_to_be=true && sink.enable-2pc=true, the writer first successfully creates the FE controlStreamLoad and then proceeds to create the BE dorisStreamLoad. If the second step fails, the constructor directly throws an exception, but the previously created controlStreamLoad will not be closed.

Similarly, if create() succeeds and subsequent abortPreCommit() fails inside initializeStreamLoad(), the current DorisStreamLoad will also be leaked.

@yzeng1618
Copy link
Copy Markdown
Collaborator Author

Under the condition of direct_to_be=true && sink.enable-2pc=true, the writer first successfully creates the FE controlStreamLoad and then proceeds to create the BE dorisStreamLoad. If the second step fails, the constructor directly throws an exception, but the previously created controlStreamLoad will not be closed.

Similarly, if create() succeeds and subsequent abortPreCommit() fails inside initializeStreamLoad(), the current DorisStreamLoad will also be leaked.

For the resource leak, initializeLoad() now stages handles into local variables and wraps the entire init block in try/catch (RuntimeException). On failure, cleanupInitializedStreamLoads() closes whatever was already created, with an identity check to avoid double-closing. Both scenarios are covered by new unit tests.

Copy link
Copy Markdown

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the latest HEAD locally again. The redirect observability fix is still wired through the real Apache HttpClient path via HttpClientContext, and the follow-up initializeLoad() cleanup now stages both stream-load handles and closes the initialized ones on failure, so I do not see the previous Doris-side blockers in the current revision.

The current red Build is coming from seatunnel-engine-client (SeaTunnelEngineClusterRoleTest.testWorkerIsFirstMemberThenGetJobDetailStatus timeout in the unit-test job), which is outside the Doris connector files touched here.

@yzeng1618 yzeng1618 requested a review from davidzollo April 11, 2026 08:06
Copy link
Copy Markdown

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-pulled the latest HEAD locally and compared it against the previously reviewed Doris redirect revision.

On the current branch, the additional commits after my earlier approval are merge-from-dev only; the Doris connector files and Doris docs are unchanged relative to the revision I already checked.

So the earlier conclusion still stands:

  • redirect diagnostics are still surfaced through the real HttpClient follow-up failure path
  • staged stream-load handles are still cleaned up on init failure
  • the current Build check is now green

I do not see a new blocker in the latest revision.

Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Good job

@DanielLeens
Copy link
Copy Markdown

Hi @yzeng1618, I rechecked the current head locally as seatunnel-review-10715. There is no code delta after my previous approval.

The runtime path remains:

DorisSinkWriter
  -> DorisStreamLoadFactory / DorisNodeResolver
      -> FE or BE target selection when direct_to_be is enabled
  -> DorisStreamLoad / HttpUtil
      -> redirect metadata is captured and surfaced with better context
  -> DorisCommitter
      -> 2PC commit behavior remains separated from writer-side stream load routing

My previous approval still stands. The latest Build check is green, and I do not have a new blocker from my side.

Conclusion: can merge

@corgy-w corgy-w changed the title [Feature] [connector-doris] Add Doris sink redirect enhancement [Feature] [Connectors-v2] Add Doris sink redirect enhancement Apr 20, 2026
@corgy-w corgy-w merged commit 6af9ed0 into apache:dev Apr 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants