Skip to content

Fix misleading WARN when polling connector status after KafkaConnector delete#12710

Open
Paramesh324 wants to merge 5 commits into
strimzi:mainfrom
Paramesh324:issue_12681
Open

Fix misleading WARN when polling connector status after KafkaConnector delete#12710
Paramesh324 wants to merge 5 commits into
strimzi:mainfrom
Paramesh324:issue_12681

Conversation

@Paramesh324
Copy link
Copy Markdown
Contributor

@Paramesh324 Paramesh324 commented May 5, 2026

Type of change

  • Bugfix

Description

After a successful DELETE of a connector, the operator polls GET /connectors/{name}/status until it gets 404. While the connector is still shutting down, Connect often returns 200 with normal status JSON (no Kafka Connect message field).

Previously, doGet treated any non-allowed status as an error response and passed the body through tryToExtractErrorMessage, which logged a WARN about failing to decode an error—confusing for operators.

This change fails unexpected 2xx responses with a plain ConnectRestException message instead of parsing the body as an error, so the backoff/retry behavior is unchanged (we still only treat 404 as success for this poll; 200 must continue to fail so withBackoff retries). A WireMock unit test covers the post-delete poll scenario.

Note: The idea of adding 200 and 404 both as “success” for status() would make the first 200 complete the future successfully and stop backoff immediately, so we do not use that approach.

Closes #12681

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@scholzj scholzj requested a review from katheris May 5, 2026 06:47
@scholzj scholzj added this to the 1.1.0 milestone May 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.16%. Comparing base (169cc6a) to head (2ff2715).

Files with missing lines Patch % Lines
...cluster/operator/assembly/KafkaConnectApiImpl.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12710      +/-   ##
============================================
- Coverage     75.16%   75.16%   -0.01%     
- Complexity     6452     6457       +5     
============================================
  Files           346      346              
  Lines         24325    24329       +4     
  Branches       3120     3121       +1     
============================================
+ Hits          18283    18286       +3     
- Misses         4805     4807       +2     
+ Partials       1237     1236       -1     
Files with missing lines Coverage Δ
...cluster/operator/assembly/KafkaConnectApiImpl.java 59.68% <83.33%> (-0.76%) ⬇️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left a few minor comments but mostly looks good to me.

@tinaselenge
Copy link
Copy Markdown
Contributor

/azp run regression

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Paramesh324 Paramesh324 requested a review from tinaselenge May 12, 2026 16:41
Copy link
Copy Markdown
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM now.

@Paramesh324 Paramesh324 requested a review from tinaselenge May 15, 2026 16:21
@tinaselenge
Copy link
Copy Markdown
Contributor

/gha run pipeline=regression

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (cncf-ubuntu-8-32-x86)
  • regression-operators-amd64 (cncf-ubuntu-8-32-x86)
  • regression-operands-amd64 (cncf-ubuntu-8-32-x86)
  • regression-brokers-and-security-arm64 (cncf-ubuntu-8-32-arm)
  • regression-operators-arm64 (cncf-ubuntu-8-32-arm)
  • regression-operands-arm64 (cncf-ubuntu-8-32-arm)

Tests will start after successful build completion.

@github-actions
Copy link
Copy Markdown

🎉 System test verification passed: link

// Unlisted 2xx: poll may get 200 before 404 after delete.
message = "Unexpected HTTP status code " + statusCode;
} else {
message = tryToExtractErrorMessage(reconciliation, response.body());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the tryToExtractErrorMessage method already has handling for when the response is not JSON, I think it would be better to update that method to better handle when it's JSON but is missing the error field, rather than having status code specific handling here that's for one edge case. WDYT @Paramesh324 @tinaselenge ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion — I considered moving this into tryToExtractErrorMessage for JSON without a message field, but that would also change behavior for real 4xx/5xx errors where we still want the WARN today.

In the delete flow we poll GET .../status with only 404 in okStatusCodes. Interim 200 responses carry normal connector status JSON, not a Connect error body. We fail those on purpose so backoff keeps retrying until 404 — the body isn’t an error payload, so handling unexpected 2xx in doGet (without calling tryToExtractErrorMessage) keeps the fix targeted and gives a clearer message (Unexpected HTTP status code 200). Backoff is unchanged.

Happy to refactor into tryToExtractErrorMessage if you’d prefer that centralized approach.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tryToExtractErrorMessage() method has a clear intention as it is, to extract an error message when there is an error. I'm worried if we handle it in the method, it could make it more confusing, because we would have to pass it an additional context (like the status code) to determine whether to parse as error. I'm ok with the current changes, since this is the only method we consider 2xx status as failure.

After DELETE, we poll GET /connectors/{name}/status until 404. Interim
200 responses carry connector status JSON, not an error body; do not run
tryToExtractErrorMessage on those or we log a confusing WARN.

Closes strimzi#12681

Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
- Shorten test name to testStatusDoesNotTreatOkBodyAsErrorMessage
- Shorten test name to testStatusWithValidErrorBody
- Remove unnecessary comment in doGet delete handling

Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
Include okStatusCodes in the debug message when a GET response has a
status code not in the allowed set, per review feedback.

Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented May 27, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Confusing warning message when deleting a KafkaConnector resource

4 participants