Fix misleading WARN when polling connector status after KafkaConnector delete#12710
Fix misleading WARN when polling connector status after KafkaConnector delete#12710Paramesh324 wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12710 +/- ##
============================================
- Coverage 75.16% 75.16% -0.01%
- Complexity 6452 6457 +5
============================================
Files 346 346
Lines 24325 24329 +4
Branches 3120 3121 +1
============================================
+ Hits 18283 18286 +3
- Misses 4805 4807 +2
+ Partials 1237 1236 -1
🚀 New features to boost your workflow:
|
tinaselenge
left a comment
There was a problem hiding this comment.
Thanks for the PR. Left a few minor comments but mostly looks good to me.
|
/azp run regression |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/gha run pipeline=regression |
|
⏳ System test verification started: link The following 6 job(s) will be executed:
Tests will start after successful build completion. |
|
🎉 System test verification passed: link |
| // Unlisted 2xx: poll may get 200 before 404 after delete. | ||
| message = "Unexpected HTTP status code " + statusCode; | ||
| } else { | ||
| message = tryToExtractErrorMessage(reconciliation, response.body()); |
There was a problem hiding this comment.
Given that the tryToExtractErrorMessage method already has handling for when the response is not JSON, I think it would be better to update that method to better handle when it's JSON but is missing the error field, rather than having status code specific handling here that's for one edge case. WDYT @Paramesh324 @tinaselenge ?
There was a problem hiding this comment.
Thanks for the suggestion — I considered moving this into tryToExtractErrorMessage for JSON without a message field, but that would also change behavior for real 4xx/5xx errors where we still want the WARN today.
In the delete flow we poll GET .../status with only 404 in okStatusCodes. Interim 200 responses carry normal connector status JSON, not a Connect error body. We fail those on purpose so backoff keeps retrying until 404 — the body isn’t an error payload, so handling unexpected 2xx in doGet (without calling tryToExtractErrorMessage) keeps the fix targeted and gives a clearer message (Unexpected HTTP status code 200). Backoff is unchanged.
Happy to refactor into tryToExtractErrorMessage if you’d prefer that centralized approach.
There was a problem hiding this comment.
There was a problem hiding this comment.
I think tryToExtractErrorMessage() method has a clear intention as it is, to extract an error message when there is an error. I'm worried if we handle it in the method, it could make it more confusing, because we would have to pass it an additional context (like the status code) to determine whether to parse as error. I'm ok with the current changes, since this is the only method we consider 2xx status as failure.
After DELETE, we poll GET /connectors/{name}/status until 404. Interim
200 responses carry connector status JSON, not an error body; do not run
tryToExtractErrorMessage on those or we log a confusing WARN.
Closes strimzi#12681
Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
- Shorten test name to testStatusDoesNotTreatOkBodyAsErrorMessage - Shorten test name to testStatusWithValidErrorBody - Remove unnecessary comment in doGet delete handling Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
Include okStatusCodes in the debug message when a GET response has a status code not in the allowed set, per review feedback. Signed-off-by: Parameshwaran Krishnasamy <Parameshwaran.K@ibm.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Type of change
Description
After a successful DELETE of a connector, the operator polls GET /connectors/{name}/status until it gets 404. While the connector is still shutting down, Connect often returns 200 with normal status JSON (no Kafka Connect message field).
Previously, doGet treated any non-allowed status as an error response and passed the body through tryToExtractErrorMessage, which logged a WARN about failing to decode an error—confusing for operators.
This change fails unexpected 2xx responses with a plain ConnectRestException message instead of parsing the body as an error, so the backoff/retry behavior is unchanged (we still only treat 404 as success for this poll; 200 must continue to fail so withBackoff retries). A WireMock unit test covers the post-delete poll scenario.
Note: The idea of adding 200 and 404 both as “success” for status() would make the first 200 complete the future successfully and stop backoff immediately, so we do not use that approach.
Closes #12681
Checklist
Please go through this checklist and make sure all applicable tasks have been done