Fix connector state transitions from FAILED to PAUSED or STOPPED#12750
Fix connector state transitions from FAILED to PAUSED or STOPPED#12750guancioul wants to merge 6 commits into
Conversation
Signed-off-by: guancioul <guancioul@gmail.com>
Signed-off-by: guancioul <guancioul@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12750 +/- ##
=========================================
Coverage 75.16% 75.16%
- Complexity 6458 6460 +2
=========================================
Files 346 346
Lines 24329 24337 +8
Branches 3120 3122 +2
=========================================
+ Hits 18287 18294 +7
Misses 4808 4808
- Partials 1234 1235 +1
🚀 New features to boost your workflow:
|
|
/gha run pipeline=regression |
|
⏳ System test verification started: link The following 6 job(s) will be executed:
Tests will start after successful build completion. |
|
❌ System test verification failed: link |
|
🎉 System test verification passed: link |
|
|
||
| ## 1.1.0 | ||
|
|
||
| * Fix connector state transitions from `FAILED` to `PAUSED` or `STOPPED` |
There was a problem hiding this comment.
Maybe this would be more clear and more positive (as the CHANGELOG is really about new features and major changes rather than bugfixes).
| * Fix connector state transitions from `FAILED` to `PAUSED` or `STOPPED` | |
| * Allow failed `KafkaConnectors` to be paused or stopped |
There was a problem hiding this comment.
Thanks for the comment, I will make the CHANGELOG clearer and more positive.
Signed-off-by: guancioul <guancioul@gmail.com>
katheris
left a comment
There was a problem hiding this comment.
Thanks @guancioul the changes look generally good, however I did a check and it looks like you can't pause a failed connector. When I tried with a local Kafka Connect I got back:
Failed to transition connector to target state (org.apache.kafka.connect.runtime.distributed.DistributedHerder:758)
org.apache.kafka.connect.errors.ConnectException: WorkerConnector{id=echo-sink} Cannot transition connector to PAUSED since it has failed
So I think to save the operator making a request that won't succeed we should instead in that case log a warning and add that to the conditions, but then carry on so that the connector can be restarted, so we would have something like:
case "FAILED" -> {
if (targetState == ConnectorState.PAUSED) {
String message = "Connector " + connectorName + "cannot be paused since it is in failed state.";
LOGGER.warnCr(reconciliation, message);
future = Future.succeededFuture(conditions);
} else if (targetState == ConnectorState.STOPPED) {
LOGGER.infoCr(reconciliation, "Stopping connector {}", connectorName);
future = Future.fromCompletionStage(apiClient.stop(reconciliation, host, port, connectorName));
}
}
@ppatierno @scholzj what do you think?
|
Sorry I missed this, I should have verified the actual Kafka Connect behavior before implementing. Thanks for catching it! FAILED → STOP FAILED → PAUSE As confirmed, |
Signed-off-by: guancioul <guancioul@gmail.com>
| String message = "Connector " + connectorName + " cannot be paused since it is in failed state."; | ||
| LOGGER.warnCr(reconciliation, message); | ||
| conditions.add(StatusUtils.buildWarningCondition("UpdateConnectorState", message)); | ||
| return Future.succeededFuture(conditions); |
|
TBH, I'm not sure I follow the Kafka Connect logic here:
|
|
After checking the Kafka Connect source code, restart and stop both use |
Signed-off-by: Kuan-Hao (Michael) Lai <42037371+guancioul@users.noreply.github.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Signed-off-by: Kuan-Hao (Michael) Lai <42037371+guancioul@users.noreply.github.com>
Type of change
Description
Close #12542
When a Kafka connector enters the FAILED state, the operator was previously unable to transition it to PAUSED or STOPPED. The state-machine switch in
AbstractConnectOperatoronly handled transitions from RUNNING and PAUSED states, so any request to pause or stop a FAILED connector was silently ignored (falling through to the default branch).This PR adds a FAILED case to the connector state transition logic so that:
Changes:
AbstractConnectOperator.java: Added case "FAILED" to the state-machine switch to handle transitions from FAILED to PAUSED or STOPPED.ConnectorMockTest.java: Added two new tests (testConnectorFailedToPaused,testConnectorFailedToStopped) to verify the correct REST API calls are made. Also refactoredConnectorStatusrecord to useStringinstead ofConnectorStateenum so that the mock can simulate arbitrary states (e.g., FAILED) that are not part of the target-state enum.Checklist