-
Notifications
You must be signed in to change notification settings - Fork 456
[FLINK-34524] Scale down JM deployment to 0 before deletion #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @mateczagany , I would love to hear your thoughts on this as well. I still need to finish up some tests but any feedback is welcome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, what is the difference between scaling the deployment to zero vs removing it? I would think both issue a SIGINT to the container. If that is the case, there should be the difference in shutdown behavior.
With respect to the JM there is no difference however deleting the deployment deletes the TM pods at the same time so the JM sees the TM failures / loss in many cases leading to a restart followed by sigterm. The idea here is to delete the JM before touching the TMs |
Ah, got it! That makes sense. |
flinkStandaloneService.deleteClusterDeployment( | ||
flinkDeployment.getMetadata(), | ||
flinkDeployment.getStatus(), | ||
new Configuration(), | ||
false); | ||
|
||
assertEquals(2, mockServer.getRequestCount() - requestsBeforeDelete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this result in one request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the JM deletion, instead of 1 DELETE request, this will cost 3 requests:
- Patch resource
waitUntilCondition
gets resource, if the condition is not fulfilled, starts websocket (so this might be 2 requests)- Delete resource
And we still have the TM deletion request below the new logic, so this will be 4 requests total instead of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, but I really like this change, I think it makes a lot of sense to let the JMs shut down the TMs themselves in native mode.
But I wonder, how much difference will it make in standalone mode? Is it worth adding it there as well?
flinkStandaloneService.deleteClusterDeployment( | ||
flinkDeployment.getMetadata(), | ||
flinkDeployment.getStatus(), | ||
new Configuration(), | ||
false); | ||
|
||
assertEquals(2, mockServer.getRequestCount() - requestsBeforeDelete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the JM deletion, instead of 1 DELETE request, this will cost 3 requests:
- Patch resource
waitUntilCondition
gets resource, if the condition is not fulfilled, starts websocket (so this might be 2 requests)- Delete resource
And we still have the TM deletion request below the new logic, so this will be 4 requests total instead of 2.
|
||
protected void scaleJmToZero( | ||
EditReplacePatchable<Deployment> jmDeployment, String namespace, String clusterId) { | ||
LOG.info("Scaling down JM deployment to 0 before deletion"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we also have a possible timeout here, maybe it's worth adding the duration of the timeout in the log
...perator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java
Outdated
Show resolved
Hide resolved
The logic can be the same in standalone as well, right now we are issuing the delete request for both TM and JM deployments directly one after the other. There is a good chance that the TM shuts down and sends an error signal to the JM before that terminates. In standalone the alternative would be simply to wait for the JM deployment to be deleted before deleting the task managers. I am working on some cleanup / improvements in this area to make this nicer |
@mateczagany I actually reworked the deletion flow and moved some code around to remove some duplicated logic and the unnecessary decoupling of deletion vs waiting. This leads to a much cleaner logic and avoids using the scale to zero for the standalone mode as well while keeping the benefits of this logic. (I still need to update some now obsolete tests, will do that today) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small comment, but overall this seems like a great improvement compared to the last solution
...-operator/src/main/java/org/apache/flink/kubernetes/operator/service/NativeFlinkService.java
Outdated
Show resolved
Hide resolved
The new and improved / consistent logging after the rework @mateczagany @mxm :
|
@gyfora Nice! LGTM |
What is the purpose of the change
We recently improved the JM deployment deletion mechanism, however it seems like task manager pod deletion can get stuck sometimes for a couple of minutes in native mode if we simply try to delete everything at once.
It speeds up the process and leads to cleaner shutdown if we scale down the JM deployment to 0 (shutting down the JM pods first) and then perform the deletion.
Verifying this change
Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors
: noDocumentation