-
Notifications
You must be signed in to change notification settings - Fork 310
Add quorum status field #2031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quorum status field #2031
Conversation
Related to #1980. This commit adds a status field to `RabbitmqCluster` resource to expose the node's quorum status. The field exposes whether the quorum is "ok", meaning there are no nodes that are quorum critical, or whether one or more nodes are quorum critical. For example, during a rolling restart, it will show the remaining 2 nodes as quorum critical, and one node as unavailable. This field has an important limitation: it updates the quorum status _only_ when a reconcile loop triggers. For example, if an operator deletes a quorum member, the field won't be updated until the next reconcile. In other words, events external to Kuberentes that affect quorum status won't be reflected in this field.
|
Asking for some early feedback @mkuratczyk @MirahImage I need to tweak the TLS part a bit before marking this as ready, but the main idea is already implemented. |
This was somehow passing locally. I suspect other suites initialised the variable and I was just very lucky
Because skipping TLS verification is no bueno :-)
6dff6aa to
3cdbd92
Compare
MirahImage
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It bothers me some that there's no tests for any status other than unavailable, but otherwise looks good. I completely understand not putting any system tests in, that would either be a flaky mess or require doing ridiculous things to the cluster, which would be very time consuming.
|
I agree, it's bothersome. FWIW, the feature is simply querying rabbit and putting the result in a field. An alternative that I thought of, would be to create a fake HTTP server that mimics rabbit HTTP API, and create manually dummy Pods in testEnv; however, I would be mocking almost everything and that's usually a sign that you are doing something wrong. |
This closes #1980
Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Summary Of Changes
Additional Context
This PR exposes the quorum status of a node in the custom resource status. When a reconcile loop runs, the operator will check the quorum status of all nodes, and update the field accordingly. The quorum status check is a health check to determine whether the node is quorum critical. This information is exposed as an HTTP API endpoint.
This implementation has an important limitation: it will only update the quorum status field when a reconcile loop triggers. It will not update the quorum status when external events affect the quorum status e.g. an operator manually removes a queue member.
Local Testing
There are unit tests for this feature.
I did not add a system test because observing the status field changing is time sensitive, and the test would be flaky and potentially very slow (it would have to observe a rolling restart). Testing at integration level with testEnv is not viable because there are no Pods in this env.