Skip to content

Conversation

@Zerpet
Copy link
Member

@Zerpet Zerpet commented Jan 8, 2026

This closes #1980

Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed

Summary Of Changes

  • Add a quorum status field
  • Add agents file

Additional Context

This PR exposes the quorum status of a node in the custom resource status. When a reconcile loop runs, the operator will check the quorum status of all nodes, and update the field accordingly. The quorum status check is a health check to determine whether the node is quorum critical. This information is exposed as an HTTP API endpoint.

This implementation has an important limitation: it will only update the quorum status field when a reconcile loop triggers. It will not update the quorum status when external events affect the quorum status e.g. an operator manually removes a queue member.

Local Testing

There are unit tests for this feature.

make unit-tests

I did not add a system test because observing the status field changing is time sensitive, and the test would be flaky and potentially very slow (it would have to observe a rolling restart). Testing at integration level with testEnv is not viable because there are no Pods in this env.

Zerpet added 2 commits January 8, 2026 12:11
Related to #1980. This commit adds a status field to `RabbitmqCluster`
resource to expose the node's quorum status. The field exposes whether
the quorum is "ok", meaning there are no nodes that are quorum critical,
or whether one or more nodes are quorum critical. For example, during a
rolling restart, it will show the remaining 2 nodes as quorum critical,
and one node as unavailable.

This field has an important limitation: it updates the quorum status
_only_ when a reconcile loop triggers. For example, if an operator
deletes a quorum member, the field won't be updated until the next
reconcile. In other words, events external to Kuberentes that affect
quorum status won't be reflected in this field.
@Zerpet Zerpet added this to the v2.19.0 milestone Jan 8, 2026
@Zerpet Zerpet marked this pull request as draft January 8, 2026 12:37
@Zerpet
Copy link
Member Author

Zerpet commented Jan 8, 2026

Asking for some early feedback @mkuratczyk @MirahImage

I need to tweak the TLS part a bit before marking this as ready, but the main idea is already implemented.

Zerpet added 2 commits January 8, 2026 12:42
This was somehow passing locally. I suspect other suites initialised the variable

and I was just very lucky
@Zerpet Zerpet marked this pull request as ready for review January 8, 2026 16:10
Because skipping TLS verification is no bueno :-)
@Zerpet Zerpet force-pushed the useful-error-message branch from 6dff6aa to 3cdbd92 Compare January 8, 2026 16:52
Copy link
Member

@MirahImage MirahImage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bothers me some that there's no tests for any status other than unavailable, but otherwise looks good. I completely understand not putting any system tests in, that would either be a flaky mess or require doing ridiculous things to the cluster, which would be very time consuming.

@Zerpet
Copy link
Member Author

Zerpet commented Jan 9, 2026

I agree, it's bothersome. FWIW, the feature is simply querying rabbit and putting the result in a field. An alternative that I thought of, would be to create a fake HTTP server that mimics rabbit HTTP API, and create manually dummy Pods in testEnv; however, I would be mocking almost everything and that's usually a sign that you are doing something wrong.

@Zerpet Zerpet merged commit e8e8e09 into main Jan 9, 2026
39 checks passed
@Zerpet Zerpet deleted the useful-error-message branch January 9, 2026 09:38
Zerpet added a commit to rabbitmq/rabbitmq-website that referenced this pull request Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a "usefull error message" field in status conditions

3 participants