ovsdb: exit clustered ovsdb server if multiple raft leaders found#6065
ovsdb: exit clustered ovsdb server if multiple raft leaders found#6065zhangzujian merged 2 commits intokubeovn:masterfrom
Conversation
Summary of ChangesHello @zhangzujian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability of OVN deployments by implementing a robust mechanism to detect and recover from split-brain scenarios in clustered OVSDB. By enabling the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a critical stability improvement by detecting and handling split-brain scenarios in the clustered OVSDB setup. When multiple raft leaders are found, the kube-ovn-leader-checker process will now exit to allow for a clean restart. The implementation also includes a valuable refactoring that centralizes the ovsdb-client execution logic into a new ovs.Query helper function, which enhances code quality and maintainability across the ovn_leader_checker and pinger packages. My feedback focuses on further improving maintainability by using constants for hardcoded values, ensuring test stability, and clarifying code comments.
| case "OVN_IC_Northbound": | ||
| dbAddr = ovs.OvsdbServerAddress(address, intstr.FromInt32(6645)) | ||
| case "OVN_IC_Southbound": | ||
| dbAddr = ovs.OvsdbServerAddress(address, intstr.FromInt32(6646)) |
There was a problem hiding this comment.
The database names "OVN_IC_Northbound" and "OVN_IC_Southbound" are hardcoded. To improve maintainability and avoid typos, it's recommended to define them as constants, similar to ovnnb.DatabaseName and ovnsb.DatabaseName. These constants could be placed in pkg/util/const.go or another appropriate shared package.
For example, in pkg/util/const.go:
const (
// ...
OvnICNB = "OVN_IC_Northbound"
OvnICSB = "OVN_IC_Southbound"
)Then you could use util.OvnICNB and util.OvnICSB here and in other places where these strings are used.
| } | ||
|
|
||
| output, err := exec.Command("ovsdb-client", cmd...).CombinedOutput() // #nosec G204 | ||
| result, err := ovs.Query(dbAddr, serverdb.DatabaseName, 1, ovsdb.Operation{ |
There was a problem hiding this comment.
The timeout for ovs.Query is hardcoded to 1 second. This might be too short for a loaded system, potentially causing false negatives. It's better to define this as a constant and consider a slightly larger value, for example, 3 or 5 seconds, to improve robustness.
For example, you could add a constant:
const (
// ...
DBLeaderCheckTimeout = 3
)And use it in the function call.
| result, err := ovs.Query(dbAddr, serverdb.DatabaseName, 1, ovsdb.Operation{ | |
| result, err := ovs.Query(dbAddr, serverdb.DatabaseName, 3, ovsdb.Operation{ |
|
|
||
| args := []string{"--timeout", strconv.Itoa(timeout), "query", address, string(query)} | ||
| if strings.HasPrefix(address, "ssl:") { | ||
| args = slices.Insert(args, 0, "-p", "/var/run/tls/key", "-c", "/var/run/tls/cert", "-C", "/var/run/tls/cacert") |
Pull Request Test Coverage Report for Build 20329458233Details
💛 - Coveralls |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a crucial high-availability feature to detect and handle split-brain scenarios in a clustered OVSDB setup. When multiple raft leaders are found, the ovsdb-server process will now exit, allowing for a restart and potential recovery. This is implemented by passing remote node addresses to kube-ovn-leader-checker and having it query the leader status on other nodes.
The changes also include a significant and very welcome refactoring of how OVSDB is queried. The logic is now centralized in a new pkg/ovs/ovsdb-client.go file, which wraps ovsdb-client calls. This greatly improves code clarity, maintainability, and robustness.
However, I've found a critical compilation error due to the use of a non-standard function slices.Values. I've also included a medium-severity suggestion to improve the consistency of the Configuration struct. Please see the detailed comments.
| } | ||
| } | ||
|
|
||
| for addr := range slices.Values(cfg.remoteAddresses) { |
There was a problem hiding this comment.
The function slices.Values is not part of the standard Go slices package and will cause a compilation error. To iterate over the slice of remote addresses, you should use a standard for...range loop. The range over func feature is experimental in Go 1.22 and slices.Values is not a standard iterator function.
| for addr := range slices.Values(cfg.remoteAddresses) { | |
| for _, addr := range cfg.remoteAddresses { |
| } | ||
| } | ||
|
|
||
| for addr := range slices.Values(cfg.remoteAddresses) { |
There was a problem hiding this comment.
| localAddress string | ||
| remoteAddresses []string |
There was a problem hiding this comment.
The new fields localAddress and remoteAddresses are unexported, which is inconsistent with other fields in the Configuration struct. It's a good practice to export fields of configuration structs for consistency and reusability. Please consider making them LocalAddress and RemoteAddresses and updating their usage accordingly in ParseFlags.
| localAddress string | |
| remoteAddresses []string | |
| LocalAddress string | |
| RemoteAddresses []string |
ffa8d3a to
fb42a16
Compare
Signed-off-by: zhangzujian <zhangzujian.7@gmail.com>
Signed-off-by: zhangzujian <zhangzujian.7@gmail.com>
04b4f1f to
743c5cc
Compare
Pull Request
What type of this PR
Examples of user facing changes:
Which issue(s) this PR fixes
Fixes #(issue-number)