Skip to content

Fix IP Port Mapping overwritten by new LBs in the same VPC#6301

Closed
SkalaNetworks wants to merge 4 commits intokubeovn:masterfrom
SkalaNetworks:fix-ipportmapping
Closed

Fix IP Port Mapping overwritten by new LBs in the same VPC#6301
SkalaNetworks wants to merge 4 commits intokubeovn:masterfrom
SkalaNetworks:fix-ipportmapping

Conversation

@SkalaNetworks
Copy link
Copy Markdown
Member

Pull Request

What type of this PR

  • Bug fixes
  • Tests

When adding a SwitchLBRule in a VPC, the ip_port_mapping is populated to help the healthchecks run. If that field is not generated correctly, the healthchecks will not run and the LB will not work.

In the following condition, the ip_port_mapping is wrong:

  • Add a first LB in the VPC (mapping is correct and holds the ports to the endpoints)
  • Add a second LB (mapping is overwritten for this LB, first LB has its ip_port_mapping erased)

The logic to handle ip_port_mapping has been faulty for a while, I tried fixing it a couple month back but made the issue worse.

This handles the fix differently:

  • I'm not trying to have an ip_port_mapping that contains only the backends that are currently active, if a LoadBalancer is deleted while the controller is down, the ip_port_mapping WILL contain stale data.
  • Because the ip_port_mapping is not trying to be updated in a smart way, we can't accidentally erase an entry we need.
  • The problem is, LBs will eventually clog the field with stale data.
  • For that, I added a GC on the OVN loadbalancers. It checks if the ip_port_mapping contains backends that are not targetted by any VIP.

With this fix:

  • We fix a bug that prevents creating more than one SLR per VPC (if healthchecks are activated, which they are by default)
  • We ensure loadbalancers are clean in OVN even if the ovn-controller is down while the user deletes LBs/Services

I still lack 2 things that I will add later in the GC:

  • Cleaning up VIPs if no service/SLR points to it
  • Cleaning up healthchecks if no service/SLR points to it

Signed-off-by: SkalaNetworks <contact@skala.network>
Signed-off-by: SkalaNetworks <contact@skala.network>
…ing correctly

Signed-off-by: SkalaNetworks <contact@skala.network>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @SkalaNetworks, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug where OVN Load Balancer IP port mappings were being incorrectly overwritten or erased when multiple load balancers were managed within the same VPC, causing health checks to fail. The core of the solution involves introducing a robust garbage collection routine that actively prunes stale IP port mapping entries from OVN Load Balancers. This change also refines the IP port mapping update and deletion logic to prevent unintended data loss and improve the overall stability and reliability of load balancing operations.

Highlights

  • Bug Fix: IP Port Mapping Overwrites: Resolved an issue where adding new Load Balancers (LBs) in the same Virtual Private Cloud (VPC) would incorrectly overwrite existing ip_port_mapping entries for other LBs, leading to health check failures.
  • New OVN Load Balancer Garbage Collection (GC): Introduced a garbage collection mechanism for OVN Load Balancers that periodically identifies and removes stale ip_port_mapping entries that are no longer associated with any active Virtual IPs (VIPs). This prevents accumulation of outdated data, especially if the controller is down during LB/Service deletions.
  • Refactored IP Port Mapping Deletion Logic: The LoadBalancerDeleteIPPortMapping function was refactored to specifically delete a single backend IP mapping. A new function, LoadBalancerDeleteVipIPPortMapping, was added to handle the deletion of IP port mappings associated with a specific VIP, ensuring more precise control over mapping lifecycle.
  • Improved LoadBalancerUpdateIPPortMapping Behavior: Modified the LoadBalancerUpdateIPPortMapping function to no longer attempt to delete 'outdated' mappings during an update. This change prevents accidental erasure of necessary entries and aligns with the new GC strategy.
  • Added E2E Tests for GC: New end-to-end tests were added to validate the garbage collection functionality, specifically ensuring that manually injected stale ip_port_mapping entries are correctly cleaned up by the controller.
Changelog
  • mocks/pkg/ovs/interface.go
    • Updated LoadBalancerDeleteIPPortMapping mock signature to use backendIP instead of vip.
    • Added mock for the new LoadBalancerDeleteVipIPPortMapping function.
  • pkg/controller/endpoint_slice.go
    • Updated calls to LoadBalancerDeleteIPPortMapping to use the newly introduced LoadBalancerDeleteVipIPPortMapping.
  • pkg/controller/gc.go
    • Imported the net package for IP parsing.
    • Registered the new gcOvnLb function to run as part of the garbage collection cycle.
    • Implemented gcOvnLb to iterate through OVN Load Balancers and remove ip_port_mapping entries that are not associated with any active VIPs.
  • pkg/controller/gc_test.go
    • Imported go.uber.org/mock/gomock for mock testing.
    • Added new test cases for gcOvnLb to verify the cleanup of stale ip_port_mapping entries, including IPv4 and IPv6 scenarios.
  • pkg/ovs/interface.go
    • Modified the LoadBalancerDeleteIPPortMapping interface method signature to accept backendIP.
    • Added a new interface method LoadBalancerDeleteVipIPPortMapping.
  • pkg/ovs/ovn-nb-load_balancer.go
    • Updated LoadBalancerDeleteVip to call the new LoadBalancerDeleteVipIPPortMapping.
    • Refactored LoadBalancerDeleteIPPortMapping to specifically delete a backend IP from ip_port_mappings.
    • Implemented LoadBalancerDeleteVipIPPortMapping to delete IP port mappings associated with a given VIP.
    • Removed the logic for deleting outdated mappings from LoadBalancerUpdateIPPortMapping.
  • pkg/ovs/ovn-nb-load_balancer_test.go
    • Added a test case to ensure LoadBalancerUpdateIPPortMapping does not overwrite existing data.
    • Renamed testLoadBalancerDeleteIPPortMapping to testLoadBalancerDeleteVipIPPortMapping and updated its internal calls.
    • Added a new testLoadBalancerDeleteIPPortMapping to test the deletion of a specific backend IP.
  • pkg/ovs/ovn-nb-suite_test.go
    • Added Test_LoadBalancerDeleteVipIPPortMapping to the test suite.
  • test/e2e/kube-ovn/e2e_test.go
    • Imported the new gc e2e test suite.
  • test/e2e/kube-ovn/gc/gc.go
    • Added a new e2e test file to verify the garbage collection of stale ip_port_mappings in OVN LoadBalancers.
Activity
  • The pull request introduces a bug fix and new garbage collection functionality.
  • New mock interfaces and implementations were added for refined IP port mapping deletion.
  • The controller's garbage collection routine was extended to include OVN Load Balancer cleanup.
  • Comprehensive unit and end-to-end tests were added to validate the new GC logic and ensure correct behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue of ip_port_mapping being overwritten when multiple load balancers are created in the same VPC. The approach of simplifying the update logic in LoadBalancerUpdateIPPortMapping to only perform inserts and introducing a dedicated garbage collector (gcOvnLb) to clean up stale entries is robust and sound. The refactoring of LoadBalancerDeleteIPPortMapping into LoadBalancerDeleteVipIPPortMapping and a new function for deleting by backend IP improves code clarity. The accompanying unit and E2E tests provide good coverage for the new functionality. I have one minor suggestion to improve the E2E test implementation.

@SkalaNetworks
Copy link
Copy Markdown
Member Author

I truly was under the impression that the GC ran periodically, but it actually only works for LSPs.

My GC function runs only once when the controller is restarted. I still wrote tests, but obviously they don't work.
I'm skipping the test right now and keeping the test in the code.

I think we'll have to reasses that point

Signed-off-by: SkalaNetworks <contact@skala.network>
@SkalaNetworks
Copy link
Copy Markdown
Member Author

The current GC could do what I need, I'll clos this and address it elsewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant