Skip to content

fix: use targeted patch in calcSubnetStatusIP to prevent U2O status overwrite#6350

Merged
oilbeater merged 1 commit intomasterfrom
fix/calcSubnetStatusIP-targeted-patch
Feb 26, 2026
Merged

fix: use targeted patch in calcSubnetStatusIP to prevent U2O status overwrite#6350
oilbeater merged 1 commit intomasterfrom
fix/calcSubnetStatusIP-targeted-patch

Conversation

@oilbeater
Copy link
Copy Markdown
Collaborator

Summary

  • Fixed a race condition in calcSubnetStatusIP() where it used subnet.Status.Bytes() to patch the entire SubnetStatus, causing handleUpdateSubnetStatus to overwrite U2OInterconnectionVPC with stale cache data
  • Changed to a targeted JSON merge patch that only includes the 8 IP-related fields, leaving non-IP fields like U2OInterconnectionVPC, U2OInterconnectionIP, and conditions untouched
  • This resolves the flaky e2e test "should support underlay to overlay subnet interconnection" that intermittently timed out waiting for U2OInterconnectionVPC to be set

Root Cause

The race condition occurs after a controller restart when U2O is re-enabled:

  1. Pod deletion triggers IP release → IPAM releases immediately, but IP CRD deletion is async → IP count inconsistency detected → handleUpdateSubnetStatus requeued
  2. handleAddOrUpdateSubnet runs, sets U2OInterconnectionVPC = vpc.Status.Router, patches status ✓
  3. handleUpdateSubnetStatus retries, reads subnet from stale informer cache (without U2OInterconnectionVPC) → calcSubnetStatusIP patches the entire status including U2OInterconnectionVPC = "" → overwrites the correct value ✗
  4. Test polls for 30s but U2OInterconnectionVPC remains empty → timeout

Test plan

  • make lint passes (0 issues)
  • go build ./pkg/controller/... compiles
  • go test ./pkg/controller/... passes
  • E2E test "should support underlay to overlay subnet interconnection" no longer flaky

🤖 Generated with Claude Code

…verwrite

calcSubnetStatusIP previously used subnet.Status.Bytes() which serialized
the entire SubnetStatus and patched all fields to etcd. This caused a race
condition where handleUpdateSubnetStatus could overwrite U2OInterconnectionVPC
with stale data from its informer cache, leading to flaky e2e test failures
in "should support underlay to overlay subnet interconnection".

The race condition occurs when:
1. handleAddOrUpdateSubnet sets U2OInterconnectionVPC and patches status
2. handleUpdateSubnetStatus retries (from IP inconsistency requeue), reads
   stale cache without U2OInterconnectionVPC, and calcSubnetStatusIP
   overwrites all status fields including U2OInterconnectionVPC=""

Fix by using a targeted JSON merge patch that only includes the 8 IP-related
fields, leaving non-IP fields like U2OInterconnectionVPC untouched.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 26, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @oilbeater, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical race condition within the subnet status update mechanism. By switching from a full status overwrite to a precise, field-specific JSON patch, it prevents unintended data loss and ensures that different controllers can update their respective parts of the subnet status without interfering with each other. This change significantly improves the reliability of subnet interconnection features and related end-to-end tests.

Highlights

  • Race Condition Fix: Addressed a race condition in calcSubnetStatusIP() that previously caused handleUpdateSubnetStatus to overwrite U2OInterconnectionVPC with stale data by patching the entire SubnetStatus.
  • Targeted Patching: Implemented a targeted JSON merge patch that only updates the 8 IP-related fields within the SubnetStatus, ensuring non-IP fields like U2OInterconnectionVPC and conditions remain untouched by this specific handler.
  • E2E Test Stability: Resolved the flakiness of the e2e test 'should support underlay to overlay subnet interconnection', which was intermittently timing out due to U2OInterconnectionVPC not being set correctly.
Changelog
  • pkg/controller/subnet_status.go
    • Refactored calcSubnetStatusIP to use a structured JSON patch for IP-related status fields.
    • Replaced subnet.Status.Bytes() with json.Marshal on a custom struct containing only IP status fields.
Activity
  • The author has provided a detailed summary and root cause analysis of the issue.
  • The author has confirmed that make lint passes, go build compiles, and go test passes.
  • The author is awaiting confirmation that the E2E test 'should support underlay to overlay subnet interconnection' is no longer flaky.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the bug Something isn't working label Feb 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a race condition in calcSubnetStatusIP by replacing a full status patch with a targeted JSON merge patch for IP-related fields. This change effectively prevents other status fields, such as U2OInterconnectionVPC, from being overwritten by stale cache data. The fix is well-implemented. I have provided one suggestion to improve the code's conciseness.

Comment on lines +248 to +267
ipStatusPatch := struct {
Status struct {
V4AvailableIPs float64 `json:"v4availableIPs"`
V4AvailableIPRange string `json:"v4availableIPrange"`
V4UsingIPs float64 `json:"v4usingIPs"`
V4UsingIPRange string `json:"v4usingIPrange"`
V6AvailableIPs float64 `json:"v6availableIPs"`
V6AvailableIPRange string `json:"v6availableIPrange"`
V6UsingIPs float64 `json:"v6usingIPs"`
V6UsingIPRange string `json:"v6usingIPrange"`
} `json:"status"`
}{}
ipStatusPatch.Status.V4AvailableIPs = v4availableIPs
ipStatusPatch.Status.V4AvailableIPRange = v4AvailableIPStr
ipStatusPatch.Status.V4UsingIPs = v4UsingIPs
ipStatusPatch.Status.V4UsingIPRange = v4UsingIPStr
ipStatusPatch.Status.V6AvailableIPs = v6availableIPs
ipStatusPatch.Status.V6AvailableIPRange = v6AvailableIPStr
ipStatusPatch.Status.V6UsingIPs = v6UsingIPs
ipStatusPatch.Status.V6UsingIPRange = v6UsingIPStr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the use of a struct to create the patch is type-safe, it's a bit verbose. For improved readability and conciseness, you could consider using a map[string]any to construct the patch payload. This is a common pattern for creating JSON patches and avoids the lengthy struct definition and separate field assignments.

Suggested change
ipStatusPatch := struct {
Status struct {
V4AvailableIPs float64 `json:"v4availableIPs"`
V4AvailableIPRange string `json:"v4availableIPrange"`
V4UsingIPs float64 `json:"v4usingIPs"`
V4UsingIPRange string `json:"v4usingIPrange"`
V6AvailableIPs float64 `json:"v6availableIPs"`
V6AvailableIPRange string `json:"v6availableIPrange"`
V6UsingIPs float64 `json:"v6usingIPs"`
V6UsingIPRange string `json:"v6usingIPrange"`
} `json:"status"`
}{}
ipStatusPatch.Status.V4AvailableIPs = v4availableIPs
ipStatusPatch.Status.V4AvailableIPRange = v4AvailableIPStr
ipStatusPatch.Status.V4UsingIPs = v4UsingIPs
ipStatusPatch.Status.V4UsingIPRange = v4UsingIPStr
ipStatusPatch.Status.V6AvailableIPs = v6availableIPs
ipStatusPatch.Status.V6AvailableIPRange = v6AvailableIPStr
ipStatusPatch.Status.V6UsingIPs = v6UsingIPs
ipStatusPatch.Status.V6UsingIPRange = v6UsingIPStr
ipStatusPatch := map[string]any{
"status": map[string]any{
"v4availableIPs": v4availableIPs,
"v4availableIPrange": v4AvailableIPStr,
"v4usingIPs": v4UsingIPs,
"v4usingIPrange": v4UsingIPStr,
"v6availableIPs": v6availableIPs,
"v6availableIPrange": v6AvailableIPStr,
"v6usingIPs": v6UsingIPs,
"v6usingIPrange": v6UsingIPStr,
},
}

@oilbeater oilbeater merged commit 21141a3 into master Feb 26, 2026
72 of 75 checks passed
@oilbeater oilbeater deleted the fix/calcSubnetStatusIP-targeted-patch branch February 26, 2026 11:54
oilbeater added a commit that referenced this pull request Mar 12, 2026
…verwrite (#6350)

calcSubnetStatusIP previously used subnet.Status.Bytes() which serialized
the entire SubnetStatus and patched all fields to etcd. This caused a race
condition where handleUpdateSubnetStatus could overwrite U2OInterconnectionVPC
with stale data from its informer cache, leading to flaky e2e test failures
in "should support underlay to overlay subnet interconnection".

The race condition occurs when:
1. handleAddOrUpdateSubnet sets U2OInterconnectionVPC and patches status
2. handleUpdateSubnetStatus retries (from IP inconsistency requeue), reads
   stale cache without U2OInterconnectionVPC, and calcSubnetStatusIP
   overwrites all status fields including U2OInterconnectionVPC=""

Fix by using a targeted JSON merge patch that only includes the 8 IP-related
fields, leaving non-IP fields like U2OInterconnectionVPC untouched.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant