Skip to content

AV-232614 Static Route fixes #1705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025

Conversation

pkoshtavmware
Copy link
Contributor

@pkoshtavmware pkoshtavmware commented Mar 29, 2025

AV-232614 Static Route fixes.

This PR includes following changes:

  1. Replace route ids generation with golang's UUID generation(For nodes CRUD). Modify logic for addition/updation/deletion of block affinities respectively as per new UUIDs
  2. Safely Remove redundant data structures earlier needed when allocating route ids in serial order.
  3. Remove all the internal data in cache corresponding to a node, when there is any error while fetching node data/ node ip, which may have lead to multiple duplicate records in the past.
  4. Fixed existing UnitTestCases.
  5. Removing records with duplicate route_ids(not needed)
  6. Add logic to sanitize data before sending to controller to check for occurence of duplicate records.
  7. Addition of new unittest cases if needed to simulate the above error.
  8. Manual Testing.
  9. Run configmap fts
  10. Tested for deadlock/race conditions using Gemini Code Assist
  11. Check ako upgrade from 1.12.1 to 1.14.1
  12. Check K8s version upgrade from 1.28 to 1.29

Copy link
Contributor

@akshayhavile akshayhavile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some test case to replicate node error scenario and check behaviour of AKO? Also have we run configmap Ft against the changes to validate all test cases are running fine?

@@ -60,19 +56,36 @@ func (o *AviObjectGraph) BuildVRFGraph(key, vrfName, nodeName string, deleteFlag
node, err := utils.GetInformers().NodeInformer.Lister().Get(nodeName)
if err != nil {
utils.AviLog.Errorf("key: %s, Error in fetching node details: %s: %v", key, nodeName, err)
var staticRouteCopy []*models.StaticRoute
for i := 0; i < len(aviVrfNode.StaticRoutes); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is replicated at multiple places. Can we make it function and call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@pkoshtavmware pkoshtavmware added the open-for-review Pull request is up for review label Apr 1, 2025
@pkoshtavmware pkoshtavmware force-pushed the AV-232614-master branch 4 times, most recently from 548acd9 to cdd46c1 Compare April 2, 2025 15:47
}
sort.Strings(nodeNames)
for _, nodeKey := range nodeNames {
o.BuildVRFGraph(key, aviVrfNode.Name, nodeKey, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If node events are happening, through dequeueIngestion, BuildVRFGraph will be called. This function at the beginning has lock so only one will run. Will be there any conflict ? IF yes, How will we avoid conflict or overwriting of data or some race conditions?

Copy link
Contributor Author

@pkoshtavmware pkoshtavmware Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid overwriting data, i have added one more lock within **CheckAndDeduplicateRecords** function. However, after all the changes in the code, chances are very bleak that this deduplication process will be triggerred and even if it does, it will behave normally the way simultaneous update on multiple nodes is handled.

Please let me know if there is any specific case where, my theoretical assumption will fail

@pkoshtavmware pkoshtavmware force-pushed the AV-232614-master branch 2 times, most recently from bd41b20 to bcdf957 Compare April 3, 2025 06:22
@pkoshtavmware
Copy link
Contributor Author

build ako

@@ -889,7 +889,7 @@ func processNodeObj(key, nodename string, sharedQueue *utils.WorkerQueue, fullsy
utils.AviLog.Errorf("key: %s, msg: Error creating vrf graph: %v", key, err)
return
}

aviModelGraph.CheckAndDeduplicateRecords(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets call this function when it is not fullsync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for i := 0; i < len(aviVrfNode.StaticRoutes); i++ {
_, ok := podCidrNextHopMap[*aviVrfNode.StaticRoutes[i].Prefix.IPAddr.Addr]
if ok {
hasDuplicateRecords = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a log here mentioning duplicate records are found for the pod cidr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@akshayhavile
Copy link
Contributor

build ako

@pkoshtavmware pkoshtavmware force-pushed the AV-232614-master branch 2 times, most recently from 553f66b to fc0a2de Compare April 10, 2025 11:53
@akshayhavile
Copy link
Contributor

build ako

@pkoshtavmware pkoshtavmware force-pushed the AV-232614-master branch 5 times, most recently from 3c02817 to 90d3282 Compare April 11, 2025 12:26
@pkoshtavmware
Copy link
Contributor Author

build ako

1 similar comment
@pkoshtavmware
Copy link
Contributor Author

build ako

@pkoshtavmware
Copy link
Contributor Author

Ft run result:

Screenshot 2025-04-14 at 8 39 11 PM

@pkoshtavmware
Copy link
Contributor Author

build ako

1 similar comment
@pkoshtavmware
Copy link
Contributor Author

build ako

akshayhavile
akshayhavile previously approved these changes Apr 15, 2025
Copy link
Contributor

@akshayhavile akshayhavile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all changes looks good to me.

Copy link
Contributor

@arihantg arihantg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pkoshtavmware
Copy link
Contributor Author

build ako

@akshayhavile akshayhavile merged commit 30445ae into vmware:master Apr 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-not-required open-for-review Pull request is up for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants