fix(controller): direct api to avoid stale promo object#5754
fix(controller): direct api to avoid stale promo object#5754shamsalmon wants to merge 1 commit intoakuity:release-1.9from
Conversation
✅ Deploy Preview for docs-kargo-io ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: sshannon <sshannon@Beeswax.com>
6141a4a to
d8466f0
Compare
|
@shamsalmon you lost me at step 4. How or why is the Promotion being re-reconciled immediately? I agree this could theoretically cause the problem you've been observing, but I can't quite see how or why immediate re-reconciliation would happen in the first place. |
|
Hmmm. Is it from mashing refresh maybe? |
|
Or I supposed the potential exists that hitting it even once could possibly cause this. |
|
I am sure my users are being aggressive with the refresh button but ive seen this happen just by chance pressing it once as you say. |
|
I think several factors probably make this worse, including latency from kargo controller to targeted cluster in a sharded setup. We probably run 50-200 promotions a day (100ish users) and see this happen once or twice a day. |
|
I'm fairly certain the refresh has something to do with this. For one, it could mean a promo currently being reconciled is added to the work queue so that there is that immediate re-reconciliation you spoke of. Two, I believe handling of the refresh at the head end of the reconciliation process involves a programmatic immediate requeue. I don't have the code in front of me at the moment, but I plan to dig into whether or how these are individually, or in combination, contributing to this. We can potentially consider your proposed change as a stop gap, but I feel we're really close to the smoking gun here and may be able to do something more strategic. |
|
Sounds good! I would dig into this more but I need to switch my focus onto some other priorities. If you do have another idea I would be happy to test it out for you. |
Closes: #5282
I am sure there is a more elegant solution than this but I wanted to get a solution out there. We are still testing this, but so far looking solid.
I noted on the issue why I believe this is occurring and I can confirm the "direct" api object is correct. However my initial assumptions were wrong.
Here is a timeline:
Using the apiReader (which already exists) we can ensure that we are getting the real and non cached version of the object.