Fix: allow requeueing cleanup without removing finalizer and add tests#1994
Fix: allow requeueing cleanup without removing finalizer and add tests#1994alex-hunt-materialize wants to merge 2 commits into
Conversation
Signed-off-by: Alex Hunt <alex.hunt@materialize.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1994 +/- ##
=======================================
+ Coverage 77.1% 78.0% +1.0%
=======================================
Files 89 89
Lines 8927 9042 +115
=======================================
+ Hits 6878 7052 +174
+ Misses 2049 1990 -59
🚀 New features to boost your workflow:
|
|
Ah, sorry about this. I missed this PR entirely. Going over now. |
|
I'm not really comfortable doing this kind of behaviour change implicitly... IMO it's fairly common practice to ~always return
You could filter out those errors outside of the (And either way, you do control that delay via
I'd say this feels more like a design problem for the reconciliation slot system in general. Perhaps ideally we'd have a way for a task to downgrade itself, and say "well, I'm still ongoing but this is going to take a while so don't count me for the semaphore anymore". But then again, I'd also argue that not all are made equally. We effectively have three kinds of finalizers:
|
clux
left a comment
There was a problem hiding this comment.
Thanks for this. Tests are really good. This part of the codebase was badly tested and the tower mock stuff is great for this.
This is potentially a breaking change for some, but it's explicitly only if requeue is returned from the Cleanup branch. It's a small subset of people, and from my search on github for patterns, I found that this change will help more people than it harms because it makes the behavior align more with the common expectation that a requeue (if requested) will happen.
I think we can inform people about this in the release.
| // A requeue means the cleanup is still in progress (e.g. waiting on a | ||
| // background process), so we keep the finalizer and let the controller | ||
| // re-run `Cleanup` later. Only `Action::await_change` signals that cleanup | ||
| // has finished and the finalizer can be removed. | ||
| if action.wants_requeue() { | ||
| return Ok(action); | ||
| } |
There was a problem hiding this comment.
The only scary part here is if people are using a blanket Action::requeue(5 minutes) in their reconcilers.
AFAIKT this used to be a common pattern for controllers to have some safety against missed events, and if i am reading this correctly, it would mean we never actually cleanup the finalizer. We would have to communicate this in the release just in case.
There was a problem hiding this comment.
searched around a bit and I could only find one controller getting this wrong
what i instead find, is more users doing Action::requeue inside the Event::Cleanup stage - which does not prevent them from having the object deleted before the requeue is started. See e.g.;
as such, despite the small risk, i now think this is a net harm-reduction change because it allows the majority of these requeue requesters to work better.
| // With the finalizer already present and the object not being deleted, the finalizer runs | ||
| // `Apply` and passes its action straight through, without touching the API. The mock handle | ||
| // is dropped, so any request would error and fail the test. | ||
| #[tokio::test] | ||
| async fn apply_runs_reconcile_and_passes_action_through() { | ||
| let (mock_service, handle) = mock::pair::<Request<Body>, Response<Body>>(); | ||
| drop(handle); // any API call now errors |
Allow requeueing cleanup without removing the finalizer.
Motivation
For long running asynchronous cleanup tasks, the user previously had only two options:
There is also a serious foot-gun in the previous code. If a user returns an
Ok(Action)fromcleanupto re-queue the event, the finalizer gets removed immediately and the cleanup does not get re-invoked. There is nothing to indicate to the user that this is an incorrect use of the library. From the user's perspective, this should just re-queue and not remove the finalizer.Solution
If the result of the
cleanupfunction isOk(Action { requeue_after: Some(Duration) }), short circuit before removing the finalizer to just return that action.I also added some unit tests for the finalizer code.