-
Notifications
You must be signed in to change notification settings - Fork 218
blog: primary resource caching #2815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
csviri
wants to merge
17
commits into
main
Choose a base branch
from
5.1-blogpost
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+97
−0
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
6a97555
blog: caching
csviri 77db6f5
docs: blogpost about primary caching
csviri f59df34
wip
csviri 6b0eff6
wip
csviri 703bfca
Update docs/content/en/blog/news/primary-cache-for-next-recon.md
csviri 6259ac0
Update primary-cache-for-next-recon.md
csviri cbb3315
mermaid and improvement
csviri 93a4a8d
date
csviri 6bb4473
title
csviri c9f92f0
docs
csviri 7d2588e
improve
csviri 38b0ea7
wording
csviri a69f6f4
improve
csviri 1ff9952
docs: start improving wording
metacosm 7a2d569
wording
csviri 39bba6f
comment improve
csviri c98d572
docs: improve
metacosm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
--- | ||
title: How to guarantee allocated values for next reconciliation | ||
date: 2025-05-22 | ||
author: >- | ||
[Attila Mészáros](https://github.com/csviri) | ||
--- | ||
|
||
We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of | ||
so-called | ||
[allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values | ||
). | ||
|
||
To describe the problem, let's say that our controller needs to create a resource that has a generated identifier, i.e. | ||
a resource which identifier cannot be directely derived from the custom resource's desired state as specified in its | ||
`spec` field. In order to record the fact that the resource was successfully created, and to avoid attempting to | ||
recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the | ||
generated identifier in the custom resource's `status` field. | ||
|
||
The Java Operator SDK relies on the informers' cache to retrieve resources. These caches, however, are only guaranteed | ||
to be eventually consistent. It could happen, then, that, if some other event occurs, that would result in a new | ||
reconciliation, **before** the update that's been made to our resource status has the chance to be propagated first to | ||
the cluster and then back to the informer cache, that the resource in the informer cache does **not** contain the latest | ||
version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be | ||
missing from the resource status and, therefore, another attempt to create the resource by the reconciler, which is not | ||
what we'd like. | ||
|
||
Java Operator SDK now provides a utility class [ | ||
`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) | ||
to handle this particular use case. Using that overlay cache, your reconciler is guaranteed to see the most up-to-date | ||
version of the resource on the next reconciliation: | ||
|
||
```java | ||
|
||
@Override | ||
public UpdateControl<StatusPatchCacheCustomResource> reconcile( | ||
StatusPatchCacheCustomResource resource, | ||
Context<StatusPatchCacheCustomResource> context) { | ||
|
||
// omitted code | ||
|
||
var freshCopy = createFreshCopy(resource); // need fresh copy just because we use the SSA version of update | ||
freshCopy | ||
.getStatus() | ||
.setValue(statusWithAllocatedValue()); | ||
|
||
// using the utility instead of update control to patch the resource status | ||
var updated = | ||
PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context); | ||
return UpdateControl.noUpdate(); | ||
} | ||
``` | ||
|
||
How does `PrimaryUpdateAndCacheUtils` work? | ||
There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. If you | ||
want to dig deep in alternatives, see | ||
this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files). | ||
|
||
The trick is to intercept the resource that the reconciler updated and cache that version in an additional cache on top | ||
of the informer's cache. Subsequently, if the reconciler needs to read the resource, the SDK will first check if it is | ||
in the overlay cache and read it from there if present, otherwise read it from the informer's cache. If the informer | ||
receives an event with a fresh resource, we always remove the resource from the overlay cache, since that is a more | ||
recent resource. But this **works only** if the reconciler updates the resource using **with optimistic locking**, which | ||
is handled for you by `PrimaryUpdateAndCacheUtils` provided you pass it a "fresh" (i.e. a version of the resource that | ||
only contains the fields you care about being updated) copy of the resource since Server-Side Apply will be used | ||
underneath. If the update fails on conflict, because the resource has already been updated on the cluster before we got | ||
the chance to get our update in, we simply wait and poll the informer cache until the new resource version from the | ||
server appears in the informer's cache, | ||
and then try to apply our updates to the resource again using the updated version from the server, again with optimistic | ||
locking. | ||
|
||
So why is optimistic locking required? We hinted at it above, but the gist of it, is that if another party updates the | ||
resource before we get a chance to, we wouldn't be able to properly handle the resulting situation correctly in all | ||
cases. The informer would receive that new event before our own update would get a chance to propagate. Without | ||
optimistic locking, there wouldn't be a fail-proof way to determine which update should prevail (i.e. which occurred | ||
first), in particular in the event of the informer losing the connection to the cluster or other edge cases (the joys of | ||
distributed computing!). | ||
|
||
Optimistic locking simplifies the situation and provides us with stronger guarantees: if the update succeeds, then we | ||
can be sure we have the proper resource version in our caches. The next event will contain our update in all cases. | ||
Because we know that, we can also be sure that we can evict the cached resource in the overlay cache whenever we receive | ||
a new event. The overlay cache is only used if the SDK detects that the original resource (i.e. the one before we | ||
applied our status update in the example above) is still in the informer's cache. | ||
|
||
The following diagram sums up the process: | ||
|
||
```mermaid | ||
flowchart TD | ||
A["Update Resource with Lock"] --> B{"Successful?"} | ||
B -- Fails on conflict --> D{"Resource updated from cluster?"} | ||
D -- " No: poll until updated " --> D | ||
D -- Yes --> n4["Apply desired changes on the resource again"] | ||
B -- Yes --> n2{"Original resource still in informer cache?"} | ||
n2 -- Yes --> C["Cache the resource in overlay cache"] | ||
n2 -- No --> n3["Informer cache already contains up-to-date version, do not use overlay cache"] | ||
n4 --> A | ||
|
||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you return
noUpdate
here? Shouldn't it returnpatchStatus
instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not, the utils doess the patching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to be clear, if a user wants to use the utils, they need to return
noUpdate
? Or, it's just that it won't matter if they return something else?Either way, that should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the utility instead of update control
, yes that might not be enough, will, expand on that, also a separate PR for the core docs