Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions client2/apis/objectstorage/v1alpha2/bucket_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha2

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
)

// BucketDeletionPolicy configures COSI's behavior when a Bucket resource is deleted.
// +enum
// +kubebuilder:validation:Enum:=Retain;Delete
type BucketDeletionPolicy string

const (
// BucketDeletionPolicyRetain configures COSI to keep the Bucket object as well as the backend
// bucket when a Bucket resource is deleted.
BucketDeletionPolicyRetain BucketDeletionPolicy = "Retain"

// BucketDeletionPolicyDelete configures COSI to delete the Bucket object as well as the backend
// bucket when a Bucket resource is deleted.
BucketDeletionPolicyDelete BucketDeletionPolicy = "Delete"
)

// BucketSpec defines the desired state of Bucket
// +kubebuilder:validation:XValidation:message="parameters map is immutable",rule="has(oldSelf.parameters) == has(self.parameters)"
// +kubebuilder:validation:XValidation:message="protocols list is immutable",rule="has(oldSelf.protocols) == has(self.protocols)"
// +kubebuilder:validation:XValidation:message="existingBucketID is immutable",rule="has(oldSelf.existingBucketID) == has(self.existingBucketID)"
Comment on lines +40 to +42

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want the message here to be may not be removed once set? These checks don't actually implement the immutability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've followed the documented pattern here for this: https://kubernetes.io/blog/2022/09/29/enforce-immutability-using-cel/#immutablility-after-first-modification

There are 2 parts:

  • the struct-scope rule that prevents the config key from being wholesale removed in a manifest
  • the field-scope rule that prevents modification of the value

If you have an alternate recommendation, I can take it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking here, your rules actually are problematic if you're aiming for immutable after first modification

The example has

// +kubebuilder:validation:XValidation:rule="!has(oldSelf.value) || has(self.value)", message="Value is required once set"

Which means that if the old value didn't exist, you can add the new value, but if the old value does exist, you must have a new value

Your current rule means that the old and new "has" must match. Which means you cannot add a value if you did not previously have one.

The example also notes "required once set" which is the same semantic I was suggesting with "may not be removed once set"

The part you have on the fields themselves is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I was mistaken on my reply.

Yes, the intent is to prevent optional values from being added/removed at random. In my testing, I was able to prevent mutating the value of a field, but the field itself was able to be added/removed, which is also a mutation. I added these rules to prevent users from removing or adding optional values after creation.

Is there a more preferred way to encode this requirement?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so to clarify, if the user did not set this when they created the object, you don't want them to be able to set it later? If that's the case, you existing validation is correct, I would update the message to be more helpful with a message of something like "may only be set on create" to give the user a more specific feedback if they try to add it after the fact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Thanks

type BucketSpec struct {
// driverName is the name of the driver that fulfills requests for this Bucket.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered what valid data for this field looks like? What is the possible format of a driver name? Must it conform to any pattern (e.g. must it be a valid K8s resource name?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in line with other Kubernetes names, following RFC1123.

// +required
// +kubebuilder:validation:MinLength=1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a maximum length?

// +kubebuilder:validation:XValidation:message="driverName is immutable",rule="self == oldSelf"
DriverName string `json:"driverName"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current guidance is that every field should have omitempty regardless of whether it is optional/required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this more? From my experience, I would disagree because I find that omitempty also shows fields that are required when using kubectl describe, and it makes it easier at a glance to see if users have misconfigured something that should be set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a read through https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#optional-vs-required which details this and talks about serialization.

For CRDs, the omitempty is even more important than for native types IMO.

Think about a structured go client which is marshalling a field without omitempty. If that field is required, this means for a CRD that the openapi schema will check for presence of the field. It does not check anything to do with the value.

In your current case, you have a MinLength > 0, most people don't. So for a field that doesn't have a min length, this would mean that a structured client who does not set an opinion for the field, would persist foo: "", and the required check would pass.

But was that the intent of making the field required? Almost certainly no.
The intent of making a field required would normally mean, I want you to specify a non-zero value here (at least in the string case).

By adding omitempty, the field, when the structured client doesn't explicitly set something, does not marshal the key, and they rightfully get an error message "foo is required"

I would disagree because I find that omitempty also shows fields that are required when using kubectl describe

Omitempty should have no effect on kubectl describe 🤔 describe is just reading whatever is in etcd, which is unstructured? Can you elaborate more on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would disagree because I find that omitempty also shows fields that are required when using kubectl describe

Omitempty should have no effect on kubectl describe 🤔 describe is just reading whatever is in etcd, which is unstructured? Can you elaborate more on this?

I may be thinking of times I've used kubectl get -o yaml instead of describe. But my memory is that omitempty will mean that zero-length strings are omitted from the output even when I would prefer to see that explicitly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But my memory is that omitempty will mean that zero-length strings are omitted from the output even when I would prefer to see that explicitly.

This very depends on what the writer did. If you used kubectl apply, you probably wouldn't see these fields as empty because someone is unlikely to write foo: "" in their yaml. They'd get an error saying "foo is required" and then provide a non-empty value most likely.

If instead they generated something with a Go client, and didn't have omitempty, then yes, you'd see the empty string persisted, and that would pass a required check. Your controller would see that empty string as empty and then error out generally in that case.

By adding omitempty and minimum length checks we are creating a more consistent experience between different types of clients


// deletionPolicy determines whether a Bucket should be deleted when its bound BucketClaim is
// deleted. This is mutable to allow Admins to change the policy after creation.
// Possible values:
// - Retain: keep both the Bucket object and the backend bucket
// - Delete: delete both the Bucket object and the backend bucket
Comment on lines +52 to +54

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend trying kubectl explain on this field. I don't think this will render as it is written here.

I generally recommend full sentences over bullets for this reason.

Suggested change
// Possible values:
// - Retain: keep both the Bucket object and the backend bucket
// - Delete: delete both the Bucket object and the backend bucket
// Valid values are Retain and Delete.
// When set to Retain, both the Bucket object and the backend bucket will be kept upon deletion.
// When set to Delete, both the Bucket object and the backend bucket will be removed upon deletion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is how it renders from kubectl explain. From my perspective, the bullets are rendering as intended, but we can still change them if you think there are other places where this won't render as intended.

  deletionPolicy	<string> -required-
  enum: Retain, Delete
    deletionPolicy determines whether a Bucket should be deleted when its bound
    BucketClaim is
    deleted. This is mutable to allow Admins to change the policy after
    creation.
    Possible values:
     - Retain: keep both the Bucket object and the backend bucket
     - Delete: delete both the Bucket object and the backend bucket

Looks somewhat better with --output plaintext-openapiv2, which is how the godoc preview renders locally:

   deletionPolicy	<string> -required-
     deletionPolicy determines whether a Bucket should be deleted when its bound
     BucketClaim is deleted. This is mutable to allow Admins to change the
     policy after creation. Possible values:
     - Retain: keep both the Bucket object and the backend bucket
     - Delete: delete both the Bucket object and the backend bucket

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're happy it's rendering correctly, feel free to keep. I don't have a strong opinion but generally lean towards prose over bullets

// +required
DeletionPolicy BucketDeletionPolicy `json:"deletionPolicy"`

// parameters is an opaque map of driver-specific configuration items passed to the driver that
// fulfills requests for this Bucket.
Comment on lines +58 to +59

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a driver here? How would I know what valid options are? Is this CLI args, or a config file?

What happens if a bucket needs more flexibility for configuration than string to string allows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are open options - how they are used it's up to the driver's implementation details, and should be documented in the specific driver's documentation. If one needs to have more flexibility here, they can use JSON/YAML in strings or reference config maps or secrets through that field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the driver will effectively come up with a schema here and will be given a raw blob of data? Or how are you passing the data to the driver? Do you have any more context on this?

I'm wondering if a map here is the right thing to do, vs just being a raw blob of unstructured data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of the early days of cluster API, we found raw data to be awkward and eventually switch to references to concrete CRs. Have you seen this pattern before? Or considered it here?

It would mean that each driver has its own config type that can be properly structured, can be evolved over time and provide proper validations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of COSI is to do our best to balance familiarity with StorageClass/PV/PVC concepts against new designs that allow better usability. This is a case where COSI has borrowed the same opaque map concept from StorageClass.

My gut feel here is that this is still sufficient for block and file storage and that it will probably be slightly more beneficial to keep similar semantics. However, I will make a note to ask in the sig-storage group if there have been discussions to change this and solicit feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of the early days of cluster API, we found raw data to be awkward and eventually switch to references to concrete CRs. Have you seen this pattern before? Or considered it here?

Yes, and I don't like this design from user's perspective. Balancing few different vendors and CAPI itself is painful, especially if you want to keep up to date. I'd like to keep it simple, using k/v pairs as here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke to sig-storage folks about this topic today, and they reported that the opaque parameter map has worked well for some time for StorageClasses, PVCs, and CSI. Unless additional info comes up here, the takeaway is that COSI and sig-storage prefer to keep this as-is to rely on user/driver/admin familiarity from CSI.

I consider this resolved, but I'll opt to leave this thread open because I think it will be a good reference in the future. Thanks again, Joel.

// +optional
// +kubebuilder:validation:XValidation:message="parameters map is immutable",rule="self == oldSelf"
Parameters map[string]string `json:"parameters,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a MinProperties=1 here to prevent someone from persisting the empty map as an unstructured client

Ideally you can set a max properties too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With MinProperties=1, does that prevent users from not specifying parameters or leaving parameters empty?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a structured Go client, since you have omitempty, when you have no values in the map, the json marshaller will omit the parameters key from the encoded object. So the only time you'd have a value in the json would be when the map has at least one property.

For an unstructured client, it prevents someone from sending (via YAML and kubectl maybe) parameters: {}, this would be rejected saying you need at least one property.

If you allowed parameters: {} to be persisted by the structured client, this implies there's a semantic difference between being omitted and present but empty (think about something like emptyDir where that is a semantic difference). If you want a semantic difference, you need to think about how the Go client serializes this map, because at the moment, if an unstructured client were to write up an object with parameters: {} in, the next time a structured client writes to the object, they'd omit it, and it would get removed from etcd. This means the data didn't round trip correctly through your structured client, which is very bad!


// protocols lists object store protocols that the provisioned Bucket must support.
// If specified, COSI will verify that each item is advertised as supported by the driver.
Comment on lines +64 to +65

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the valid protocol types? Should include the valid options here to that users can see from the docs what the valid options are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be set through // +kubebuilder:validation:Enum, right? Do we need this separately in the comment though?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be set through // +kubebuilder:validation:Enum, right?

Yes, though I thought this was already done?

Most end user documentation is generated only from the godoc. It doesn't include the enum or other validations from the schema. Which means that when a user reads the docs for the field (e.g. kubectl explain), they would have no idea what the valid options are unless we tell them in the godoc

// +optional
// +listType=set
// +kubebuilder:validation:XValidation:message="protocols list is immutable",rule="self == oldSelf"
Protocols []ObjectProtocol `json:"protocols,omitempty"`

// bucketClaim references the BucketClaim that resulted in the creation of this Bucket.
// For statically-provisioned buckets, set the namespace and name of the BucketClaim that is
// allowed to bind to this Bucket.
// +required
BucketClaimRef BucketClaimReference `json:"bucketClaim"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, normally the field name and json tag name match

Suggested change
BucketClaimRef BucketClaimReference `json:"bucketClaim"`
BucketClaim BucketClaimReference `json:"bucketClaim"`


// existingBucketID is the unique identifier for an existing backend bucket known to the driver.
// Use driver documentation to determine how to set this value.
// This field is used only for Bucket static provisioning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static bucket provisioning vs bucket static provisioning?

// This field will be empty when the Bucket is dynamically provisioned from a BucketClaim.
// +optional
// +kubebuilder:validation:XValidation:message="existingBucketID is immutable",rule="self == oldSelf"
ExistingBucketID string `json:"existingBucketID,omitempty"`
}

// BucketClaimReference is a reference to a BucketClaim object.
// +kubebuilder:validation:XValidation:message="namespace is immutable once set",rule="!has(oldSelf.namespace) || has(self.namespace)"
// +kubebuilder:validation:XValidation:message="uid is immutable once set",rule="!has(oldSelf.uid) || has(self.uid)"
type BucketClaimReference struct {
// name is the name of the BucketClaim being referenced.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document the valid pattern for this field

// It must be at most 253 characters in length and consist only of lower-case alphanumeric characters, hyphens and periods.
// Each period separated label must start and end with an alphanumeric character.

// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:XValidation:message="name is immutable",rule="self == oldSelf"
Name string `json:"name"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What versions of Kube do you intend to support with this API? Depending on the version, you may be able to validate the DNS 1123 Subdomain property within a CEL helper

Copy link
Contributor Author

@BlaineEXE BlaineEXE Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I think v1.30 and above. We could probably say v1.32 and above if that helps.

I don't see a k8s version here, so I presume that means any k8s version that supports CEL would support format.dns1123Label().validate(self) here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in v1.31, which means usable from 1.32 onwards, see https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Format


// namespace is the namespace of the BucketClaim being referenced.
// If empty, the Kubernetes 'default' namespace is assumed.
// namespace is immutable except to update '' to 'default'.
Comment on lines +98 to +99

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? And why bother defaulting?

// +optional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +optional
// It must be at most 63 characters in length and consist only of lower-case alphanumeric characters and hyphens,
// and must start and end with an alphanumeric character.
// +optional

// +kubebuilder:validation:MinLength=0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want a user to explicitly set "" realistically, you always want a min length of 1 unless the empty string means something semantically

// +kubebuilder:validation:MaxLength=253

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespaces are DNS 1123 Labels, so top out at 63 chars

// +kubebuilder:validation:XValidation:message="namespace is immutable",rule="(oldSelf == '' && self == 'default') || self == oldSelf"
Namespace string `json:"namespace"`

// uid is the UID of the BucketClaim being referenced.
// +optional
// +kubebuilder:validation:XValidation:message="uid is immutable once set",rule="oldSelf == '' || self == oldSelf"
UID types.UID `json:"uid"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the kubebuilder format marker for UUID on this field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a follow-up question here.

I looked into this quite deeply a number of weeks ago. I scoured k8s docs, and I didn't see anything on k8s resources or in documentation that says what the format is for UIDs. In my experience, they are all the the 60-char format like this: 8D8AC610-566D-4EF0-9C22-186B2A5ED793, but I didn't see any language that guaranteed a specific representation.

Is it safe to actually do this? And if so, is this documented somehwere I can reference for the future?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types.UID format is as you say, an 8-4-4-4-12 hex value, documented standard in https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids

This field is representing the UID of K8s object, changing that format would be a major breaking change within K8s that I cannot imagine would happen lightly and you would not be the only person having to deal with that breaking change

As far as I'm aware the Kubebuilder format conforms to what's done here
https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/apis/resource/validation/validation.go#L98-L106

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing, I found that the uuid() CEL format helper does not work correctly. It allowed a UID with capitalized alpha chars and a string longer than 36 chars long. Do you know where the code for this is maintained, that I might file an issue?

Copy link

@JoelSpeed JoelSpeed Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the kubebuilder format marker rather than the CEL (I've tested this/had this conversation recently)

The best place to raise an issue for the CEL is to raise it on K/K and tag sig-api-machinery. Walking it through, it relies on this regex, https://github.com/kubernetes/kube-openapi/blob/4e65d59e963e749fa7939999518f9e90682983c3/pkg/validation/strfmt/default.go#L59

Which looks correct to me? What was your validation rule?

}

// BucketStatus defines the observed state of Bucket.
// +kubebuilder:validation:XValidation:message="bucketID is immutable once set",rule="!has(oldSelf.bucketID) || has(self.bucketID)"
// +kubebuilder:validation:XValidation:message="protocols is immutable once set",rule="!has(oldSelf.protocols) || has(self.protocols)"
type BucketStatus struct {
// readyToUse indicates that the bucket is ready for consumption by workloads.
ReadyToUse bool `json:"readyToUse"`
Comment on lines +116 to +117

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better represented by a Ready condition


// bucketID is the unique identifier for the backend bucket known to the driver.
// +optional
// +kubebuilder:validation:XValidation:message="boundBucketName is immutable once set",rule="oldSelf == '' || self == oldSelf"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need the ability to transition from "", I wouldn't expect "" to be a valid choice here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was needed because "" is the default value. Or is there more complexity here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "" a valid user choice for this field? Would I send bucketID: "" and that mean something different from the field being omitted completely? I suspect the answer is no

So this field should prevent the empty string by adding a MinLength marker.

In which case there is no way the field could be persisted as the empty string. It would either be missing, or set to something no-zero.

When the oldSelf is missing, the rule does not run, it only runs once you have the field set, and as above, would not be empty.

BucketID string `json:"bucketID"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on a pattern? Min length, max length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the field is optional (unset initially), min length is 0 I think.

For max length, it's challenging because this is really determined by backend object store limitations and driver's needs. Object store bucket names can be quite long. Azure allows up to 1024 chars by their doc. S3 is 63 chars. GCS is unclear, but possibly 222 chars. And a driver might reasonably also want more length than just the proto max, though we could cap this if you think it's pertinent.

As for pattern, my personal theory has been that COSI will have an easier time by not overly clamping down on this. At the base layer, this has to support bucket id/name for all 3 current object protocols. It also has to support whatever drivers need/want to do with that string on top of the 3 protocols.

Since users aren't specifying this field, I have assumed there isn't much risk to leave it patternless.

With that background knowledge, are there still things you'd suggest here?

Or, alternately, is there a recommended way for COSI to document that we are omitting certain best-practices for reviews?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the field is optional (unset initially), min length is 0 I think.

Generally no. The validations don't run unless the field is present (present vs omitted from the serialization). You only want the zero value to be accepted here if there is a semantic difference between an end user setting the zero value and the field being omitted completely.

For max length, it's challenging because this is really determined by backend object store limitations and driver's needs. Object store bucket names can be quite long. Azure allows up to 1024 chars by their doc. S3 is 63 chars. GCS is unclear, but possibly 222 chars. And a driver might reasonably also want more length than just the proto max, though we could cap this if you think it's pertinent.

If you don't put a bound on, then the API server will assume the max length is 3 MiB for the purposes of CEL. Which means you won't be able to validate this string very well as you'll hit the cost limits.

It's also generally good hygiene to limit the input to good values.

In a case like this, I would suggest choosing something high enough that all of your current use cases can work, and if you have an idea about potential future use cases, maybe provide some headroom.

Bear in mind that loosening a validation is a much easier argument than tightening it later on.

Are you able to scan for the equivalent length limit on other platforms that integrate with Kube to get an indication what the max might be?

As for pattern, my personal theory has been that COSI will have an easier time by not overly clamping down on this. At the base layer, this has to support bucket id/name for all 3 current object protocols. It also has to support whatever drivers need/want to do with that string on top of the 3 protocols.

In this case, I'd recommend documenting in the godoc that the format of the field is based on the driver and give the user a hint of how to work out what valid data is

Since users aren't specifying this field, I have assumed there isn't much risk to leave it patternless.

With pretty much every API review I do, I come to it with very little context about the fields and the use case. So I normally comment for the most strict, and then have a conversation about exceptions. If a field documents something that clearly shows it is not possible to validate with a pattern, as is this case, then that generally gives me a hint not to suggest that we validate it. I'm trying to prompt with these comments to think through these cases (in case you haven't already) and ideally make sure the documentation on the fields is as full as it can possibly be to help the future end user understand to the best of our ability how to populate the data in this field correctly.


// protocols is the set of protocols the Bucket reports to support. BucketAccesses can request
// access to this BucketClaim using any of the protocols reported here.
// +optional
// +listType=set
Protocols []ObjectProtocol `json:"protocols"`
Comment on lines +124 to +128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this field in both spec and status?

Copy link
Contributor Author

@BlaineEXE BlaineEXE Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Spec protocols is an optional set of user-defined protocols that the bucket is required to support. If a user supplies values and the driver doesn't support the user's request, provisioning will fail.

After provisioning, the driver reports the set of protocols that the bucket can support, which is allowed to be a superset of the user-provided values (including if the user provides no input). This is captured in the Status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes more sense now, thanks


// BucketInfo reported by the driver, rendered in the COSI_<PROTOCOL>_<KEY> format used for the
// BucketAccess Secret. e.g., COSI_S3_ENDPOINT, COSI_AZURE_STORAGE_ACCOUNT.
// This should not contain any sensitive information.
// +optional
BucketInfo map[string]string `json:"bucketInfo,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better as a list of named objects, easier to apply validation and allow SSA style writes

bucketInfo:
- property: Blah
  value: something


// Error holds the most recent error message, with a timestamp.
// This is cleared when provisioning is successful.
// +optional
Error *TimestampedError `json:"error,omitempty"`
Comment on lines +136 to +139

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally error fields don't age well. A condition will probably be more intuitive

}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Cluster
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=unapproved, experimental v1alpha2 changes"

// Bucket is the Schema for the buckets API
type Bucket struct {
metav1.TypeMeta `json:",inline"`

// metadata is a standard object metadata
// +optional
metav1.ObjectMeta `json:"metadata,omitempty,omitzero"`

// spec defines the desired state of Bucket
// +required
Spec BucketSpec `json:"spec"`

// status defines the observed state of Bucket
// +optional
Status BucketStatus `json:"status,omitempty,omitzero"`
}

// +kubebuilder:object:root=true

// BucketList contains a list of Bucket
type BucketList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []Bucket `json:"items"`
}

func init() {
SchemeBuilder.Register(&Bucket{}, &BucketList{})
}
217 changes: 217 additions & 0 deletions client2/apis/objectstorage/v1alpha2/bucketaccess_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha2

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// BucketAccessAuthenticationType specifies what authentication mechanism is used for provisioning
// bucket access.
// +enum
// +kubebuilder:validation:Enum:="";Key;ServiceAccount

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why support the empty string as a valid choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially just because the type is reused in multiple contexts. "" is needed in the status, which begins its life as an empty string that gets filled in later.

In the spec, we wrote a more restrictive enum to prohibit users from giving "" as input.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does status start life as an empty string? Why wouldn't it just be omitted?

type BucketAccessAuthenticationType string

const (
// The driver should generate a protocol-appropriate access key that clients can use to
// authenticate to the backend object store.
BucketAccessAuthenticationTypeKey = "Key"

// The driver should configure the system such that Pods using the given ServiceAccount
// authenticate to the backend object store automatically.
BucketAccessAuthenticationTypeServiceAccount = "ServiceAccount"
)

// BucketAccessMode describes the Read/Write mode an access should have for a bucket.
// +enum
// +kubebuilder:validation:Enum:=ReadWrite;ReadOnly;WriteOnly
type BucketAccessMode string

const (
// BucketAccessModeReadWrite represents read-write access mode.
BucketAccessModeReadWrite BucketAccessMode = "ReadWrite"

// BucketAccessModeReadOnly represents read-only access mode.
BucketAccessModeReadOnly BucketAccessMode = "ReadOnly"

// BucketAccessModeWriteOnly represents write-only access mode.
BucketAccessModeWriteOnly BucketAccessMode = "WriteOnly"
)

// BucketAccessSpec defines the desired state of BucketAccess
// +kubebuilder:validation:XValidation:message="serviceAccountName is immutable",rule="has(oldSelf.serviceAccountName) == has(self.serviceAccountName)"
type BucketAccessSpec struct {
// bucketClaims is a list of BucketClaims the provisioned access must have permissions for,
// along with per-BucketClaim access parameters and system output definitions.
// At least one BucketClaim must be referenced.
// Multiple references to the same BucketClaim are not permitted.
// +required
// +listType=map
// +listMapKey=bucketClaimName
// +kubebuilder:validation:MinItems=1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a sensible maximum number of bucket claims you can impose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I've been thinking about. For the usecases I am aware of, I don't imagine more than 10 buckets being referenced. However, I don't know what usecases might need more that I'm not aware of.

Since this is a validation, I suppose it would be easy to impose some limit and raise it later if we get feedback about it.

Do you have any suggestions for our position here?

Copy link

@JoelSpeed JoelSpeed Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would generally suggest picking something larger than you currently expect the maximum to be, and to have some headroom. I tend to suggest either powers of 10 or powers of 2.

If we went for something like 64, that sounds like there's plenty of headroom there, and if you found issues later, you'd be able to reason about loosening that validation to something larger.

It may also be worth checking with various cloud providers if they have limitations that may impact this number

// +kubebuilder:validation:XValidation:message="bucketClaims list is immutable",rule="self == oldSelf"
BucketClaims []BucketClaimAccess `json:"bucketClaims"`

// bucketAccessClassName selects the BucketAccessClass for provisioning the access.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any formatting you can apply? Looks like this is probably a DNS 1123 subdomain right?

// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:XValidation:message="bucketAccessClassName is immutable",rule="self == oldSelf"
BucketAccessClassName string `json:"bucketAccessClassName"`

// protocol is the object storage protocol that the provisioned access must use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document the valid choices and what they mean

// +required
// +kubebuilder:validation:XValidation:message="protocol is immutable",rule="self == oldSelf"
Protocol ObjectProtocol `json:"protocol"`

// serviceAccountName is the name of the Kubernetes ServiceAccount that user application Pods
// intend to use for access to referenced BucketClaims.
// This has different behavior based on the BucketAccessClass's defined AuthenticationType:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? How would I as an end user understand what the differences in behaviour are?

// - Key: This field is ignored.
// - ServiceAccount: This field is required. The driver should configure the system so that Pods
// using the ServiceAccount authenticate to the object storage backend automatically.
// +optional
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:XValidation:message="serviceAccountName is immutable",rule="self == oldSelf"
ServiceAccountName string `json:"serviceAccountName,omitempty"`
}

// BucketAccessStatus defines the observed state of BucketAccess.
// +kubebuilder:validation:XValidation:message="accountID is immutable once set",rule="!has(oldSelf.accountID) || has(self.accountID)"
// +kubebuilder:validation:XValidation:message="accessedBuckets is immutable once set",rule="!has(oldSelf.accessedBuckets) || has(self.accessedBuckets)"
// +kubebuilder:validation:XValidation:message="driverName is immutable once set",rule="!has(oldSelf.driverName) || has(self.driverName)"
// +kubebuilder:validation:XValidation:message="authenticationType is immutable once set",rule="!has(oldSelf.authenticationType) || has(self.authenticationType)"
// +kubebuilder:validation:XValidation:message="parameters is immutable once set",rule="!has(oldSelf.parameters) || has(self.parameters)"
type BucketAccessStatus struct {
// readyToUse indicates that the BucketAccess is ready for consumption by workloads.
ReadyToUse bool `json:"readyToUse"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition?


// accountID is the unique identifier for the backend access known to the driver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any validations you can apply here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially just a copy-paste of what the driver tells us the account ID is. IMO, it might be best to leave this without validation so that drivers don't provision successfully and then fail at the last possible step when setting the status.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a min/max length would still be advisable. I cannot imagine an ID is shorter than 1 character or longer than say 256/512/1024 characters on any platform right?

// This field is populated by the COSI Sidecar once access has been successfully granted.
// +optional
// +kubebuilder:validation:XValidation:message="accountId is immutable once set",rule="oldSelf == '' || self == oldSelf"
AccountID string `json:"accountID"`

// accessedBuckets is a list of Buckets the provisioned access must have permissions for, along
// with per-Bucket access options. This field is populated by the COSI Controller based on the
// referenced BucketClaims in the spec.
// +optional
// +listType=map
// +listMapKey=bucketName
// +kubebuilder:validation:XValidation:message="accessedBuckets is immutable once set",rule="oldSelf.size() == 0 || self == oldSelf"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add a minlength instead of allowing zero to more?

Is there a semantic difference between omitted and accessedBuckets: []?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a distinction. At the start of its life, this status field is zero-length. After processing, we set it once, after which it shouldn't be mutated.

In my testing, this rule limited mutuation after first set.

I'm not sure what adding a minlength rule would add for us here, but this could be a misunderstanding on my part.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the start of its life, this status field is zero-length

Initially it should be omitted. Which means no validations run. If there's no semantic difference between zero length and omitted, you should not allow the zero length list to be persisted, so set a MinLength=1. You shouldn't need to pre-check for the zero value on any of these immutable validations.

Just to be sure, there is no configuration that should ever result in [] being the correct answer for this field?

AccessedBuckets []AccessedBucket `json:"accessedBuckets"`

// driverName holds a copy of the BucketAccessClass driver name from the time of BucketAccess
// provisioning. This field is populated by the COSI Controller.
Comment on lines +118 to +119

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any validations you can apply?

// +optional
// +kubebuilder:validation:XValidation:message="driverName is immutable once set",rule="oldSelf == '' || self == oldSelf"
DriverName string `json:"driverName"`

// authenticationType holds a copy of the BucketAccessClass authentication type from the time of
// BucketAccess provisioning. This field is populated by the COSI Controller.
Comment on lines +124 to +125

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document valid choices and what they mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here. I have documented the type itself. BucketAccessAuthenticationType has an enum defined which shows up in the CRD spec. What makes this approach insufficient?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm aiming for here is complete documentation in the godoc on fields. This means that, without seeing the CRD schema, just seeing the godoc for each field, a user should understand what the valid values are for setting this field.

Many tools exist today to allow folks to create documentation for their APIs from a generated schema. But generally the easy way for anyone to get docs for a field would be kubectl explain.

So as an end user, I have a cluster running, I run kubectl explain on a field within a CRD, I don't get to see the CRD spec, I get to see the godoc. If you don't document the valid enum values in your godoc, even though they exist in the schema, the user has no idea what the valid values are

// +optional
// +kubebuilder:validation:XValidation:message="authenticationType is immutable once set",rule="oldSelf == '' || self == oldSelf"
AuthenticationType BucketAccessAuthenticationType `json:"authenticationType"`

// parameters holds a copy of the BucketAccessClass parameters from the time of BucketAccess
// provisioning. This field is populated by the COSI Controller.
// +optional
// +kubebuilder:validation:XValidation:message="accessedBuckets is immutable once set",rule="oldSelf.size() == 0 || self == oldSelf"
Parameters map[string]string `json:"parameters,omitempty"`

// error holds the most recent error message, with a timestamp.
// This is cleared when provisioning is successful.
// +optional
Error *TimestampedError `json:"error,omitempty"`
Comment on lines +136 to +139

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better represented as conditions

}

// BucketClaimAccess selects a BucketClaim for access, defines access parameters for the
// corresponding bucket, and specifies where user-consumable bucket information and access
// credentials for the accessed bucket will be stored.
type BucketClaimAccess struct {
// bucketClaimName is the name of a BucketClaim the access should have permissions for.
// The BucketClaim must be in the same Namespace as the BucketAccess.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Comment on lines +149 to +150

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply the DNS1123 subdomain feedback here too

BucketClaimName string `json:"bucketClaimName"`

// accessMode is the Read/Write access mode that the access should have for the bucket.
// Possible values: ReadWrite, ReadOnly, WriteOnly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you need to explain what these different choices mean?

// +required
AccessMode BucketAccessMode `json:"accessMode"`

// accessSecretName is the name of a Kubernetes Secret that COSI should create and populate with
// bucket info and access credentials for the bucket.
// The Secret is created in the same Namespace as the BucketAccess and is deleted when the
// BucketAccess is deleted and deprovisioned.
// The Secret name must be unique across all bucketClaimRefs for all BucketAccesses in the same
// Namespace.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Comment on lines +165 to +166

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply DNS 1123 subdomain feedback here

AccessSecretName string `json:"accessSecretName"`
}

// AccessedBucket identifies a Bucket and correlates it to a BucketClaimAccess from the spec.
type AccessedBucket struct {
// bucketName is the name of a Bucket the access should have permissions for.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Comment on lines +174 to +175

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS 1123 subdomain feedback? Bucket here is a kube object or an external object?

BucketName string `json:"bucketName"`

// bucketClaimName must match a BucketClaimAccess's BucketClaimName from the spec.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Comment on lines +180 to +181

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS 1123 subdomain feedback here

BucketClaimName string `json:"bucketClaimName"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=unapproved, experimental v1alpha2 changes"

// BucketAccess is the Schema for the bucketaccesses API
type BucketAccess struct {
metav1.TypeMeta `json:",inline"`

// metadata is a standard object metadata
// +optional
metav1.ObjectMeta `json:"metadata,omitempty,omitzero"`

// spec defines the desired state of BucketAccess
// +required
Spec BucketAccessSpec `json:"spec"`

// status defines the observed state of BucketAccess
// +optional
Status BucketAccessStatus `json:"status,omitempty,omitzero"`
}

// +kubebuilder:object:root=true

// BucketAccessList contains a list of BucketAccess
type BucketAccessList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []BucketAccess `json:"items"`
}

func init() {
SchemeBuilder.Register(&BucketAccess{}, &BucketAccessList{})
}
Loading