-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.1 API Review #154
v0.1 API Review #154
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
/* | ||
Copyright 2024 The Kubernetes Authors. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
// Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group | ||
// +kubebuilder:object:generate=true | ||
// +groupName=inference.networking.x-k8s.io | ||
package v1alpha1 | ||
|
||
import ( | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
"sigs.k8s.io/controller-runtime/pkg/scheme" | ||
) | ||
|
||
var ( | ||
// GroupVersion is group version used to register these objects | ||
GroupVersion = schema.GroupVersion{Group: "inference.networking.x-k8s.io", Version: "v1alpha1"} | ||
|
||
// SchemeGroupVersion is alias to GroupVersion for client-go libraries. | ||
// It is required by pkg/client/informers/externalversions/... | ||
SchemeGroupVersion = GroupVersion | ||
|
||
// SchemeBuilder is used to add go types to the GroupVersionKind scheme | ||
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion} | ||
|
||
// AddToScheme adds the types in this group-version to the given scheme. | ||
AddToScheme = SchemeBuilder.AddToScheme | ||
) | ||
|
||
// Resource is required by pkg/client/listers/... | ||
func Resource(resource string) schema.GroupResource { | ||
return GroupVersion.WithResource(resource).GroupResource() | ||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,202 @@ | ||||||
/* | ||||||
Copyright 2024 The Kubernetes Authors. | ||||||
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); | ||||||
you may not use this file except in compliance with the License. | ||||||
You may obtain a copy of the License at | ||||||
|
||||||
http://www.apache.org/licenses/LICENSE-2.0 | ||||||
|
||||||
Unless required by applicable law or agreed to in writing, software | ||||||
distributed under the License is distributed on an "AS IS" BASIS, | ||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||
See the License for the specific language governing permissions and | ||||||
limitations under the License. | ||||||
*/ | ||||||
|
||||||
package v1alpha1 | ||||||
|
||||||
import ( | ||||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||||||
) | ||||||
|
||||||
// InferenceModel is the Schema for the InferenceModels API. | ||||||
// | ||||||
// +kubebuilder:object:root=true | ||||||
// +kubebuilder:subresource:status | ||||||
// +genclient | ||||||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
type InferenceModel struct { | ||||||
metav1.TypeMeta `json:",inline"` | ||||||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||||||
|
||||||
Spec InferenceModelSpec `json:"spec,omitempty"` | ||||||
Status InferenceModelStatus `json:"status,omitempty"` | ||||||
} | ||||||
|
||||||
// InferenceModelList contains a list of InferenceModel. | ||||||
// | ||||||
// +kubebuilder:object:root=true | ||||||
type InferenceModelList struct { | ||||||
metav1.TypeMeta `json:",inline"` | ||||||
metav1.ListMeta `json:"metadata,omitempty"` | ||||||
Items []InferenceModel `json:"items"` | ||||||
} | ||||||
|
||||||
// InferenceModelSpec represents the desired state of a specific model use case. This resource is | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "model use case" language wasn't immediately clear to me here. Is it fair to just say:
Suggested change
Or are we trying to make some additional distinction? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was just trying not to use a circular definition, and clarify what an InferenceModel is intending to represent. Open to changing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Small thing, please consider resolved at your discretion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding more high-level documentation to the |
||||||
// managed by the "Inference Workload Owner" persona. | ||||||
// | ||||||
// The Inference Workload Owner persona is someone that trains, verifies, and | ||||||
// leverages a large language model from a model frontend, drives the lifecycle | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can assume anyone that's made it here understands "inference", "training", "models", e.t.c. but might it be worth explaining more or enumerating some examples of a "model frontend" if we're going to mention that here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tried rewording here. |
||||||
// and rollout of new versions of those models, and defines the specific | ||||||
// performance and latency goals for the model. These workloads are | ||||||
// expected to operate within an InferencePool sharing compute capacity with other | ||||||
// InferenceModels, defined by the Inference Platform Admin. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Platform Admin is only mentioned here. Do we need some sort of description similar to what you described above for "Inference Workload Owner" persona? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, great point. I just added a section to the inference pool where we mention the platform admin on the InferencePool obj and link to the description we provide in the proposal: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/204/files#diff-5c9369304e66d8144be23e421c131a54c4d370853b28f4cbd95b2ce4f43101baR26 LMKWYT |
||||||
// | ||||||
// InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool, | ||||||
// if the name is reused, an error will be shown on the status of a | ||||||
// InferenceModel that attempted to reuse. The oldest InferenceModel, based on | ||||||
// creation timestamp, will be selected to remain valid. In the event of a race | ||||||
// condition, one will be selected at random. | ||||||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
type InferenceModelSpec struct { | ||||||
// ModelName is the name of the model as the users set in the "model" parameter in the requests. | ||||||
// The name should be unique among the workloads that reference the same backend pool. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You've said should here, but is must meant? Do we already have a mechanism in place to enforce this? |
||||||
// This is the parameter that will be used to match the request with. In the future, we may | ||||||
// allow to match on other request parameters. The other approach to support matching | ||||||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
// on other request parameters is to use a different ModelName per HTTPFilter. | ||||||
// Names can be reserved without implementing an actual model in the pool. | ||||||
// This can be done by specifying a target model and setting the weight to zero, | ||||||
// an error will be returned specifying that no valid target model is found. | ||||||
// | ||||||
// +kubebuilder:validation:MaxLength=256 | ||||||
// +kubebuilder:validation:Required | ||||||
ModelName string `json:"modelName"` | ||||||
|
||||||
// Criticality defines how important it is to serve the model compared to other models referencing the same pool. | ||||||
// The lack of defaulting is intentional, the behavior of not setting criticality future-proofs the API without complicating. | ||||||
// | ||||||
// +optional | ||||||
Criticality *Criticality `json:"criticality,omitempty"` | ||||||
Comment on lines
+74
to
+78
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been seeing and hearing a lot of discussion about This doesn't mean I'm strictly against it or anything mind you, I'm just trying to think through how this plays out in the real world. It might help me personally (since I'm very new to this project) to see some of the motivation and user stories that influenced criticality if someone can spare a link. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is some more recent discussion here: https://kubernetes.slack.com/archives/C071WA7R9LY/p1736906518995639 But Criticality has been a hot topic, agreed that it might not be in its final state. Since criticality is used in a load balancing aspect, we are trying to limit options to have something that we can guarantee to support out of the box. We expect iteration in the future as we (hopefully) increase usage. Would opening an issue about criticality to centralize conversation be acceptable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, a TODO issue just to make sure the conversation remains topical and continues is a reasonable deferral at this stage so that we can keep velocity up and test it out in its current state and see what that teaches us. 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created #213 and added a todo in Criticality comments |
||||||
|
||||||
// TargetModels allow multiple versions of a model for traffic splitting. | ||||||
// If not specified, the target model name is defaulted to the modelName parameter. | ||||||
// modelName is often in reference to a LoRA adapter. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth expanding on this piece in particular in this documentation to help future code readers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, added examples to more clearly explain the variability. |
||||||
// | ||||||
// +optional | ||||||
// +kubebuilder:validation:MaxItems=10 | ||||||
TargetModels []TargetModel `json:"targetModels,omitempty"` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If target model mutable? can it be updated to add or remove models? if that happen, are the weights recalculated ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, targetModels can be added/removed. The weights do recalculate, they are all relative to one another. Link to how the weights are consumed here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then document it, IIUIC it should be something like |
||||||
|
||||||
// PoolRef is a reference to the inference pool, the pool must exist in the same namespace. | ||||||
// | ||||||
// +kubebuilder:validation:Required | ||||||
PoolRef PoolObjectReference `json:"poolRef"` | ||||||
} | ||||||
|
||||||
// PoolObjectReference identifies an API object within the namespace of the | ||||||
// referrer. | ||||||
type PoolObjectReference struct { | ||||||
// Group is the group of the referent. | ||||||
// | ||||||
// +optional | ||||||
// +kubebuilder:default="inference.networking.x-k8s.io" | ||||||
// +kubebuilder:validation:MaxLength=253 | ||||||
// +kubebuilder:validation:Pattern=`^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` | ||||||
Group string `json:"group,omitempty"` | ||||||
|
||||||
// Kind is kind of the referent. For example "InferencePool". | ||||||
// | ||||||
// +optional | ||||||
// +kubebuilder:default="InferencePool" | ||||||
// +kubebuilder:validation:MinLength=1 | ||||||
// +kubebuilder:validation:MaxLength=63 | ||||||
// +kubebuilder:validation:Pattern=`^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$` | ||||||
Kind string `json:"kind,omitempty"` | ||||||
|
||||||
// Name is the name of the referent. | ||||||
// | ||||||
// +kubebuilder:validation:MinLength=1 | ||||||
// +kubebuilder:validation:MaxLength=253 | ||||||
// +kubebuilder:validation:Required | ||||||
Name string `json:"name"` | ||||||
} | ||||||
|
||||||
// Criticality defines how important it is to serve the model compared to other models. | ||||||
// +kubebuilder:validation:Enum=Critical;Default;Sheddable | ||||||
type Criticality string | ||||||
|
||||||
const ( | ||||||
// Critical defines the highest level of criticality. Requests to this band will be shed last. | ||||||
Critical Criticality = "Critical" | ||||||
|
||||||
// Default defines the default criticality level and is more important than Sheddable but less | ||||||
// important than Critical. Requests in this band will be shed before critical traffic. | ||||||
Default Criticality = "Default" | ||||||
|
||||||
// Sheddable defines the lowest level of criticality. Requests to this band will be shed before | ||||||
// all other bands. | ||||||
Sheddable Criticality = "Sheddable" | ||||||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
) | ||||||
|
||||||
// TargetModel represents a deployed model or a LoRA adapter. The | ||||||
// Name field is expected to match the name of the LoRA adapter | ||||||
// (or base model) as it is registered within the model server. Inference | ||||||
// Gateway assumes that the model exists on the model server and it's the | ||||||
// responsibility of the user to validate a correct match. Should a model fail | ||||||
// to exist at request time, the error is processed by the Inference Gateway | ||||||
// and emitted on the appropriate InferenceModel object. | ||||||
type TargetModel struct { | ||||||
// Name is the name of the adapter or base model, as expected by the ModelServer. | ||||||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
// | ||||||
// +kubebuilder:validation:MaxLength=253 | ||||||
// +kubebuilder:validation:Required | ||||||
Name string `json:"name"` | ||||||
|
||||||
// Weight is used to determine the proportion of traffic that should be | ||||||
// sent to this model when multiple target models are specified. | ||||||
// | ||||||
// Weight defines the proportion of requests forwarded to the specified | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. one question, if have 1 client with http2 that sends 1000 requests (all requests are pipelined over the same connection) and two models with 50 and 50, is the result 500 requests for each model? Is request a connection request, a token , an http request? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gateway implementations handle the actual connection (ext-proc just uses gRPC communication with the GW) But yes, assuming equal weighting for 2 underlying models, the mathematical probability should be a 50:50 split over a large enough sample pool There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From our experience with Services and affinity I suggest you add these details as part of the API description, since people my interpret it differently and then will open issues about it and eventually someone will want to modify the API to adopt a new behavior and you end with inconsistent implementations There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @robscott note that the |
||||||
// model. This is computed as weight/(sum of all weights in this | ||||||
// TargetModels list). For non-zero values, there may be some epsilon from | ||||||
// the exact proportion defined here depending on the precision an | ||||||
// implementation supports. Weight is not a percentage and the sum of | ||||||
// weights does not need to equal 100. | ||||||
// | ||||||
// If only one model is specified and it has a weight greater than 0, 100% | ||||||
// of the traffic is forwarded to that model. If weight is set to 0, no | ||||||
// traffic should be forwarded for this model. | ||||||
// | ||||||
// +optional | ||||||
// +kubebuilder:validation:Minimum=0 | ||||||
// +kubebuilder:validation:Maximum=1000000 | ||||||
Weight *int32 `json:"weight,omitempty"` | ||||||
} | ||||||
|
||||||
// InferenceModelStatus defines the observed state of InferenceModel | ||||||
type InferenceModelStatus struct { | ||||||
// Conditions track the state of the InferenceModel. | ||||||
Conditions []metav1.Condition `json:"conditions,omitempty"` | ||||||
} | ||||||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
// InferenceModelConditionType is a type of condition for the InferenceModel. | ||||||
type InferenceModelConditionType string | ||||||
|
||||||
// InferenceModelConditionReason is the reason for a given InferenceModelConditionType. | ||||||
type InferenceModelConditionReason string | ||||||
|
||||||
const ( | ||||||
// This condition indicates whether the model is ready for traffic or not, and why. | ||||||
ModelConditionReady InferenceModelConditionType = "Ready" | ||||||
|
||||||
// Desired state. Model is ready for serving with no conflicts or issues. | ||||||
ModelReasonReady InferenceModelConditionReason = "Ready" | ||||||
|
||||||
// This reason is used when a given ModelName already exists within the pool. | ||||||
// Details about naming conflict resolution are on the ModelName field itself. | ||||||
ModelReasonNameInUse InferenceModelConditionReason = "ModelNameInUse" | ||||||
|
||||||
// This reason is the initial state, and indicates that the controller has not yet reconciled the InferenceModel. | ||||||
ModelReasonPending InferenceModelConditionReason = "Pending" | ||||||
) | ||||||
|
||||||
func init() { | ||||||
SchemeBuilder.Register(&InferenceModel{}, &InferenceModelList{}) | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
/* | ||
Copyright 2024 The Kubernetes Authors. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package v1alpha1 | ||
|
||
import ( | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
) | ||
|
||
// InferencePool is the Schema for the InferencePools API. | ||
// | ||
// +kubebuilder:object:root=true | ||
// +kubebuilder:subresource:status | ||
// +genclient | ||
type InferencePool struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||
|
||
Spec InferencePoolSpec `json:"spec,omitempty"` | ||
Status InferencePoolStatus `json:"status,omitempty"` | ||
} | ||
|
||
// InferencePoolList contains a list of InferencePool. | ||
// | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to the above it might be nice to provide more documentation about what (and why) InferencePool is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added more details here, LMKWYT! |
||
// +kubebuilder:object:root=true | ||
type InferencePoolList struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ListMeta `json:"metadata,omitempty"` | ||
Items []InferencePool `json:"items"` | ||
} | ||
|
||
// InferencePoolSpec defines the desired state of InferencePool | ||
type InferencePoolSpec struct { | ||
// Selector defines a map of label to watch model server pods | ||
// that should be included in the InferencePool. | ||
// In some cases, implementations may translate this to a Service selector, so this matches the simple | ||
// map used for Service selectors instead of the full Kubernetes LabelSelector type. | ||
// | ||
// +kubebuilder:validation:Required | ||
Selector map[LabelKey]LabelValue `json:"selector"` | ||
|
||
// TargetPortNumber defines the port number to access the selected model servers. | ||
// The number must be in the range 1 to 65535. | ||
// | ||
// +kubebuilder:validation:Minimum=1 | ||
// +kubebuilder:validation:Maximum=65535 | ||
Comment on lines
+56
to
+59
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If applicable, I suggest reducing this port range to a smaller list of well-known ports that users can rely on for firewall configuration purposes. Also, don't allow overlap with other well-known ports like those used for dns, http/s, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Holding off on changing this one, just to gather consensus on what range we should limit it to. But I do agree with the idea. It's possible that we could start with a small range and relax as needed. As the other direction would be nigh impossible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @candita this is meant to be a reference to a port number on a Pod, I can't think of any reasonable way to limit that since I think Kubernetes has likely scaled far enough that there's probably at least one case of each individual port being in use across the many Kubernetes that exist. |
||
// +kubebuilder:validation:Required | ||
TargetPortNumber int32 `json:"targetPortNumber"` | ||
} | ||
|
||
// LabelKey was originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731 | ||
// Duplicated as to not take an unexpected dependency on gw's API. | ||
// | ||
// LabelKey is the key of a label. This is used for validation | ||
// of maps. This matches the Kubernetes "qualified name" validation that is used for labels. | ||
// | ||
// Valid values include: | ||
// | ||
// * example | ||
// * example.com | ||
// * example.com/path | ||
// * example.com/path.html | ||
// | ||
// Invalid values include: | ||
// | ||
// * example~ - "~" is an invalid character | ||
// * example.com. - can not start or end with "." | ||
// | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=253 | ||
// +kubebuilder:validation:Pattern=`^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$` | ||
type LabelKey string | ||
|
||
// LabelValue is the value of a label. This is used for validation | ||
// of maps. This matches the Kubernetes label validation rules: | ||
// * must be 63 characters or less (can be empty), | ||
// * unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]), | ||
// * could contain dashes (-), underscores (_), dots (.), and alphanumerics between. | ||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// Labels are case sensitive, so: my-label and My-Label are considered distinct. | ||
// | ||
// Valid values include: | ||
// | ||
// * MyValue | ||
// * my.name | ||
// * 123-my-value | ||
// | ||
// +kubebuilder:validation:MinLength=0 | ||
// +kubebuilder:validation:MaxLength=63 | ||
// +kubebuilder:validation:Pattern=`^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$` | ||
type LabelValue string | ||
|
||
// InferencePoolStatus defines the observed state of InferencePool | ||
type InferencePoolStatus struct { | ||
// Conditions track the state of the InferencePool. | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
kfswain marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
// InferencePoolConditionType is a type of condition for the InferencePool | ||
type InferencePoolConditionType string | ||
|
||
// InferencePoolConditionReason is the reason for a given InferencePoolConditionType | ||
type InferencePoolConditionReason string | ||
|
||
const ( | ||
// This condition indicates whether the pool is ready for traffic or not, and why. | ||
PoolConditionReady InferencePoolConditionType = "Ready" | ||
|
||
// Desired state. The pool and its components are initialized and ready for traffic. | ||
PoolReasonReady InferencePoolConditionReason = "Ready" | ||
|
||
// This reason is used when the EPP has not yet passed health checks, or has started failing them. | ||
PoolReasonEPPNotHealthy InferencePoolConditionReason = "EndpointPickerNotHealthy" | ||
|
||
// This reason is the initial state, and indicates that the controller has not yet reconciled this pool. | ||
PoolReasonPending InferencePoolConditionReason = "Pending" | ||
) | ||
|
||
func init() { | ||
SchemeBuilder.Register(&InferencePool{}, &InferencePoolList{}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InferenceModel
appears to be topical, and of particular importance, so it may be one of the first thing a newcomer reads when learning these APIs. It may be beneficial to expand the documentation here to explain more thoroughly the "what" and "why" of it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a cop out to reference our site or the doc proposal that goes into a bit more detail?
I could see a brief blurb being valuable, but a more detailed explanation offloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also go into more detail in the
spec
maybe it could be as simple as 'a more detailed description is affixed to the InferenceModelSpec field below'?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A link may be sufficient. As far as the
spec
section goes, I was thinking maybe higher level but maybe that's OK as well.I would like to see something more in the documentation here. I trust your judgement in what that is, just please consider what a newcomer might be looking for when they come here and try to accommodate for that. Otherwise please feel free to consider this comment resolved at your discretion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a stab! LMKWYT