Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.1 API Review #154

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions api/groupversion_info.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group
// +kubebuilder:object:generate=true
// +groupName=inference.networking.x-k8s.io
package v1alpha1

import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)

var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: "inference.networking.x-k8s.io", Version: "v1alpha1"}

// SchemeGroupVersion is alias to GroupVersion for client-go libraries.
// It is required by pkg/client/informers/externalversions/...
SchemeGroupVersion = GroupVersion

// SchemeBuilder is used to add go types to the GroupVersionKind scheme
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)

// Resource is required by pkg/client/listers/...
func Resource(resource string) schema.GroupResource {
return GroupVersion.WithResource(resource).GroupResource()
}
204 changes: 204 additions & 0 deletions api/inferencemodel_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// InferenceModel is the Schema for the InferenceModels API.
Copy link
Member

@shaneutt shaneutt Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InferenceModel appears to be topical, and of particular importance, so it may be one of the first thing a newcomer reads when learning these APIs. It may be beneficial to expand the documentation here to explain more thoroughly the "what" and "why" of it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a cop out to reference our site or the doc proposal that goes into a bit more detail?

I could see a brief blurb being valuable, but a more detailed explanation offloaded.

Copy link
Collaborator Author

@kfswain kfswain Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also go into more detail in the spec maybe it could be as simple as 'a more detailed description is affixed to the InferenceModelSpec field below'?

Copy link
Member

@shaneutt shaneutt Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link may be sufficient. As far as the spec section goes, I was thinking maybe higher level but maybe that's OK as well.

I would like to see something more in the documentation here. I trust your judgement in what that is, just please consider what a newcomer might be looking for when they come here and try to accommodate for that. Otherwise please feel free to consider this comment resolved at your discretion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab! LMKWYT

//
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +genclient
kfswain marked this conversation as resolved.
Show resolved Hide resolved
type InferenceModel struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec InferenceModelSpec `json:"spec,omitempty"`
Status InferenceModelStatus `json:"status,omitempty"`
}

// InferenceModelList contains a list of InferenceModel.
//
// +kubebuilder:object:root=true
type InferenceModelList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []InferenceModel `json:"items"`
}

// InferenceModelSpec represents the desired state of a specific model use case. This resource is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "model use case" language wasn't immediately clear to me here. Is it fair to just say:

Suggested change
// InferenceModelSpec represents the desired state of a specific model use case. This resource is
// InferenceModelSpec represents the desired state of an InferenceModel. This resource is

Or are we trying to make some additional distinction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying not to use a circular definition, and clarify what an InferenceModel is intending to represent. Open to changing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, please consider resolved at your discretion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more high-level documentation to the InferenceModel obj allowed me to make the model use case - InferenceModel linkage above. Rewording.

// managed by the "Inference Workload Owner" persona.
//
// The Inference Workload Owner persona is someone that trains, verifies, and
// leverages a large language model from a model frontend, drives the lifecycle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assume anyone that's made it here understands "inference", "training", "models", e.t.c. but might it be worth explaining more or enumerating some examples of a "model frontend" if we're going to mention that here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried rewording here.

// and rollout of new versions of those models, and defines the specific
// performance and latency goals for the model. These workloads are
// expected to operate within an InferencePool sharing compute capacity with other
// InferenceModels, defined by the Inference Platform Admin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Platform Admin is only mentioned here. Do we need some sort of description similar to what you described above for "Inference Workload Owner" persona?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, great point. I just added a section to the inference pool where we mention the platform admin on the InferencePool obj and link to the description we provide in the proposal: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/204/files#diff-5c9369304e66d8144be23e421c131a54c4d370853b28f4cbd95b2ce4f43101baR26

LMKWYT

//
// InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool,
// if the name is reused, an error will be shown on the status of a
// InferenceModel that attempted to reuse. The oldest InferenceModel, based on
// creation timestamp, will be selected to remain valid. In the event of a race
// condition, one will be selected at random.
kfswain marked this conversation as resolved.
Show resolved Hide resolved
type InferenceModelSpec struct {
// ModelName is the name of the model as the users set in the "model" parameter in the requests.
// The name should be unique among the workloads that reference the same backend pool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've said should here, but is must meant? Do we already have a mechanism in place to enforce this?

// This is the parameter that will be used to match the request with. In the future, we may
// allow to match on other request parameters. The other approach to support matching
kfswain marked this conversation as resolved.
Show resolved Hide resolved
// on other request parameters is to use a different ModelName per HTTPFilter.
// Names can be reserved without implementing an actual model in the pool.
// This can be done by specifying a target model and setting the weight to zero,
// an error will be returned specifying that no valid target model is found.
//
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Required
ModelName string `json:"modelName"`
kfswain marked this conversation as resolved.
Show resolved Hide resolved

// Criticality defines how important it is to serve the model compared to other models referencing the same pool.
//
// +optional
// +kubebuilder:default="Default"
kfswain marked this conversation as resolved.
Show resolved Hide resolved
Criticality *Criticality `json:"criticality,omitempty"`

// TargetModels allow multiple versions of a model for traffic splitting.
// If not specified, the target model name is defaulted to the modelName parameter.
// modelName is often in reference to a LoRA adapter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth expanding on this piece in particular in this documentation to help future code readers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added examples to more clearly explain the variability.

//
// +optional
// +kubebuilder:validation:MaxItems=10
TargetModels []TargetModel `json:"targetModels,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If target model mutable? can it be updated to add or remove models? if that happen, are the weights recalculated ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, targetModels can be added/removed.

The weights do recalculate, they are all relative to one another. Link to how the weights are consumed here:

func RandomWeightedDraw(model *v1alpha1.InferenceModel, seed int64) string {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then document it, IIUIC it should be something like Target models can be added or removed and its weights will be recalculated


// PoolRef is a reference to the inference pool, the pool must exist in the same namespace.
//
// +kubebuilder:validation:Required
PoolRef PoolObjectReference `json:"poolRef"`
}

// PoolObjectReference identifies an API object within the namespace of the
// referrer.
type PoolObjectReference struct {
// Group is the group of the referent.
//
// +optional
// +kubebuilder:default="inference.networking.x-k8s.io"
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
Group string `json:"group,omitempty"`

// Kind is kind of the referent. For example "InferencePool".
//
// +optional
// +kubebuilder:default="InferencePool"
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:Pattern=`^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$`
Kind string `json:"kind,omitempty"`

// Name is the name of the referent.
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Required
Name string `json:"name"`
}

// Criticality defines how important it is to serve the model compared to other models.
// +kubebuilder:validation:Enum=Critical;Default;Sheddable
type Criticality string

const (
// Critical defines the highest level of criticality. Requests to this band will be shed last.
Critical Criticality = "Critical"

// Default defines the default criticality level and is more important than Sheddable but less
// important than Critical. Requests in this band will be shed before critical traffic.
Default Criticality = "Default"

// Sheddable defines the lowest level of criticality. Requests to this band will be shed before
// all other bands.
Sheddable Criticality = "Sheddable"
kfswain marked this conversation as resolved.
Show resolved Hide resolved
)

// TargetModel represents a deployed model or a LoRA adapter. The
// Name field is expected to match the name of the LoRA adapter
// (or base model) as it is registered within the model server. Inference
// Gateway assumes that the model exists on the model server and it's the
// responsibility of the user to validate a correct match. Should a model fail
// to exist at request time, the error is processed by the Inference Gateway
// and emitted on the appropriate InferenceModel object.
type TargetModel struct {
// Name is the name of the adapter or base model, as expected by the ModelServer.
kfswain marked this conversation as resolved.
Show resolved Hide resolved
//
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Required
Name string `json:"name"`

// Weight is used to determine the proportion of traffic that should be
// sent to this model when multiple target models are specified.
//
// Weight defines the proportion of requests forwarded to the specified
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question, if have 1 client with http2 that sends 1000 requests (all requests are pipelined over the same connection) and two models with 50 and 50, is the result 500 requests for each model?

Is request a connection request, a token , an http request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway implementations handle the actual connection (ext-proc just uses gRPC communication with the GW)

But yes, assuming equal weighting for 2 underlying models, the mathematical probability should be a 50:50 split over a large enough sample pool

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our experience with Services and affinity I suggest you add these details as part of the API description, since people my interpret it differently and then will open issues about it and eventually someone will want to modify the API to adopt a new behavior and you end with inconsistent implementations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott note that the weight godocs were ported from gateway-api so ^ feedback should be incorporated there too.

// model. This is computed as weight/(sum of all weights in this
// TargetModels list). For non-zero values, there may be some epsilon from
// the exact proportion defined here depending on the precision an
// implementation supports. Weight is not a percentage and the sum of
// weights does not need to equal 100.
//
// If only one model is specified and it has a weight greater than 0, 100%
// of the traffic is forwarded to that model. If weight is set to 0, no
// traffic should be forwarded for this model. If unspecified, weight
// defaults to 1.
//
// +optional
// +kubebuilder:default=1
kfswain marked this conversation as resolved.
Show resolved Hide resolved
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=1000000
Weight int32 `json:"weight,omitempty"`
kfswain marked this conversation as resolved.
Show resolved Hide resolved
}

// InferenceModelStatus defines the observed state of InferenceModel
type InferenceModelStatus struct {
// Conditions track the state of the InferenceModel.
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
kfswain marked this conversation as resolved.
Show resolved Hide resolved

// InferenceModelConditionType is a type of condition for the InferenceModel.
type InferenceModelConditionType string

// InferenceModelConditionReason is the reason for a given InferenceModelConditionType.
type InferenceModelConditionReason string

const (
// This condition indicates whether the model is ready for traffic or not, and why.
ModelConditionReady InferenceModelConditionType = "Ready"

// Desired state. Model is ready for serving with no conflicts or issues.
ModelReasonReady InferenceModelConditionReason = "Ready"

// This reason is used when a given ModelName already exists within the pool.
// Details about naming conflict resolution are on the ModelName field itself.
ModelReasonNameInUse InferenceModelConditionReason = "ModelNameInUse"

// This reason is the initial state, and indicates that the controller has not yet reconciled the InferenceModel.
PoolReasonPending InferenceModelConditionReason = "Pending"
)

func init() {
SchemeBuilder.Register(&InferenceModel{}, &InferenceModelList{})
}
135 changes: 135 additions & 0 deletions api/inferencepool_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// InferencePool is the Schema for the InferencePools API.
//
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +genclient
type InferencePool struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec InferencePoolSpec `json:"spec,omitempty"`
Status InferencePoolStatus `json:"status,omitempty"`
}

// InferencePoolList contains a list of InferencePool.
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above it might be nice to provide more documentation about what (and why) InferencePool is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more details here, LMKWYT!

// +kubebuilder:object:root=true
type InferencePoolList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []InferencePool `json:"items"`
}

// InferencePoolSpec defines the desired state of InferencePool
type InferencePoolSpec struct {
// Selector defines a map of label to watch model server pods
// that should be included in the InferencePool. ModelServers should not
// be with any other Service or InferencePool, that behavior is not supported
// and will result in sub-optimal utilization.
kfswain marked this conversation as resolved.
Show resolved Hide resolved
// In some cases, implementations may translate this to a Service selector, so this matches the simple
// map used for Service selectors instead of the full Kubernetes LabelSelector type.
//
// +kubebuilder:validation:Required
Selector map[LabelKey]LabelValue `json:"selector"`

// TargetPortNumber defines the port number to access the selected model servers.
// The number must be in the range 1 to 65535.
//
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
Comment on lines +56 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If applicable, I suggest reducing this port range to a smaller list of well-known ports that users can rely on for firewall configuration purposes. Also, don't allow overlap with other well-known ports like those used for dns, http/s, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off on changing this one, just to gather consensus on what range we should limit it to. But I do agree with the idea.

It's possible that we could start with a small range and relax as needed. As the other direction would be nigh impossible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candita this is meant to be a reference to a port number on a Pod, I can't think of any reasonable way to limit that since I think Kubernetes has likely scaled far enough that there's probably at least one case of each individual port being in use across the many Kubernetes that exist.

// +kubebuilder:validation:Required
TargetPortNumber int32 `json:"targetPortNumber"`
}

// LabelKey was originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731
// Duplicated as to not take an unexpected dependency on gw's API.
//
// LabelKey is the key of a label. This is used for validation
// of maps. This matches the Kubernetes "qualified name" validation that is used for labels.
//
// Valid values include:
//
// * example
// * example.com
// * example.com/path
// * example.com/path.html
//
// Invalid values include:
//
// * example~ - "~" is an invalid character
// * example.com. - can not start or end with "."
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$`
type LabelKey string

// LabelValue is the value of a label. This is used for validation
// of maps. This matches the Kubernetes label validation rules:
// * must be 63 characters or less (can be empty),
// * unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]),
// * could contain dashes (-), underscores (_), dots (.), and alphanumerics between.
kfswain marked this conversation as resolved.
Show resolved Hide resolved
// Labels are case sensitive, so: my-label and My-Label are considered distinct.
//
// Valid values include:
//
// * MyValue
// * my.name
// * 123-my-value
//
// +kubebuilder:validation:MinLength=0
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:Pattern=`^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$`
type LabelValue string

// InferencePoolStatus defines the observed state of InferencePool
type InferencePoolStatus struct {
// Conditions track the state of the InferencePool.
Conditions []metav1.Condition `json:"conditions,omitempty"`
kfswain marked this conversation as resolved.
Show resolved Hide resolved
}

// InferencePoolConditionType is a type of condition for the InferencePool
type InferencePoolConditionType string

// InferencePoolConditionReason is the reason for a given InferencePoolConditionType
type InferencePoolConditionReason string

const (
// This condition indicates whether the pool is ready for traffic or not, and why.
PoolConditionReady InferencePoolConditionType = "Ready"

// Desired state. The pool and its components are initialized and ready for traffic.
PoolReasonReady InferencePoolConditionReason = "Ready"

// This reason is used when the EPP has not yet passed health checks, or has started failing them.
PoolReasonEPPNotHealthy InferencePoolConditionReason = "EndpointPickerNotHealthy"

// This reason is the initial state, and indicates that the controller has not yet reconciled this pool.
PoolReasonPending InferencePoolConditionReason = "Pending"
)

func init() {
SchemeBuilder.Register(&InferencePool{}, &InferencePoolList{})
}
Loading