Skip to content

Latest commit

 

History

History
150 lines (78 loc) · 7.12 KB

File metadata and controls

150 lines (78 loc) · 7.12 KB

API Reference

Packages

llmd.ai/v1alpha1

Package v1alpha1 contains API Schema definitions for the llmd v1alpha1 API group.

Resource Types

ActuationStatus

ActuationStatus provides details about the actuation process and its current status.

Appears in:

Field Description Default Validation
applied boolean Applied indicates whether the actuation was successfully applied.

OptimizedAlloc

OptimizedAlloc describes the target optimized allocation for a model variant.

Appears in:

Field Description Default Validation
lastRunTime Time LastRunTime is the timestamp of the last optimization run.
accelerator string Accelerator is the type of accelerator for the optimized allocation. This field is deprecated and will be removed in a future version. Use node selector or node affinity from scale target instead.
numReplicas integer NumReplicas is the number of replicas for the optimized allocation.
nil means no optimization decision has been made yet.
Minimum: 0

VariantAutoscaling

VariantAutoscaling is the Schema for the variantautoscalings API. It represents the autoscaling configuration and status for a model variant.

Appears in:

Field Description Default Validation
apiVersion string llmd.ai/v1alpha1
kind string VariantAutoscaling
kind string Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
Optional: {}
apiVersion string APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
Optional: {}
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec VariantAutoscalingSpec Spec defines the desired state for autoscaling the model variant.
status VariantAutoscalingStatus Status represents the current status of autoscaling for the model variant.

VariantAutoscalingConfigSpec

VariantAutoscalingConfigSpec holds the optional tuning fields for a VariantAutoscaling. It is extracted as a standalone embeddable type so that higher-level controllers (e.g. KServe) can inline it without duplicating field definitions.

Appears in:

Field Description Default Validation
variantCost string VariantCost specifies the cost per replica for this variant (used in saturation analysis). 10.0 Optional: {}
Pattern: ^\d+(\.\d+)?$

VariantAutoscalingList

VariantAutoscalingList contains a list of VariantAutoscaling resources.

Field Description Default Validation
apiVersion string llmd.ai/v1alpha1
kind string VariantAutoscalingList
kind string Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
Optional: {}
apiVersion string APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
Optional: {}
metadata ListMeta Refer to Kubernetes API documentation for fields of metadata.
items VariantAutoscaling array Items is the list of VariantAutoscaling resources.

VariantAutoscalingSpec

VariantAutoscalingSpec defines the desired state for autoscaling a model variant.

Appears in:

Field Description Default Validation
scaleTargetRef CrossVersionObjectReference ScaleTargetRef references the scalable resource to manage.
This follows the same pattern as HorizontalPodAutoscaler.
Required: {}
modelID string ModelID specifies the unique identifier of the model to be autoscaled. MinLength: 1
Required: {}
minReplicas integer MinReplicas is the lower bound on the number of replicas for this variant.
A value of 0 enables scale-to-zero when the model is idle.
Defaults to 1, preserving existing behavior for VAs that omit this field.
1 Minimum: 0
Optional: {}
maxReplicas integer MaxReplicas is the upper bound on the number of replicas for this variant.
The autoscaler will never scale beyond this value regardless of load.
2 Minimum: 1
variantCost string VariantCost specifies the cost per replica for this variant (used in saturation analysis). 10.0 Optional: {}
Pattern: ^\d+(\.\d+)?$

VariantAutoscalingStatus

VariantAutoscalingStatus represents the current status of autoscaling for a variant, including the current allocation, desired optimized allocation, and actuation status.

Appears in:

Field Description Default Validation
desiredOptimizedAlloc OptimizedAlloc DesiredOptimizedAlloc indicates the target optimized allocation based on autoscaling logic.
actuation ActuationStatus Actuation provides details about the actuation process and its current status.
conditions Condition array Conditions represent the latest available observations of the VariantAutoscaling's state Optional: {}