Package v1alpha1 contains API Schema definitions for the llmd v1alpha1 API group.
ActuationStatus provides details about the actuation process and its current status.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
applied boolean |
Applied indicates whether the actuation was successfully applied. |
OptimizedAlloc describes the target optimized allocation for a model variant.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
lastRunTime Time |
LastRunTime is the timestamp of the last optimization run. | ||
accelerator string |
Accelerator is the type of accelerator for the optimized allocation. This field is deprecated and will be removed in a future version. Use node selector or node affinity from scale target instead. | ||
numReplicas integer |
NumReplicas is the number of replicas for the optimized allocation. nil means no optimization decision has been made yet. |
Minimum: 0 |
VariantAutoscaling is the Schema for the variantautoscalings API. It represents the autoscaling configuration and status for a model variant.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
llmd.ai/v1alpha1 |
||
kind string |
VariantAutoscaling |
||
kind string |
Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
Optional: {} |
|
apiVersion string |
APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources |
Optional: {} |
|
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec VariantAutoscalingSpec |
Spec defines the desired state for autoscaling the model variant. | ||
status VariantAutoscalingStatus |
Status represents the current status of autoscaling for the model variant. |
VariantAutoscalingConfigSpec holds the optional tuning fields for a VariantAutoscaling. It is extracted as a standalone embeddable type so that higher-level controllers (e.g. KServe) can inline it without duplicating field definitions.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
variantCost string |
VariantCost specifies the cost per replica for this variant (used in saturation analysis). | 10.0 | Optional: {} Pattern: ^\d+(\.\d+)?$ |
VariantAutoscalingList contains a list of VariantAutoscaling resources.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
llmd.ai/v1alpha1 |
||
kind string |
VariantAutoscalingList |
||
kind string |
Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
Optional: {} |
|
apiVersion string |
APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources |
Optional: {} |
|
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items VariantAutoscaling array |
Items is the list of VariantAutoscaling resources. |
VariantAutoscalingSpec defines the desired state for autoscaling a model variant.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
scaleTargetRef CrossVersionObjectReference |
ScaleTargetRef references the scalable resource to manage. This follows the same pattern as HorizontalPodAutoscaler. |
Required: {} |
|
modelID string |
ModelID specifies the unique identifier of the model to be autoscaled. | MinLength: 1 Required: {} |
|
minReplicas integer |
MinReplicas is the lower bound on the number of replicas for this variant. A value of 0 enables scale-to-zero when the model is idle. Defaults to 1, preserving existing behavior for VAs that omit this field. |
1 | Minimum: 0 Optional: {} |
maxReplicas integer |
MaxReplicas is the upper bound on the number of replicas for this variant. The autoscaler will never scale beyond this value regardless of load. |
2 | Minimum: 1 |
variantCost string |
VariantCost specifies the cost per replica for this variant (used in saturation analysis). | 10.0 | Optional: {} Pattern: ^\d+(\.\d+)?$ |
VariantAutoscalingStatus represents the current status of autoscaling for a variant, including the current allocation, desired optimized allocation, and actuation status.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
desiredOptimizedAlloc OptimizedAlloc |
DesiredOptimizedAlloc indicates the target optimized allocation based on autoscaling logic. | ||
actuation ActuationStatus |
Actuation provides details about the actuation process and its current status. | ||
conditions Condition array |
Conditions represent the latest available observations of the VariantAutoscaling's state | Optional: {} |