Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion api/fma/v1alpha1/launcherconfig_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,34 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EmbeddedObjectMeta holds the subset of metav1.ObjectMeta fields that
// we want in the CRD schema, so that strict decoding accepts them.
type EmbeddedObjectMeta struct {
// Labels for organizing and categorizing objects.
// +optional
Labels map[string]string `json:"labels,omitempty"`

// Annotations for storing arbitrary non-identifying metadata.
// +optional
Annotations map[string]string `json:"annotations,omitempty"`
}

// EmbeddedPodTemplateSpec is a PodTemplateSpec whose metadata fields
// are explicitly declared so that the CRD schema admits them.
type EmbeddedPodTemplateSpec struct {
// +optional
Metadata EmbeddedObjectMeta `json:"metadata,omitempty"`

// Spec defines the behavior of pods created from this template.
// +optional
Spec corev1.PodSpec `json:"spec,omitempty"`
}

// LauncherConfigSpec defines the configuration to manage the nominal server-providing pod definition.
type LauncherConfigSpec struct {
// PodTemplate defines the pod specification for the server-providing pod.
// +optional
PodTemplate corev1.PodTemplateSpec `json:"podTemplate,omitempty"`
PodTemplate EmbeddedPodTemplateSpec `json:"podTemplate,omitempty"`

// MaxSleepingInstances is the maximum number of sleeping inference engine instances allowed per launcher pod.
// +kubebuilder:validation:Required
Expand Down
46 changes: 46 additions & 0 deletions api/fma/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 16 additions & 5 deletions config/crd/fma.llm-d.ai_launcherconfigs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,24 @@ spec:
properties:
metadata:
description: |-
Standard object's metadata.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
EmbeddedObjectMeta holds the subset of metav1.ObjectMeta fields that
we want in the CRD schema, so that strict decoding accepts them.
properties:
annotations:
additionalProperties:
type: string
description: Annotations for storing arbitrary non-identifying
metadata.
type: object
labels:
additionalProperties:
type: string
description: Labels for organizing and categorizing objects.
type: object
type: object
spec:
description: |-
Specification of the desired behavior of the pod.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
description: Spec defines the behavior of pods created from this
template.
properties:
activeDeadlineSeconds:
description: |-
Expand Down
7 changes: 5 additions & 2 deletions docs/cluster-sharing.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,13 @@ object.

(One design goal is to minimize chores for administrators of shared clusters.)

- As development progresses, we never change the definitions in an
existing version of the `fma.llm-d.ai` API group; we only add new
- As development progresses, we almost never change the definitions in
an existing version of the `fma.llm-d.ai` API group; we only add new
versions. Old versions may be deleted only once we are sure there is
no further dev/test activity using them.
We can make a change in an existing definition if the change does not
affect any current usage (e.g., add a field, change an optional thing
that does not appear in any existing YAML).

- During development of a PR that adds a version of the API group,
successive revisions of the PR's head branch can change the
Expand Down
12 changes: 9 additions & 3 deletions pkg/controller/utils/pod-helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,16 @@ import (
"encoding/json"
"errors"
"fmt"
"maps"
"regexp"
"slices"

corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/api/resource"
"k8s.io/apimachinery/pkg/util/intstr"

v1alpha1 "github.com/llm-d-incubation/llm-d-fast-model-actuation/api/fma/v1alpha1"
"github.com/llm-d-incubation/llm-d-fast-model-actuation/pkg/api"
"github.com/llm-d-incubation/llm-d-fast-model-actuation/pkg/controller/common"
)
Expand Down Expand Up @@ -141,10 +144,13 @@ func IsPodReady(pod *corev1.Pod) bool {

// BuildLauncherPodFromTemplate creates a launcher pod from a LauncherConfig object's
// Spec.PodTemplate and assigns the built launcher pod to a node
func BuildLauncherPodFromTemplate(template corev1.PodTemplateSpec, ns, nodeName, launcherConfigName string) (*corev1.Pod, error) {
func BuildLauncherPodFromTemplate(template v1alpha1.EmbeddedPodTemplateSpec, ns, nodeName, launcherConfigName string) (*corev1.Pod, error) {
pod := &corev1.Pod{
ObjectMeta: template.ObjectMeta,
Spec: *DeIndividualize(template.Spec.DeepCopy()),
ObjectMeta: metav1.ObjectMeta{
Labels: maps.Clone(template.Metadata.Labels),
Annotations: maps.Clone(template.Metadata.Annotations),
},
Comment on lines +149 to +152
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template.Metadata.Labels / Annotations are assigned directly into the Pod's ObjectMeta, but this function later mutates pod.Labels and pod.Annotations. Because maps are reference types, this will also mutate the LauncherConfig's cached spec.podTemplate.metadata maps (from the informer/lister) when users supply labels/annotations, which can cause surprising controller behavior and potential races. Clone the maps (or deep-copy the template metadata) before mutating the Pod's labels/annotations so the LauncherConfig object remains immutable in memory.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Spec: *DeIndividualize(template.Spec.DeepCopy()),
}
pod.Namespace = ns
pod.GenerateName = "launcher-"
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions pkg/generated/applyconfiguration/utils.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 10 additions & 3 deletions test/e2e/deploy_fma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,15 @@ CRD_NAMES=""
for crd_file in config/crd/*.yaml; do
crd_name=$(kubectl apply --dry-run=client -f "$crd_file" -o jsonpath='{.metadata.name}')
CRD_NAMES="$CRD_NAMES $crd_name"
if kubectl get crd "$crd_name" &>/dev/null; then
echo " CRD $crd_name already exists, skipping"
live_spec=$(kubectl get crd "$crd_name" -o jsonpath='{.spec}' 2>/dev/null) || live_spec=""
if [ -n "$live_spec" ]; then
desired_spec=$(kubectl apply --dry-run=client -f "$crd_file" -o jsonpath='{.spec}')
if [ "$live_spec" = "$desired_spec" ]; then
echo " CRD $crd_name already exists with matching spec, skipping"
else
echo " CRD $crd_name exists but spec differs, updating"
kubectl apply --server-side -f "$crd_file"
fi
else
echo " Applying $crd_file ($crd_name)"
kubectl apply --server-side -f "$crd_file"
Expand Down Expand Up @@ -178,7 +185,7 @@ helm upgrade --install "$FMA_CHART_INSTANCE_NAME" charts/fma-controllers \

step "Wait for controllers to be ready"

kubectl wait --for=condition=available --timeout=120s \
kubectl wait --for=condition=available --timeout=180s \
deployment "${FMA_CHART_INSTANCE_NAME}-dual-pods-controller" -n "$FMA_NAMESPACE"
kubectl wait --for=condition=available --timeout=120s \
deployment "${FMA_CHART_INSTANCE_NAME}-launcher-populator" -n "$FMA_NAMESPACE"
Expand Down
Loading