Outline of changes from milestone 2 to 3

# Software engineering

We have a choice to make about how to develop m3 while maintaining a working m2. I can see three obvious possible approaches.

1. Use different branches. E.g., copy current m2 to branch `milestone-2` and start developing m3 on branch `main`.
2. Use different Go packages. E.g., leave `pkg/controller/dual-pods/` as it is, holding m2, and copy it into new package `pkg/controller/dual-pods-m3` and develop from there. Remember, m3 changes not only the controller but all the recipes and scripts that use it.
3. Introduce an option into the existing controller. E.g., a command line flag `--use-launcher` that takes a Boolean value.

I currently favor approach 3, because I think it will be least disruptive. I expect there may be some on-going maintenance for m2, and I hate maintaining two copies of (more or less) the same code. Also, the change from m2 and m3 can be fairly smooth, in terms of the code in the controller, I think.

# Changes to the dual-pods controller

## Authoritative store of binding state

In m2 this is an annotation on the server-providing Pod. I think that this can remain.

## Index into sleeping vLLM instances

In m2 this is an index maintained by the Pod informer. We can add an additional index into launched vLLM instances, in the controller's data structure. In `nodeData`, have a map from server-providing Pod name to `*launcherData`. Let `launcherData` have a map from nominal hash to last-used time. Use the nominal hash as the instance ID in the launcher. There is no need for a launcher to have multiple instances with the same nominal hash.

## Other data/logic notes

Let a launcher-based server-providing Pod have an annotation that says it is such a thing. Let the controller's handler for Pod notifications queue a reference to the Pod when notified of such a thing.

Let `launcherData` have a boolean indicating whether the set of nominal hash is accurate.

Let the controller use the extension of generic controller that knows whether the initial load of objects has been processed.

When syncing a launcher-based server-providing Pod, create and populate the `launcherData` if it is not already present. If it is present but not accurate, update it to become accurate.

When it is time to create a new vLLM instance, use either the m2 code or the m3 code, as indicated by the option.

Before creating a launcher-based vLLM instance, check that every `lancherData` of the Pod has an accurate set of nominal hash. If not then enqueue a reference to the Pod and consider this a transient failure.

Each launcher has a personality. Create a new instance using only a launcher with the right personality.

Create a new launcher when needed. Delete an old launcher when its set of instances is empty and it has not been used for two minutes.

When the call to create or delete a launched instance fails with a networking problem, this controller considers this a transient failure and marks the `launcherData` as being uncertain about the correctness of the map of instances.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Outline of changes from milestone 2 to 3 #154

Software engineering

Changes to the dual-pods controller

Authoritative store of binding state

Index into sleeping vLLM instances

Other data/logic notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Outline of changes from milestone 2 to 3 #154

Description

Software engineering

Changes to the dual-pods controller

Authoritative store of binding state

Index into sleeping vLLM instances

Other data/logic notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions